Simple Image Based Persistence in Squeak
One of the nicest things about prototyping in Smalltalk is that you can delay the need to hook up a database during much of your development, and if you’re lucky, possibly even forever.
It’s a mistake to assume every application needs a relational database, or even a proper database at all. It’s all too common for developers to wield a relational database as a golden hammer that solves all problems, but for many applications they introduce a level of complexity that can making development feel like wading through a pond full of molasses where you spend much of your time trying to keep the database schema and the object schema in sync. It kills both productivity and fun, and god dammit, programming should be fun!
This is sometimes justified, but many times it’s not. Many business applications and prototypes are built to replace manual processes using Email, Word, and Excel. Word and Excel by the way, aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses. MySql became wildly popular long before it supported transactions, so it’s pretty clear a wide range of apps just don’t need that, no matter how much relational weenies say it’s required.
It shouldn’t come as a surprise that one can take a single step up the complexity ladder, and build simple applications that aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses BETTER than Word and Excel while purposely not taking a further step and moving up to a real database which would introduce a level of complexity that might blow the budget and make the app infeasible.
No object relational mapping layer (not even Rails and ActiveRecord) can match the simplicity, performance, and speed of development one can get just using plain old objects that are kept in memory all the time. Most small office apps with no more than a handful of users can easily fit everything into memory, this is the idea behind Prevayler.
The basic idea is to use a command pattern to apply changes to your model, you can then log the commands, snapshot the model, and replay the log in case of a crash to bring the last snapshot up to date. Nice idea, if you’re OK creating commands for every state changing action in you’re application and being careful with how you use timestamps so replaying the logs works properly. I’m not OK with that, it introduces a level of complexity that is overkill for many apps and is likely the reason more people don’t use a Prevayler like approach.
One might attempt to use the Smalltalk image itself as a database (and many try), but this is ripe with problems. My average image is well over 30 megs, saving it takes a bit of time, and saving it while processing http requests risks all kinds of things going wrong as the image prepares for what is essentially a shutdown/restart cycle.
Using a ReferenceStream to serialize objects to disk Prevayler style, but ignoring the command pattern part and just treating it more like crash proof image persistence is a viable option if your app won’t ever have that much data. Rather than trying to minimize writes with commands, you just snapshot the entire model on every change. This isn’t as crazy as it might sound, most apps just don’t have that much data. This blog for example, a year and a half old, around 100 posts, 1500 comments, has a 2.1 megabyte MySql database, which would be much smaller as serialized objects.
If you’re going to have a lot of data, clearly this is a bad approach, but if you’re already thinking about how to use the image for simple persistence because you know your data will fit in ram, here’s how I do it.
It only takes a few lines of code in a single abstract class that you can subclass for each project to make a Squeak image fairly robust and crash proof and more than capable enough to allow you just use the image, no database necessary. We’ll start with a class…
Object subclass: #SMFileDatabase instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'SimpleFileDb' SMFileDatabase class instanceVariableNames: 'lock'
All the methods that follow are class side methods. First, we’ll need a method to fetch the directory where rolling snapshots are kept.
backupDirectory ^ (FileDirectory default directoryNamed: self name) assureExistence.
The approach I’m going to take is simple, a subclass will implement #repositories to return the root object that needs serialized, I just return an array containing the root collection of each domain class.
repositories self subclassResponsibility
The subclass will also implement #restoreRepositories: which will restore those repositories back to wherever they belong in the image for the application to use them.
restoreRepositories: someRepositories self subclassResponsibility
Should the image crash for any reason, I want the last backup will be fetched from disk and restored. So I need a method to detect the latest version of the backup file, which I’ll stick a version number in when saving.
lastBackupFile ^ self backupDirectory fileNames detectMax: [:each | each name asInteger]
Once I have the file name, I’ll deserialize it with a read only reference stream (don’t want to lock the file if I don’t plan on editing it)
lastBackup | lastBackup | lastBackup := self lastBackupFile. lastBackup ifNil: [ ^ nil ]. ^ ReferenceStream readOnlyFileNamed: (self backupDirectory fullNameFor: lastBackup) do: [ : f | f next ]
This requires you extend ReferenceStream with #readOnlyFileNamed:do:, just steal the code from FileStream so nicely provided by Avi Bryant that encapsulates the #close of the streams behind #do:. Much nicer than having to remember to close your streams.
Now I can provide a method to actually restore the latest backup. Later, I’ll make sure this happens automatically.
restoreLastBackup self lastBackup ifNotNilDo: [ : backup | self restoreRepositories: backup ]
I like to keep around the last x number of snapshots to give me a warm fuzzy feeling that I can get old versions should something crazy happen. I’ll provide a hook for an overridable default value in case I want to adjust this for different projects.
defaultHistoryCount ^ 15
Now, a quick method to trim the older versions so I’m not filling up the disk with data I don’t need.
trimBackups | entries versionsToKeep | versionsToKeep := self defaultHistoryCount. entries := self backupDirectory entries. entries size < versionsToKeep ifTrue: [ ^ self ]. ((entries sortBy: [ : a : b | a first asInteger < b first asInteger ]) allButLast: versionsToKeep) do: [ : entry | self backupDirectory deleteFileNamed: entry first ]
OK, I’m ready to actually serialize the data. I don’t want multiple processes all trying to do this at the same time, so I’ll wrap the save in a critical section, #trimBackups, figure out the next version number, and serialize the data (#newFileNamed:do: another stolen FileStream method), ensuring to #flush it to disk before continuing (don’t want the OS doing any write caching).
saveRepository | version | lock critical: [ self trimBackups. version := self lastBackupFile ifNil: [ 1 ] ifNotNil: [ self lastBackupFile name asInteger + 1 ]. ReferenceStream newFileNamed: (self backupDirectory fullPathFor: self name) , ‘.’ , version asString do: [ : f | f nextPut: self repositories ; flush ] ]
So far so good, let’s automate it. I’ll add a method to schedule the subclass to be added to the start up and shutdown sequence. You must call this for each subclass, not for this class itself.
UPDATE: This method also initializes the lock and must be called prior to using #saveRepository, this seems cleaner.
enablePersistence lock := Semaphore forMutualExclusion. Smalltalk addToStartUpList: self. Smalltalk addToShutDownList: self
So on shutdown, if the image is actually going down, just save the current data to disk.
shutDown: isGoingDown isGoingDown ifTrue: [ self saveRepository ]
And on startup we can #restoreLastBackup.
startUp: isComingUp isComingUp ifTrue: [ self restoreLastBackup ]
Now, if you want a little extra snappiness and you’re not worried about making the user wait for the flush to disk, I’ll add little convience method for saving the repository on a background thread.
takeSnapshot [self saveRepository] forkAt: Processor systemBackgroundPriority named: ’snapshot: ‘ , self class name
And that’s it, half a Prevayler and a more robust easy to use method that’s a bit better than trying to shoehorn the image into being your database for those small projects where you really really don’t want to bother with a real database (blogs, wikis, small apps, etc). Just sprinkle a few MyFileDbSubclass saveRepository or MyFileDbSubclass takeSnapshot’s around your application whenever you feel it important, and you’re done.
Here’s a file out if you just want the code fast, SMFileDatabase.st
Related postsat: "07 December 2007 > Squeak Image Updated";
at: "20 October 2007 > Squeak Image Updated";
at: "13 April 2008 > Squeak Image Updated";