Simple Image Based Persistence in Squeak

One of the nicest things about prototyping in Smalltalk is that you can delay the need to hook up a database during much of your development, and if you’re lucky, possibly even forever.

It’s a mistake to assume every application needs a relational database, or even a proper database at all. It’s all too common for developers to wield a relational database as a golden hammer that solves all problems, but for many applications they introduce a level of complexity that can making development feel like wading through a pond full of molasses where you spend much of your time trying to keep the database schema and the object schema in sync. It kills both productivity and fun, and god dammit, programming should be fun!

This is sometimes justified, but many times it’s not. Many business applications and prototypes are built to replace manual processes using Email, Word, and Excel. Word and Excel by the way, aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses. MySql became wildly popular long before it supported transactions, so it’s pretty clear a wide range of apps just don’t need that, no matter how much relational weenies say it’s required.

It shouldn’t come as a surprise that one can take a single step up the complexity ladder, and build simple applications that aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses BETTER than Word and Excel while purposely not taking a further step and moving up to a real database which would introduce a level of complexity that might blow the budget and make the app infeasible.

No object relational mapping layer (not even Rails and ActiveRecord) can match the simplicity, performance, and speed of development one can get just using plain old objects that are kept in memory all the time. Most small office apps with no more than a handful of users can easily fit everything into memory, this is the idea behind Prevayler.

The basic idea is to use a command pattern to apply changes to your model, you can then log the commands, snapshot the model, and replay the log in case of a crash to bring the last snapshot up to date. Nice idea, if you’re OK creating commands for every state changing action in you’re application and being careful with how you use timestamps so replaying the logs works properly. I’m not OK with that, it introduces a level of complexity that is overkill for many apps and is likely the reason more people don’t use a Prevayler like approach.

One might attempt to use the Smalltalk image itself as a database (and many try), but this is ripe with problems. My average image is well over 30 megs, saving it takes a bit of time, and saving it while processing http requests risks all kinds of things going wrong as the image prepares for what is essentially a shutdown/restart cycle.

Using a ReferenceStream to serialize objects to disk Prevayler style, but ignoring the command pattern part and just treating it more like crash proof image persistence is a viable option if your app won’t ever have that much data. Rather than trying to minimize writes with commands, you just snapshot the entire model on every change. This isn’t as crazy as it might sound, most apps just don’t have that much data. This blog for example, a year and a half old, around 100 posts, 1500 comments, has a 2.1 megabyte MySql database, which would be much smaller as serialized objects.

If you’re going to have a lot of data, clearly this is a bad approach, but if you’re already thinking about how to use the image for simple persistence because you know your data will fit in ram, here’s how I do it.

It only takes a few lines of code in a single abstract class that you can subclass for each project to make a Squeak image fairly robust and crash proof and more than capable enough to allow you just use the image, no database necessary. We’ll start with a class…

Object subclass: #SMFileDatabase
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'SimpleFileDb'

SMFileDatabase class
	instanceVariableNames: 'lock'

All the methods that follow are class side methods. First, we’ll need a method to fetch the directory where rolling snapshots are kept.

backupDirectory
	^ (FileDirectory default directoryNamed: self name) assureExistence.

The approach I’m going to take is simple, a subclass will implement #repositories to return the root object that needs serialized, I just return an array containing the root collection of each domain class.

repositories
	self subclassResponsibility

The subclass will also implement #restoreRepositories: which will restore those repositories back to wherever they belong in the image for the application to use them.

restoreRepositories: someRepositories
	self subclassResponsibility

Should the image crash for any reason, I want the last backup will be fetched from disk and restored. So I need a method to detect the latest version of the backup file, which I’ll stick a version number in when saving.

lastBackupFile
	^ self backupDirectory fileNames
		detectMax: [:each | each name asInteger]

Once I have the file name, I’ll deserialize it with a read only reference stream (don’t want to lock the file if I don’t plan on editing it)

lastBackup
	| lastBackup |
	lastBackup := self lastBackupFile.
	lastBackup ifNil: [ ^ nil ].
	^ ReferenceStream
		readOnlyFileNamed: (self backupDirectory fullNameFor: lastBackup)
		do: [ : f | f next ]

This requires you extend ReferenceStream with #readOnlyFileNamed:do:, just steal the code from FileStream so nicely provided by Avi Bryant that encapsulates the #close of the streams behind #do:. Much nicer than having to remember to close your streams.

Now I can provide a method to actually restore the latest backup. Later, I’ll make sure this happens automatically.

restoreLastBackup
	self lastBackup ifNotNilDo: [ : backup | self restoreRepositories: backup ]

I like to keep around the last x number of snapshots to give me a warm fuzzy feeling that I can get old versions should something crazy happen. I’ll provide a hook for an overridable default value in case I want to adjust this for different projects.

defaultHistoryCount
	^ 15

Now, a quick method to trim the older versions so I’m not filling up the disk with data I don’t need.

trimBackups
	| entries versionsToKeep |
	versionsToKeep := self defaultHistoryCount.
	entries := self backupDirectory entries.
	entries size < versionsToKeep ifTrue: [ ^ self ].
	((entries sortBy: [ : a : b | a first asInteger < b first asInteger ])
		allButLast: versionsToKeep)
			do: [ : entry | self backupDirectory deleteFileNamed: entry first ]

OK, I’m ready to actually serialize the data. I don’t want multiple processes all trying to do this at the same time, so I’ll wrap the save in a critical section, #trimBackups, figure out the next version number, and serialize the data (#newFileNamed:do: another stolen FileStream method), ensuring to #flush it to disk before continuing (don’t want the OS doing any write caching).

saveRepository
	| version |
	lock critical:
		[ self trimBackups.
		version := self lastBackupFile
			ifNil: [ 1 ]
			ifNotNil: [ self lastBackupFile name asInteger + 1 ].
		ReferenceStream
			newFileNamed: (self backupDirectory fullPathFor: self name) , ‘.’ , version asString
			do: [ : f | f nextPut: self repositories ; flush ] ]

So far so good, let’s automate it. I’ll add a method to schedule the subclass to be added to the start up and shutdown sequence. You must call this for each subclass, not for this class itself.

UPDATE: This method also initializes the lock and must be called prior to using #saveRepository, this seems cleaner.

enablePersistence
	lock := Semaphore forMutualExclusion.
	Smalltalk addToStartUpList: self.
	Smalltalk addToShutDownList: self

So on shutdown, if the image is actually going down, just save the current data to disk.

shutDown: isGoingDown
	isGoingDown ifTrue: [ self saveRepository ]

And on startup we can #restoreLastBackup.

startUp: isComingUp
	isComingUp ifTrue: [ self restoreLastBackup ]

Now, if you want a little extra snappiness and you’re not worried about making the user wait for the flush to disk, I’ll add little convience method for saving the repository on a background thread.

takeSnapshot
	[self saveRepository] forkAt: Processor systemBackgroundPriority
		named: ’snapshot: ‘ , self class name

And that’s it, half a Prevayler and a more robust easy to use method that’s a bit better than trying to shoehorn the image into being your database for those small projects where you really really don’t want to bother with a real database (blogs, wikis, small apps, etc). Just sprinkle a few MyFileDbSubclass saveRepository or MyFileDbSubclass takeSnapshot’s around your application whenever you feel it important, and you’re done.

Here’s a file out if you just want the code fast, SMFileDatabase.st

Related posts
    at: "07 December 2007 > Squeak Image Updated";
    at: "20 October 2007 > Squeak Image Updated";
    at: "13 April 2008 > Squeak Image Updated";

07 December 2007 > Squeak Image Updated

Just a quick notification that I updated my squeak image (several of you have asked).

It’s based on Damien Cassou’s latest Squeak Dev Image (Squeak 3.9), an awesome base image with all the necessary goodies a developer needs. Of course I’ve loaded up my window customizations and preferences, nicer looking fonts, and have all the preferences set the way I like.

Nothing major in this update other than keeping up with the latest versions of everything I use. I have removed the SentorsaBrowser and TrickRefractoringBrowser because the new OmniBrowser is just getting too good to use anything else. Enjoy.

Related posts
    at: "20 October 2007 > Squeak Image Updated";
    at: "22 February 2007 > Squeak Image Updated";
    at: "13 April 2008 > Squeak Image Updated";

Seaside 2.8 Released

Quoted from the Seaside mailing list…

After a beta phase of two months we release the final version of Seaside 2.8. Most bugs fixed during this period were either long standing (already in 2.7), minor or portability related, Together with the dozens of Seaside 2.8 applications already in production today this gives a pretty good feeling about this version. A special mention deserves Roger Whitney, thanks to him we went from 99 commented classes to 144.

We have a list of new features [1] and a migration guide [2] on our homepage.

Squeak users can get it either from SqueakMap, Universes or directly via Monticello (Seaside2.8a1-lr.518). A special note for Squeak users, do not load Seaside 2.8 into an image that has already Seaside 2.7 in it. If you use Squeak 3.7 you will have to load SeasideSqueak37 as well.

VisualWorks users can get it form Store (2.8a1-lr.518,tkogan).

GemStone/S users can load Seaside2.8g1-dkh.522.

[1] http://www.seaside.st/community/development/seaside28a1
[2] http://www.seaside.st/documentation/migration

Cheers
The Seaside Team

I’ve been using 2.8 for a while now in development and for several weeks in production, it’s solid and very easy to port to for 2.7 users. Upgrade as soon as you can, it’s quite a bit snappier and uses much less memory.

Related posts
    at: "An Excellent Smalltalk Primer By Alan Lovejoy";
    at: "Seaside Tutorial By Software Architecture Group";

Smalltalk Concurrency, Playing With Futures

Concurrency is always a source of problems in complex systems and one of the coolest patterns I’ve seen for simplifying it is Futures. I thought I’d explore the idea today and hack up a quick implementation of a dynamic Future proxy.

The basic idea is to take a block of code, schedule it on another thread and return a dynamic proxy that if accessed, blocks until the value returns. This lets useful work continue on the main thread until you access the value.

A nice way to break up a big task concurrently might be to #collect: all the futures for a bunch of work processes you have, say fetching rates for a bunch of hotels that require calls to outside systems that may or may not return quickly, and then aggregate the results at the end.

Here’s the complete implementation, it’s quite simple but seems to work pretty well while playing around in a workspace and makes concurrency seem less of a beast.

First the class, a subclass of ProtoObject since we’re building a proxy…

ProtoObject subclass: #SFuture
	instanceVariableNames: 'futureValue error lock'
	classVariableNames: ''
	poolDictionaries: ''
	category: 'OnSmalltalk'

Then a #value: write accessor which eagerly kicks off the process, sets up, and clears the lock after fetching the future value.

value: aBlock
    lock := Semaphore new.
    [futureValue := [aBlock on: Error do: [:err | error := err]]
                ensure:
                    [lock signal.
                    lock := nil]] fork

Now a #value read accessor that blocks if the lock still exists, re-throws any error that may have happened on the worker thread in the context of the main thread, and finally returns the future value.

value
	lock ifNotNil: [lock wait].
	error ifNotNil:
			[error
				privHandlerContext: thisContext;
				signal].
	^ futureValue

A quick testing method for checking if the future has finished executing (useful for doing what work you can with the results that have returned).

hasValue
	^lock isNil

And the all important #doesNotUnderstand: override that intercepts any message sent to the proxy and sends it to the future value, causing the thread to block until the result is finished computing.

doesNotUnderstand: aMessage
    ^ self value
		perform: aMessage selector
		withArguments: aMessage arguments

Finally, a single extension method to BlockContext to make using the future more natural and ensuring to call fixTemps so I can collect future values in a loop with the assumption that the block will act like a proper closure.

BlockContext>>future
	^ SFuture new value: self fixTemps

Now we can ask any block for its future value and just pretend we have it. Executing some test code in a workspace…

value1 := [200 timesRepeat:[Transcript show: '.']. 6] future.
value2 := [200 timesRepeat:[Transcript show: '+']. 6] future.
Transcript show: ‘other work’.
Transcript show: (value1 + value2).

Reveals the string ‘other work’, a long string of interspersed periods and pluses, and finally 12, the result of adding the value returned by each future. In all, a pretty nice way to handle concurrency, I’ll have to see where I can simplify some code with the use of Futures, I can already think of a few.

Related posts
    at: "Enterprise Development Suckage";
    at: "Upgrading a Running Squeak Image";
    at: "Making a Connection Pool for Glorp in Seaside";

« Previous PageNext Page »