Archive for the tag 'Databases'

Simple Image Based Persistence in Squeak

One of the nicest things about prototyping in Smalltalk is that you can delay the need to hook up a database during much of your development, and if you’re lucky, possibly even forever.

It’s a mistake to assume every application needs a relational database, or even a proper database at all. It’s all too common for developers to wield a relational database as a golden hammer that solves all problems, but for many applications they introduce a level of complexity that can making development feel like wading through a pond full of molasses where you spend much of your time trying to keep the database schema and the object schema in sync. It kills both productivity and fun, and god dammit, programming should be fun!

This is sometimes justified, but many times it’s not. Many business applications and prototypes are built to replace manual processes using Email, Word, and Excel. Word and Excel by the way, aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses. MySql became wildly popular long before it supported transactions, so it’s pretty clear a wide range of apps just don’t need that, no matter how much relational weenies say it’s required.

It shouldn’t come as a surprise that one can take a single step up the complexity ladder, and build simple applications that aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses BETTER than Word and Excel while purposely not taking a further step and moving up to a real database which would introduce a level of complexity that might blow the budget and make the app infeasible.

No object relational mapping layer (not even Rails and ActiveRecord) can match the simplicity, performance, and speed of development one can get just using plain old objects that are kept in memory all the time. Most small office apps with no more than a handful of users can easily fit everything into memory, this is the idea behind Prevayler.

The basic idea is to use a command pattern to apply changes to your model, you can then log the commands, snapshot the model, and replay the log in case of a crash to bring the last snapshot up to date. Nice idea, if you’re OK creating commands for every state changing action in you’re application and being careful with how you use timestamps so replaying the logs works properly. I’m not OK with that, it introduces a level of complexity that is overkill for many apps and is likely the reason more people don’t use a Prevayler like approach.

One might attempt to use the Smalltalk image itself as a database (and many try), but this is ripe with problems. My average image is well over 30 megs, saving it takes a bit of time, and saving it while processing http requests risks all kinds of things going wrong as the image prepares for what is essentially a shutdown/restart cycle.

Using a ReferenceStream to serialize objects to disk Prevayler style, but ignoring the command pattern part and just treating it more like crash proof image persistence is a viable option if your app won’t ever have that much data. Rather than trying to minimize writes with commands, you just snapshot the entire model on every change. This isn’t as crazy as it might sound, most apps just don’t have that much data. This blog for example, a year and a half old, around 100 posts, 1500 comments, has a 2.1 megabyte MySql database, which would be much smaller as serialized objects.

If you’re going to have a lot of data, clearly this is a bad approach, but if you’re already thinking about how to use the image for simple persistence because you know your data will fit in ram, here’s how I do it.

It only takes a few lines of code in a single abstract class that you can subclass for each project to make a Squeak image fairly robust and crash proof and more than capable enough to allow you just use the image, no database necessary. We’ll start with a class…

Object subclass: #SMFileDatabase
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'SimpleFileDb'

SMFileDatabase class
	instanceVariableNames: 'lock'

All the methods that follow are class side methods. First, we’ll need a method to fetch the directory where rolling snapshots are kept.

backupDirectory
	^ (FileDirectory default directoryNamed: self name) assureExistence.

The approach I’m going to take is simple, a subclass will implement #repositories to return the root object that needs serialized, I just return an array containing the root collection of each domain class.

repositories
	self subclassResponsibility

The subclass will also implement #restoreRepositories: which will restore those repositories back to wherever they belong in the image for the application to use them.

restoreRepositories: someRepositories
	self subclassResponsibility

Should the image crash for any reason, I want the last backup will be fetched from disk and restored. So I need a method to detect the latest version of the backup file, which I’ll stick a version number in when saving.

lastBackupFile
	^ self backupDirectory fileNames
		detectMax: [:each | each name asInteger]

Once I have the file name, I’ll deserialize it with a read only reference stream (don’t want to lock the file if I don’t plan on editing it)

lastBackup
	| lastBackup |
	lastBackup := self lastBackupFile.
	lastBackup ifNil: [ ^ nil ].
	^ ReferenceStream
		readOnlyFileNamed: (self backupDirectory fullNameFor: lastBackup)
		do: [ : f | f next ]

This requires you extend ReferenceStream with #readOnlyFileNamed:do:, just steal the code from FileStream so nicely provided by Avi Bryant that encapsulates the #close of the streams behind #do:. Much nicer than having to remember to close your streams.

Now I can provide a method to actually restore the latest backup. Later, I’ll make sure this happens automatically.

restoreLastBackup
	self lastBackup ifNotNilDo: [ : backup | self restoreRepositories: backup ]

I like to keep around the last x number of snapshots to give me a warm fuzzy feeling that I can get old versions should something crazy happen. I’ll provide a hook for an overridable default value in case I want to adjust this for different projects.

defaultHistoryCount
	^ 15

Now, a quick method to trim the older versions so I’m not filling up the disk with data I don’t need.

trimBackups
	| entries versionsToKeep |
	versionsToKeep := self defaultHistoryCount.
	entries := self backupDirectory entries.
	entries size < versionsToKeep ifTrue: [ ^ self ].
	((entries sortBy: [ : a : b | a first asInteger < b first asInteger ])
		allButLast: versionsToKeep)
			do: [ : entry | self backupDirectory deleteFileNamed: entry first ]

OK, I’m ready to actually serialize the data. I don’t want multiple processes all trying to do this at the same time, so I’ll wrap the save in a critical section, #trimBackups, figure out the next version number, and serialize the data (#newFileNamed:do: another stolen FileStream method), ensuring to #flush it to disk before continuing (don’t want the OS doing any write caching).

saveRepository
	| version |
	lock critical:
		[ self trimBackups.
		version := self lastBackupFile
			ifNil: [ 1 ]
			ifNotNil: [ self lastBackupFile name asInteger + 1 ].
		ReferenceStream
			newFileNamed: (self backupDirectory fullPathFor: self name) , ‘.’ , version asString
			do: [ : f | f nextPut: self repositories ; flush ] ]

So far so good, let’s automate it. I’ll add a method to schedule the subclass to be added to the start up and shutdown sequence. You must call this for each subclass, not for this class itself.

UPDATE: This method also initializes the lock and must be called prior to using #saveRepository, this seems cleaner.

enablePersistence
	lock := Semaphore forMutualExclusion.
	Smalltalk addToStartUpList: self.
	Smalltalk addToShutDownList: self

So on shutdown, if the image is actually going down, just save the current data to disk.

shutDown: isGoingDown
	isGoingDown ifTrue: [ self saveRepository ]

And on startup we can #restoreLastBackup.

startUp: isComingUp
	isComingUp ifTrue: [ self restoreLastBackup ]

Now, if you want a little extra snappiness and you’re not worried about making the user wait for the flush to disk, I’ll add little convience method for saving the repository on a background thread.

takeSnapshot
	[self saveRepository] forkAt: Processor systemBackgroundPriority
		named: ’snapshot: ‘ , self class name

And that’s it, half a Prevayler and a more robust easy to use method that’s a bit better than trying to shoehorn the image into being your database for those small projects where you really really don’t want to bother with a real database (blogs, wikis, small apps, etc). Just sprinkle a few MyFileDbSubclass saveRepository or MyFileDbSubclass takeSnapshot’s around your application whenever you feel it important, and you’re done.

Here’s a file out if you just want the code fast, SMFileDatabase.st

Squeak Smalltalk and Databases

I’ve been working in Smallalk and Seaside for quite a while now, but something I haven’t quite gotten around to yet is trying to hook Squeak up to a database in a manner that I think could actually scale for a professional project. Now, I mean directly hook it up, so far, professionally, I’ve been using it against web services written in .Net against Microsoft SQL Server, which scales just fine, but leaves me still working in .Net, and I’d much rather work in pure Smalltalk.

Object Databases

I’ve tried several object databases, GOODS, Magma, and OmniBase, and while interesting experiences, I find them not quite acceptable for various reasons. OmniBase is file based, and has odd semantics that make hooking it up to multiple images and programming web apps against it difficult.

GOODS is very low level and bare bones, it works great, but you have to pre-index all your data, it has no query capabilities beyond what you provide in your object model, which can make performance quite horrible unless you know exactly what you’re doing and make very strict choices about how your data is stored. GOODS is also a one man show, so I’m not thrilled by the support I’d have available, were I trying to do something serious with it, though it works great for hobby and prototype programming.

Magma has queries, and is very similar to GOODS as far as ease of use goes but like GOODS, it’s a one man show, and I just wouldn’t feel comfortable doing anything truly serious on a one man show kind of database. This, for Squeak at least, seems to rule out object databases, for me anyway, though I hear Gemstone is going to support Seaside. I’m hopeful, for Gemstone is truly an enterprise ready object database, it’s just vaporware at this point, nothing production ready.

All of them seem rather slow when it comes to bulk inserts, and there are various solutions and workarounds depending on which version of Squeak you’re running, but a guy can only jump through so many hoops before he says “fuck it”. I’ve hit that point more than a few times when working with larger datasets and trying to do bulk inserts or queries.

Now, don’t get me wrong, Goods, Magma, and OmniBase are great products, and I’m sure have their uses, they’re just not something I’d throw up to my boss and say “hey, let’s use this for this big upcoming project”, because it’s hard enough throwing Squeak at them and having to support that decision over something like .Net which everyone already knows how to use, let alone taking away their familiar relational databases. I’d actually prefer something like Gemstone, time will tell if that preference pans out.

Relational Databases

So, on to relational databases. Squeak has ODBC support, but it’s single threaded and blocks the VM when querying, so while it works for demos and low traffic apps against pretty much any database, I wouldn’t try anything too big with it; it just can’t scale. Blocking the whole VM, every time you run a query, just leaves me feeling a bit dirty and not too proud of whatever I just wrote.

Squeak basically supports two popular relational databases that I know of, natively, MySQL and PostgreSQL. Now, I use MySql for this blog, so I have some experience managing it, and I’m just not a big fan, compared to Microsoft SQL Server, which I work with professionally daily, MySQL just sucks, but PostgreSQL is a different story, I’m quite impressed with the latest release, which looks and runs very nice on Windows servers and has a nice admin tool.

MySQL is IMHO not much better than Microsoft Access, it’s not an enterprise database. PostgreSQL, I think is much more comparable to SqlServer and Oracle and could be used for any size app. I have much more faith in its abilities and it has some cool features like table inheritance, which to an object bigot like myself, just makes me think relational databases aren’t totally void of innovation.

Now if I can just talk my DBA into giving up Sql Server, ummm… yea, not gonna happen, but I’ve got a side project coming up that’ll be totally green field development, no legacy database to worry about, hence my renewed interest in PostgreSQL, my new database of choice with Squeak.

Getting Started

So I installed the latest PostgreSQL, installed the PostgreSQL Client for Squeak from SqueakMap, then the GLORP port from SqueakMap, and gave it a shot.

I was immediately confronted with a nasty error that reminded me why I gave up last time I tried PostgreSQL. Something about the PostgreSQL driver’s state machine not being valid, luckily enough time had passed that a few minutes Googling turned up a simple answer this time, unlike the last.

PostgreSQL installs with MD5 connection encryption turned on, which Squeak doesn’t support out of the box. There are two fixes, either install the Cryptography package from SqueakMap, or turn encryption in PostgreSQL to “password” with a simple configuration change to the pg_hba.conf. I chose the latter, as its default install only accepts local connections anyway I’m not too concerned with encryption.

PostgreSQL fires up and runs without a problem, GLORP tests all run fine, so now I just have to learn GLORP and how to map my objects into PostgreSQL, but that, is going to have to be another story. I’m going to map the simple blog in Seaside into a PostgreSQL database to learn GLORP, and I’ll post that code once I figure it out. I’ve done a bit already, and so far, I’m quite happy with GLORP.

Rails vs Seaside From a Java Developer

Here’s an interesting post from a Java guy trying to decide between Ruby on Rails and Seaside. He has quite a few interesting things to say concerning the shortcomings of Ruby on Rails, and how well Seaside handles that complexity with ease.

He also has a few complaints about Seaside, most of them valid. Seaside still isn’t the full stack solution that Ruby on Rails is. We still have to handle object relational mapping, something Ruby on Rails gives you for free. Nor does Seaside deal with object validation and errors, I use Magritte for this. Magritte rocks, but I’m not sure the average guy trying out Seaside will find it, or learn how to use it. From an outside point of view, Ruby on Rails looks like a much more complete solution.

We have Glorp, which can do this, but only against Postgres Sql in Squeak. Nothing against Postgres, but seriously, in the real business world, it’s either Microsoft Sql Server or Oracle; and it’s also usually a legacy database, so we really need something like Glorp for those databases, because something like ActiveRecord is just too brain dead to work.

Most of the schemas I have to work with suck, and can’t be changed because people used the database as an integration point for multiple apps (God I’m tired of seeing people make this mistake). Doing a simple class = table, object = row mapping just doesn’t cut the mustard for legacy development against existing databases.

Seaside is far more advanced than Ruby on Rails, and is a much better web framework for doing anything complex, but we’re still missing the market on CRUD apps. CRUD against a popular business database is still far too difficult using Squeak. I’m sure Visualworks has far better database support, but I want something free… I want Squeak, I want Squeak to work with Sql Server, I want a Pony… :(

Featured Resources

There are different programming languages for different purposes. For advertising the one that has the best features becomes bit difficult as they usually have their own purposes to fulfill. Similar competition applies to the hosting companies, like ipowerweb, or another best one would be dot5hosting to choose from. With the cheap internet phones comes the cheap call rates. Taking a 350-001 test got easier with preparation guides available online.

Podcasts: OOPSLA Panel on Objects and Databases

Sadly, I wasn’t able to make the trip to OOPSLA this year, but I found a few podcasts that covered the Objects and Databases: State of the Union in 2006, as well as a few other interesting ones. It’s great to hear these issues being discussed, dealing with databases and objects in still one of the fundamental problems in our industry.

There aren’t any easy answers, there are benefits to both object databases, and relational databases. The butchering of an object model so it can be shoehorned into a relational database via some mapping layer isn’t the answer. Industry wide, the relational folk seem to think this is a solved problem and object databases are modern dinosaurs. Mathematically, they may be right, pragmatically, they’re absolutely wrong, and the academics are wrong as well. I don’t care how pretty the relational model is, programming to it is a nightmare and doesn’t fit the way programmers actually work.

Databases will learn to deal with objects as objects, without crippling them, or they will be replaced with ones that can, period.

Whatever the future of databases are, they will need to learn to deal with random queries (a weakness of object databases) and fast access to objects with a nearly transparent API (a weakness of relational databases). As Esther Dyson says, “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.”

There has to be a better solution!

Next Page »