Archive for the tag 'Sql'

Simple Image Based Persistence in Squeak

One of the nicest things about prototyping in Smalltalk is that you can delay the need to hook up a database during much of your development, and if you’re lucky, possibly even forever.

It’s a mistake to assume every application needs a relational database, or even a proper database at all. It’s all too common for developers to wield a relational database as a golden hammer that solves all problems, but for many applications they introduce a level of complexity that can making development feel like wading through a pond full of molasses where you spend much of your time trying to keep the database schema and the object schema in sync. It kills both productivity and fun, and god dammit, programming should be fun!

This is sometimes justified, but many times it’s not. Many business applications and prototypes are built to replace manual processes using Email, Word, and Excel. Word and Excel by the way, aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses. MySql became wildly popular long before it supported transactions, so it’s pretty clear a wide range of apps just don’t need that, no matter how much relational weenies say it’s required.

It shouldn’t come as a surprise that one can take a single step up the complexity ladder, and build simple applications that aren’t ACID compliant, don’t support transactions, and manage to successfully run most small businesses BETTER than Word and Excel while purposely not taking a further step and moving up to a real database which would introduce a level of complexity that might blow the budget and make the app infeasible.

No object relational mapping layer (not even Rails and ActiveRecord) can match the simplicity, performance, and speed of development one can get just using plain old objects that are kept in memory all the time. Most small office apps with no more than a handful of users can easily fit everything into memory, this is the idea behind Prevayler.

The basic idea is to use a command pattern to apply changes to your model, you can then log the commands, snapshot the model, and replay the log in case of a crash to bring the last snapshot up to date. Nice idea, if you’re OK creating commands for every state changing action in you’re application and being careful with how you use timestamps so replaying the logs works properly. I’m not OK with that, it introduces a level of complexity that is overkill for many apps and is likely the reason more people don’t use a Prevayler like approach.

One might attempt to use the Smalltalk image itself as a database (and many try), but this is ripe with problems. My average image is well over 30 megs, saving it takes a bit of time, and saving it while processing http requests risks all kinds of things going wrong as the image prepares for what is essentially a shutdown/restart cycle.

Using a ReferenceStream to serialize objects to disk Prevayler style, but ignoring the command pattern part and just treating it more like crash proof image persistence is a viable option if your app won’t ever have that much data. Rather than trying to minimize writes with commands, you just snapshot the entire model on every change. This isn’t as crazy as it might sound, most apps just don’t have that much data. This blog for example, a year and a half old, around 100 posts, 1500 comments, has a 2.1 megabyte MySql database, which would be much smaller as serialized objects.

If you’re going to have a lot of data, clearly this is a bad approach, but if you’re already thinking about how to use the image for simple persistence because you know your data will fit in ram, here’s how I do it.

It only takes a few lines of code in a single abstract class that you can subclass for each project to make a Squeak image fairly robust and crash proof and more than capable enough to allow you just use the image, no database necessary. We’ll start with a class…

Object subclass: #SMFileDatabase
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'SimpleFileDb'

SMFileDatabase class
	instanceVariableNames: 'lock'

All the methods that follow are class side methods. First, we’ll need a method to fetch the directory where rolling snapshots are kept.

backupDirectory
	^ (FileDirectory default directoryNamed: self name) assureExistence.

The approach I’m going to take is simple, a subclass will implement #repositories to return the root object that needs serialized, I just return an array containing the root collection of each domain class.

repositories
	self subclassResponsibility

The subclass will also implement #restoreRepositories: which will restore those repositories back to wherever they belong in the image for the application to use them.

restoreRepositories: someRepositories
	self subclassResponsibility

Should the image crash for any reason, I want the last backup will be fetched from disk and restored. So I need a method to detect the latest version of the backup file, which I’ll stick a version number in when saving.

lastBackupFile
	^ self backupDirectory fileNames
		detectMax: [:each | each name asInteger]

Once I have the file name, I’ll deserialize it with a read only reference stream (don’t want to lock the file if I don’t plan on editing it)

lastBackup
	| lastBackup |
	lastBackup := self lastBackupFile.
	lastBackup ifNil: [ ^ nil ].
	^ ReferenceStream
		readOnlyFileNamed: (self backupDirectory fullNameFor: lastBackup)
		do: [ : f | f next ]

This requires you extend ReferenceStream with #readOnlyFileNamed:do:, just steal the code from FileStream so nicely provided by Avi Bryant that encapsulates the #close of the streams behind #do:. Much nicer than having to remember to close your streams.

Now I can provide a method to actually restore the latest backup. Later, I’ll make sure this happens automatically.

restoreLastBackup
	self lastBackup ifNotNilDo: [ : backup | self restoreRepositories: backup ]

I like to keep around the last x number of snapshots to give me a warm fuzzy feeling that I can get old versions should something crazy happen. I’ll provide a hook for an overridable default value in case I want to adjust this for different projects.

defaultHistoryCount
	^ 15

Now, a quick method to trim the older versions so I’m not filling up the disk with data I don’t need.

trimBackups
	| entries versionsToKeep |
	versionsToKeep := self defaultHistoryCount.
	entries := self backupDirectory entries.
	entries size < versionsToKeep ifTrue: [ ^ self ].
	((entries sortBy: [ : a : b | a first asInteger < b first asInteger ])
		allButLast: versionsToKeep)
			do: [ : entry | self backupDirectory deleteFileNamed: entry first ]

OK, I’m ready to actually serialize the data. I don’t want multiple processes all trying to do this at the same time, so I’ll wrap the save in a critical section, #trimBackups, figure out the next version number, and serialize the data (#newFileNamed:do: another stolen FileStream method), ensuring to #flush it to disk before continuing (don’t want the OS doing any write caching).

saveRepository
	| version |
	lock critical:
		[ self trimBackups.
		version := self lastBackupFile
			ifNil: [ 1 ]
			ifNotNil: [ self lastBackupFile name asInteger + 1 ].
		ReferenceStream
			newFileNamed: (self backupDirectory fullPathFor: self name) , ‘.’ , version asString
			do: [ : f | f nextPut: self repositories ; flush ] ]

So far so good, let’s automate it. I’ll add a method to schedule the subclass to be added to the start up and shutdown sequence. You must call this for each subclass, not for this class itself.

UPDATE: This method also initializes the lock and must be called prior to using #saveRepository, this seems cleaner.

enablePersistence
	lock := Semaphore forMutualExclusion.
	Smalltalk addToStartUpList: self.
	Smalltalk addToShutDownList: self

So on shutdown, if the image is actually going down, just save the current data to disk.

shutDown: isGoingDown
	isGoingDown ifTrue: [ self saveRepository ]

And on startup we can #restoreLastBackup.

startUp: isComingUp
	isComingUp ifTrue: [ self restoreLastBackup ]

Now, if you want a little extra snappiness and you’re not worried about making the user wait for the flush to disk, I’ll add little convience method for saving the repository on a background thread.

takeSnapshot
	[self saveRepository] forkAt: Processor systemBackgroundPriority
		named: ’snapshot: ‘ , self class name

And that’s it, half a Prevayler and a more robust easy to use method that’s a bit better than trying to shoehorn the image into being your database for those small projects where you really really don’t want to bother with a real database (blogs, wikis, small apps, etc). Just sprinkle a few MyFileDbSubclass saveRepository or MyFileDbSubclass takeSnapshot’s around your application whenever you feel it important, and you’re done.

Here’s a file out if you just want the code fast, SMFileDatabase.st

My Journey to Linux

I’m a Windows guy, I’ve always been a Windows guy. Windows today is more stable than ever. Seems now would be the best time of all to be a Windows guy. Slowly but surely though, I’m becoming a Linux guy.

Truth is, I was always a Microsoft guy, and that simply included Windows along with all of their development products. I used to be a hardware/network technician. I’d setup and maintain networks for medium to small businesses. Windows was always the way to go here, it’s what the users were accustomed to and expected. I’d usually setup a Windows NT server and from a dozen to maybe 30 client computers running various version of Windows including NT workstation. So Windows was just something I was always familiar with.

Even back then, I had the occasional urge to try other things. One of my first experiences with Linux involved using it as a firewall for a windows network on some cheap throwaway hardware that wasn’t good for much else. But it always seemed a pain to use, and I didn’t really understand it, despite having it working quite well for what I intended. I just didn’t see the point of not having a nice GUI and using cryptic commands to do everything.

Later, I learned to program in VBScript and VB using ASP and SQL. I became a web developer and abandoned the hardware gig. Software was so much more interesting. ASP became ASP.net, and VB became C# when I realized how crappy a language VB actually was. What made me want to change was my discovery of the original Wiki. I found a place where real programmers hung out and discussed anything and everything. I realized the world was bigger than VB. VB.Net fixed many of the issues with VB and is pretty much equivalent to C# in all but one area… culture.

What I really was abandoning was the VB culture. I’d outgrown it, I wanted to be involved in a culture that cared more about programming well. The VB culture is dominated by amateur programmers that are just happy to get something working, they tend to care very little about things like architecture, or patterns, or the aesthetics of good code. They don’t think of themselves as amateurs, many of them consider themselves experts, but start talking about object oriented programming or functional programming and the confused looks on their faces tells you they’ve not really looked into such things very deeply. Many think simply using classes makes code object oriented.

I was still firmly in the Microsoft camp at this point, though my change to C# had opened my eyes to Java, and more importantly object oriented programming. It was the Wiki that introduced me to Smalltalk. I just couldn’t help but notice how much Smalltalk was referenced whenever object oriented programming was discussed, nor how many famous authors credentials included a Smalltalk background. I decided I had to check out this Smalltalk thing. Now, at the same time, I was checking out the Lisp thing as well, but that’s not relevant to this story.

So I’m a web developer, my seeking tends to be guided by the need to make my job easier, to find better ways to automate myself. Obviously, I discovered Seaside. Seaside got me into a non Microsoft language. Around the same time, a buddy of mine who I’d met on the Wiki suggested cygwin. I’d been talking about wanting to learn a little more about Linux and he said I could do so without leaving Windows by using a better shell. Cygwin was the beginning of the end for Windows.

I started finding reasons to grep, cat, sed, sort, uniq. This was pretty cool, I was still in Windows but had a Linux command line and the shell became a bigger part of my toolbox. Now I find myself using a non Microsoft programming language, and having discovered PostgreSQL, a non Microsoft database. And now bash for my shell. Hmm…

So now I’m still hosting my apps on Windows servers, but I keep having problems crop up. I recently did a write up on Scaling Seaside which included a bash script for making sure the Seaside services were always up and running. Problem is, turns out the only thing making my Seaside services seem to die, was the bash script itself. Somehow lynx gums up Windows after a certain period of time and Windows starts having random network errors. I’ve taken the script down and now have another one running that uses wget and simply notifies me should any site I’m monitoring go down, or come back up.

So I find myself using all open source non Microsoft tools for everything except for the server’s operating system. Having become quite comfortable on the command line, it finally hit me, stop screwing with all these problems on Windows and try Linux again. Setting up new Seaside services on Windows is a multi step pain in the but. I’d thought I’d give a Linux a try and see how far it’s come since the last time I tried it. Boy was I surprised. In the next post I’ll detail my experience setting up a Linux server for hosting Seaside.

22 February 2007 > Squeak Image Updated

Just a quick notification that I’ve updated my squeak image. I do this occasionally to keep my base up to date with the latest and greatest of the frameworks I use.

In this update I sharing my current 3.9 image. It’s based on Damien Cassou’s Squeak Dev Image, an awesome base image with all the necessary goodies a developer needs. Of course I’ve loaded up my window customizations and preferences, nicer looking fonts, and have all the preferences set the way I like.

This image includes PostgreSQL drivers and Glorp, since I’m now doing some development with them and consider them part of my base tool set. If my image isn’t to your liking, I highly recommend learning to build and maintain your own using Damien’s as a starting point. It will save you a lot of time, and he’s done an awesome job building and sharing these images. Thanks Damien.

Scaling Seaside

I’ve been busy with non Seaside projects lately, but one of the things I have squeezed in was a bit of configuration to make Seaside scale better. I was having some performance problems when more than a few sessions were running concurrently and after discussing the issue on the Seaside developers list, Avi popped in from DabbleDB and told us the way to scale was load balance many VMs with a few sessions running on each. So I had to read up a bit on the issue and learn what to do.

It turns out one of the best ways to learn about scaling Seaside is to read about scaling Ruby on Rails. The architecture for scaling both is pretty much the same. Ruby developers use a web server called Mongrel, a light weight single threaded server that works well with Rails but isn’t a heavy duty web server like Apache. This is much the same position Seaside is in with Comanche, though not single threaded, the Squeak VM can’t take advantage of multiple processors and doesn’t do well with too many concurrent connections.

UPDATE: I neglected to mention one major requirement of Seaside, your load balancer must support sticky sessions. Seaside uses sessions heavily and does not support the shared nothing approach where every request can hit a different server. This isn’t an issue unique to Seaside, many frameworks use sessions and must deal with this. Other frameworks handle such issues either by sticking a session to a server, or by having a shared session cache that all the servers can access, such as memcached or a sql server. Currently, to the best of my knowledge, no one has externalized Seaside sessions in this manner, so sticky sessions is the only viable approach.

The solution for both is quite simple really, setup a heavier duty web server/load balancer (Apache/LiteHttpd) to serve up static content, and load balance and proxy connections to a farm of light weight application servers (Mongrel/Comanche) running on other ports. I’m actually using an F5 as my front end load balancer, but Apache has all the necessary features including the ability to create pools of virtual servers which it will load balance requests across with its new mod_proxy.

One only need Google scaling Rails to find many examples of detailed setup information and articles for such a setup, so I won’t repeat it here. I will mention that neither Mongrel or Comanche are anywhere near as rock solid stable as Apache, so one thing you’ll want when having a setup like this to ensure maximum uptime is a process running to poll all of your servers to ensure they aren’t hung up for any reason.

UPDATE: I don’t want to imply Comanche is unstable, it is very rare that a service needs reset, I only do this because it “can” happen, not because it happens a lot. Seaside is very stable and under normal conditions, doesn’t crash.

Here’s a little bash script I found somewhere that makes checking a site for a specific response string simple and easy to use from the command line allowing you to easily schedule some scripts to reset any hung processes. Oh, I use cygwin on all my servers so I can have a decent Unix command line on my Windows servers.

checkSite.sh UPDATE: This script eventually freaks out Windows and causes random network errors because lynx somehow eats up network resources. Don’t use it, Seaside is quite stable without it.

#!/bin/bash

if [ $# -lt 2 ]; then
    exit
fi

URL=$1
findText=$2

lynx -dump -error_file=/tmp/x$$ $URL 2>/dev/null | grep “$findText” &>/dev/null
if [ $? != 0 ]; then
    echo WARNING: Specified search text was not found
fi

stcode=`awk ‘/STATUS/{print $2}’ /tmp/x$$ 2>/dev/null`
if [ $? != 0 ]; then
    echo site is down
fi

for code in $stcode
do
    case $code in
      200) echo OK;;
      302) echo redirecting
           awk -F/ ‘/URL/{print ”  “,$3}’ /tmp/x$$;;
      *)   echo $code
    esac
done

if [ -f /tmp/x$$ ]; then
    rm /tmp/x$$
fi

Then I use this in another script built specifically to monitor instances of my Seaside app. Though this could be more generic, I haven’t bothered yet because I only have one Seaside site in production to worry about.

checkSeaside.sh

echo "$1 $2 $3..."
sh checkSite.sh http://$1:$3/seaside/someApp "Some Required Text" | grep WARNING &>/dev/null
if [ $? = 0 ]; then
    echo “restarting $2 on $1″
    psservice \\\\$1 restart “$2″ >/dev/null #NT util to restart services on remove machines
    echo “Restarting $1 $2″ | wsendmail — -CGI -Ssome.mail.server.com -s”App Monitor reset $1 $2″ someEmail@someAddress.com -Fmonitor@someAddress.com -P1
fi

Then on each web server, I’m running a pool of 10 instances of Seaside setup as services and ensuring they’re up by scheduling a simple batch file with the windows task scheduler.

monitorSomeApp.cmd

@echo off
bash checkSeaside.sh serverName "Some Service1" 3001
bash checkSeaside.sh serverName "Some Service2" 3002
bash checkSeaside.sh serverName "Some Service3" 3003
bash checkSeaside.sh serverName "Some Service4" 3004
bash checkSeaside.sh serverName "Some Service5" 3005
bash checkSeaside.sh serverName "Some Service6" 3006
bash checkSeaside.sh serverName "Some Service7" 3007
bash checkSeaside.sh serverName "Some Service8" 3008
bash checkSeaside.sh serverName "Some Service9" 3009
bash checkSeaside.sh serverName "Some Service10" 3010

Yea, I’m mixing and matching bash and dos scripts, so sue me! Anyway, the setup works great, I’m running 30 instances of Squeak across 3 servers and these scrips ensure they’re always up and responding, and reset them and email me if they go down for any reason. Response time is now much better and I can fully take advantage of the multiple processors on the web boxes.

My process isn’t nearly as fancy as Avi’s (he’s dynamically bringing images up and down based on the host header), but balancing the connections across a bunch of fixed sized pools works well. I started with 10 processes per box, just for the hell of it, but I’ll increase or decrease the size of the pool as load dictates to eventually find the sweet spot for the pool size. For now, 10 per box works, I’ve got plenty of spare ram.

Of course, doing a setup like this means you’ll need to automate your deployment process for new code as well. So far this mean keeping a master image I upgrade with new code and test, then a script to take down each process, copy the image file over the old one, and bring the process back up. Seems Avi’s doing the same thing, works pretty well so far.

Next Page »