SandstoneDb, Simple ActiveRecord Style Persistence in Squeak

By Ramon Leon - 14 July 2008 under Databases, Performance, Programming, Ruby, Seaside, Smalltalk, Sql, Squeak

On Persistence, Still Not Happy

Persistence is hard and something you need to deal with in every app. I've written about what's available in Squeak, written about simpler image-based solutions for really small systems where just dumping out to one file is sufficient; however, nothing I've used so far has satisfied me completely for various reasons, so before I get to the point of this post, let me do a quick review of my current thoughts on the matter.

Relational Databases

Tired of em, I don't care how much they have to offer me in the areas of declarative indexing and queries, transactions, triggers, stored procedures, views, or any of the handful of things they offer that I don't really want from them. The price they make me pay in programming just isn't worth it for small systems. I don't want my business logic in the database. I don't want to use a big mess of tables to model all my data as a handful of global variables, aka tables, that multiple applications share and modify freely. What I do want from them, transactional persistence of my object model, they absolutely suck at and all attempts to shoehorn an object model into a relational database ends up being an exercise in frustration, compromise, and cussing. I think using a database as an integration point between multiple applications is a terrible idea that just leads to a bunch of fragile applications and a data model you can't change for fear of breaking them. Enough said, on to more object-oriented approaches!

Active Record

Ruby on Rails has brought the ActiveRecord pattern mainstream, which was as far as I know, first popularized in Martin Fowler's book Patterns Of Enterprise Application Architecture which largely dealt with all the various known methods of mapping objects to databases. Initially, I wasn't a fan of the pattern and preferred the more complex domain model with a metadata mapping, but having written an object-relational mapper at a previous gig, used several open-source ones, as well as tried out several pure object databases, I've come to appreciate the simplicity and explicitness of its simple API.

If you have to work with a relational database, this is a fairly good compromise for doing so. You can't bind a real object model to a relational database cleanly without massive effort, so don't try, just revel in the fact that you're editing rows rather than trying to hide it. It works reasonably well, and it's easy to get other team members to use it because it's simple.

"Simplicity is the ultimate sophistication" -- Leonardo Da Vinci

Other Approaches

A total OO purist, or a young one still enamored with patternitis, wouldn't want objects to save themselves as an ActiveRecord does. You can see this in the design of most object-oriented databases available, it's considered a sin to make you inherit from a class to obtain persistence. I used to be one of those guys too, but I've changed my mind in favor of pragmatism. The typical usage pattern is to create a connection to the OODB server which basically presents itself to you as a persistent dictionary of some sort where you put objects into it and then "commit" any unsaved changes. They will save any object and leave it up to you what your object should look like, intruding as little as possible on your domain, so they say.

Behind the scenes there's some voodoo going on where this persistent dictionary tries to figure out what's actually been changed either by having installed some sort of write barrier that marks objects dirty automatically when they get changed, comparing your objects to a cached copy created when they were originally read, or sometimes even explicitly forcing the programmer to manually mark the object dirty. The point of all of this complexity of course, is to minimize writes to the disk to reduce IO and keep things snappy.

Simplicity Matters

What seems to be overlooked in this approach is the amount of accidental complexity that is imposed upon the programmer. If I have to open a connection to get a persistent dictionary to work with, I now have to store this configuration information, manage the creation of this connection, possibly pool it if it's an expensive resource, and decide where to hang this dictionary so I can have access to it from within my application. This is usually some sort of current session object I can always reach such as a WASession subclass in Seaside. Now, this all actually seems pretty normal, but should it be?

I'm not saying this is wrong, but one has to be aware of the trade-offs made for any particular API or style. At some point, you have to wonder if we're not suffering from some form of technical Stockholm syndrome where we forget that all this complexity is killing us and we forget just how painful it really is because we've grown accustomed to it.

Sit down and try explaining one of your programs that use some of this stuff to another programmer unfamiliar with your setup. If you really pay attention, you'll notice just how much of the explaining you're doing has nothing to do with the actual problem you're trying to solve. Much of it is just accidental complexity for plumbing and scaffolding that crept in. If you spend more time explaining the persistence framework than your program and the actual problem it's solving, then maybe that's a problem you'll want to revisit sometime. Do I really want to write code somewhat like...

user := User firstName: 'Ramon' lastName: 'Leon'.
self session commit: [ self session users at: user id put: user ].

with all the associated configuration setup and cognitive load of remembering what I called the accessor to get #users and how I'm hashing the user for this or that class while remembering the semantics of what exactly is committed, or whether I forgot to mark something dirty, or would I rather do something more straight forward and simple like this...

user := User firstName: 'Ramon' lastName: 'Leon'.
user save.

And just assume the object knows how to persist itself and there's no magic going on? If I say save I just know it commits to disk, whether there were any changes or not. No setup, no configuration, no magic, just save the damn object already.

Contrary to popular belief, disk IO is not the bottleneck, my time is the bottleneck. Computers are cheap, ram is cheap, disks are cheap, programmer's time is usually by far the largest expense on any project. Something simple that just works OK but solidly every time is far more useful to me than something complex that works really really well most of the time but still breaks in weird ways occasionally, forcing me to dig into someone else's complex code for change detection or topological insertion sorting and blow a week of programmer time working on god damn plumbing. I want to spend as much time as possible when programming working on my actual problem, not fighting with the persistence framework to get it to behave correctly or map my object correctly.

A Real Solution

Of course, GemStone is offering GLASS, a 4 gig persistent image that just magically solves all your problems. That will be the preferred option for persistence when you really need to scale in the Seaside world, and I for one will be using it when necessary; however, it does require a 64 bit server and introduces the small additional complexity of changing to an entirely different Smalltalk and learning its class library. Definitely an option if you outgrow Squeak. But will you? I'll get into GemStone more in another post when I can get more into it and give it the attention it deserves, but my main point now is that there's still a need for simple GemStone'ish like persistence for Squeak.

Reality Check

Let's be honest, most apps don't need to scale. Most apps in the real world are written to run small businesses, which DHH calls the fortune five million. The simple fact is, in all likelihood scaling is not and probably won't ever be your problem. We might like to think we're writing the next YouTube or Twitter, but odds are we're not. You can make a career just replacing spreadsheets from hell with simple applications that make people lives easier without ever once hitting the limits of a single Squeak image (such was the inspiration for DabbleDb), so don't waste your time scaling.

You don't have a scaling problem unless you have a scaling problem. Even if you do have an app that needs to scale, it'll probably need 2 or 3 back end supporting applications that don't and it's a waste of time making them scale if they don't need too. If scaling ever becomes a problem, be happy, it's a nice problem to have unless you're doing something stupid like giving away all of your services for free and hoping you'll figure out that little money thing later on.

Conventions Rule

Ruby on Rails has shown us that beyond making things easier with ActiveRecord, things often need to be made more structured and less configurable. Configuration is a hidden complexity that Java has shown can kill any chance for any real productivity, sometimes having more configuration than actual code. It's amazing how much simpler programs can get if you just have the guts to make a few tough choices, decide how you want to do things, and always do it that way. Ruby on Rails true contribution to the programming community was its convention over configuration philosophy, ActiveRecord itself was in use long before Rails.

Convention over configuration is really just a nice way of the framework writer saying "This is how it's done and if you don't like it, tough." The problem then of course becomes finding a framework with conventions you agree with, but it's a big world, you're probably a programmer if you're reading this, so if you can't find something, write your own. The only problem with other people's frameworks is that they're other people's frameworks. There's nothing quite like living in a world of your own creation.

What I Wanted

I wanted something like ActiveRecord from Rails but not mapped to a relational database, that I could use with Seaside and Squeak for small applications. I've accepted that if I need to scale, I'll use GemStone, this limits what I need from a persistence solution for Squeak.

For Squeak, I need a simple, fast, configuration free, crash-proof, easy to use object database that doesn't require heavy thinking to use, optimize, or explain to others that allows me to build and iterate prototypes and small applications quickly without having to keep a schema in sync or stop to figure out why something isn't working, or why it's too slow to be usable.

I don't want any complex indexing schemes to be necessary, which means I want something like a prevalence system where all the objects are kept in memory all the time so everything is just automatically fast. I basically just want my classes in Squeak to be persistent and crash-proof. I don't need a query language, I have the entire Smalltalk collections hierarchy at my disposal, and I sure as hell don't need SQL.

I also don't want a bunch of configuration. If I want to find all the instances of a User in memory I can simply say...

someUsers := User allInstances.

Without having to first go and configure what memory #allInstances will refer to because obviously I want #allInstances in the current image. After all, isn't a persistent image what we're really after to begin with? Don't we just want our persistent objects to be available to us as if they were just always in memory and the image could never crash? Shouldn't our persistent API be nearly as simple?

Since I'm basically after a persistent image, I don't need any configuration; the image is my configuration. It is my unit of deployment and I've already got one per app/customer anyway. I don't currently, nor do I plan on running multiple customers out of a single image so I can simply assume that when I persist an instance, it will be stored automatically in some subdirectory in the directory my image itself is in, overridable of course, but with a suitable default. If I want to host another instance of a particular database, I'll put another image in a different directory and fire it up.

And now I'm finally getting to the point...

SandstoneDb

Since I couldn't find anything that worked exactly the way I wanted, though Prevayler was pretty close, I just wrote my own. It's a simple object database that uses SmartRefStreams to serialize clusters of objects to disk. Ordinary ReferenceStreams can mix up your instance variables when deserializing older versions of a class.

The root of each cluster is an ActiveRecord / OODB hybrid. It makes ActiveRecord a bit more object oriented by treating it as an aggregate root and its class as a repository for its instances. I'm mixing and matching what I like from Domain Driven Design, Prevayler, and ActiveRecord into a single simple framework that suits me.

SandstoneDb API

To use SandstoneDb, just subclass SDActiveRecord and restart your image to ensure the proper directories are created, that's it, there is no further configuration. The database is kept in a subdirectory matching the name of the class in the same directory as the image. This is a Prevayler like system so all data is kept in memory written to disk on save; on system startup, all data is loaded from disk back into memory. This keeps the image itself small.

Like Prevayler, there's a startup cost associated with loading all the instances into memory and rebuilding the object graph, however once loaded, accessing your objects is blazing fast and you don't need to worry about indexing or special query syntaxes like you would with an on-disk database. This of course limits the size of the database to whatever you're willing to put up with in load time and whatever you can fit in ram.

To give you a rough idea, loading up a 360 meg database containing about 73,000 hotel objects on my 3ghz Xeon Windows workstation takes about 57 minutes. That's an average of about 5k per object. Hefty and definitely pushing the upper limits of acceptable. Of course, load time will vary depending upon your specific domain and the size of the objects. This blog is nearly two years old and only has a few hundred objects varying from 2k to 90k, some of my customers have been using their small apps for nearly a year and only accumulated 500 to 600 business objects averaging 0.5k each. Load time for apps this small is insignificant and using a relational database would be akin to using a sledgehammer to hang an index card with a thumbtack.

API

SandstoneDb has a very simple API for querying and iterating on the class side representing the repository for those instances:

queries

#atId: (for fetching a record by its #id)
#atId:ifAbsent:
#do: (for iterating all records)
#find: (for finding first matching record)
#find:ifAbsent:
#find:ifPresent:
#findAll (for grabbing all records)
#findAll: (for finding all matching record)

Being pretty much just variations of #select: and #detect:, little if any explanation is required for how to use these. The #find naming is to make it clear these queries could potentially be more expensive than just the standard #select: and #detect:.

Though it's memory-based now, I'm leaving open the option of future implementations that could be disk-based allowing larger databases than will fit in memory; the same API should work regardless.

There's an equally simple API for the instance side:

Accessors that come in handy for all persistent objects.

#id (a UUID string in base 36)
#createdOn
#updatedOn
#version (useful in critical sections to validate you're working on the version you expect)
#indexString (all instance variable's asStrings as a single string for easy searching)

Actions you can perform on a record.

#save (thread safe)
#save: (same as above but you can pass a block if you have other work you want done while the object is locked)
#critical: (grabs or creates a Monitor for thread safety)
#abortChanges (rollback to the last saved version)
#delete (thread safe)
#validate (for subclasses to override and throw exceptions to prevent saves)

You can freely have records holding references to other records but a record must be saved before it can be referenced. If you attempted to save an object that references another record that answers true to #isNew, you'll get an exception. Saves are not cascaded, only the programmer can know the proper save order his object model requires. To do safe cascaded saves would require actual transactions. Saves are always explicit, if you didn't save it, it wasn't saved, there is no magic, and you should never be left scratching your wondering if your objects were saved or not.

Events you can override to hook into a records life cycle.

#onBeforeFirstSave
#onAfterFirstSave
#onBeforeSave
#onAfterSave
#onBeforeDelete
#onAfterDelete

Be careful with these, if an exception occurs you will prevent the life cycle from completing properly, but then again, that might be what you intend.

A testing method you might find useful on occasion.

#isNew (answers true prior to the first successful save)

Only subclass SDActiveRecord for aggregate roots where you need to be able to query for the object, for all other objects just use ordinary Smalltalk objects. You DO NOT need to make every one of your domain objects into ActiveRecords, this is not Ruby on Rails, choosing your model carefully gives you natural transaction boundaries since the save of a single ActiveRecord and all ordinary objects contained within is atomic and stored in a single file. There are no real transactions so you can not atomically save multiple ActiveRecords.

A good example of an aggregate root object would an #Order class, while its #LineItem class just be an ordinary Smalltalk object. A #BlogPost is an aggregate root while a #BlogComment is an ordinary Smalltalk object. #Order and #BlogPost would be ActiveRecords. This allows you to query for #Order and #BlogPost but not for #LineItem and #BlogComment, which is as it should be, those items don't make much sense outside the context of their aggregate root and no other object in the system should be allowed to reference them directly, only aggregate roots can be referenced by other objects.

This of course means should you improperly reference say an #OrderItem from an object other than its parent #Order (which is the root of the file they're bother stored in), then you'll ultimately end up referencing a copy rather than the original because such a reference won't be able to maintain its identity after an image restart.

In the real world, this is more than enough to write most applications. Transactions are a nice to have feature, they are not a must-have feature and their value has been grossly oversold. Starbucks doesn't use a two-phase commit, and it's good to remind yourself that the world chugs on anyway, mistakes are sometimes made and corrective actions are taken, but you don't need transactions to do useful work. MySql became the most popular open-source database in existence long before they added transactions as a feature.

Here are some examples of using an ActiveRecord...

person := Person find: [ :e | e name = 'Joe' ].
person save.
person delete.
user := User find: [ :e | e email = 'Joe@Schmoe.com' ] ifAbsent: [ User named: 'Joe' email: 'Joe@Schmoe.com' ].
joe := Person atId: anId.
managers := Employee findAll: [ :e | e subordinates notEmpty ].

Concurrency is handled by calling either #save or #save: and it's entirely up to the programmer to put critical sections around the appropriate code. You are working on the same instances of these objects as other threads and you need to be aware of that to deal with concurrency correctly. You can wrap a #save: around any chunk of code to ensure you have a lock on that object like so...

auction save:[ auction addBid: (Bid price: 30 dollars user: self session currentUser) ].

While #critical: lets you decide when to call #save, in case you want other stuff inside the critical section of code to do something more complex than a simple implicit save. When you're working with multiple distributed systems, like a credit card processor, transactions don't really cut it anyway so you might do something like save the record, get the auth, and if successful, update the record again with the new auth...

auction critical: [ 
    [ auction
        acceptBid: aBid;
    save;
    authorizeBuyerCC;
        save ] 
     on: Error do: [ :error | auction reopen; save ] ]

That's about all there is to using it, there are some more things going on under the hood like crash recovery and startup but if you really want to know how that works, read the code. SandstoneDb is available on SqueakSource and is MIT licensed and makes a handy development and prototyping or small application database for Seaside. If you happen to use it and find any bugs or performance issues, please send me a test case and I'll see what I can do to correct it quickly.

Comments (automatically disabled after 1 year)

Claus 6239 days ago

Hi,

Cannot look at the code for now, so I have to ask -

How hard do you think would it be to port this to VW? I am actually looking for something like this for VW ...

Cheers,

Claus

Ramon Leon 6239 days ago

Should be pretty trivial assuming VW has something similar to a SmartRefStream for serializing objects.

Sungjin 6239 days ago

Finally an article on SandstoneDb :-)

I'm not sure you can remember my problem with SandstoneDb - my problem was its loading time; when image starts, all records are read in and my application does have many(> 800) number of small records. I've stolen your idea on prevaylor style and mixed it with OmniBase and I managed to reduce the loading/starting time. If SandstoneDb could reduce its data loading time, I think SandstoneDb would be the best solution for small web application, because it's very easy and intuitive to use.

Great work, though I'm not using it currently, it does let me know many useful concept on persistence.

Ramon Leon 6239 days ago

Have you tested the newest version? The newer versions have had some load time improvements.

Robert 6238 days ago

Have you thought about possibly creating some sort of memory pool that could allow multiple images that are running on different systems to pool there memory together as if it was one large segment of memory? I know your not in the business of building large scale object stores but thought it might be interesting to see how far SandstoneDb could scale.

Robert 6238 days ago

Sorry just red your description on SqueakSource I think you answer my question :-)

Aaron Davis 6237 days ago

This looks really interesting. The only problem I have with is the ID field in ActiveRecord objects (alluded to by the #atId: message). I think that an ID was really a hack to make relational databases work. I haven't looked at the code yet, so maybe it is necessary in this case.

Anyway, my real question is this: Is this meat to be a replace/upgrade to "Simple, Image-based Persistance in Squeak," or is it just an alternate path?

I have been working on a personal project in order to learn Squeak and Seaside, which I have been thinking of scrapping and starting over, since it is rather unwieldy, and I am sure not very Squeaky (though I would probably use some of the old code).

If I do restart, I will likely use SandstoneDb. If not, does it make sense to switch to it?

Ramon Leon 6237 days ago

No, id's are not a hack to make relational databases work, object identity is not a concept of relational algebra, in fact it's quite frowned upon. Adding an id column to every table is something us object heads tend to do because we want object identity, and think object identity alone is sufficient to identify an object without looking at pure values of the columns as a real relational weenie would prefer.

The reason you need and #id by the way, is that sometimes you need to pass a reference to your object outside your system, like in an email, so you pass the id so you can look it up when the user clicks a link. When possible, always pass the real object, not the #id, just recognize that isn't always possible, so you need an #id of some sort. Some people use base36 integers for small ids (reddit for example), but I prefer a base36 UUID so I can just generate it without having some global counter to increment.

SandstoneDb is a bit of an upgrade from "Simple, Image-based Persistence in Squeak", it commits much faster because the whole database isn't written on every change, just the changed record is. A SandstoneDb database could be quite a bit larger and it also gives you somewhat of a model to work against, rather than leaving everything up to you. I'd certainly switch to SandstoneDb, clearly, since I myself have.

Avalon 6235 days ago

Hello Ramon!

Its really interesting idea and i fully agree with you about small web projects. Even financial reporting application for very big industry corporation works perfect on one image. I am thinking about using Sandstone for persistence (now i am using image based one). I have the following question: What objects should i inherit from activerecord in my situation. I have templates which contains different kind of rows. I have reports which contains hashes of values where key is the specific row. I also have class level hash for fast accessing rows. So what should be inherited from activerecord? For example if class A is inherited from AR and contains instance variable bInstance of class B, which is inherited just from object. If i save class A, will object bInstance be saved too?

Thank you

Ramon Leon 6235 days ago

"What objects should i inherit from activerecord in my situation."

For your model, I couldn't say without knowing more; however, I'd say in general the active records should the be classes you want to directly query for the most, those which are whole objects and not just some small subpart of another object.

"If class A is inherited from AR and contains instance variable bInstance of class B, which is inherited just from object. If i save class A, will object bInstance be saved too?"

Yes, that's the intended usage exactly.

Adam Howard 6234 days ago

"...classes you want to directly query for the most, those which are whole objects and not just some small subpart of another object."

What happens when these two cases do not overlap. For instance, a Project contains a collection of Invoices. An Invoice contains a collection of LineItems. The Project is obviously a top level object and so should be an ActiveRecord. The LineItems are obviously subparts of the Invoice object and should be regular objects. What about the Invoice? It may be necessary to query for a particular invoice but it cannot exist with out it's associated Project. In relational terms this would be defined with a "cascade delete".

My other question has to do with unit testing. In your example test the tearDown method deletes all instances of SDPersonMock. If I was developing a UI for dealing with SDPersonMock, how could I maintain the data I have been using during development seperately from the instances created during testing?

Thanks for introducing me to another great persistence scheme for Squeak!

Ramon Leon 6234 days ago

I would make both Project and Invoice active records, they are clearly self contained. A LineItem is not, it means nothing outside the context of its Invoice, so it's just a regular object.

As for dealing with the deletion of a Project, it's a simple matter to define a cascading delete within project...

onBeforeDelete
    self lineItems do: #delete

"If I was developing a UI for dealing with SDPersonMock"

Simple answer, don't do that, leave my test objects alone. If I were testing actual biz objects, I'd subclass them which would put them into different directories or I'd be more careful and delete only what I created during the tests or more likely, not keep production data in my local test directory.

More complex answer, for your own classes, if you don't like where objects are stored by default, then on your class side, override the #defaultBaseDirectory and put your records wherever you want.

Aaron Davis 6232 days ago

I re-read my previous comment, and boy was it clumsy. What I meant to say, was that I was under the impression that and ID was a hack to make ORMs work. I know that IDs aren't part of the relational model of things, and I avoid them in my tables when possible, so I don't know why I said that. Whoops.

In any case, thanks for the reply. Your response makes a lot of sense. I hadn't thought about it much, but now I can see why you would want object IDs. Somehow, just using a UUID feels wrong though, like using a surrogate PK in a table. Does SandstoneDb allow you to override the "Primary Key" field? Can I use a symbol for example?

Ramon Leon 6232 days ago

"Somehow, just using a UUID feels wrong though, like using a surrogate PK in a table."

It's not wrong, it's how objects work. An objects identity is independent of any of it's fields values, these are objects, not tables, there's no such thing as a primary key and you should stop thinking of it as a key. The rules that apply to a relational model do not apply here. There's no reason to want to mess with how #id is implemented since the only time you should ever use it is when you pass a reference to the object outside the system. I'm anxious to hear an actual reason, if you have one (feels wrong doesn't count)?

By the way, there's also nothing wrong with using a surrogate PK in a table, pragmatically speaking, it's the only way to ensure ever changing business rules don't require ever changing code. In my experience natural keys are nice in theory, but in practice using natural keys only works for a small subset of lookup tables, but primary entity tables like transactions, customers, orders, products, etc, are always better off using a surrogate key.

Aaron Davis 6231 days ago

From a pragmatic standpoint, I agree you usually have to use surrogate keys. That doesn't make it right. In theory, to my mind anyway, whenever it is possible to use a real field, you should. For example, TBUSERS has a Surrogate USERID column, despite the fact that the record can be reasonably identified by the USER_NAME column. You might argue that TBUSERS is a lookup table, and I wouldn't disagree. The examples you mentioned (transactions, customers, products) are fine candidates to surrogate keys, because no small subset of the can be used to adequately identify the record.

I also would also use a surrogate key when meaningful key might be one field, but that field is long, since it would lead to typing mistakes. But that's the relational model.

The actual use case I had in mind relates to the application I mentioned before, which is meant to facilitate lending video games between my co-workers. I would like to be able to bookmark a list of all games on Xbox 360, for example. The URL could be something like 'http://localhost/seaside/gamelist/systems/Xbox+360.' Since you say that the only purpose of #id is to pass a reference to the outside world in a URL, I think this applies. Technically, there is nothing stopping me from doing that without using the UUID.

I suppose I am just stuck in the Relational model of things. I've been working on implementing ActiveRecord style models at my job, mostly because I understand the draw of working with smart objects, rather than dumb data (I think you said that in a comment on Enfranchised Mind). However, my thought process is still clouded by the relationship between the RDBMS and my objects.

Ramon Leon 6230 days ago

Exactly, there's nothing stopping you from doing that right now, you just have to get over the idea that an object's identity has anything at all to do with its values, it doesn't.

The relational model, the theory, doesn't hold up well in actual practice because the real world is messy and prone to change and programs need stable keys that are fast to join on and don't ever require updating. If you made the username the key, you'd have to cascade update everything that joined to it every time the user decided to change his name. Throw theory out the window, because that's just a dumb thing to do and puts a huge amount of extra and totally unnecessary load and complexity onto both the db and the application.

Surrogate keys are part of the physical model, not the logical model, so just pretend they aren't there and put appropriate constraints on your logical model, such as making username unique. But use the surrogate for joins and deletes and such, because practically speaking, it just makes programs vastly simpler and more immune to changes in the business rules. You just never know when a manager will want something like declaring the username just an alias and allowing dupes and deciding the email field is now the login field. If you'd used username as they key, you'd have to rewrite much of your application and queries, with a surrogate, nothing changes but some constraints.

One thing to keep in mind is that users don't give a rats ass about relational theory or your database, they only care about the application, to them the application is everything, so the applications needs MUST take priority over everything else, the database is there to serve the application, theory be damned.

Göran Krampe 6218 days ago

Hi Ramon!

I am slightly curious why you prefer rolling your own instead of using Magma? Although there are plenty of reasons for rolling your own stuff - it is FUN for example - the original article dismisses Magma because it is a "one man show". But it doesn't seem very logical to roll your own instead.

regards, Göran

Ramon Leon 6218 days ago

Sure it makes sense, if I'm the one man. Something breaks, I can fix it, something doesn't work the way I like, I can make it, and as you said, it's fun, it's been a great leaning experience for me and it gives me a hobby project to hack on constantly. It's also blazing fast, much faster than Magma because I'm using data that's always in memory, not on disk. No mucking with indexes and slow queries, no complex configuration and connection pools, just a simple crash proof Squeak image that does what I need.

Magma's tackling a much harder problem, it's trying to be Gemstone, server based, multi image, huge databases and it pays for that with a great deal of complexity and bloat. If you're writing a blog, a small cms, or a news site like say ycombinator (which uses an approach similar to mine), you don't need all that, you just need basic simple crash proof persistence. You don't need indexes when all your data will easily fit in ram. You don't need limited query languages when you can just bang away on the ordinary collection protocol without fear of hitting the disk.

I've used Magma, GOODS, OmniBase, Glorp, ODBC with Tantalus, and SPrevayler, and in none of them am I as productive or can bang things out as quickly as I do with SandstoneDb. There's a simplicity to the ActiveRecord pattern that just doesn't get in my way like all those others do. There's under 2000 lines of code in SandstoneDb, which keeps it grokable in a way Magma just can't be. Now if I can just figure out how to add transactions...

BTW, did you give up on blogging? I'd sure like to see you write some more, for example, your recent contest with Janko on tuning Kom, I'd love to see that written up with some insight into your thought process as you do performance tuning and how you use the profiler, what you look for, what your hunches are and where they lead you. On the list, you tell us what you did, but you don't tell us how you got there or show us any dead ends you may have chased down and been wrong.

On Smalltalk