Scaling Seaside

By Ramon Leon - 19 February 2007 under Ruby, Seaside, Sql

UPDATE: This advice is a bit outdated and only applied back when I was hosting Seaside on Windows servers (which I advise against), see Scaling Seaside Redux Enter the Penguin for more up to date advice.

I've been busy with non Seaside projects lately, but one of the things I have squeezed in was a bit of configuration to make Seaside scale better. I was having some performance problems when more than a few sessions were running concurrently and after discussing the issue on the Seaside developers list, Avi popped in from DabbleDB and told us the way to scale was load balance many VMs with a few sessions running on each. So I had to read up a bit on the issue and learn what to do.

It turns out one of the best ways to learn about scaling Seaside is to read about scaling Ruby on Rails. The architecture for scaling both is pretty much the same. Ruby developers use a web server called Mongrel, a light weight single threaded server that works well with Rails but isn't a heavy duty web server like Apache. This is much the same position Seaside is in with Comanche, though not single threaded, the Squeak VM can't take advantage of multiple processors and doesn't do well with too many concurrent connections.

UPDATE: I neglected to mention one major requirement of Seaside, your load balancer must support sticky sessions. Seaside uses sessions heavily and does not support the shared nothing approach where every request can hit a different server. This isn't an issue unique to Seaside, many frameworks use sessions and must deal with this. Other frameworks handle such issues either by sticking a session to a server, or by having a shared session cache that all the servers can access, such as memcached or a sql server. Currently, to the best of my knowledge, no one has externalized Seaside sessions in this manner, so sticky sessions is the only viable approach.

The solution for both is quite simple really, setup a heavier duty web server/load balancer (Apache/LiteHttpd) to serve up static content, and load balance and proxy connections to a farm of light weight application servers (Mongrel/Comanche) running on other ports. I'm actually using an F5 as my front end load balancer, but Apache has all the necessary features including the ability to create pools of virtual servers which it will load balance requests across with its new mod_proxy.

One only need Google scaling Rails to find many examples of detailed setup information and articles for such a setup, so I won't repeat it here. I will mention that neither Mongrel or Comanche are anywhere near as rock solid stable as Apache, so one thing you'll want when having a setup like this to ensure maximum uptime is a process running to poll all of your servers to ensure they aren't hung up for any reason.

UPDATE: I don't want to imply Comanche is unstable, it is very rare that a service needs reset, I only do this because it "can" happen, not because it happens a lot. Seaside is very stable and under normal conditions, doesn't crash.

Here's a little bash script I found somewhere that makes checking a site for a specific response string simple and easy to use from the command line allowing you to easily schedule some scripts to reset any hung processes. Oh, I use cygwin on all my servers so I can have a decent Unix command line on my Windows servers.

checkSite.sh UPDATE: This script eventually freaks out Windows and causes random network errors because lynx somehow eats up network resources. Don't use it, Seaside is quite stable without it.

#!/bin/bash

if [ $# -lt 2 ]; then
    exit
fi

URL=$1
findText=$2

lynx -dump -error_file=/tmp/x$$ $URL 2>/dev/null | grep "$findText" &>/dev/null
if [ $? != 0 ]; then
    echo WARNING: Specified search text was not found
fi

stcode=`awk '/STATUS/{print $2}' /tmp/x$$ 2>/dev/null`
if [ $? != 0 ]; then
    echo site is down
fi

for code in $stcode
do
    case $code in
      200) echo OK;;
      302) echo redirecting
           awk -F/ '/URL/{print "  ",$3}' /tmp/x$$;;
      *)   echo $code
    esac
done

if [ -f /tmp/x$$ ]; then
    rm /tmp/x$$
fi

Then I use this in another script built specifically to monitor instances of my Seaside app. Though this could be more generic, I haven't bothered yet because I only have one Seaside site in production to worry about.

checkSeaside.sh

echo "$1 $2 $3..."
sh checkSite.sh http://$1:$3/seaside/someApp "Some Required Text" | grep WARNING &>/dev/null
if [ $? = 0 ]; then
    echo "restarting $2 on $1"
    psservice \\\\$1 restart "$2" >/dev/null #NT util to restart services on remove machines
    echo "Restarting $1 $2" | wsendmail -- -CGI -Ssome.mail.server.com \
        -s"App Monitor reset $1 $2" someEmail@someAddress.com -Fmonitor@someAddress.com -P1
fi

Then on each web server, I'm running a pool of 10 instances of Seaside setup as services and ensuring they're up by scheduling a simple batch file with the windows task scheduler.

monitorSomeApp.cmd

@echo off
bash checkSeaside.sh serverName "Some Service1" 3001
bash checkSeaside.sh serverName "Some Service2" 3002
bash checkSeaside.sh serverName "Some Service3" 3003
bash checkSeaside.sh serverName "Some Service4" 3004
bash checkSeaside.sh serverName "Some Service5" 3005
bash checkSeaside.sh serverName "Some Service6" 3006
bash checkSeaside.sh serverName "Some Service7" 3007
bash checkSeaside.sh serverName "Some Service8" 3008
bash checkSeaside.sh serverName "Some Service9" 3009
bash checkSeaside.sh serverName "Some Service10" 3010

Yea, I'm mixing and matching bash and dos scripts, so sue me! Anyway, the setup works great, I'm running 30 instances of Squeak across 3 servers and these scrips ensure they're always up and responding, and reset them and email me if they go down for any reason. Response time is now much better and I can fully take advantage of the multiple processors on the web boxes.

My process isn't nearly as fancy as Avi's (he's dynamically bringing images up and down based on the host header), but balancing the connections across a bunch of fixed sized pools works well. I started with 10 processes per box, just for the hell of it, but I'll increase or decrease the size of the pool as load dictates to eventually find the sweet spot for the pool size. For now, 10 per box works, I've got plenty of spare ram.

Of course, doing a setup like this means you'll need to automate your deployment process for new code as well. So far this mean keeping a master image I upgrade with new code and test, then a script to take down each process, copy the image file over the old one, and bring the process back up. Seems Avi's doing the same thing, works pretty well so far.

Comments (automatically disabled after 1 year)

Avi Bryant 6738 days ago

Ramon, thanks for writing this up.

Nate Drake 6737 days ago

Dugg: http://digg.com/programming/Scaling_Seaside

Ramon Leon 6737 days ago

No prob Avi, someone had to.

Gábor Farkas] 6737 days ago

i always wondered how seaside deals with session-data.

i mean, is it after every request persisted to some kind of database, like it happens with shared-nothing architectures (rails,django etc.)?

because if not, then when you load-balance, you have to make sure that all the requests for a certain session arrive at the same seaside-instance, or not?

so, how is it with seaside?

Ramon Leon 6737 days ago

Yes, you have to use sticky sessions so that once a session is established it stays on the same server. Every front end worth using has an option to enable sticky sessions.

How to scale Seaside « Tekkie 6735 days ago

[...] Leon at On Smalltalk has made a good start at providing an answer on how to scale Seaside. His answer is to run multiple Squeak images at the same time, and have a load balancer choose [...]

Ramiro Diaz Trepat 6731 days ago

The problem I still see with scaling seaside apps is that you cannot use a reverse proxy and decide which parts of your web site NOT to dynamically generate at a certain point. With seaside you HAVE to generate everything for every request, and you cannot cache pages because the parameters _s and _k are different for every session. Then, for instance, if I have a large catalog of products that rarely change, and a web page that displays the iniformation for a particular product, I will always have to dynamically generate this page for a particular product, It cannot be cached and statically served from a reverse proxy, even if it got generated 20 times on the last second. I believe this is a real waste of resources, and makes seaside applications a lot harder to scale, and probably unsuitable in practice for real large applications. May be there is a way around this issue, I don't know. I have asked about this at least a couple of times on the mailing list but I got no answer.

Ramon Leon 6730 days ago

The _s is different for every session, the _k is different for each request. Don't get too hung up on that, you can just as easily parse the state from the url path if needed allowing more restful pages, Pier does this.

Seaside btw, isn't designed to build mostly static "web" sites, it's designed to build highly dynamic "applications". Applications comparable in complexity to desktop applications, where caching has little if any value. Applications that are so complex that doing them in another framework isn't really feasible.

Also, that it renders pages dynamically doesn't make Seaside any harder to scale, it simply makes Seaside require more resources (hardware) to scale, but if you're serving up so much static or mostly static content that you're really concerned about caching, then maybe Seaside isn't the framework you need.

Ramiro Diaz Trepat 6729 days ago

Hello Ramón, May be I did not express myself properly. By no means I talked about a static web site on my post. I talk about caching a dynamic web site which is enormously different. Most dynamic web sites can benefit enormously from this, including yours. I disagree about Seaside being designed for "higly dynamical" web sites only; where would you put the threshold that divides regular dynamic web sites from "highly dynamic" web sites. I think there is no use in making this categorization. Not to talk about hypothetical web sites, let's talk about your real neat travel reservation web site (which I think it's the coolest Seaside app I've seen so far with DabbleDB), Is it "highly dynamic"?. Don't you have for instance, a catalog of hotels? Imagine your web site becomes real popular on the next soccer world cup and you have 50 requests per second quering about a specific hotel on a specific town. I don't think hotel information changes really often, nevertheless, you still have to generate it dynamically for each of those 50 requests per second you are serving.
Imagine you could dynamically generate the page for the hotel only when the underlying object model changed, and all the time in between the page is served from a cache, at lightning speed and in comparison without consuming almost any resources. Wouldn't that produce a MUCH MUCH better use of your servers? You could probably handle an order of magnitude or more traffic, and you would not lose anything of your dynamic functionality. Just like in all the other frameworks. So no, I'm not talking about static web sites, nor "less dynamic" web sites than what Seaside was designed for.

I also think that one of the biggest problems in my beloved Squeak community is looking the other way when someone points to a problem. I've seen so many pragmatic and clear problems dragged to a philosophical ground of discussion about "what is really good", where any opinion can be relativizied and then dismissed. For instance, when a new guy comes and says that Squeak user interface is outdated; poor dude, then come the philosophical threads about "what is real good". Who can dare to say that holds the truth? And the problem is annihilated. Se we should learn to acknowledge the problems, not to look the other way with articulated rethoric. I believe the problem I am pointing out about Seaside's inability to be proxifyed, is a MAJOR problem, it really sucks. It makes Seaside applications consume a lot more resources than they should also and be a lot harder to administrate on a high traffic web site. I HOPE I AM WRONG, since I really love Seaside.]

Ramon Leon 6729 days ago

I'm not disagreeing that the ability to cache the site wouldn't be nice. Certainly it would, but it's a very complicated issue because Seaside relies so heavily on Sessions to enable all the magic that makes working in it so great.

Those pages have so many Ajax callbacks in them that assume there is state sitting on the server waiting to answer them. Were a page somehow served from a cache, the server side session would eventually expire, the state would disappear, and the cached page would no longer work. The cache is also user specific, so without some kind of major core rewrite of the one thing that makes Seaside different, and a joy to work with, Sessions, I just don't see how such a page can be cached.

I do have caching in the site, but I do it at the db call level, rather than at the page level. To make a cache work, you'd have to make the calls stateless, encoding the necessary state information into the URL directly and re-fetching the objects per request, and running stateless with data encoded into the URL is everything Seaside tries to avoid.

I could be totally wrong, maybe someone smarter than me will come along and see an elegant way to do it without losing that Seaside feel, but I just don't see how it'd work. Of course, I'd not complain at all if someone figured out how to do it. ;)

At the moment, my time costs far more than web servers do, so I'm happy to throw hardware at the problem to make it scale and serve up everything dynamically. As far as I'm concerned, it's a small price to pay for the joy of programming in Seaside.

I would of course, love to see a deeper discussion on the subject by those more knowledgeable than me.

Vincent Girard-Reydet 6695 days ago

Ramiro,

I think it is possible to perform caching with Seaside, but not in the traditional way of doing. For sure, you cannot cache the whole content of a page on a front-end. But what about caching pieces of page in the server's memory ? I personally do it for configuration stuff, such as would be a list of hotels. The only thing you would need to do is perform dynamic replacement of session keys in the cached content - something quite easy and performant with Squeak. The major problem of applications such as one you're describing is accessing the database - most of the time, this is the real treshold. If you cache the results of your queries in memory - and there's nothing preventing you of doing it with Seaside - then you would have a very fast generation time. I agree with you, it would not be as fast as a completely cached page. But it would still be fast. Another approach could be to have static front-end pages querying a back-office Seaside app. This requires to THINK your application differently, but this is also a viable approach. Anyway, Seaside requires to think your app differently.

Ramon Leon 6695 days ago

If you're using Seaside, you're already thinking quite differently. ;)

Michael Gorsuch 6630 days ago

Hi Ramon - I'm starting to nervously work with Seaside to develop some of my side projects. I love the entire Seaside paradigm, and have already seen my productivity go up, therefore see no point in 'turning back' to my former Ruby addiction now.

My question to you is how many concurrent session can a single image really handle? I saw that you mentioned 'a few', but am hoping to get a little more clarity. Considering that my work right now will be mostly 'start-up like', I need to know if I can scale any of this out on my VPS or not.

Thanks for all the insightful articles and contributions!

David Mitchell 6629 days ago

See this thread:

http://www.nabble.com/Server-sizing---100-concurrent-users-tf3851191.html#a10940905

Unlimited GemStone VMs in every Garage? ….and a Stone in every Pot « (gem)Stone Soup 6608 days ago

[...] that are running in Squeak (or VW, or Dolphin) arrange for all http session requests to be routed to the same VM, using some flavor of session affinity. Given the GemStone limitation that only one concurrent [...]

Sebastian 6551 days ago

I'm using monit for this. It's very simple and it even has a web interface to manage monitored processes. It also can restart processes for diferent reasons and send you email alerts.

Links to some content « BarCampKC - May 9-10 2008 6290 days ago

[...] pointed me at this post about Seaside. Based on some side conversation and the panel I thought others might be interested as [...]

On Smalltalk

Scaling Seaside

Related Posts

Comments (automatically disabled after 1 year)

Topics