Login

Scaling Seaside Redux: Enter the Penguin

In my last post titled My Journey to Linux I spoke about Linux. Having decided to use Linux to host my Seaside servers, I was now faced with picking a distro. I chose Ubuntu Server. I've heard good things about both Debian and Ubuntu a distribution based on it and I wanted something clean, free of bloat, and designed to be used as a server. I also wanted to try a distro I'd never tried before. In the past, I've tried Red Hat, its offshoot Fedora, and Slackware. I almost tried Suse but it seemed too aimed at the desktop and I was looking for something to use as a server. Ubuntu seemed to fit the bill nicely. I grabbed a spare test box, and decided to explore the idea of hosting Seaside on a Linux server by installing Ubuntu.

I popped in the CD burned from downloading the ISO image and was greeted with a nice menu asking me what I wanted to do. I chose to install server to hard drive and proceed through an install that reminded me very much of the old NT4 install but better. The install finished, successfully detected all the hardware, auto configured the network (using the local DHCP server), and left me with a command line login prompt, exactly what I wanted. At this point, I was very impressed, setting up a Linux server takes a fraction of the time of setting up a Windows server and was completely trouble free. I wouldn't use DHCP for a production server, but it'll do fine for testing purposes.

Now, on to what I thought would be the hard part, setting up the necessary software. I haven't yet loaded emacs, so I'll use pico to whip the server into shape.

I spent a few minutes looking into Debian's package system, learned what I needed, and then away I went. First I needed to ensure the server was up to date.

sudo apt-get update
sudo apt-get upgrade

OK, that was easy. Now I need to install some software. Two of the main things I wanted weren't in the default Ubuntu repositories. Squeak and Daemontools. Squeak for hosting Seaside and Daemontools because I found out that's what Lukas uses to maintain his Seaside services, and HAProxy to load balance the processes. I quickly found repositories for all of them and added them to the list of repositories available.

sudo pico /etc/apt/sources.list

and added...

#daemontools
deb http://smarden.org/pape/Debian/ sarge unofficial
deb-src http://smarden.org/pape/Debian/ sarge unofficial
#squeak
deb http://ftp.squeak.org/debian/ stable main
deb-src http://ftp.squeak.org/debian/ stable main
#haproxy
deb http://ftp.sysif.net/debian sid main

Install the cert for HAProxy site

wget http://ftp.sysif.net/debian/apt_key.asc
sudo apt-key add apt_key.asc

Now fetch the new lists by updating again....

sudo apt-get update

OK, time to install everything I want. I need Squeak, an FTP server, Apache, Daemontools for managing services, a SSH server for remote access, HAProxy for load balancing, Samba for networking with my other windows servers, and Emacs because I'm going to be using it. I chose HAProxy for two reasons, there's no official Ubuntu version of Apache2.2 with modproxybalancer so installing is a pain, and even with modproxybalancer I haven't been able to get it to work successfully with Seaside. So, on to the install, simple enough...

sudo apt-get install squeak daemontools vsftpd 
sudo apt-get install apache2 apache2-utils 
sudo apt-get install openssh-server emacs21 haproxy samba

This was actually one command but 3 fits nicer on the blog. Now configure the ftp server...

sudo emacs /etc/vsftpd.conf

and make a few changes...

anonymous_enable=NO
local_enable=YES
write_enable=YES

and restart the ftp service...

sudo /etc/init.d/vsftpd restart

Now setup Apache with the modules I need...

sudo a2enmod rewrite
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod deflate

and restart the Apache service...

sudo /etc/init.d/apache2 restart

And I'm mostly setup. Wow, that was easy, much easier than I'd assumed it would be.

Time to setup Seaside. I'm no Linux expert, but I see other services running out of /etc/serviceName so I'll setup my Seaside services the same way. I'm going to run 10 processes, so I'll create one directory per process, i.e. squeak1, squeak2, etc. Using daemontools I simply have to setup a directory for my service with the files it needs and a shell script called "run" that will kick off the service. Here's the script...

#!/bin/bash
exec squeakvm -mmap 200m -headless SqueakProd "" port 3001

I'm using mmap to limit each process to a maximum of 200 megs of ram and then feed Seaside the port number to start on, so squeak1 runs on port 3001, squeak2 on 3002, etc. In each folder I put SqueakProd.image, SqueakProd.changes, SqueakV39.sources and chmod 755 the run script.

Because I'm using daemontools, I can now start my services by simply creating a symbolic link in the /service directory to the /etc/squeakX directory for each directory I created.

sudo ln -s /etc/squeak1 /service/squeak1

My services are started within a few seconds and maintained by daemontools.

Now lets setup the load balancer. Seaside requires session persistence to do the magic it does, so we need to configure HAProxy to use a cookie to ensure a user gets routed to the appropriate server each time. All that talk about statelessness being necessary is crap, an old onion in the web framework recipe that isn't at all necessary and is actually crippling. Yes, stateless websites are easier to scale but they're much harder to develop because state exists, it's just a matter of whether you marshal it manually or let the framework do it. I'll take the latter because it scales well enough for what I need and saves me boatloads of time.

sudo emacs /etc/haproxy/haproxy.cfg

using these settings for now (these could change based on load testing later)...

global
    log 127.0.0.1 local0
    maxconn 32000
    chroot /usr/share/haproxy
    pidfile /var/run/haproxy.pid
    uid 33
    gid 33
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    retries 3
    redispatch
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000

listen smalltalk 0.0.0.0:8080
    mode http
    cookie SEASIDE insert nocache
    balance roundrobin
    server app1 127.0.0.1:3001 cookie app1inst1 check
    server app2 127.0.0.1:3002 cookie app1inst2 check
    server app3 127.0.0.1:3003 cookie app1inst3 check
    server app4 127.0.0.1:3004 cookie app1inst4 check
    server app5 127.0.0.1:3005 cookie app1inst5 check
    server app6 127.0.0.1:3006 cookie app1inst6 check
    server app7 127.0.0.1:3007 cookie app1inst7 check
    server app8 127.0.0.1:3008 cookie app1inst8 check
    server app9 127.0.0.1:3009 cookie app1inst9 check
    server app10 127.0.0.1:3010 cookie app1inst10 check

Then I need to enable the proxy, it's disabled by default.

sudo emacs /etc/default/haproxy

and set...

STARTUP=1

then restart the service

sudo /etc/init.d/haproxy restart

Now I hit the box on port 8080 with a web browser to ensure everything works, it does. I now have 10 load balanced sticky session enabled instances of Seaside running behind HAProxy, sweet! Time to setup Apache as the front end to this beast to offload all the static content, URL rewriting, HTTP compression, HTTPS, and logging to it. Only dynamic requests will be proxied to the Seaside cluster. This also allows me to mix in other frameworks when necessary (Rails, .Net), all proxied behind Apache.

Now I want to disable the default site and create a new virtual host...

sudo a2dissite default
sudo emacs /etc/apache2/sites-available/linuxweb1

with the following settings...

NameVirtualHost *:80

<VirtualHost *:80>
    ServerName linuxweb1
    DocumentRoot /var/www
    RewriteEngine On
    ProxyRequests Off
    ProxyPreserveHost On
    UseCanonicalName Off

    # http compression
    DeflateCompressionLevel 5
    SetOutputFilter DEFLATE
    AddOutputFilterByType DEFLATE text/html text/plain text/xml 
        application/xml application/xhtml+xml text/javascript text/css
    BrowserMatch ^Mozilla/4 gzip-only-text/html
    BrowserMatch ^Mozilla/4.0[678] no-gzip
    BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

    #proxy to seaside if file not found
    RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
    RewriteRule ^/(.*)$ http://localhost:8080/$1 [P,L]

    # Logfiles
    ErrorLog  /var/log/apache2/linuxweb1.error.log
    CustomLog /var/log/apache2/linuxweb1.access.log combined
</VirtualHost>

Now I need to enable Apache's proxy for localhost so requests can be forwarded to Squeak; it's disabled by default. Edit the file /etc/apache2/mods-available/proxy.config or just add this to your virtual host section.

<Proxy *>
    AddDefaultCharset off
    Order deny,allow
    Allow from localhost
</Proxy *>

ProxyVia On

and enable the new site and reload Apache

sudo a2ensite linuxweb1
sudo /etc/init.d/apache2 force-reload

Now I just check the box by hitting it in my web browser, http://linuxweb1/seaside/config, verify everything works. It does, my session sticks to a single Seaside instance. It's easy to tell if something is setup wrong, without sticky sessions no Seaside link works. All in all, I'm very pleased with this setup, it runs great, performs well, and being Linux, should be easy to replicate to new servers as I add them.

I'll just pretend everything actually went this smooth, in reality, there were many points during this install where I had to stop and learn something new about Linux, or about configuring the various components required. In reality, my brain hurts a little after all this, but in a good way. I tackled a lot of new stuff all at once, but I had a lot of fun doing it.

If anyone has a better setup or pointers about mistakes I've made, I'd love to hear about it, this is my first crack at using Linux as a web server.

NOTE: I left out the samba setup because it deals with various things with my local windows network that aren't relevant to this setup. As with any article of this detail, I may have overlooked some step I took but forgot to write down while doing so, so I'll make corrections as they come up in any comments.

UPDATE: Thanks to a tip in the comments, and having learned Linux a little better now, I use "aptitude update/upgrade/install" rather than "apt-get update/upgrade/install" because aptitude will track dependencies and clean them out automatically when I uninstall stuff.

UPDATE: Since I've switched to Linux, my Seaside site has become incredibly stable and much faster than when it was hosted on Windows 2003 server. Part of the speedup comes from HTTP compression, however, the CPU load is now less than before. The hardware is slightly different as well. I went from 3 dual processor windows servers to 2 dual (both hyper-threaded so looks like 4) processor Linux servers. So it's not an apples to apples comparison, but I'll give Linux the credit anyway ;)

Comments (automatically disabled after 1 year)

Joe Jones 6501 days ago

This article is a diamond among gems. Very nice, helpful and concise. Thank you.

Karl O. Pinc 6501 days ago

Your mention of using Debian packages with Ubuntu is a little spooky. I assume you mean Ubuntu's packages.

Synaptic and aptitude will keep track of which packages are installed only as dependencies and remove them when they're un-needed. I use them instead of apt-get. There are other tools that let you do the work manually, but why? "dpkg -S", "dpkg -L" and "apt-cache showpkg" are the only apt commands I find I need besides aptitude.

You'll do better with emacs if you open a terminal window and "sudo bash; emacs" and then use C-x, C-F and C-x, C-K. In other words, leave emacs running forever and use multiple buffers. (Take the emacs tuitorial. I tend to use job control a lot (man bash) and sometimes emacs isleft running as a background job.) The other emacs trick is to put "export EDITOR=emacs ; export VISUAL=emacs" in "~/.bashrc".

The other thing I like in my .bashrc is: alias cp='cp -i --strip-trailing-slashes' alias mv='mv -i --strip-trailing-slashes' (you may or may not want the -i) This makes your tab completion sane when working with directories and softlinks. (See: info coreutils "trailing slashes")

Frankly, if you're the only admin, sudo is a pain. No harm in doing "su -" in a terminal window and just use that window when you want to be root. You'll also have root's $PATH set up properly so you'll not have to type full paths to stuff in /sbin and /usr/sbin. You can make the terminal a different color or something to distinguish it, fiddle with root's .bashrc etc.

You may also find the screen and script commands handy.
Sadly, screen does not play nice with emacs. I've heard you put "escape Bb" into ~./screenrc to have screen use C-b rather than C-a as it's escape key.

'tail -f" is your friend when dealing with daemons. E.g.: tail -f /var/log/apache2/error.log

P.S. Debian testing (etch) is about to go stable, and the security team is supporting it, so there's an argument for using it over Ubuntu. It'll probably have more packages.

Ramon Leon 6501 days ago

@Karl, thanks for the info, I didn't know about aptitude. As for the repositories, yea, I meant Ubuntu, though, it is a Debian distro.

Thanks for the emacs tips, and the admin tips as well, though I like the sudo because I won't be the only admin on the system once it's deployed. Already been doing the tail thing.

As for Debian over Ubuntu, I'm too new to all this to judge, but Ubuntu has the polish and production support I'd need to sell it here, so I'll stick with it at the moment.

Geert 6501 days ago

What is you network topology? I have an old box hanging around too and I would like to give it a whirl myself (currently just a couple of laptops using a linksys wireless DSL router). Is that SeasideProd.image your SqueakDev.image and why do you need SqueakV3.sources as well as SqueakV39.sources (Squeak newbie)?

Ramon Leon 6501 days ago

Just a standard C class private network with an NT box doing DHCP, your router probably does the same.

As for the image, no, for prod, I use a bare 3.9 image with no goodies installed, only deployed code. I'm actually about to change that and use one of Pavel's mini images but haven't gotten around to it yet.

As for the sources files, I'm not sure, it's just what the squeak VM Debian version did when I ran it, so I assumed it was necessary. I could be wrong.

Mark Miller 6501 days ago

Re: Karl's comments

Interesting. I've used full Emacs a few times in my career, but couldn't get used to it fast enough to find it useful. For what I needed at the time I stuck with uEmacs. It seemed to have a simpler command set. I understand though that it's a very powerful tool. You can get it to do a lot of things. I've heard of people using it for e-mail sending/receiving, full-screen debugging (with gdb), newsreading, web browsing, etc. in addition to editing code. I think it's safe to say if you need a tool that manipulates/displays text, no matter the application, Emacs can do it. It's just a matter of programming it to do the job, or finding a mod someone else has made for it. I wouldn't be surprised if there was an RSS feed reader for it as well.

Re: Squeak sources

It's my understanding that the Sources file is where all the source code for the Squeak system, including your application code, is stored. If you're deploying a Squeak image for something where you won't need access to the existing source code in the future, then I'd say it's safe to not include Sources. It's certainly safe to not deploy Changes, since that's just for backtracking in case you want to go back to older code. Changes is really only useful during development.

Ramon Leon 6501 days ago

That's not quite how it works. My understanding is that sources are the code for the last cut version, say 3.0 or 3.9. Changes is anything changes since that version was cut, including your app code, it's basically a transaction log for source code. Both files are necessary to see the source in your image while debugging. This is why the sources file can be shared by many images but each images needs it's own changes files.

For example, the previous sources file was version 3.0, and all the squeak versions since, up to 3.9, made the changes file grow. The file got too big so a new sources file had to be cut, and now the changes file is tiny again.

Deploying them is necessary if you want to be able to get into your live images and debug problems that happen at runtime, and truthfully, one your own servers, I can't see a reason not to deploy them.

Also, the image is the only thing strictly necessary to run, but you need to prep it to allow it to run without throwing errors.

Karl O. Pinc 6500 days ago

@Mark

The emacs tuitorial, IMO, is important because it focuses you on the essential commands. Aside from the basics like saving and opening files, the most useful commands are the movement and the cut and paste commands (kill and yank). When you get these down you don't have to reach for the mouse and replace your hands on the keyboard, an operation which takes a suprising amount of time and mental attention. Your productivity goes way up, especially in combination with macros. The only other really useful emacs command that comes to mind is M-q, the word-wrap-this-paragraph command.

Sebastian 6496 days ago

Very interesting post for Seasiders Ramon!!! I think I'll be using other linux distro but it's escentially the same. To install daemontools did you compile it or there is some .rpm? That HAProxy cookie will demand cookies in the client browsers to be enabled? Do you have a domain dedicated to that app? or you are using a subdomain? Would be nice to use subdomains for seaside apps like: app1.mydomain.com app2.mydomain.com and app1 and app2 perhaps will have 5 to 10 squeaks services. Do you know the rewrite rule? Perhaps something like: RewriteRule ^app1/(.)$ http://localhost:9090/$1 [P,L] RewriteRule ^app2/(.)$ http://localhost:9091/$1 [P,L] where 9090 is the balanced app1 and 9091 is the balanced app2 What do you think?

Ramon Leon 6496 days ago

I didn't compile it, I loaded the package via apt-get, this is detailed in the article. Yes, I'm requiring cookies, but I require them anyway for the rest of the site. I'm not on my own domain, sharing a domain with the old version of the application. I'm using a rewrite rule on the url to split off traffic to the new app. As for subdomains, I'd personally do that with a different virtual host, not a rewrite rule.

Ramon Leon 6493 days ago

@Karl, I see what you mean about Ubuntu vs Debian, I hadn't realized the difference. I'm now trying out Debian itself.

Victor 6493 days ago

This post killed!! Thanks.

Ramon Leon 6493 days ago

No problem.

Simon 6492 days ago

Ramon - thanks for an extremely informative article! I'll be using this for reference in future.

Ramon Leon 6492 days ago

You're welcome.

Martial 6485 days ago

Hi,

Very informative post! I just noticed you copy the SqueakV39.sources file to each directories but you don't need it. For Unix squeak VM, you simply add one source in /lib/squeak/3.9.8 (with =/usr or /usr/local; I prefer to compile squeakvm by hand). So, every images without sources file in the directory will use this file. If you've got 50 images running with 50 .sources files, it yields to a 800MB unuseful hole on your disc.

I use Ubuntu too. It's the best Linux I know. I still use Debian for 10 years for optimal server solution (my own server's only 256MB RAM). I use Pavel KernelImage and it works very well. I do testing by playing with a personal modified Pier-Blog and the images reaches only 8MB.

Thanks for your enlightments. It will help every Seasiders.

Kurt Miebach 6485 days ago

This is so brilliant! It's almost exactly what I am planning to do on a debian sarge amd 64 machine. I will gladly follow your instructions. It covers all that is necessary.

Thank you!

Kurt Miebach

Kurt Miebach 6464 days ago

Anyone did this on Etch already?

Radek Skokan 6442 days ago

Very useful article, thanks!

Just a note: I had to enable also Apache's proxyhttp (a2enmod proxyhttp).

Richard Eng 6392 days ago

Hmmm, I don't get it. When I try:

wget http://ftp.sysif.net/debian/apt_key.asc

I get the error message:

Resolving ftp.sysif.net... failed: Name or service not known.

Why hasn't anyone else experienced this???

cdrick 6389 days ago

it seems http://ftp.sysif.net/ is down :s

Ramon Leon 6389 days ago

Hmm... It is down, but the official HAProxy site still links to it so maybe it's temporary.

cédrick] 6389 days ago

I hope so ;) . I couldn't compile the last version on ubuntu (anyway, it was just for test purposes :) ).

Ramon, what is the version you're using ?

I also have 2 remarks: I use ubuntu feisty right now...

-to make the apache rewriting works, I had to enable the proxyhttp mod (a2enmod proxyhttp), not only proxy. Otherwise I get a forbidden message (503).

-I also have a problem with daemontools (dependency problems avoiding the configuration of dameontools-run) (>=0.76.1). I think this is because in ubuntu (etc/inittab doesn't exist...). I'll see later anyway...

Are you using ubuntu or debian ?

cédrick] 6389 days ago

...ok it's ubuntu server ;)

for daemontools (and ubuntu), I found the following link: http://packages.ubuntu.com/cgi-bin/download.pl?arch=all&file=pool%2Fmultiverse%2Fd%2Fdaemontools-installer%2Fdaemontools-installer0.76-9all.deb&md5sum=39d30c5399c937a1836782d525d857fd&arch=all&type=main

seems to give the same error...

did you have the same (pb with /dev/inittab) ?

Thanks ;)

Ramon Leon 6389 days ago

Nope, I had no problems at all, but I'm using Ubuntu 6.06 LTS, not Feisty, so maybe that's related. As for proxy_http, maybe I did as well and just forgot to write it down, I'll add it to the article.

cédrick] 6389 days ago

actually, proxy_http mod loads proxy mod but that's pointless ;)

[...] Ramon Leon gibt es nun eine Schritt-für-Schritt-Anleitung zum Installieren eines Squeak-Images unter Linux. Dabei wird nicht nur allein das Image zum Laufen [...] ]

Sebastian 6292 days ago

Hey Ramon, did you tried monit instead of daemontools? I've had a couple of problems with daemontools and I'm using monit now.

It's better because it not only start or restart the services but also can send you email alerts under lots of conditions (cpu usage, mem, etc). In ubuntu is painless and it's configuration very reasonable. It even has a web interface. Take a look I've found it really good.

Ramon Leon 6292 days ago

I haven't yet, though I've been meaning to play with monit for its alerting features. I have munin up and running for some charts of graphs of performance. I didn't learn about monit until after I had my production setup up and running.

Ric8ard 6221 days ago

Hi,

I was following this last night (with some ubuntu servers installed into VMware boxes). I've found that for ubuntu versions after 6.06, daemontools is broken (but I haven't fixed it yet), since the Ubuntu developers have decided that inittab is in need of replacement and thus have replaced it! ( Cue head-scratching and lots of "what!?! no inittab?!?" :-/ )

Also, using apt(itude|-get) to retrieve and install squeak showers me with a whole bunch of essentially: "this is broken and will not be installed" (and this is from the squeak.org repo).

Still, building things by hand is good practice, right? ;-)

regardless, thanks for the info - very useful.

Ramon Leon 6221 days ago

Daemontools isn't broken, but the install is. It thinks inittab still works, so while everything installs OK, the svscan process never gets started. Here's what you need to do.

start on runlevel 2
start on runlevel 3
start on runlevel 4
start on runlevel 5

stop on runlevel 0
stop on runlevel 1
stop on runlevel 6

respawn
exec /command/svscanboot

Save this as /etc/event.d/svscan, this is what replaces inittab. After doing this, setting permissions according to the other files there, a quick reboot, and daemontools should be chugging along just fine.

I just recently setup another Linux box and ran into this, but otherwise I basically kept the same basic process, though I no longer use HAProxy because I'd use Apache's modproxybalancer now. I also use aptitude rather than apt-get now. I avoid building by hand if at all possible, I much prefer binary packages that are known to work. I didn't run into any broken packages, but I did waste a few hours not realizing I needed more than just the squeakvm to start my services, took me a while to notice that the vm didn't include any .sources files, once I installed the squeak package everything fired right up.

Ric8ard 6221 days ago

Thanks for the extra info Ramon - you are of course right about the daemontools installer being broken rather than daemontools itself.. I'll blame it being a late night last night.. Though I don't quite know why aptitude is complaining about squeak being 'broken' (broken dependancies, my 6.06 server says) but i shall have another go tonight hopefully. (oddly enough, I have an ubuntu 7.10 server that installed squeak just fine, so i think sorting daemontools on that might be an easier proposition)

Cheers again for the info.

[...] Ramon Leon on setting up Seaside on Ubuntu with daemontools and load balancing (tags: linux seaside smalltalk programming ubuntu) [...]

Chad 6185 days ago

Re: daemontools, I ran into the same issues on Debian Etch. I ended up using runit, which appears to be an updated daemontools and installed nicely with apt.

Simon Michael 6108 days ago

As mentioned above screen by default interferes with emacs keybindings - but dtach works perfectly. A must-have for server admins!

$ ssh server sudo apt-get install dtach $ alias serveremacs='ssh server -t dtach -A emacs emacs' $ serveremacs # start or connect to emacs on server

about me|good books|popular posts|atom|rss