Generating a Site Map for OnSmalltalk
By Ramon Leon - 9 December 2008 under Programming, Seaside, Smalltalk
OK, so any website that wants to be indexed well by Google (and those other guys) should be generating an XML sitemap for the search engines to index. A sitemap is nothing fancy, though it can get more complex if you choose to take advantage of more of its features; I prefer a simple version with everything marked as updated weekly.
I also prefer to invoke the generation of the sitemap manually and to generate it as a static file that Apache can serve up rather than having Seaside build one dynamically (though I'll probably change my mind later). My blog has an admin panel with a menu option to generate site map which invokes...
generateSiteMap
| siteMap |
siteMap := SBSiteMapGenerator blogRoot: 'http://onsmalltalk.com/'.
siteMap generateFromItems: { (SBPost new) } ,
(SBPost findAll: [ :e | e isPublished ]) , SBTag findAll.
(siteMap pingGoogleWithMap: 'http://onsmalltalk.com/sitemap.xml')
ifTrue: [ self message: 'Map generated and Google notified successfully.' ]
ifFalse: [ self message: 'Map generated but Google notification failed.' ]
The first item in the list, the empty new post, creates and item without a slug which represents the root of the site. I don't bother pinging the other search engines, the vast majority of my traffic comes from Google, the rest will find me eventually. So let's run through the generation of this sitemap, it's only a few methods. The class declaration...
Object subclass: #SBSiteMapGenerator
instanceVariableNames: 'document root blogRoot'
classVariableNames: ''
poolDictionaries: ''
category: 'OnSmalltalkBlog-Config'
A couple of accessors for the blog root...
blogRoot
^ blogRoot
blogRoot: aRoot
blogRoot := aRoot
And a constructor that uses it...
blogRoot: aRootUrl
^ self new
blogRoot: aRootUrl;
yourself
Since I'm going to write the sitemap to disk, I'll need to know where to put it, and I'll want it configurable...
siteMapPath
^ (FileDirectory
on: (SSConfig at: #blogWebRoot default: FileDirectory default fullName))
fullNameFor: 'sitemap.xml'
Now a method to generate the document, add the items to it, and write the file to disk...
generateFromItems: someItems
document := XMLDocument new
version: '1.0';
encoding: 'UTF-8';
yourself.
root := (XMLElement named: 'urlset').
root attributeAt: 'xmlns' put: 'http://www.sitemaps.org/schemas/sitemap/0.9'.
root attributeAt: 'xmlns:xsi' put: 'http://www.w3.org/2001/XMLSchema-instance'.
root attributeAt: 'xsi:schemaLocation' put: 'http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'.
document addElement: root.
someItems do: [ :e | self addItem: e ].
FileStream forceNewfileNamed: self siteMapPath
do: [ :f | f nextPutAll: document asString ]
For each item, I'll want to generate an entry. The item is expected to respond to two methods, #createdOn, and #slug. All of my posts and tags respond to these so I can just toss then into a single list of items...
addItem: anItem
| url location lastModification isoString changeFreq |
url := root addElement: (XMLElement named: 'url').
location := url addElement: (XMLElement named: 'loc').
location addContent: (XMLStringNode string: self blogRoot , anItem slug).
changeFreq := url addElement: (XMLElement named: 'changefreq').
changeFreq addContent: (XMLStringNode string: 'weekly').
lastModification := url addElement: (XMLElement named: 'lastmod').
isoString := String streamContents:
[ :stream | anItem updatedOn printOn: stream withLeadingSpace: false ].
lastModification addContent: (XMLStringNode string: isoString).
With the file generated, we're ready to let Google know we've updated it...
pingGoogleWithMap: aMap
^ (WAUrl new
hostname: 'www.google.com';
addToPath: 'webmasters/tools/ping';
addParameter: 'sitemap' value: aMap;
yourself) asString asUrl retrieveContents content
includesSubString: 'Sitemap Notification Received'
And that's it, Google knows the site's been changed and all of its valid URLs, and most of the time, is crawling the site within minutes, if not instantly.
I've got to say, I'm not missing Wordpress at all; it's a lot more fun just building your own blog.
Comments (automatically disabled after 1 year)
Thanks for the heads up.
Ah, dumb mistake on my part, wasn't creating a new sitemap everytime, but opening up the existing one and appending to it.
I really like your blog and I read it often, when I tried to check out the sitemap you generated at http://onsmalltalk.com/sitemap.xml ...
I got: XML Parsing Error: not well-formed Location: http://onsmalltalk.com/sitemap.xml Line Number 438, Column 60:<lastmod>2008-11-30T23:00:17+00:00</lastmod></url></urlset>/urlset>
You might have a small bug that is duplicating part of the closing tag.