Implementing Related Posts for OnSmalltalk
By Ramon Leon - 8 December 2008 under Smalltalk, Seaside, Programming, Performance
I found a few minutes to sit down and implement a simple related posts feature for the blog. Thought I'd take the simple method of just counting the number of tags the posts have in common, sorting them, dropping those with nothing in common, and grabbing the top x posts...
SBPost>>relatedPosts
^ (((((self class publicPosts copyWithout: self)
collect: [ :post | post -> (self tags count: [ :t | post tags includes: t ]) ])
reject: [ :e | e value = 0 ])
sortBy: [ :a :b | a value > b value ])
collect: [ :e | e key ])
pageSize: self relatedPostCount page: 1
I was looking at it afterwards and thought something looked familiar about that algorithm. After stripping out two lines it became a bit more obvious...
SBPost>>relatedPosts
^ (((self class publicPosts copyWithout: self)
collect: [ :post | post -> (self tags count: [ :t | post tags includes: t ]) ])
sortBy: [ :a :b | a value > b value ])
collect: [ :e | e key ]
It's a Schwartzian Transform, named after our own Randal Schwartz (Squeak board member / one man Seaside evangelist). If you ever run into him, ask him about it, it's a funny story.
You could get the same result with just the sort block; a more naive implementation...
SBPost>>relatedPosts
^ ((self class publicPosts copyWithout: self)
sortBy: [:p1 :p2 |
(self tags count: [ :t | p1 tags includes: t ])
> (self tags count: [ :t | p2 tags includes: t ]) ])
But you'd end up doing the tag count computation on each comparison, much more expensive than calculating once up front and then sorting.
By the way, a nice justification for re-inventing the wheel and writing your own blogging software, is to have a side project just for fun that lets you come up with excuses for features like this that don't feel like work and give you something to write about as well.