On Simple Functional Idioms
By Ramon Leon - 6 July 2007 under Programming, Ruby, Smalltalk
I ran across an post by David Altenburg about the enumerate, map, filter, accumulate pattern. These are certainly common functional techniques, but they're also common Smalltalk techniques. We call them do:, collect:, select:, and inject:into:.
He showed a Java implementation of a simple and common programming problem...
int maxSalary = 0; for (Employee employee : employees) { if (Role.PROGRAMMER.equals(employee.getRole())) { int salary = employee.getSalary(); if (salary > maxSalary) { maxSalary = salary; } } }
Comparible to a C# 1.0 version...
int maxSalary = 0; foreach(Employee employee in employees) { if(employee.Role == Role.PROGRAMMER && employee.Salary > maxSalary) { maxSalary = employee.Salary; } }
Then he does a Scheme version using a more functional style...
(define (salary-of-highest-paid-programmer records) (accumulate max 0 (map salary (filter programmer? records))))
And a Ruby version of similar style, note the Smalltalk'ishness of it...
employees. select {|emp| :programmer == emp.role }. map {|emp| emp.salary }. inject {|m, v| m > v ? m : v}
And for comparison here's a Smalltalk version...
(employees select: [:emp | emp role = #programmer ] thenCollect: [:emp | emp salary ]) inject: 0 into: [:a :b | a max: b ]
Though if I found the pattern that common I'd define a quick helper method on Collection to clean up the syntax to this...
employees select: [:emp | emp role = #programmer ] collect: [:emp | emp salary ] inject: 0 into: [:a :b | a max: b ]
What's interesting about them to me is really the difference between the Java/C# approaches and the more composable Scheme/Ruby/Smalltalk style. Because Java/C# 1.0 lack first class functions (C# actually has them now in 2.0 and a nice lambda syntax in 3.0) and a literal syntax for declaring them, everything ends up being a special case of for/foreach. They lack the ability to factor out common patterns like select:, collect:, and inject:into: into higher order functions.
As a side effect, select: turns into nested if's and a loop, collect: into a temporary variable and a loop, and inject:into: into both an if, a temporary variable, and a loop. You end up writing these patterns inline over and over rather than just reusing existing versions from the base library.
Languages that can't support and make idiomatic the use of higher order functions, aren't worth using these days in most domains. They're just too damn much work. However, using functional techniques doesn't at all mean using a functional language. All of these work quite well in good object oriented languages like Smalltalk and Ruby.
Comments (automatically disabled after 1 year)
Um. Why can't Java and C# support higher order functions or make them idiomatic? I'm not a fan of those languages. Maybe I just still don't get functional programming.
If you can write a procedure of any kind that takes an input and returns an output, don't you then have some kind of higher order function? So if you were using Java and wanted map/filter/reduce/inject/whatever, why not take a few minutes and write a module/class/helper with those items in it?
Even better would be to support Iterators like Python. I've been using Python's 'itertools' module which has lazy versions of map, filter (select), etc..., using Python's simple Iterator's protocol. As a result I've been able to build up complex filter chains by grouping small functions together (isactive, hasdates, is_past). In basic Smalltalk, Ruby, etc, if I were to nest these like I'm doing in Python, the collection would be iterated for each little case. But with Iterators it only gets run through once, even though I'm nesting a lot of calls.
Inside of all of these 'ifilter' calls, and to the outside world, it all looks the same. The input and output is always an iterable. you can just keep stacking them up.
For example, you could do the following (assuming you had the small 'is_...' functions defined):
import itertools, operator maxsalary = max(itertools.imap(operator.attrgetter('salary'), itertools.ifilter(isprogrammer, employees)))
Or you can use Python's "generator comprehensions" and do the following without having to put up with map and filter for such a simple use case:
max_salary = max(emp.salary for emp in employees if emp.role is Programmer)
Note that Python's built in 'max()' function can take in either an iterable/sequence, or multiple arguments (ie: max([1,3,5,7,9]) and max(1,3,5,7,9). No need to deal with inject or reduce here, which I must admit is the one fuctional idiom I can do without (writing a function like Python's max() is a lot easier to read in production code. Same thing with sum() - much better than reduce(operator.add, seq) or reduce(lambda a,b: a+b, seq))
I used to love using Prototype.js when I did Javascript because it added the collect, inject, etc methods to most array-like objects. But it didn't apply them to all. So I was constantly wasting cycles converting things to arrays or wasting cycles and memory when nesting maps/filters/reduces because it had to make a new list every time. Then I went to MochiKit, which is heavily influenced by Python. In its base library it has 'map' and 'filter' and other helpers for this style of programming, which I do love to use. But even better - it has an 'Iterators' module that does Python style Iterators and provides a lot of lazy implementations of the same features. By using adapters, it can provide the iterator protocol for 'array-like' objects (some annoying pseudo-collections that JS has that have less methods than the base Array). Check it out as a good example of providing "simple functional idioms" to a C-based for-loop style language like Javascript.
http://www.mochikit.com/doc/html/MochiKit/Iter.html
Using Iterators and Generators in Python (and with MochiKit in Javascript.. and I believe JS 1.7 sports some iterator features natively now), you can get some really powerful composition. By taking in and returning an iterable, which may be lazy, you could build up a large complex filter/query together over time by running the results through more generator functions, and still have it exhaust its primary resource only once. You don't need long methods like 'select:thenCollect:andInject:into' because you can just nest function calls - while still taking advantage of an object oriented language.
Sorry, this is a bit of an exhaustive comment. In summary - I rarely program in Javascript without MochiKit's Python based functional idioms by my side. Any C-based language can do the same. Granted, it's not idiomatic for everyone in that case, but it's still very useful. With a little bit of work, you can write a library that can make the most intolerable language more bearable. And also - I think there are even better idioms like Python's Iterators and Generators which can be used to compose very powerful filters (using these same basic techniques) without running through the collection each time, all by supporting a very very simple iterator protocol.
Piers, here's a single walk, which I personally prefer as well,
employees inject: 0 into: [:x :ea | x max: (ea isProgrammer ifTrue: [ea salary] ifFalse: [x])]
The io version is most compact:
employees select(role == "programmer") max(salary)
See http://www.dekorte.com/blog/blog.cgi?do=item&id=2752
@jeff, without first class functions, these patterns can't be idiomatic because you're forced to define every little predicate as a separate class making it less work to just write the foreach instead. Newer version of C# actually do have first class functions, and 3.0 even has a nice syntax for them. That doesn't help much when the code you maintain is written in 1.1 and you're too busy to justify upgrading.
"If you can write a procedure of any kind that takes an input and returns an output, don't you then have some kind of higher order function?"
No, that'd be a regular function. A higher order function takes other functions, not their values as it's inputs or outputs.
As for those worried about multiple iterations of the collection, optimization isn't the point. The point is composition of existing code in a chain. A very common chain that occurs over and over in code and could be optimized behind the scenes without cluttering the expression of the solution with imperative constructs.
Declarative solutions are nicer because they express only the "what" while the imperative solutions get all bogged down in the "how".
The IO solution is interesting, I'd considered that same solution in Smalltalk but left it out because of the Symbol value hack it uses to avoid the block
(employees select: [:emp | emp role = #programmer ]) detectMax: #salary
Boris: I actually prefer the higher order select: thenCollect: andInject: into:, which can be implemented as:
select: selBlock thenCollect: collBlock andInject: anObject into: injBlock ^self inject: anObject into: [:i :rawItem| (selBlock value: rawItem) ifTrue: [injBlock value: i value: (collBlock value: rawItem) ] ifFalse: [i]]
Damn, what's the magic for putting a code block in comments?
pre tags of course, fixed.
Well, the real beauty is that you could go nuts with this and simply create things to do,
employees max: [:ea | ea salary] if: [:ea | ea isProgrammer]
where,
max: fetch if: condition ^self inject: 0 into: [:max :ea | max max: ((condition value: ea) ifTrue: [fetch value: ea] ifFalse: [max])]
Of course you could also use that do things like finding the largest even number in a collection,
numbers max: [:ea | ea] if: [:ea | ea even]
Of course, if you had such predicates as isProgrammer on the model, then I'd just call your method like this...
employees max: #salary if: #isProgrammer
But this is really just another version of select:detectMax
(employees select: #isProgrammer) detectMax: #salary
Well, I personally hate perform:'s since they fall outside of rewrite capabilities, but that's a whole other discussion. Either way, we could come up with a million ways of doing the same thing, but in some cases it is important to have composite patterns that do a single walk even though they tend to be a little less reusable, but such is the case with all composites :)
> For all its faults, the Java version does at least only walk through the collection once.
That's an optimization for performance instead of an optimization for clarity (programs should be written for humans to understand, and incidentally for computers to execute). Plus it can be solved by fusion (which the Haskell guys are currently exploring)
> So if you were using Java and wanted map/filter/reduce/inject/whatever, why not take a few minutes and write a module/class/helper with those items in it?
Because there is no possibility to easily inject behavior, in Java or C# you need to creates classes for everything, and at best you can create an anonymous class with a previously defined type, and a very low genericity.
> In basic Smalltalk, Ruby, etc, if I were to nest these like I'm doing in Python, the collection would be iterated for each little case. But with Iterators it only gets run through once, even though I'm nesting a lot of calls.
That's a runtime/optimization issue, it has no relation to the concept itself, which is exactly the same in Python, Smalltalk, Ruby, Scheme or Haskell.
I just realised, I meant to point out Adam Turoff's excellent article about this very topic at: http://notes-on-haskell.blogspot.com/2007/02/whats-wrong-with-for-loop.html
So I'll do it now.
And I could have sworn I'd tried using <pre> before and had the comment rejected. Ho hum.
Ach. Thanks for clearing things up. I guess I've just gotten so used to Python and other dynamic languages over the past ten or so years. I forgot that the big static languages made such things so difficult. No wonder my eyes glazed over whenever anyone tried to explain the rationale for Templates in C++. I hope that I never have to return to such a world.
[...] On Simple Functional Idioms [...]
Excellent comments here.
What Ramon is describing reminds me of a time when we were using an API-based database engine more than 10 years ago, at a company where I used to work. The library was lightening fast. We hadn't found another DB engine that could match its speed. The API did not understand the concept of joins, though. So we did them manually. We'd set up one filter for one side of the join, and another for the other, iterate over both of them, using an outer and inner loop (for the one-to-many relationships), and we'd test for a condition to then launch into another set of steps, or perhaps populate a UI. We did this over and over again. It was so repetitive it started to get boring and tiresome. It had the same feel to it as the enumerate, map, filter, accumulate pattern in the article Ramon links to. I think we tried the idea of creating a function to do this work, but the language (C) was not flexible enough to allow it. I suppose we could've littered the code with special-case, one-use functions and passed them in as function pointers. We knew of function pointers. I assume the scheme wouldn't have worked, because we didn't implement this. Quite a few of our iteration loops were customized with various interim logic steps. It was never quite the same loop twice, though as a basic pattern it was very consistent.
Then, on a different project the customer required us to work with an Oracle database, and I learned SQL. What a difference! I could express joins in a succinct, declarative manner, and get the desired result. It made me wonder why we did it the other way for so long.
What a great example Mark. Database joins and where clauses are a perfect example of where for loops can be replaced by more reusable declarative syntax to greatly simplify the most common use cases.
Databases are also a good example of suffering the same problem, the failure to abstract beyond the primitive operations. Databases have one primitive operation for viewing data, the "SELECT". It's the one hammer that must fit all problems.
When working with collections in Smalltalk, we've abstracted beyond that into a handful of commonly used methods starting with #select:, but also #detect:, #detectMax:, #detectMin:, #detectSum:, #reject:, #collect:, #inject:into:, #includes:, #contains:, #do:, etc..
Some languages encourage abstractions and allow you to build upon your previous work, others don't. Sometimes this is a limitation of the language, sometimes it's a limitation of the culture.
C# is an interesting example where the language has become capable, but the culture is seriously lagging behind.
FWIW (I'm a month and something late on commenting) there are two ways to improve the "Smalltalk" way.
First, you could use "Lazy Collections", which provide an implementation of #select:/#collect: that does not create intermediate collections. Then you would do this without any performance penalty:
programmers := employees select: [:emp | emp role = #programmer ]. salaries := programmers collect: [:emp | emp salary ] ^salaries inject: 0 into: [:a :b | a max: b ]
But since this breaks code in interesting ways, there is another way around it, which is to implement #select: and friends on Streams. GNU Smalltalk does this for example, so that you can write:
data := employees readStream. programmers := data select: [:emp | emp role = #programmer ]. salaries := programmers collect: [:emp | emp salary ] ^salaries inject: 0 into: [:a :b | a max: b ]
This creates three different kinds of streams (a ReadStream, a FilteringStream, a CollectingStream) and not even a single temporary collection. Everything is just hidden behind the factory methods, which are polymorphic with Collection's.
Since in the upcoming version of GNU Smalltalk (3.0, release candidate will be ready in a two/three months I guesstimate) database rows are a subclass of Stream, you get this for free for database access too!
All very interesting ideas, guess I'll have to give GNU Smalltalk a try soon, I hear it's pretty good at scripting so I was going to get to it anyway.
Your welcome, I guess I'll have to release a new prerelease soon because there were some big bugfixes recently.
I hope to port Seaside at the ESUG conference next week...
Why smalltalk has no operator/keyword for piping ?
obj collect: [ :x | ...] | filter: [:x | ...] | select: [:x | ...]
?
but has... the ";" ?
I don't care to send messages to the reciever of the previous message, because I return self when no return value is important for example ....
point x: 10 | y: 20 | z: 30
x returns self, y returns self, z returns self -> which is the updated point.
works exactly as: point x:10; y:20 ; z:30.
But you lose PIPES !!!!! Can somebody elaborate on this ?
I don't like your idea of writing helper methods like ...
employees select: [:emp | emp role = #programmer ] collect: [:emp | emp salary ] inject: 0 into: [:a :b | a max: b ]
What if I need to inject first, and another case where I have other seletes and injects not just 3 operations. I that is your raccomanded style you would need to write for me quite a lot of helpers ... but at least in smalltalk you CAN do it :-) .. but again I think this is a Smalltalk Issue.
Note that I'm working an a programming language that will have support for multiple dispatch, Array programming, based heavily on Smalltalk and a Pipe&Filters style...
What about this ?
Smalltalk ...
employees select: [:emp | emp role = #programmer ] collect: [:emp | emp salary ] inject: 0 into: [:a :b | a max: b ]
employees $ (employees:role = `programmer) :salary :inject [\a b\ a :max b]
$ = selection ... takes an array on the left and an array of booleans on the right ... returns elements from the first array that have TRUE on the same index of the second array.
array :filter
sends all the objects in array to the filter process yielding an array of filtered objects.
Check F-Script on Google. Array programming rocks on collections. It has ALL smalltalk stuff PLUS message patterns. There is a paper there where they literally smash functional programming/lambas/blocks ... they have convinced mee that array programming is the right way to think about collections. Note that love functional programming. but array programming is better...
Again on smalltalk ...
Stupid example (please replace add1 wich ... YourComplexDifficultLongCollectionOperation)
{10. 20. 30. 40} :add1 :add1 :add1 :add1 :add1 >>{15. 25. 35. 45}
Compare with smalltalk ...
Please I'm a smalltalk novice correct me If I'm wrong....
((((({10. 20. 30. 40} collect: [ :elem | elem :add1]) collect: [ :elem | elem :add1]) collect: [ :elem | elem :add1]) collect: [ :elem | elem :add1]) collect: [ :elem | elem :add1])
I think that PIPES are more important than the cascade operator as most of the cases where cascading is helpfull have the previous message returning self !!!... there are exceptions but I'd like not to loose pipes for them.
Any comments ?
Actually, if you're dying to work with collections so tersely, you just rig up your collection to send any message it doesn't understand to collect by default, so your example would look like this...
(((({10. 20. 30. 40} add1) add1) add1) add1) add1.
Perfectly valid Smalltalk. You could get rid of the parens by defining a pipe on object like so
Object>>| anArg ^ self perform: anArg
And then pipe symbols together like so...
{10. 20. 30. 40} add1 | #add1 | #add1 | #add1 | #add1
But you'd have to change the compiler to make the pipe work like ; without the symbols (I think).
However, Smalltalk is an object oriented language, not an array oriented one, and none of this stuff would be considered idiomatic so I probably wouldn't do it.
You can override the default compiler and do whatever you like with the syntax, if you're brave.
Actually, if you're dying to work with collections so tersely....
I usually do stuff using lots lists, maps, sets, ... instead of making complex hierarchies ... i do .. filter here, than zip it, make a map, select a few xyz ... check if is in a set ... re-filter , remap dispatch to this and to that ... remap ... ... stuff like that ...
and Smalltalk supports very well the kind of code I like to write ... because of it's collection protocol ... but there is a BETTER way of doing it ... which doesn't conflict at all with object orientation and even smalltalk keyword syntax. That is ... like this guy is doing ... http://www.fscript.org/documentation/OOPAL.pdf ... at least in my humble opinion.
The paper also explains ... what is the problem with this:
.... just rig up your collection to send any message it doesn't understand to collect by default, so your example would look like this...
(((({10. 20. 30. 40} add1) add1) add1) add1) add1.
{10. 20. 30. 40} add1 | #add1 However, Smalltalk is an object oriented language, not an array oriented one, and none of this stuff would be considered idiomatic so I probably wouldn't do it.
Actually, if you're dying to work with collections so tersely....; .... just rig up your collection to send any message it doesn't understand to collect by default, so your example would look like this....
(((({10. 20. 30. 40} add1) add1) add1) add1) add1.
{10. 20. 30. 40} add1 | #add1
However, Smalltalk is an object oriented language, not an array oriented one, and none of this stuff would be considered idiomatic so I probably wouldn't do it.
(I can't get my full post on ...)
.... continues ....
I don't agree with ... >> However, Smalltalk is an object oriented language, not an array oriented one, and none of this stuff would be considered idiomatic so I probably wouldn't do it.
@Fabio
This would make an excellent question for the Squeak-dev mailing list.
It's not a matter of agreement, it's a matter of fact. I agree that array oriented stuff would be cool, and is better, but the statement "Smalltalk is an object oriented language, not an array oriented one, and none of this stuff would be considered idiomatic" is a true one.
I've read that paper before, very nice, but it's a proposal to merge the techniques into OO, it's not something that has been done and accepted to the point of being able to call it idiomatic.
I would however, like to see the adoption of such techniques as well.
Sorry Ramon ... I couldn't finish the post there is some problem on the server ... I was explaining why I don't agree with ... "Smalltalk is an object oriented language, not an array oriented one, and none of this stuff would be considered idiomatic" ... I got cut out buy the server :-( I'll try to post it all again ... Then if I can't do it ... I'll ask my questions on the Squeak mlist
>> Actually, if you're dying to work with collections so tersely....
I usually do stuff using lots lists, maps, sets, ... instead of making complex hierarchies ... i do .. filter here, than zip it, make a map, select a few xyz ... check if is in a set ... re-filter , remap dispatch to this and to that ... remap ... ... stuff like that ...
and Smalltalk supports very well the kind of code I like to write ... because of it's collection protocol ... but there is a BETTER way of doing it ... which doesn't conflict at all with object orientation and even smalltalk keyword syntax. That is ... like this guy is doing ... http://www.fscript.org/documentation/OOPAL.pdf ... at least in my humble opinion.
The paper also explains ... what is the problem with this: >--------- .... just rig up your collection to send any message it doesn't understand to collect by default, so your example would look like this...
(((({10. 20. 30. 40} add1) add1) add1) add1) add1.
{10. 20. 30. 40} add1 | #add1
However, Smalltalk is an object oriented language, not an array oriented one, and none of this stuff would be considered idiomatic so I probably wouldn't do it.
Got cut again ... I'll post on the squeak dev mailing list ...
For all its faults, the Java version does at least only walk through the collection once.
Admittedly, if you were in a mood to optimize, it's possible to write select:thenCollect: and, by extension, select:thenCollect:andInject:into: in a way that only loops once (or possibly twice in the case of fixed size collections).
The beauty of functional methods like this being that you only have to do the optimization once, and every user benefits without affecting client code complexity. Result.