Building Pipelines with Lambdas
Consider a toy pipeline:
An off-topic remark is in order now. If you insist on doing everything from scratch, stock up on free time first. Some past Sunday I did not feel like doing much, so I thought I would spend it to write this blog post. I ended up spending it starting to write this blog post. That is, I got stuck with the above diagram. Coding SVG by hand is tedious, so instead I chose to code some Lisp functions that would make it easier to draft flowcharts. As it stands, I am using LaTeX syntax and a custom parser for my blog, but my parser did not support named arguments, only positional ones, and I thought named arguments would be clearer for the flowchart commands, so I had to implement those first. And so it went...
But let us go back to the diagram. I want to implement this pipeline using functional programming where each process is a separate function agnostic of other functions. There is a number of conceptually different approaches to achieve such modularity, I want to try some of them out, and I also want to do a few simple benchmarks. My goal is to organize my understanding of these ideas, but any chance visitor is welcome to tread along.
A good place to start, for comparison purposes, is with a more imperative implementation. (All the code examples are in Lisp, some benchmarks are also for Scheme, Python, and Julia.)
This snippet nicely surmises why modularity is important: without it the code become one big mess, or in this case one small mess, very quickly. However, this code has one advantage: it works in an “online” mode. The data is processed the moment it arrives. This feature is important for network applications, for responsive UIs, for processing big data, for parallelism. And we want to preserve this feature when translating the code into a more functional style.
One way to write a functional program is to let the processes earlier on in the pipeline call the process later on in the pipeline. Further, for prototyping and for modularity purposes it is convenient if the functions are agnostic of one another, and so can be combined into arbitrary pipelines (just like with Unix pipes). This level of modularity can be achieved with callbacks. (For the purpose of testing continuations, which I do later on, I have also rewritten some loops as recursive functions.)
The code is longer now, but each separate function is easy to understand, easy to modify, and easy to reuse. There is a cost of wrapping your head around the concept of nested lambdas, but one would hope that to be a one time investment. The new definition for pipeline shows that having Lisp-2 instead of Lips-1 is not always bad.
One subtle point is worth mentioning. If a pipeline is organized with callbacks, like we have it now, then resources can be protected with unwind-protect. No reliance on garbage collection and finilization is necessary. Here is an example.
Another way to program the pipeline, which is the opposite of the callback approach, is to let the functions later on in the pipeline request the next value from the functions earlier on in the pipeline. This way would be natural in Python, where there is native support for generators. Lisp does not have generators but they can be implemented in a general way with continuation-passing style. To start off with CPS, let us consider an even simpler pipeline.
With direct style, we have:
First, let us rewrite the loop as a recursive call:
We can now convert source to CPS now by “inverting” it:
So far, so good: everything is still working! At this point we already can implement a generator, but I want to use the cl-cont library and its call/cc function, and to go in that direction let us first inline the yield& function.
Up until now we simply have been calling the function print down the pipeline. Doing so is no different from using callbacks. However, this time around we have a continuation, which we can save in the global variable, release control, and then regain control by invoking the continuation.
Voilà, we have a working generator. Now, instead of rewriting functions into CPS by hand, we can use the cl-cont library. (For those familiar with Scheme, call/cc as implemented by cl-cont is not a true call with continuation. In Scheme, if continuation k is called from some execution branch, the execution will never implicitly return to that branch. In Lisp, the execution will implicitly return to that branch at the end of the with-call/cc block.)
The continuation can be saved in a closure instead of a global variable:
We further need a way to stop the iteration once the stream finishes. This can be accomplished, for example, with conditions (exceptions).
Finally, we can wrap our new solution in a macro, so that we have a clean syntax for defining and using generators (these particular macros are not fully general, but they will suffice for our purposes).
The resulting defstream and dostream are as concise as Python code. Kudos to Lisp for making it possible to implement a non-trivial foreign paradigm, and yet make it fill like an existing part of the language. There will be performance costs, of course, and we shall see just how large those are later on. But also, kudos to Python for making generators easy in Python in the first place. In comparison, Julia implements more general tasks, but that makes the syntax for simple generators more cumbersome.
We can implement the pipeline using generators now.
With callbacks, each function was written as if processing a single chunk of data. With generators, each function is written as if iterating over all chunks of data, because generators allow us to interrupt a function’s execution. The former style is more concise for 1-to-1 pipelines, becomes contrived for more complex flows, and is not possible in some cases. The latter style, in my opinion, is easier to understand for more complex flows. Here is an example where generators will do the trick, but callbacks are not possible:
On the other hand, unwind-protect is not applicable anymore when generators are used, and if a resource at the beginning of a pipeline needs to be managed, then finilization has to be used.
Lastly, Lisp allows for dynamic binding of global variables. With callbacks, the source can control the behaviour of the sink without threading extra arguments through the whole pipeline, which can be useful when writing a printer. With generators, the sink can control the behaviour of the source, which can be useful when writing a parser.
We discussed top-to-bottom and bottom-to-up control flows, and we programmed them with Lisp language primitives. We can also do concurrency by using OS threads (in languages like Erlang and Go, there are concurrency primitives provided by the language itself—green threads). There is Bordeaux library for using threads in Lisp in a portable way, but let me use SBCL extensions directly, because I want to try those.
The threads code is mostly analogous to generators code. It can be made more concise with an extra macro or two, but let me leave it at that. For simplicity, I also do not control for sizes of pipes (doing so in a production code would be a bad idea).
Many of us have learned programming not as computer scientists but as a tool for their own domain. My domain is game-theoretical models and econometrics. That is to say, I never need to write time-critical code. Maybe even the opposite, the longer my models take to solve, the more time I have to chat with colleagues over a cup of coffee. Where does my obsession with ill-designed benchmarks come from then? So, benchmarks.
I have run the code snippets 100 times with triangular series of length 100,000, sans (pause 1) and (print ...). I have also benchmarked Python, Julia, and Scheme implementations of the generators approach, because Python and Julia support generators natively, and because Scheme supports call/cc natively. There is also Golang in the mix as a modern champion for green threads. (Importantly, I use unbuffered channels for a fair comparison with generators.) The tests have been run on AMD Ryzon 3600. I have not invoked any compiler optimization, nor have I used type declarations. (In the table below each row links to the respective code.)
|Imperative (Lisp)||SBCL 2.1.1||1.9 s|
|Callbacks (Lisp)||SBCL 2.1.1||2.4 s|
|Generators (Lisp)||SBCL 2.1.1||21 s|
|Generators (Scheme)||Chez 9.5.4||5.0 s|
|Generators (Python)||Python 3.9.2||28 s|
|Generators (Julia)||Julia 1.6.0||165 s|
|Threads (Lisp)||SBCL 2.1.1||32 s|
|Green threads (Go)||Go 1.16.4||28 s|
So, callbacks impose relatively little extra penalty in Lisp in comparison with a more imperative implementation. Lisp does not have native generators and implementing them, while possible, is expensive. Python is slow, as would be typical for a benchmark heavy on function calling. Generators in Chez Scheme, implemented on top of call/cc, come clearly ahead of other generators. As for Julia, I decided to benchmark it after I got the results for Scheme and Python, hoping maybe Julia beats Python fair and square. Fat disappointment.
Now, threads. I know thread switching is expensive, but I honestly thought they will take the top spot, especially because there are no restrictions on pipe sizes in the Lisp threads code and all the functions can thus run in parallel. Turns out threads are roughly as slow as Python’s generators. Green threads in Go perform just as poorly. Supposedly, Go excels at async IO, which this benchmark most certainly is not. Still, I was curios if green threads in Go could be used as a light-weight concurrency primitive. Chez Scheme is a faster choice.
I’ve posted this benchmark on reddit, and as mfraddit subsequently pointed out, I’ve cheated somewhat when it comes to big integers. The benchmark, as presented, generates big integers for higher values of the triangular series. In the preceding table, Lisp, Scheme, and Python handle big integers correctly but at the additional computational expense, whereas running Julia and Go code results in integer overflow, which gets silently ignored. In the table below, I’ve redone the computations but explicitly using big integers in Julia and Go. While I had no intention benchmarking big integer operations at the start, and this is all rather accidental, the results do seem to suggest that standard big integer libraries in Julia and Go aren’t very good.
On a closing note, my example is a contrived one: most processing time is spend moving data chunks around instead of processing them. If processing time becomes significant, there will be little difference between callbacks and generators, and the threads—green or not—are likely to come on top.
The benchmarks were updated on 30 May, 2021. Notably, back in 2018 I used Chicken Scheme and surprisingly it was slower than the non-native generators in Lisp. In 2021 I switched to Chez and Chez is just faster, at least on this benchmark.