Free Applicative Functors in Haskell
====================================
A few days ago, I found myself thinking about free monads, specifically [`Prompt` monads][5], and wondering if there were equivalents for applicative functors. I didn’t realize it at the time, but apparently there was a recent bit of discussion about this in the Haskell community, starting with a [blog post by Tom Ellis][1], and leading to a [roundup of implementations][2] by Roman Cheplyaka.
The consensus implementation, at least the one provided by the [`free` package][3], looks roughly like this:
> {-# LANGUAGE GADTs, RankNTypes #-}
>
> module FreeApp where
>
> import Control.Applicative
>
> data Free f a where
> Pure :: a -> Free f a
> App :: f a -> Free f (a -> b) -> Free f b
>
> liftFree :: f a -> Free f a
> liftFree x = App x (Pure id)
>
> lowerFree :: Applicative f => Free f a -> f a
> lowerFree (Pure x) = pure x
> lowerFree (App x xs) = x <**> lowerFree xs
>
> instance Functor (Free f) where
> fmap f (Pure a) = Pure (f a)
> fmap f (App x xs) = App x (fmap (f .) xs)
>
> instance Applicative (Free f) where
> pure = Pure
>
> Pure f <*> y = fmap f y
> App x xs <*> y = App x (fmap flip xs <*> y)
Essentially, `Free f a` consists of a sequence of “effects” (values of type `f a`, for some `a`) and a function which combines the results of those effects to produce the final answer.
You can also see that this is going to have bad asymptotic performance. `fmap f x`, for example, will take time *O*(*n*) time, where *n* is the number of effects in `x`. Worse, `x <*> y` will take *O*(*n*^2^ + *m*) time, since it calls `fmap` *n* times on `x` and once on `y`.
My first implementation
-----------------------
Of course, I didn’t bother to look for other implementations until after I’d found my own. I think I considered one like `Free` above, but rejected it because I didn’t like that `fmap` was so expensive. Instead, I kept the function separate from the effect sequence. To make it easy to call `fmap` without needing to walk the tree, I made the function accept all the results from the effects in a single argument, represented as a nested tuple.
Here’s the type for the effect sequence:
> data CSeq f a where
> CNil :: CSeq f ()
> CCons :: f a -> CSeq f u -> CSeq f (a,u)
>
>
> reduce :: Applicative f => CSeq f u -> f u
> reduce CNil = pure ()
> reduce (CCons x xs) = (,) <$> x <*> reduce xs
So if we have operations `x :: f a`, `y :: f b`, and `z :: f c`, we can put them in sequence and get
`CCons x (CCons y (CCons z CNil)) :: Seq f (a,(b,(c,())))`.
And here’s the implementation itself:
> data C1 f a = forall u. C1 (u -> a) (CSeq f u)
>
> liftC1 :: f a -> C1 f a
> liftC1 a = C1 fst (CCons a CNil)
>
> lowerC1 :: Applicative f => C1 f a -> f a
> lowerC1 (C1 f x) = f <$> reduce x
>
> instance Functor (C1 f) where
> fmap f (C1 u x) = C1 (f . u) x
>
> instance Applicative (C1 f) where
> pure a = C1 (const a) CNil
>
> C1 f x <*> C1 g y = rebaseC1 x C1 (\v u -> f u (g v)) y
>
> rebaseC1 :: CSeq f u -> (forall x. (x -> y) -> CSeq f x -> z) ->
> (v -> u -> y) -> CSeq f v -> z
> rebaseC1 CNil k f = k (\v -> f v ())
> rebaseC1 (CCons x xs) k f =
> rebaseC1 xs (\g s -> k (\(a,u) -> g u a) (CCons x s))
> (\v u a -> f v (a,u))
(The prefix C, incidentally, stands for “Curry”, because when I was naming it, I got currying and uncurrying confused. `C1` uses an uncurried function; `Free` uses a curried one.)
The only real tricky part is `rebaseC1`. It may not be obvious, but it essentially acts like `++`, traversing the first sequence and creating a new one by appending the second sequence. The difference is that this also has to modify the return functions and that the return type depends on the input types. I could have used type families here, but for some reason I chose to just stick with existential types; the continuation passed in the second argument is polymorphic in the sequence type, allowing me to just pass `C1` when I call it.
The real “trick” in `rebaseC1` involves the `y` parameter, which changes in the recursive call to allow us to pass values through the later parts of the computation and then reconstruct them. So if the third parameter, `f` has type `v -> (a,u) -> y`, in the recursive call it will have type `v -> u -> a -> y`. The modified continuation will eventually convert the function `g :: x -> a -> y` into a one with type `(a,x) -> y` and pass it to the original continuation.
Asymptotically, this is more efficient than `Free`. All of the operations are *O*(1), aside from `lowerC1` and `<*>`, which are *O*(*n*). It’s pretty clear that *O*(*n*) is the best something like `lowerC1` could be, since it has to evaluate every effect, but can we get a faster `<*>`?
My second implementation
------------------------
Consider that `<*>` is essentially appending one sequence to another. In lists, appending works by replacing the final `[]` of the first list with the second list. That is,
1:2:3:[] ++ 4:5:[] = 1:2:3:4:5:[]
We can speed up append by taking the end of the list as a parameter, allowing us to use `[]` or another list as we choose. Then appending simply becomes function composition.
(\n -> 1:2:3:n) . (\n -> 4:5:n) = (\n -> 1:2:3:4:5:n)
Finding an analogous trick for `C1` is harder, obviously, but after a bunch of tweaking, this is what I came up with:
> newtype C2 f a = C2
> { unC2 :: forall u y z.
> (forall x. (x -> y) -> CSeq f x -> z) ->
> (u -> a -> y) -> CSeq f u -> z }
>
> liftC2 :: f a -> C2 f a
> liftC2 a = C2 (\k f s -> k (\(a,s) -> f s a) (CCons a s))
>
> lowerC2 :: Applicative f => C2 f a -> f a
> lowerC2 x = unC2 x (\f s -> f <$> reduce s) (\() -> id) CNil
>
> instance Functor (C2 f) where
> fmap g x = C2 (\k f -> unC2 x k (\s -> f s . g))
>
> instance Applicative (C2 f) where
> pure a = C2 (\k f -> k (flip f a))
>
> x <*> y = C2 (\k f -> unC2 y (unC2 x k) (\s a g -> f s (g a)))
Now, all operations are *O*(1) except `lowerC2`.
As a quick explanation, `x :: C2 f a` is a function taking three arguments. The first is a continuation (of the same type used in `reduceC1`) saying what to do with the final effect sequence and function. The third is the tail of the effect sequence, that `x` can add to using `CCons`. The second is a function which explains how to take the results of the tail effects and the value produced by `x` and combine them into the final return value. You can look at `liftC2` and `pure` to see how they use them.
`<*>` uses the same trick as `rebaseC1` of being sneaky with the `y` type parameter, allowing us to pass an extra value from `x` to `y`. That’s also why the function doesn’t have the seemingly more logical argument order `a -> u -> v`.
Notice in `<*>` that the first argument is put into the continuation of the second one. This is the secret to `C2`’s speed: it essentially traverses the expression tree from right to left, building the sequence of events from last to first. Unlike `Free` or `C1`, `C2` never has to traverse and reconstruct the effect sequence.
Naturally, `C1` and `C2` are easily interconvertible:
> c2toC1 :: C2 f a -> C1 f a
> c2toC1 x = unC2 x C1 (\() a -> a) CNil
>
> c1toC2 :: C1 f a -> C2 f a
> c1toC2 (C1 g x) = C2 (\k f -> rebaseC1 x k (\v u -> f v (g u)))
Testing for speed
-----------------
I’ve talked a bit about asymptotic performance, but that isn’t the end of the story with efficiency. `C1` and `C2` do a bit work converting functions between curried and uncurried forms, so we should do some experiments to see whether they’re actually any better.
Here’s one. I generated a balanced binary tree containing between 10 and 100 nodes, each of which is a trivial call to `liftX`, used `sequenceA` to traverse the tree, and then used `lowerX` to run the computation. I then used [Criterion][4] to tell me how long they took. The result was that `C2` was easily faster than `C1`, which was much faster than `Free`.
Other experiments are more stark. For example, expressions of the form
f <$> lift () <*> lift () <*> lift () …
are something of a worst case for `Free` and `C1`, particularly as the number of arguments rose. In my experiments, the time for `Free` went from 0.58 μs to 14 μs as the number of arguments rose from 3 to 11. `C2`, in contrast, rose only from 0.049 μs to 0.16 μs.
Obviously, this experiment is artificial, but its results hold up in the tree traversal experiment.
The funny thing is, if I had started my exploration by looking for other implementations, I might have found the existing ones and stopped, my curiosity satisfied. Because I was (re)inventing the concept for myself, I pushed harder to find the best one I could.^[I'm omitting, in fact, four implementations I came up with that use a curried function—in fact, `C1` and `C2` were renamed to contrast with them.
When I finally pulled out Criterion, I saw that they could beat `Free`, but `C2` was better than all of them. Since they’re also more complicated, it didn’t seem worth it to mention them.] The results are a significant improvement over the current state of the art.
It may be that no one will ever need a free applicative functor. It certainly may be that no one will need the speed improvement of `C2` over `Free`. We’ve still learned two lessons here: don’t append when you can compose, and don’t use curried functions if you’re going to compose them.
[1]: http://web.jaguarpaw.co.uk/~tom/blog/2012/09/09/towards-free-applicatives.html "Towards Free Applicatives"
[2]: http://ro-che.info/articles/2013-03-31-flavours-of-free-applicative-functors.html "Flavours of free applicative functors"
[3]: http://hackage.haskell.org/packages/archive/free/3.4.1/doc/html/Control-Applicative-Free.html "free-3.4.1: Control.Applicative.Free"
[4]: http://hackage.haskell.org/package/criterion-0.8.0.0 "The criterion package"
[5]: http://hackage.haskell.org/packages/archive/MonadPrompt/1.0.0.3/doc/html/Control-Monad-Prompt.html#t:Prompt "MonadPrompt-1.0.0.3: Control.Monad.Prompt.Prompt"