Free Applicative Functors in Haskell ==================================== A few days ago, I found myself thinking about free monads, specifically [`Prompt` monads], and wondering if there were equivalents for applicative functors. I didn’t realize it at the time, but apparently there was a recent bit of discussion about this in the Haskell community, starting with a [blog post by Tom Ellis], and leading to a [roundup of implementations] by Roman Cheplyaka. The consensus implementation, at least the one provided by the [`free` package], looks roughly like this: > {-# LANGUAGE GADTs, RankNTypes #-} > > module FreeApp where > > import Control.Applicative > > data Free f a where > Pure :: a -> Free f a > App :: f a -> Free f (a -> b) -> Free f b > > liftFree :: f a -> Free f a > liftFree x = App x (Pure id) > > lowerFree :: Applicative f => Free f a -> f a > lowerFree (Pure x) = pure x > lowerFree (App x xs) = x <**> lowerFree xs > > instance Functor (Free f) where > fmap f (Pure a) = Pure (f a) > fmap f (App x xs) = App x (fmap (f .) xs) > > instance Applicative (Free f) where > pure = Pure > > Pure f <*> y = fmap f y > App x xs <*> y = App x (fmap flip xs <*> y) Essentially, `Free f a` consists of a sequence of “effects” (values of type `f a`, for some `a`) and a function which combines the results of those effects to produce the final answer. You can also see that this is going to have bad asymptotic performance. `fmap f x`, for example, will take time *O*(*n*) time, where *n* is the number of effects in `x`. Worse, `x <*> y` will take *O*(*n*^2^ + *m*) time, since it calls `fmap` *n* times on `x` and once on `y`. My first implementation ----------------------- Of course, I didn’t bother to look for other implementations until after I’d found my own. I think I considered one like `Free` above, but rejected it because I didn’t like that `fmap` was so expensive. Instead, I kept the function separate from the effect sequence. To make it easy to call `fmap` without needing to walk the tree, I made the function accept all the results from the effects in a single argument, represented as a nested tuple. Here’s the type for the effect sequence: > data CSeq f a where > CNil :: CSeq f () > CCons :: f a -> CSeq f u -> CSeq f (a,u) > > > reduce :: Applicative f => CSeq f u -> f u > reduce CNil = pure () > reduce (CCons x xs) = (,) <\$> x <*> reduce xs So if we have operations `x :: f a`, `y :: f b`, and `z :: f c`, we can put them in sequence and get `CCons x (CCons y (CCons z CNil)) :: Seq f (a,(b,(c,())))`. And here’s the implementation itself: > data C1 f a = forall u. C1 (u -> a) (CSeq f u) > > liftC1 :: f a -> C1 f a > liftC1 a = C1 fst (CCons a CNil) > > lowerC1 :: Applicative f => C1 f a -> f a > lowerC1 (C1 f x) = f <\$> reduce x > > instance Functor (C1 f) where > fmap f (C1 u x) = C1 (f . u) x > > instance Applicative (C1 f) where > pure a = C1 (const a) CNil > > C1 f x <*> C1 g y = rebaseC1 x C1 (\v u -> f u (g v)) y > > rebaseC1 :: CSeq f u -> (forall x. (x -> y) -> CSeq f x -> z) -> > (v -> u -> y) -> CSeq f v -> z > rebaseC1 CNil k f = k (\v -> f v ()) > rebaseC1 (CCons x xs) k f = > rebaseC1 xs (\g s -> k (\(a,u) -> g u a) (CCons x s)) > (\v u a -> f v (a,u)) (The prefix C, incidentally, stands for “Curry”, because when I was naming it, I got currying and uncurrying confused. `C1` uses an uncurried function; `Free` uses a curried one.) The only real tricky part is `rebaseC1`. It may not be obvious, but it essentially acts like `++`, traversing the first sequence and creating a new one by appending the second sequence. The difference is that this also has to modify the return functions and that the return type depends on the input types. I could have used type families here, but for some reason I chose to just stick with existential types; the continuation passed in the second argument is polymorphic in the sequence type, allowing me to just pass `C1` when I call it. The real “trick” in `rebaseC1` involves the `y` parameter, which changes in the recursive call to allow us to pass values through the later parts of the computation and then reconstruct them. So if the third parameter, `f` has type `v -> (a,u) -> y`, in the recursive call it will have type `v -> u -> a -> y`. The modified continuation will eventually convert the function `g :: x -> a -> y` into a one with type `(a,x) -> y` and pass it to the original continuation. Asymptotically, this is more efficient than `Free`. All of the operations are *O*(1), aside from `lowerC1` and `<*>`, which are *O*(*n*). It’s pretty clear that *O*(*n*) is the best something like `lowerC1` could be, since it has to evaluate every effect, but can we get a faster `<*>`? My second implementation ------------------------ Consider that `<*>` is essentially appending one sequence to another. In lists, appending works by replacing the final `[]` of the first list with the second list. That is, 1:2:3:[] ++ 4:5:[] = 1:2:3:4:5:[] We can speed up append by taking the end of the list as a parameter, allowing us to use `[]` or another list as we choose. Then appending simply becomes function composition. (\n -> 1:2:3:n) . (\n -> 4:5:n) = (\n -> 1:2:3:4:5:n) Finding an analogous trick for `C1` is harder, obviously, but after a bunch of tweaking, this is what I came up with: > newtype C2 f a = C2 > { unC2 :: forall u y z. > (forall x. (x -> y) -> CSeq f x -> z) -> > (u -> a -> y) -> CSeq f u -> z } > > liftC2 :: f a -> C2 f a > liftC2 a = C2 (\k f s -> k (\(a,s) -> f s a) (CCons a s)) > > lowerC2 :: Applicative f => C2 f a -> f a > lowerC2 x = unC2 x (\f s -> f <\$> reduce s) (\() -> id) CNil > > instance Functor (C2 f) where > fmap g x = C2 (\k f -> unC2 x k (\s -> f s . g)) > > instance Applicative (C2 f) where > pure a = C2 (\k f -> k (flip f a)) > > x <*> y = C2 (\k f -> unC2 y (unC2 x k) (\s a g -> f s (g a))) Now, all operations are *O*(1) except `lowerC2`. As a quick explanation, `x :: C2 f a` is a function taking three arguments. The first is a continuation (of the same type used in `reduceC1`) saying what to do with the final effect sequence and function. The third is the tail of the effect sequence, that `x` can add to using `CCons`. The second is a function which explains how to take the results of the tail effects and the value produced by `x` and combine them into the final return value. You can look at `liftC2` and `pure` to see how they use them. `<*>` uses the same trick as `rebaseC1` of being sneaky with the `y` type parameter, allowing us to pass an extra value from `x` to `y`. That’s also why the function doesn’t have the seemingly more logical argument order `a -> u -> v`. Notice in `<*>` that the first argument is put into the continuation of the second one. This is the secret to `C2`’s speed: it essentially traverses the expression tree from right to left, building the sequence of events from last to first. Unlike `Free` or `C1`, `C2` never has to traverse and reconstruct the effect sequence. Naturally, `C1` and `C2` are easily interconvertible: > c2toC1 :: C2 f a -> C1 f a > c2toC1 x = unC2 x C1 (\() a -> a) CNil > > c1toC2 :: C1 f a -> C2 f a > c1toC2 (C1 g x) = C2 (\k f -> rebaseC1 x k (\v u -> f v (g u))) Testing for speed ----------------- I’ve talked a bit about asymptotic performance, but that isn’t the end of the story with efficiency. `C1` and `C2` do a bit work converting functions between curried and uncurried forms, so we should do some experiments to see whether they’re actually any better. Here’s one. I generated a balanced binary tree containing between 10 and 100 nodes, each of which is a trivial call to `liftX`, used `sequenceA` to traverse the tree, and then used `lowerX` to run the computation. I then used [Criterion] to tell me how long they took. The result was that `C2` was easily faster than `C1`, which was much faster than `Free`. Other experiments are more stark. For example, expressions of the form f <\$> lift () <*> lift () <*> lift () … are something of a worst case for `Free` and `C1`, particularly as the number of arguments rose. In my experiments, the time for `Free` went from 0.58 μs to 14 μs as the number of arguments rose from 3 to 11. `C2`, in contrast, rose only from 0.049 μs to 0.16 μs. Obviously, this experiment is artificial, but its results hold up in the tree traversal experiment. The funny thing is, if I had started my exploration by looking for other implementations, I might have found the existing ones and stopped, my curiosity satisfied. Because I was (re)inventing the concept for myself, I pushed harder to find the best one I could.^[I'm omitting, in fact, four implementations I came up with that use a curried function—in fact, `C1` and `C2` were renamed to contrast with them. When I finally pulled out Criterion, I saw that they could beat `Free`, but `C2` was better than all of them. Since they’re also more complicated, it didn’t seem worth it to mention them.] The results are a significant improvement over the current state of the art. It may be that no one will ever need a free applicative functor. It certainly may be that no one will need the speed improvement of `C2` over `Free`. We’ve still learned two lessons here: don’t append when you can compose, and don’t use curried functions if you’re going to compose them. : http://web.jaguarpaw.co.uk/~tom/blog/2012/09/09/towards-free-applicatives.html "Towards Free Applicatives" : http://ro-che.info/articles/2013-03-31-flavours-of-free-applicative-functors.html "Flavours of free applicative functors" : http://hackage.haskell.org/packages/archive/free/3.4.1/doc/html/Control-Applicative-Free.html "free-3.4.1: Control.Applicative.Free" : http://hackage.haskell.org/package/criterion-0.8.0.0 "The criterion package" : http://hackage.haskell.org/packages/archive/MonadPrompt/1.0.0.3/doc/html/Control-Monad-Prompt.html#t:Prompt "MonadPrompt-1.0.0.3: Control.Monad.Prompt.Prompt"