Parsing the ContT monad, or how to write advanced goto, generators and fibers in Haskell
In this article I will analyze the monad ContT
and I'll show you how to return it return
and other control-flow operators from imperative programming languages, which I missed so much when I started learning Haskell.
0. Introduction to continuations
Consider the expression f (g x)
. This approach to passing arguments does not scale. If the names are cumbersome and there is a lot of nesting, the code becomes completely unreadable:
largeFunctionFoo (largeFunctionBar (largeFunctionBaz x))
There are many ways to combat this, but the two most famous are pipes x |> g |> f
and continuation passing style (CPS – Continuation passing style) ($ x) g' f' id
.
Continuations are very similar to callbacks. A continuation-enabled function performs calculations, calls a continuation, and returns its result. For example, the function g'
can be defined like this:
g' x cont = cont (g x)
Let's immediately note one interesting feature: the CPS function knows about its continuation. While in f (g x)
functions g
nothing is known about the function f
functions g'
the continuation is conveyed explicitly. This allows her to control the continuation, call it multiple times, or not call it at all.
1. Add continuations to Haskell
Let's define the type for continued values:
data ContT r m a = ContT {runCont :: (a -> m r) -> m r}
Further in the text the function a -> m r
will often be called cc
– current continuation
To get used to the definition, let's look at a couple of examples:
Let's try to determine the continue number 5:
cont5 :: ContT r m Integer
cont5 = ContT $ \cc -> cc 5 -- ContT ($ 5), если кратко
cont5
is a function that receives a continuation and calls it with the number 5.
You can be sure that cont5
actually stores the number you need by calling cont5
with continuation print
:
ghci> runCont cont5 print
5
Let's define the function of adding one to a continued number:
addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = ContT $
\cc -> runCont contX $
\x -> cc (x + 1)
addOne
receives a continued number and constructs a new number based on it. The function is continued inside cc
calls the old number contX
with continuation \x -> cc (x + 1)
which in turn causes continuation cc
with meaning x + 1
.
Let's check that (addOne cont5)
stores the number 6:
ghci> runCont (addOne cont5) print
6
To make it convenient to work with continued values, let’s define class instances for them Functor
, Applicative
And Monad
:
instance Functor (ContT r m) where
fmap f contX = ContT $
\cc -> runCont contX $
\x -> cc (f x)
instance Applicative (ContT r m) where
pure a = ContT ($ a)
contF <*> contX = ContT $
\cc -> runCont contF $
\f -> runCont contX $
\x -> cc (f x)
instance Monad (ContT r m) where
return = pure
contX >>= f = ContT $
\cc -> runCont contX $
\x -> runCont (f x) $
\res -> cc res
Now calculations with continuations can be written much easier. The functions from the examples can be redefined like this:
cont5 :: ContT r m Integer
cont5 = return 5
addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = do
x <- contX
return (x + 1)
So far all examples using ContT
were pretty straightforward. Continuations were called exactly once, no skipping occurred. We'll fix this in the next chapter.
2. Building callCC
Let's look again at the type ContT
:
data ContT r m a = ContT {runCont :: (a -> m r) -> m r}
Did you notice? If we have a value like m r
you can return it without calling cc
. Then all subsequent actions (which were performed inside cc
) will be skipped.
Let's fix it temporarily r = ()
, m = IO
and write skip3 – a continuable value that skips actions and prints 3:
skip3 :: ContT () IO a
skip3 = ContT $ \cc -> print 3 -- ContT (const $ print 3)
Note that the type a
not important because cc
not called. Let's check that skip3
works:
test1 = do
return 5
test2 = do
skip3
return 5
ghci> runCont test1 print
5
ghci> runCont test2 print
3
This makes it possible to skip steps, but due to fixation r
And m
, skip3
turned out to be too limited. Let's write a more flexible mechanism.
To do this, let's look at the situation from the inside. ContT
:
cont :: ContT r m a
cont = ContT $ \cc -> _ -- ?
cc
inside cont
has type a -> m r
. This makes it possible to get m r
, which can already be used for skipping. Let's add a definition skip
inside cont
:
cont :: ContT r m a
cont = ContT $
\cc ->
let skip a = ContT (const $ cc a)
in _ -- ?
skip
has type a -> ContT r m b
, skip a
works similarly skip3
. So inside cont
we can run some “small” function to which we will pass the function skip
to switch to external ContT r m a
.
This is how we come to the definition of the function callCC
(call with current continuation):
type Switch r m a = forall b. a -> ContT r m b
callCC :: (Switch r m a -> ContT r m a) -> ContT r m a
callCC f = ContT $
\cc ->
let switchToCC = \a -> ContT (const $ cc a)
in runCont (f switchToCC) cc
This function will be the basis for almost all future designs.
Let's look at a few examples with callCC
.
Let's start with the ability to terminate a function early:
test = callCC $ \exit -> do
lift $ putStrLn "Reachable"
exit ()
lift $ putStrLn "Unreachable"
ghci> runCont test (const $ return ())
Reachable
Operators can be returned break
And continue
:
test :: ContT r IO ()
test = do
forM_ [1 .. 10] $ \i -> do
callCC $ \continue -> do
when (i == 5) $ do
continue ()
lift $ print i
ghci> runCont test (const $ return ())
1
2
3
4
6
7
8
9
10
test :: ContT r IO ()
test = do
callCC $ \break -> do
forM_ [1 .. 10] $ \i -> do
when (i == 5) $ do
break ()
lift $ print i
ghci> runCont test (const $ return ())
1
2
3
4
Let us note one very important feature: the function switchToCC
does not interrupt the “small” function. It ends the current calculation and switches to the one from which the function was called callCC
. If somehow switchToCC
will be able to go beyond callCC
calling this function will do this switch:
test = do
val {- label -} <- callCC $ \exit -> do
exit 10
-- rest
exit
takes us to the point label
execution continues with assignment val
.
Let's move on to more advanced examples using calls switchToCC
outside callCC
3. Label function (aka goto, also similar to the useState hook)
In this chapter we will write a function label
. It gets the initial value and returns a pair (restart, value)
.
restart
allows you to restart the calculation with a different valuevalue
– present value
The definition of this function is actually quite simple:
label :: a -> ContT r m (a -> ContT r m b, a)
label init = callCC $
\switch ->
let restart val = switch (restart, val)
in return (restart, init)
Now we can describe some things in an imperative style
For example, loops:
test = do
(restart, counter) <- label 0
lift $ print counter
when (counter < 10) $ do
restart $ counter + 1
ghci> runCont test (const $ return ())
0
1
2
3
4
5
6
7
8
9
10
You can write a couple setjmp
/longjmp
. It's even easier than label
because they don't need to carry any additional state:
setjmp = do
(restart, _) <- label ()
let longjmp = restart ()
return longjmp
test = do
longjmp <- setjmp
lift $ print 10
longjmp
ghci> runCont test (const $ return ())
10
10
10
10
10
10
...
4. Generators, fibers and scheduler
Let's try to write even more advanced mechanisms for program management.
Let's talk about what generator control might look like. To do this, you can see how they are made in other languages. Usually the keyword is used for this yield
however, there are several nuances:
We don't have the option to add a keyword, so
yield
will be a functionIt is necessary that from
yield x
you could get the next oneyield
. If you use the same one twiceyield
we will end up at the same point, which is not suitable for usNeed to be able to call
exit
(it also needs to be updated when callingyield
) to return to the scheduler
Let's put the control functions in a separate structure:
data Controls r m x = Controls
{ yield :: x -> ContT r m (Controls r m x)
, exit :: forall b. ContT r m b
}
Then the generator code could look like this:
test controls = do
controls <- yield controls 1
controls <- yield controls 2
controls <- yield controls 3
exit controls
It may seem strange that we explicitly call exit controls
for the output at the end of the generator, but later it will be clear why this is.
For now, just at the type level, let’s prohibit the generator from terminating without calling exit
:
type Generator r m x = Controls r m x -> ContT r m Void
Now let's write a function to run the generator until the next yield
.
It will return a pair of the value and the next part of the generator or Nothing
if the generator has terminated:
runToYield :: Generator r m x -> ContT r m (Maybe (x, Generator r m x))
runToYield generator = callCC $ \exitContext -> do
let exit = exitContext Nothing
yield value = callCC $ \continueGenerator ->
exitContext $ Just (value, continueGenerator)
controls = Controls{yield, exit}
generator controls
-- Если генератор добрался сюда, происходит что-то похожее на провал стека в
-- императивных языках. Дальнейшее исполнение, если убрать error, хоть и
-- определено, но будет очень странным и контринтуитивным
error "Generator exit invariant violated"
Let's look at this code line by line:
callCC $ \exitContext -> do
– create a new context in which to launch the generatorexit = exitContext Nothing
– function to return to the parent contextyield value
– define a function to transfer the valuecallCC $ \continueGenerator ->
– capture the state of the generator from the insideexitContext $ Just (value, continueGenerator)
– go to the parent context, return the value and the next part of the generatorcontrols = Controls{yield, exit}
– simply define a convenient synonymgenerator controls
– start the generatorerror "Generator exit invariant violated"
– we are using typeVoid
prevented the generator from terminating in anything other than a callexit
. This line of code should be unavailable
Fibers are generators that have yield
passes control back to the scheduler (without passing any values).
Let's define a couple of convenient synonyms for them:
type Fiber r m = Generator r m ()
suspend :: Controls r m () -> ContT r m (Controls r m ())
suspend controls = yield controls ()
Let's write a scheduler to conveniently launch multiple fibers. It will work very simply: take a list of fibers, run them all once, remove those that have completed. Repeat until there are unfinished fibers left:
scheduler :: [Fiber r m] -> ContT r m ()
scheduler threads = do
let round threads = do
nextThreads <- forM threads $ \thread -> do
res <- runToYield thread
return $ snd <$> res
return $ catMaybes nextThreads -- Фильтрация файберов
(loop, threads) <- label threads
threadsLeft <- round threads
when (length threadsLeft /= 0) $ do
loop threadsLeft
All that remains is to check that the fibers really work:
debug :: (MonadIO m) => String -> m ()
debug str = liftIO $ putStrLn str
fiberA :: Fiber r IO
fiberA controls = do
debug "Started fiber A"
controls <- suspend controls
debug "Running fiber A"
controls <- suspend controls
debug "Exiting fiber A"
exit controls
fiberB :: Fiber r IO
fiberB controls = do
debug "Started fiber B"
controls <- suspend controls
debug "Running fiber B"
controls <- suspend controls
debug "Running fiber B again"
controls <- suspend controls
debug "Exiting fiber B"
exit controls
ghci> runCont (scheduler [fiberA, fiberB]) (const $ return ())
Started fiber A
Started fiber B
Running fiber A
Running fiber B
Exiting fiber A
Running fiber B again
Exiting fiber B
OK it's all over Now. Thank you for your attention