Parsing the ContT monad, or how to write advanced goto, generators and fibers in Haskell

In this article I will analyze the monad ContTand I'll show you how to return it return and other control-flow operators from imperative programming languages, which I missed so much when I started learning Haskell.

0. Introduction to continuations

Consider the expression f (g x). This approach to passing arguments does not scale. If the names are cumbersome and there is a lot of nesting, the code becomes completely unreadable:

largeFunctionFoo (largeFunctionBar (largeFunctionBaz x))

There are many ways to combat this, but the two most famous are pipes x |> g |> f and continuation passing style (CPS – Continuation passing style) ($ x) g' f' id.

Continuations are very similar to callbacks. A continuation-enabled function performs calculations, calls a continuation, and returns its result. For example, the function g' can be defined like this:

g' x cont = cont (g x)

Let's immediately note one interesting feature: the CPS function knows about its continuation. While in f (g x) functions g nothing is known about the function ffunctions g' the continuation is conveyed explicitly. This allows her to control the continuation, call it multiple times, or not call it at all.

1. Add continuations to Haskell

Let's define the type for continued values:

data ContT r m a = ContT {runCont :: (a -> m r) -> m r}

Further in the text the function a -> m r will often be called cc – current continuation

To get used to the definition, let's look at a couple of examples:

Let's try to determine the continue number 5:

cont5 :: ContT r m Integer
cont5 = ContT $ \cc -> cc 5 -- ContT ($ 5), если кратко

cont5 is a function that receives a continuation and calls it with the number 5.

You can be sure that cont5 actually stores the number you need by calling cont5 with continuation print:

ghci> runCont cont5 print
5

Let's define the function of adding one to a continued number:

addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = ContT $
  \cc -> runCont contX $
    \x -> cc (x + 1)

addOne receives a continued number and constructs a new number based on it. The function is continued inside cccalls the old number contX with continuation \x -> cc (x + 1)which in turn causes continuation cc with meaning x + 1.

Let's check that (addOne cont5) stores the number 6:

ghci> runCont (addOne cont5) print
6

To make it convenient to work with continued values, let’s define class instances for them Functor, Applicative And Monad:

instance Functor (ContT r m) where
  fmap f contX = ContT $
    \cc -> runCont contX $
      \x -> cc (f x)

instance Applicative (ContT r m) where
  pure a = ContT ($ a)

  contF <*> contX = ContT $
    \cc -> runCont contF $
      \f -> runCont contX $
        \x -> cc (f x)

instance Monad (ContT r m) where
  return = pure

  contX >>= f = ContT $
    \cc -> runCont contX $
      \x -> runCont (f x) $
        \res -> cc res

Now calculations with continuations can be written much easier. The functions from the examples can be redefined like this:

cont5 :: ContT r m Integer
cont5 = return 5

addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = do
  x <- contX
  return (x + 1)

So far all examples using ContT were pretty straightforward. Continuations were called exactly once, no skipping occurred. We'll fix this in the next chapter.

2. Building callCC

Let's look again at the type ContT:

data ContT r m a = ContT {runCont :: (a -> m r) -> m r}

Did you notice? If we have a value like m ryou can return it without calling cc. Then all subsequent actions (which were performed inside cc) will be skipped.

Let's fix it temporarily r = (), m = IO and write skip3 – a continuable value that skips actions and prints 3:

skip3 :: ContT () IO a
skip3 = ContT $ \cc -> print 3 -- ContT (const $ print 3)

Note that the type a not important because cc not called. Let's check that skip3 works:

test1 = do
  return 5

test2 = do
  skip3
  return 5

ghci> runCont test1 print
5
ghci> runCont test2 print
3

This makes it possible to skip steps, but due to fixation r And m, skip3 turned out to be too limited. Let's write a more flexible mechanism.

To do this, let's look at the situation from the inside. ContT:

cont :: ContT r m a
cont = ContT $ \cc -> _ -- ?

cc inside cont has type a -> m r. This makes it possible to get m r, which can already be used for skipping. Let's add a definition skip inside cont:

cont :: ContT r m a
cont = ContT $
  \cc ->
    let skip a = ContT (const $ cc a)
     in _ -- ?

skip has type a -> ContT r m b, skip a works similarly skip3. So inside cont we can run some “small” function to which we will pass the function skip to switch to external ContT r m a.

This is how we come to the definition of the function callCC (call with current continuation):

type Switch r m a = forall b. a -> ContT r m b

callCC :: (Switch r m a -> ContT r m a) -> ContT r m a
callCC f = ContT $
  \cc ->
    let switchToCC = \a -> ContT (const $ cc a)
     in runCont (f switchToCC) cc

This function will be the basis for almost all future designs.

Let's look at a few examples with callCC.

Let's start with the ability to terminate a function early:

test = callCC $ \exit -> do
  lift $ putStrLn "Reachable"
  exit ()
  lift $ putStrLn "Unreachable"

ghci> runCont test (const $ return ())
Reachable

Operators can be returned break And continue:

test :: ContT r IO ()
test = do
  forM_ [1 .. 10] $ \i -> do
    callCC $ \continue -> do
      when (i == 5) $ do
        continue ()

      lift $ print i

ghci> runCont test (const $ return ())
1
2
3
4
6
7
8
9
10

test :: ContT r IO ()
test = do
  callCC $ \break -> do
    forM_ [1 .. 10] $ \i -> do
      when (i == 5) $ do
        break ()

      lift $ print i

ghci> runCont test (const $ return ())
1
2
3
4

Let us note one very important feature: the function switchToCC does not interrupt the “small” function. It ends the current calculation and switches to the one from which the function was called callCC. If somehow switchToCC will be able to go beyond callCCcalling this function will do this switch:

test = do
  val {- label -} <- callCC $ \exit -> do
    exit 10
  -- rest

exit takes us to the point labelexecution continues with assignment val.

Let's move on to more advanced examples using calls switchToCC outside callCC

3. Label function (aka goto, also similar to the useState hook)

In this chapter we will write a function label. It gets the initial value and returns a pair (restart, value).

restart allows you to restart the calculation with a different value
value – present value

The definition of this function is actually quite simple:

label :: a -> ContT r m (a -> ContT r m b, a)
label init = callCC $
  \switch ->
    let restart val = switch (restart, val)
     in return (restart, init)

Now we can describe some things in an imperative style

For example, loops:

test = do
  (restart, counter) <- label 0

  lift $ print counter

  when (counter < 10) $ do
    restart $ counter + 1

ghci> runCont test (const $ return ())
0
1
2
3
4
5
6
7
8
9
10

You can write a couple setjmp/longjmp. It's even easier than label because they don't need to carry any additional state:

setjmp = do
  (restart, _) <- label ()

  let longjmp = restart ()

  return longjmp

test = do
  longjmp <- setjmp
  lift $ print 10

  longjmp

ghci> runCont test (const $ return ())
10
10
10
10
10
10
...

4. Generators, fibers and scheduler

Let's try to write even more advanced mechanisms for program management.

Let's talk about what generator control might look like. To do this, you can see how they are made in other languages. Usually the keyword is used for this yieldhowever, there are several nuances:

We don't have the option to add a keyword, so yield will be a function
It is necessary that from yield x you could get the next one yield. If you use the same one twice yieldwe will end up at the same point, which is not suitable for us
Need to be able to call exit (it also needs to be updated when calling yield) to return to the scheduler

Let's put the control functions in a separate structure:

data Controls r m x = Controls
  { yield :: x -> ContT r m (Controls r m x)
  , exit :: forall b. ContT r m b
  }

Then the generator code could look like this:

test controls = do
  controls <- yield controls 1
  controls <- yield controls 2
  controls <- yield controls 3
  exit controls

It may seem strange that we explicitly call exit controls for the output at the end of the generator, but later it will be clear why this is.
For now, just at the type level, let’s prohibit the generator from terminating without calling exit:

type Generator r m x = Controls r m x -> ContT r m Void

Now let's write a function to run the generator until the next yield.
It will return a pair of the value and the next part of the generator or Nothing if the generator has terminated:

runToYield :: Generator r m x -> ContT r m (Maybe (x, Generator r m x))
runToYield generator = callCC $ \exitContext -> do
  let exit = exitContext Nothing
      yield value = callCC $ \continueGenerator ->
        exitContext $ Just (value, continueGenerator)

      controls = Controls{yield, exit}

  generator controls
  -- Если генератор добрался сюда, происходит что-то похожее на провал стека в
  -- императивных языках. Дальнейшее исполнение, если убрать error, хоть и
  -- определено, но будет очень странным и контринтуитивным
  error "Generator exit invariant violated"

Let's look at this code line by line:

callCC $ \exitContext -> do – create a new context in which to launch the generator
exit = exitContext Nothing – function to return to the parent context
yield value – define a function to transfer the value
callCC $ \continueGenerator -> – capture the state of the generator from the inside
exitContext $ Just (value, continueGenerator) – go to the parent context, return the value and the next part of the generator
controls = Controls{yield, exit} – simply define a convenient synonym
generator controls – start the generator
error "Generator exit invariant violated" – we are using type Void prevented the generator from terminating in anything other than a call exit. This line of code should be unavailable

Fibers are generators that have yield passes control back to the scheduler (without passing any values).
Let's define a couple of convenient synonyms for them:

type Fiber r m = Generator r m ()

suspend :: Controls r m () -> ContT r m (Controls r m ())
suspend controls = yield controls ()

Let's write a scheduler to conveniently launch multiple fibers. It will work very simply: take a list of fibers, run them all once, remove those that have completed. Repeat until there are unfinished fibers left:

scheduler :: [Fiber r m] -> ContT r m ()
scheduler threads = do
  let round threads = do
        nextThreads <- forM threads $ \thread -> do
          res <- runToYield thread
          return $ snd <$> res

        return $ catMaybes nextThreads -- Фильтрация файберов

  (loop, threads) <- label threads
  threadsLeft <- round threads

  when (length threadsLeft /= 0) $ do
    loop threadsLeft

All that remains is to check that the fibers really work:

debug :: (MonadIO m) => String -> m ()
debug str = liftIO $ putStrLn str

fiberA :: Fiber r IO
fiberA controls = do
  debug "Started fiber A"
  controls <- suspend controls
  debug "Running fiber A"
  controls <- suspend controls
  debug "Exiting fiber A"
  exit controls

fiberB :: Fiber r IO
fiberB controls = do
  debug "Started fiber B"
  controls <- suspend controls
  debug "Running fiber B"
  controls <- suspend controls
  debug "Running fiber B again"
  controls <- suspend controls
  debug "Exiting fiber B"
  exit controls

ghci> runCont (scheduler [fiberA, fiberB]) (const $ return ())
Started fiber A
Started fiber B
Running fiber A
Running fiber B
Exiting fiber A
Running fiber B again
Exiting fiber B

OK it's all over Now. Thank you for your attention