Restricted IO in Haskell

The article describes the mechanism for creating your own modified version of the IO monad in Haskell, with restrictions on I/O operations.

It is considered good practice to organize the structure of any Haskell program by dividing the code into blocks that perform IO input/output operations and blocks that consist entirely of pure functions, i.e. functions that do not perform IO operations, but only take some data as input and return it in a converted form. These kinds of pure blocks are essentially functions in the mathematical sense of the word, taking an argument and returning the value of the function, and are reminiscent of programs at the dawn of the computer era, when data from punched cards was loaded into the program at the very beginning of its operation, after which it was processed for some time, and then As a result of the program's operation, the final result of the calculations was printed, while no interactive interaction with it was assumed during the program's operation.

To add interactivity to the program, but at the same time preserve the mathematical integrity of the functions as much as possible, an approach approximately like this is used:

mainLoop :: ReadParams -> ApplicationState -> IO ()
mainLoop readParams appState = do
    -- IO операция считывающая ввод пользователя (клавиатура, мышь, и т.п.), 
    -- а также загрузка необходимых данных с жесткого диска, из базы данных или по сети.
    -- Никакой другой логики здесь быть не должно!
    inputData <- ioGetInputData readParams appState

    -- Чистая функция. Вся логика программы содержится внутри неё.
    let newState = processBusinessLogic inputData appState

    -- IO операция - вывод информации на экран, сохранение нужных данных в файл, в базу данных и т.п.
    -- И снова никакой другой логики, кроме вывода данных здесь нет.
    ioOutputData newState

    mainLoop readParams newState

This is an approximate structure of the main loop of the program, deliberately simplified to a single thread. Of course, in a real application, it makes sense to run I/O operations ioGetInputData And ioOutputData in separate threads, for example using the command forkIO, so that from the user's point of view, the interactivity of the interaction feels instantaneous and without lag. But this article is not about that. Therefore, without loss of generality, we will assume that each step of the cycle mainLoop runs faster than 1/60 of a second 🙂

All application business logic is contained in a function processBusinessLogicbut during the work processBusinessLogic It may be necessary to load something else from the source data, but it does not have such an opportunity, because this is a pure function. You will have to wait for the next step of the cycle. Information about what data needs to be downloaded processBusinessLogic put in newState and in the next step ioGetInputData will upload a new portion of data. For this ioGetInputData and accepts entry appState. Of course, in a real application there is no point in passing ioGetInputData all appStateit is enough to transmit only the information that indicates what needs to be unloaded.

Unfortunately, in practice it is not always possible to adhere to the presented architectural template. If the logic of work often requires access to IO operations, then writing code in this style becomes inconvenient. In addition, if you wait each time for the next step of the loop to continue executing business logic, this can have a very bad effect on performance. For example, a recursive algorithm for traversing directories on a hard drive to find a file requires an IO operation to read the contents of the folder at each step. And if to search for a file you need to wait for a new cycle step each time mainLoopperformance will deteriorate monstrously.

Therefore, very often you have to compromise and write business logic in the IO monad, deliberately sacrificing the architecture.

So, if for some reason we are forced to write business logic in an IO monad, is it possible to somehow modify the IO monad so that the code inside it is allowed to perform only those IO operations that need to be performed? Of course yes! And now we will do it.

First, let's look at a simple problem. The code inside processBusinessLogic it is necessary to obtain the system time (for example, for the seed of a random number generator). No other IO operations processBusinessLogic not required. Ideally, of course, the system time should be obtained at the work stage ioGetInputData and transmitted to processBusinessLogic as an argument, but we have already decided that for some reason this is impossible. Well, don’t really give functions processBusinessLogic access to a full-fledged IO monad for such a small thing?

How to limit IO? We need to wrap it in another type (let's call it GetTime), make it a monad by implementing the corresponding instance, and do not give the user access to its wrapper type constructor. Then from the monad GetTime it will be impossible to run any other IO operations other than those implemented in the module GetTime and exported from it (in this example this is the only function getTime).

module GetTime
  ( GetTime (), -- Это важно! Нельзя экспортировать конструктор UnsafeGetTime
    runGetTime, -- "запускалка" монады GetTime
    getTime, -- единственная дозволенная IO операция
  )
where

import Control.Monad (ap)
import qualified Data.Time as Time

-- GetTime - это обёртка над IO, но за пределами модуля нет доступа к его конструктору
newtype GetTime a = UnsafeGetTime {runGetTime :: IO a}

instance Functor GetTime where
  -- стандартная имплементация функтора для типа-обёртки
  fmap f (UnsafeGetTime io) = UnsafeGetTime (f <$> io)

instance Applicative GetTime where
  -- тоже все стандартно  
  pure = UnsafeGetTime . pure
  -- А вы знали, что так можно? Функция ap сама реализует 
  -- функцию (<*>) через (>>=), раз уж всё равно мы пишем монаду
  (<*>) = ap 

instance Monad GetTime where
  -- и опять стандартная имплементация монады для типа-обёртки
  (UnsafeGetTime io) >>= k = UnsafeGetTime $ io >>= runGetTime . k

-- Имея конструктор UnsafeGetTime мы можем после него написать любую IO операцию, 
-- а за пределами модуля это будет невозможно
getTime :: GetTime Time.UTCTime
getTime = UnsafeGetTime Time.getCurrentTime

And indeed, when trying to perform any IO operation while inside a monad GetTime, we will get a type matching error. All we can do is do getTimewhich is what we wanted.

module BusinessLogic where
import GetTime

someBusinessLogic :: GetTime String
someBusinessLogic = do
    t <- getTime

    -- print "Unsuccessful Hack"
    -- ^^^ Если раскомментировать строку выше, то компилятор ругается:
    -- Couldn't match type `IO' with `GetTime' -- Expected: GetTime () -- Actual: IO ()

    -- а вот если бы у нас был UnsafeGetTime, мы могла бы имели доступ ко всем IO операциям, например так:
    -- UnsafeGetTime $ print "Ho-ho-ho"

    -- Хоть мы и внутри монады GetTime, но необязательно возвращать тип UTCTime, 
    -- можно вернуть что угодно, например, строку
    pure ("Текущее время: " ++ show t )

Everything worked out? Well, not quite. After all, Haskell has a wonderful function unsafeCoerce, which can “transform” any data type into any other, but in essence simply instructs the compiler not to perform type checking in a given place. Therefore the line unsafeCoerce $ print "Successful Hack" hacks our entire security system.

Fortunately there is a pragma Safewhich prohibits the use unsafeCoerce and any other functions derived from it. It is enough to place the Safe pragma in one single place in the module from which the monad is called GetTime (for example, in the Main module), and we can be sure that in all code that runs inside the monad GetTimeno matter how big it is, no unsafeCoerce or similar functions (otherwise the compiler will report an error).

{-# LANGUAGE Safe #-}
module Main where

import GetTime
import qualified BusinessLogic

main :: IO ()
main = do
  -- запуск единственной функции getTime
  timeResult <- runGetTime getTime
  print timeResult

  -- запуск сколь угодно большого куска программного кода, 
  -- в котором гарантированно не будет выполнено никаких других IO операций, кроме getTime
  stringResult <- runGetTime BusinessLogic.someBusinessLogic
  putStrLn stringResult

It is clear how to create a limited IO monad for the general case by analogy with GetTime.

We want some code to be able to interact with the database and file system, but we don't want to give this code full access to the entire database and the entire hard drive, but rather limit its rights to certain directories and tables in the database. This time we'll call the wrapper type RIO – Restricted IO. This is exactly what the package is called in the Hackage repository.

module RIO
  ( RIO (), -- обёртка монады IO без конструктора
    Permission(..), -- настройки ограничений
    runRIO, 
    rioReadFile, -- несколько разрешенных IO операций
    rioWriteFile,
    rioReadFromDB,
    rioWriteToDB
  )
where

import Control.Monad (ap)
import Control.Monad.Reader (MonadIO (liftIO), ReaderT (runReaderT), asks)
import Data.ByteString (ByteString)
import qualified Data.ByteString as BS


-- С помощью типа данных Permission можно определить доступ к необходимым папкам и таблицам БД
data Permission = Permission
  { allowedReadDirs :: [FilePath],
    allowedWriteDirs :: [FilePath],
    allowedReadDBTables :: [String],
    allowedWriteDBTables :: [String]
  }

-- тип обёртка монады IO
newtype RIO a = UnsafeRIO {unRIO :: ReaderT Permission IO a}

runRIO :: Permission -> RIO a -> IO a
runRIO permissons routine = runReaderT (unRIO routine) permissons

-- Реализация фнуктора, аппликатива и монады полностью аналогичка предыдущему примеру.
instance Functor RIO where
  fmap f (UnsafeRIO io) = UnsafeRIO (fmap f io)

instance Applicative RIO where
  pure = UnsafeRIO . pure
  (<*>) = ap

instance Monad RIO where
  (UnsafeRIO ioA) >>= k = UnsafeRIO $ ioA >>= unRIO . k

-- Дозволенные IO операции
rioReadFile :: FilePath -> RIO (Maybe ByteString)
rioReadFile file =
  UnsafeRIO $ do
    readDirs <- asks allowedReadDirs
    if checkFilePath readDirs file
      then liftIO (BS.readFile file) >>= pure . Just
      else pure Nothing

rioWriteFile :: FilePath -> ByteString -> RIO Bool
rioWriteFile file content =
  UnsafeRIO $ do
    writeDirs <- asks allowedWriteDirs
    if checkFilePath writeDirs file
      then liftIO (BS.writeFile file content) >> pure True
      else pure False

-- Понятно, как реализовать и остальные необходимые функции. 
-- Здесь они представлены как заглушки для примера.
rioReadFromDB :: Connection -> TableName -> Fields -> RIO (Maybe [[ByteString]])
rioReadFromDB con table fields = undefined

rioWriteToDB :: Connection -> TableName -> Fields -> [[ByteString]] -> RIO Bool
rioWriteToDB con table fields content = undefined

checkFilePath :: [FilePath] -> FilePath -> Bool
checkFilePath = undefined

Note that the wrapper type is defined as newtype RIO a = UnsafeRIO {unRIO :: ReaderT Permission IO a}not like that

newtype RIO' a = UnsafeRIO {unRIO' :: IO a}

type RIO a = ReaderT Permission RIO' a

Otherwise the user will have access to the monad ReaderTwhich means it can replace the contents of Permission, for example, using the local function.

Using the RIO monad from third-party modules is completely similar to using the monad GetTime. Don't forget to add pragma Safe.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE Safe #-}

module RunRIO where

import Data.ByteString (ByteString)
import Data.Maybe (fromMaybe)
import RIO

conn = "Provider=PostgreSQL..."

routine :: RIO ByteString
routine = do
  mayFile <- rioReadFile "input_data/csv_files/file1.csv"
  mayData <- rioReadFromDB conn "users.accounts" ["Username", "email"]
  _ <- rioWriteToDB conn "log.common_logs" ["severity", "message"] [["info", "write OK"]]
  pure (fromMaybe "" mayFile)

main :: IO ()
main = do
  let permission =
        Permission
          { allowedReadDirs = ["input_data/csv_files/"],
            allowedWriteDirs = [],
            allowedReadDBTables = ["users.accounts", "transactions.transactions", "log.common_logs"],
            allowedWriteDBTables = ["log.common_logs"]
          }
  bs <- runRIO permission routine
  print bs

Using this technique, you can divide the program into blocks, each of which has a set of IO operations it needs. This approach will reduce the likelihood of an accidental error when writing code or when refactoring it, and will also make the entire system more secure, and will not allow an accidental (or intentionally created) error to damage the database, file system, or gain unauthorized access to information.