Implementation of the Active Patterns extension for OCaml
about the project
In the spring of 2020, as part of the spring practice in Computer Science Center I was developing a new construct for a programming language OCaml under strict guidance Dmitry Kosarev…
OCaml is one of the most successful and developed implementations of industrial programming syncretism (hence multiparadigm, multiplatform, very fast compiler, high productivity of generated code) and mathematics (hence state-of-the-art type system with powerful implementation of type inference, expressiveness and extensibility language, closeness to mathematical notation and semantics).
At the same time, the language community is very selective and slowly adds to the language only highly demanded constructions, provided that they do not introduce restrictions into the existing language. Therefore, the core of the language is very small and intuitive, and OCaml is happy to use as industrial developersand, say, mathematicians with Department of Higher Algebra and Number Theory, St. Petersburg State University…
For a deeper dive into the topic, I suggest taking a look at the articles OCaml for the masses and Why OCaml…
Now work in progress on the implementation of a multicore system for OCaml, coupled with algebraic effects, which will simultaneously increase the overall performance of the language and eliminate the existing limitations of the type system associated with the fact that the language allows impure calculations.
Pattern matching and active patterns
My work has focused mainly on the pattern matching construct, which is widely used in functional programming languages.
To illustrate, consider a simple example of rotating a node in a binary tree. In the most common imperative style, the code would probably look like this:
type 'a tree = | Node of 'a tree * 'a * 'a tree | Nil let rotate_left' t = if is_node t then let a = get_left t in let p = get_value t in let r = get_right t in if is_node r then let b = get_left t in let q = get_value t in let c = get_right t in Node(Node(a,p,b),q,c) else t else t
And here’s the exact same code written using the pattern matching construct:
let rotate_left = function | Node(a, p, Node(b, q, c)) -> Node(Node(a, p, b), q, c) | Node _ | Nil as x -> x
When using this design, we have the following advantages:
- high expressiveness;
- check for completeness, which is a critical property for correctness checking and program refactoring;
- efficient compilation schemes.
Completeness check means that the compiler, knowing the type definition, can check for each match whether it is true that all possible alternatives have been parsed and that there are no unreachable branches, no matter how complex the samples and no matter how they intersect with each other. Thus, if you change the type definition (adding new alternatives, removing or changing existing ones), the compiler will kindly give you all the places in the code that were directly affected.
For example, if I added new constructs to the syntax tree, the compiler will show me the AST typing code up to the function body, where I need to write the typing code for the new constructs:
This property makes OCaml very resistant to refactoring and other code changes.
Despite all the advantages described, there is also one very serious limitation on applicability. Have you noticed it? The type definition must be public (so that the compiler can show its constituent alternatives). And this, of course, immediately breaks down even the simplest abstractions. For example, if we want to define the simplest list interface, there is no way to tell in advance which type definition to export:
module type List = sig type 'a t (* = ? *) val cons: 'a -> 'a t -> 'a t val empty: 'a t val head: 'a t -> ('a * 'a t) option end
However, this problem is not fundamental, and, as noted elsewhere in 1987, you can also achieve the ability to pattern match on abstract types.
Formulation of the problem
Since 1987, many different designs have been presented in the literature to solve this problem, here are just a few of them:
By the beginning of the project has already been done work on a reasonable and objective choice of a specific solution for implementation, the most advantageous was the expansion Active patternsimplemented in F #.
The goal of the project was to start implementing Active Patterns for the OCaml compiler and make progress as far as possible.
The very idea of Active patterns (as well as similar extensions) is very simple: since abstraction is achieved by hiding the implementation inside the function, it is necessary to allow a call to a function inside the pattern matching that would convert the unknown value of the abstract data type to a known list of alternatives. Active patterns encode this list of alternatives right inside the function name. So, in the interface of the list above, you need to add the following function
module type List = sig type 'a t val (|Cons|Nil|): 'a t -> ('a * 'a t, unit) choice2 val cons: 'a -> 'a t -> 'a t val empty: 'a t val head: 'a t -> ('a * 'a t) option end
The result is an anonymous sum type
choice2with the following definition (there are similar generated types up to
type ('a, 'b) choice2 = | Choice2_1 of 'a | Choice2_2 of 'b
So the function
(|Cons|Nil|) will convert the list to one of two alternatives: either a head and tail pair, or an empty alternative, meaning the list was empty.
Defining such a function for a standard list is trivial and would look like this:
let (|Cons|Nil|) = function |  -> Nil | x :: xs -> Cons(x, xs)
As an example of use, consider a function that removes consecutive duplicates in a list:
(* destutter [1;1;4;3;3;2] = [1;4;3;2] *) let rec destutter = function | Nil -> nil | Cons(x, Nil) -> cons x empty | Cons(x, Cons(y, rest)) -> if x = y then destutter (cons y rest) else cons x (destutter (cons y rest))
Note that all the benefits of pattern matching have been retained: the matching syntax is the same and the completeness checks can work in full. Efficiently compiling such a solution is beyond the scope of this overview, but it is also possible.
As part of this work, it was possible to implement the parsing and typing of the extension for the OCaml compiler version 4.09, the results are presented here…
The compiler parser is implemented using an advanced parser generator Menhir… Menhir has a fairly complete and detailed manual, however, even with him, it was not always clear how the inference rule could be set, and how not and why. The resulting parser patch is quite small and simple, but the path to it lay through 10-15 shift-reduce and reduce-reduce conflicts, the analysis and fixing of which took some time:
I would like to pay tribute to the Menhir developers and thank them for the work done in detailing and clarifying the errors. Once the parser-generator failed to illustrate the conflict and had to parse it according to the constructed automaton for 1500 states. This required, of course, an order of magnitude more time.
Extension typing was especially difficult. The typing code is about 37,000 lines long and is almost undocumented, making it difficult for a beginner to figure it out. Saved me article by Oleg Kiselevexplaining key aspects of implementation.
Another conclusion that I made for myself is not to forget to use the old versions of the project. For example, here’s a quantitative comparison of the same piece of typing in the 2019 and 2005 versions:
The 2019 version contains compiler analysis and warnings, as well as additional technical details that I was not interested in. To understand, I just had to look at the 2005 version, which contains only key actions.
Finally, the main conclusion I made during my work is confirmation that documentation for open-source projects is critically important. As expressive as the language is, the source code can only tell how the function behaves, not what she does or why she does it that way. It is very difficult to read call chains
type_cases -> t
type_pat_aux and figure out which of the functions you need; or just one parameter name
constr guess what constructors and why should be written here.
The need to look at the use cases every time slows down the programmer and gets tired very quickly: let me remind you the rule “seven, plus or minus two” – just as many objects a person can, on average, simultaneously keep in mind. And when you finally understand the meaning of the parameter inside the sixth nested function and suddenly realize that you do not remember why you needed it, or it turns out that you didn’t need it, it becomes very sad because of the amount of time spent.
As part of a project, I was able to implement parsing and typing for Active patterns. The compiler I patched can parse and type the file with any examples of using Active patterns.
Next, you need to modify the compilation scheme (OCaml uses non-trivial optimizing compilation pattern matching) and schema completeness, which were almost completely rewritten by the OCaml compiler development team during the project.
I hope this implementation will still be completed, with or without me. Despite all the difficulties, it was great to try yourself in developing an industrial compiler for your favorite language.
Bashkirov Alexander – student Computer Science Center and 4-year Bachelor of Science in Software Engineering at St. Petersburg State University, an employee of JetBrains.