Avoiding recompilation hell in Elixir with mix xref
Recompilation hell: what is it like?
Elixir is an amazing language and it has been a huge privilege to work with it for over a decade (how time flies)!
I'd like to point out an issue that, if ignored, can seriously impact your team's productivity. Yes, I'm talking about (re)compiling a module.
You make several changes to one file in your code base and click “recompile”. Boom: Compiling 93 files (.ex)
. Then you make another change and boom: Compiling 103 files (.ex)
.
We've all been through this. This problem There is solution. Whether the solution is painful depends on how long the problem has been left unresolved in your codebase.
If you don't fix this problem, the number of recompiled files will likely grow as your project grows, making them more difficult to get rid of.
Why is this important
Before I tell you how to detect, fix, and prevent this problem from happening again, I'd like to briefly share why it's important and why you should certainly you should worry about this.
Feedback time, that is, the time it takes a developer to iterate between making changes And watching them, is the single most important individual performance metric you should track.
Strengthen the feedback loop: If you change one module, ideally only one module should be recompiled. If you're doing local web development, your local page should load almost instantly. There are exceptions to this rule, but if you are an exception, you will realize What you are an exception.
Step 1: Finding the Problem
Well, discover easy: you change one module, and several are recompiled? Yes, you are in recompilation hell. But how do you know which circle the hell are you in?
You can use the most underrated tool in the Elixir ecosystem!
mix xref tutorial
mix xref
it is without a doubt the most underrated tool in the Elixir ecosystem. Think of it as a Swiss Army knife that provides insight into the relationships between modules in your codebase.
Step 2: Understanding the Problem
Okay, you know you're in recompilation hell. But did you know Why are you in hell? I guess: you sinned with macros.
To fully understand the problem, you first need to understand the interactions between modules and what creates a compilation dependency.
(Re)compiling a module in Elixir
Let's understand once and for all how modules are recompiled in Elixir.
Scenario 1: Runtime Dependencies
In this first scenario there are only runtime function calls: A1 calls B1, which calls C1.
# lib/a1.ex
defmodule A1 do
def call_b, do: B1.call_c()
end
# lib/b1.ex
defmodule B1 do
def call_c, do: C1.do_something()
end
# lib/c1.ex
defmodule C1 do
def do_something, do: IO.puts(“I did something!”)
end
Here's the conclusion mix xref --graph
:
lib/a1.ex
└── lib/b1.ex
lib/b1.ex
└── lib/c1.ex
lib/c1.ex
This tells us that lib/a1.ex
has a runtime dependency on lib/b1.ex
which in turn has a dependence of execution time on lib/c1.ex
which has no dependencies.
NOTE: We can say that this is a runtime dependency because in mix xref
There is no additional information next to the file path in the output. If these were compile-time dependencies, we would see “lib/a1.ex (compile)”.
Recompilation output when changing any of these modules individually:
Runtime dependencies are good because it means the change one module will only require recompilation of this one module. In an ideal world, each module in your codebase would only depend on other modules at runtime.
Scenario 2: Compile-time dependencies
Alas, we do not live in an ideal world. Let's model the dependency at compile time
# lib/a2.ex
defmodule A2 do
@b B2
def call_b, do: @b.say_hello()
end
# lib/b2.ex
defmodule B2 do
@person C2.get_person()
def say_hello, do: IO.puts(“Hello #{@person}”)
end
# lib/c2.ex
defmodule C2 do
def get_person, do: :mike
end
As you probably know, module attributes are evaluated at compile time (which is why you should use Application.compile_env/3
module attributes). For the code above this means the following:
Attribute
@b
inA2
contains:Elixir.B2
atom referring toB2
module.Attribute
@person
inB2
contains:mike
the atom that came from the callC2.get_person/0
Elixir Quiz: Which modules above will have dependencies at compile time?
lib/a2.ex
└── lib/b2.ex
lib/b2.ex
└── lib/c2.ex (compile)
lib/c2.ex
As you would expect, B2
depends on C2
at compile time because if we change the result to C2.person/0
ahem, :joe
, B2
it will need to be recompiled to @person
the module attribute could be re-evaluated.
Somewhat surprisingly what notA2
has compile time dependencies. The compiler is smart enough to keep track of what any changes to a module will never change the value.B2B2@b
Recompilation output when changing any of these modules individually:
Should we worry about these compile time dependencies? Personally, I wouldn't worry too much until they become a noticeable problem. They can become noticeable when you start, for example, calling a specific function from a module attribute in multiple modules.
Cause I wouldn't bother initially is that this type of dependency is usually easy to resolve at compile time. So go ahead, let your codebase evolve and grow, and if you have any rough edges, you can fix them later.
Of course, this does not mean that you can neglect it. When writing or reviewing code, always pay special attention to module attributes. If you see a compilation dependency that could become a nuisance in the future—or that could easily be removed right now—go ahead and save yourself future hassle.
Scenario 3: Transitive compile-time dependencies
The third scenario I'd like to demonstrate is when your code has transitive compile time dependencies. If you suffer from the pain of recompilation, you are probably encountering this scenario.
Below is the smallest demo I could come up with:
defmodule A3 do
@runtime_dependency_on_b B3
def foo, do: :foo
end
defmodule B3 do
@runtime_dependency_on_c C3
end
defmodule C3 do
end
defmodule ClientOne do
@compile_time_dependency_on_a A3.foo()
end
defmodule ClientTwo do
@compile_time_dependency_on_a A3.foo()
end
defmodule ClientThree do
@compile_time_dependency_on_a A3.foo()
end
You have three “clients”. Clients are just users A3
(with normal compile time dependency). A3
then there is a runtime dependency on B3
which also has a runtime dependency on C3
. Doesn't sound so bad, right? The most important thing is that this seems to be a regular occurrence in any given codebase.
Here by “client” I meant that there could be multiple instances using A3
. Think of events, endpoints, or jobs, each of which has a dependency on its “main” implementation at compile time.
This is what the compilation graph of these modules looks like:
Looking at the graph above, we expect changes in lib/a3.ex
will lead to the recompilation of all N of its clients. Indeed, this is exactly what is happening. But what do you expect to recompile if you change lib/c3.ex
?
Only C3
yes? Yes, me too. But this is what happens when I change C3
:
Compiling 4 files (.ex)
Compiled lib/c3.ex
Compiled lib/client_three.ex
Compiled lib/client_two.ex
Compiled lib/client_one.ex
Oh, not good. What we are seeing here is transitive Compile time dependencies in action.
Although my minimal example seems far from reality, typically you will encounter a situation where:
Do you have
Core.Event
implementation with macroDo you have several
Event.MyEventName
modules that use macros fromCore.Event
you call
Services.Log
from macroYou're calling
Services.User
from the insideServices.Log
You're calling
Schemas.User
from the insideServices.User
Every time you change a user's schema, you also recompile each of your events.
Now imagine following a similar pattern with endpoints, job definitions, schemas, authorization, error handling, data validation, analytics. You can easily end up in a situation where everything is tangled up in a web of modules. One change will essentially recompile every single module in your codebase.
This is the worst! This is the eighth circle of hell! It can reduce your productivity by several orders of magnitude! Do yourself (and your team) a favor and get rid of them – or attentively Keep an eye on them so they don't grow unintentionally.
In practice, the worst offenders of this are macros. You may be affected even if you don't write macros yourself: “macro-heavy” libraries in particular are prone to transitive dependencies (Absinthe is worth mentioning by name: each the codebase I've seen using Absinthe suffers from this).
Step 3: Troubleshooting
At that moment you know that you are in hell, and more importantly, you know why you are in hell. How can you get out? Essentially, it all comes down to two simple steps:
Identifying problematic modules in key or common areas of your codebase; and also
Refactor to avoid creating dependencies at compile time.
Will it be difficult? It depends on how spaghetti your modules are. If you have a big ball of dirt, untangling your dependencies will likely be painful.
However, as difficult as this task may be, I can guarantee that it is definitely worth the effort. And once you untangle it, if you continue reading this blog until the end, you will learn what you need to do to prevent transitive dependencies from occurring in the future.
Identifying transitive dependencies
Let's go! First you will need to determine which modules are affected.
The simplest thing you can do is:
Change a module that triggers the recompilation of several other modules.
Run
mix compile --verbose
.
You now have a list of all modules that have been recompiled.
But we can do better! We can get the full 'path' between two files using mix xref
. This will make it much easier for you to understand where exactly the obfuscation is happening:
mix xref graph --source SOURCE --sink TARGET
Here, TARGET
this is the file that you modified and triggered the recompilation of several unrelated files, and SOURCE
this is the module that you think (or know) starts the compilation chain. If you don't know which module is SOURCE
just use the latest file shown in mix compile --verbose
the command you ran above. This is probably what SOURCE
you need.
Below is the result for scenario 3
lib/client_one.ex
└── lib/a3.ex (compile)
└── lib/b3.ex
└── lib/c3.ex
In the above example, if you can remove the compilation dependency between client_one.ex
And a3.ex
or remove the runtime dependency between a3
And b3
you will break the compilation chain.
Removing compile-time dependencies
You have defined the compilation chain and now you need to break it. Below you will find several strategies that you can use to refactor problematic modules.
Strategy 1: Move macros into dedicated modules without additional dependencies
Initially we have Core.Event
a module containing both macros and general functions that can be used by events.
# lib/strategy_1/before/event_user_created.ex
defmodule S1.Before.Event.UserCreated do
use S1.Before.Core.Event
@name :user_created_event
def new(user_id), do: %{user_id: user_id}
end
# lib/strategy_1/before/core_event.ex
defmodule S1.Before.Core.Event do
defmacro __using__(_) do
quote do
alias __MODULE__
@before_compile unquote(__MODULE__)
end
end
defmacro __before_compile__(_) do
quote do
@name || raise “You must specify the event name via @name”
def get_name, do: @name
end
end
def emit(%{id: id, data: _data}),
do: S1.After.Services.Log.info(“Emitting event_id=#{id}”)
end
# lib/strategy_1/before/service_log.ex
defmodule S1.Before.Services.Log do
def info(value), do: IO.puts(“[info] #{value}”)
end
Notice how Services.Log
ends up being part of the compilation chain simply because it is part of a module Core.Event
although it does not play a special role in the macro itself. The command below tells us that there is a transitive dependency ( mix ref
calls them “compiled-linked”) between event_user_created.ex
Andcore_event.ex
> mix xref graph --label compile-connected | grep before
lib/strategy_1/before/event_user_created.ex
└── lib/strategy_1/before/core_event.ex (compile)
This strategy is to split Core.Event
into two parts: Core.Event.Definition
with macros and Core.Event
with common functions.
# lib/strategy_1/after/core_event.ex
defmodule S1.After.Core.Event do
def emit(%{id: id, data: _data}),
do: S1.After.Services.Log.info("Emitting event_id=#{id}")
end
# lib/strategy_1/after/core_event_definition.ex
defmodule S1.After.Core.Event.Definition do
defmacro __using__(_) do
quote do
alias Core.Event
@before_compile unquote(__MODULE__)
end
end
defmacro __before_compile__(_) do
quote do
@name || raise “You must specify the event name via @name”
def get_name, do: @name
end
end
end
What you should remember here is that your Definition
module Not must have dependencies on other parts of the application at compile time, otherwise you're back to square one.
If you can't get around calling another module in your codebase, try following the same pattern: make sure you extract the necessary functions into a dedicated module. The goal is to isolate the functions needed by the macro in their own modules.
By applying this strategy, we are now free from transitive dependencies:
> mix xref graph --label compile-connected | grep after
(empty output, meaning there are no transitive dependencies)
You can find a complete example on Github .
Strategy 2: Module reference at runtime (Absinthe example)
This strategy ( ahem hack) is to call the module at runtime to break the chain between queries.ex
And resolver.ex
.
# lib/strategy_2/before/queries.ex
defmodule S2.before.queries do
use Absinthe.Schema.Notation
object :queries do
field :user, :user do
arg(:id, :integer)
# We are calling the resolver directly, as any sane person would
resolve(&S2.Before.Resolver.query_user/3)
end
end
end
Here's an alternative:
# lib/strategy_2/after/queries.ex
defmodule S2.After.Queries do
use Absinthe.Schema.Notation
object :queries do
field :user, :user do
arg(:id, :integer)
resolve(fn parent, args, resolution ->
# Here, we remove the compile-time reference by building
# the resolver module dynamically
resolver().query_user(parent, args, resolution)
end)
end
end
# Either option below works
defp resolver, do: :”Elixir.S2.Before.Resolver”
# defp resolver, do: Module.concat(S2.Before, Resolver)
end
But Renato, is it safe? Could this cause some module to become obsolete due to not being recompiled? This is a good question. And I don't have a good answer other than I used thishackingstrategies throughout at least 8 years and I don’t remember a single case when it caused inconsistencies.
Disadvantages of this approach:
Less code readability because you're doing something unexpected.
Worse performance as additional function calls are required.
Are the disadvantages worth the advantages of not having a compilation chain? Almost always yes! The readability degradation can be abstracted away (as seen in the example). The performance overhead is negligible in most cases (a single SQL query will be at least 10,000 times slower).
This strategy is especially useful for Absinthe: when you break the chain between queries/mutations and resolvers, you effectively protect yourself from transitive dependencies that could potentially affect your entire codebase!
You can find a complete example on Github .
Strategy 3: Keep Macros Simple
My third strategy is to keep macros simple when possible. In other words, make sure they only link to built-in modules and/or third party libraries.
Strategy 4: Don't use macros
Macros are tempting, I know. Is it possible to Not use? If this is a valid option, I recommend you use it.
Macros have the disadvantages of making code more difficult to read, more difficult to test, more difficult to document, less intuitive, and less idiomatic. On the other hand, they have the positive side of reducing boilerplate and ensuring consistency and conventions throughout the codebase.
Step 4: Prevent this situation from happening again
After a lot of work, you managed to get rid of the compilation chains! Congratulations! Your colleagues will be very grateful for your refactoring.
Detecting transitive dependencies in your CI pipeline
First, you need to find out how many chains you currently have:
mix xref graph --label compile-connected | grep "(compile)" | wc -l
The above command will show you the quantity compile-connected
(transitive) dependencies you have.
You can then apply the following check in your pipeline:
- name: Check for transitive compilation dependencies
run: mix xref graph --label compile-connected --fail-above NUMBER
Replace NUMBER
the current (or target) number of transitive dependencies in your codebase. Your goal should be zero.
By following this method you will prevent:
the emergence of new chains.
existing networks are growing.
Wait! You have a CI pipeline, right? If you don't have it but are using Github, just add this basic workflow template . It compiles, tests, checks formatting and transitive dependencies. This is a good starting point. Github Actions is free, including private repositories.
Worth mentioning
Pattern matching of structures no longer creates dependencies at compile time.
If you've been working with Elixir for a few years now, you might remember that this used to be a problem:
defmodule Services.User do
alias Schemas.Userdef query(%User{id: id}) do
#…
end
endThe above code will create a compile time dependency between
Services.User
AndSchemas.User
. As of Elixir 1.11 (~2020) this not like that anymore . Now this type of pattern matching creates an “export” dependency that will only force a recompile if changed structuresSchemas.User
.Conclusion : Don't be afraid of aliasing and pattern matching regarding structure. This makes your code better, safer, easier to read, and won't introduce unnecessary compile-time dependencies.
Don't (blindly) rely on mix xref boundaries
As you saw, a module can reference another one dynamically. When this happens
mix xref
He Not can tell you thatA
may causeB
.Conclusion : don’t trust blindly
mix xref
output, especially if you're trying to enforce security/boundaries between modules.Visualize your dependencies with DepViz and Graphviz
There is an amazing tool that you should use right now: DepViz .
Just go to https://depviz.jasonaxelson.com/ .
Create your own
.dot
file with extensionmix xref graph --format dot
.Upload and visualize your clear, well-organized module structure.
Alternatively you can simply use Graphviz :
Create your own
.dot
file with extensionmix xref graph --format dot
.Create
.svg
your schedule usingdot -Tsvg xref_graph.dot -o xref_graph.svg
.Open it in your browser (or wherever) and visualize the hierarchy of modules.
Conclusion
Whether you are a CTO, a technical lead, or an individual team member, please, for your own sake, pay attention to the feedback loop.
When starting a new project, an experienced engineer will keep a close eye on this because he knows how important it is in the long run to have fast cycle. However, less experienced engineers or experienced engineers without much experience with Elixir may not realize that they are harming the feedback loop until it is too late.
Optimize the feedback loop that your developers (or you) go through every day. I insist: when it comes to performance, this is the first metric you should care about.