main implementation options
In this article we will look at the main tools for implementing Elxir distributed computing.
Basics
Let's start with the main character – the virtual machine BEAM. BEAM is the heart of both Erlang and Elixir. Unlike most virtual machines, BEAM was originally designed with the needs of telecommunications systems in mind, where fault tolerance and low latency are important.
Main features of BEAM:
Easy processes: BEAM manages millions of lightweight processes with minimal overhead. These processes are isolated.
Multithreading and Parallelism: BEAM automatically distributes processes across available processor cores.
Actor model: In Elixir and Erlang, processes do not share memory and communicate with each other exclusively through messages.
Hot code swap: BEAM supports hot code update.
To understand how easy the process is on BEAM, let's look at a simple example:
spawn(fn -> IO.puts("Hello from a process!") end)
This code creates a new process that prints a message to the console. Creating and terminating the process requires virtually no resources.
Now let's look at the three main components on which distributed systems are built in Elixir: Node, Process And Message Passing.
Nodes
Node — is an independent instance of the BEAM virtual machine. In the context of distributed systems, a node is a single node that can run on a single server or on multiple different servers.
To connect one node to another, use the command:
Node.connect(:"node2@localhost")
You can run multiple nodes on the same or different hosts, and each node will communicate with the others via messages. All nodes in the network must use the same cookie
for authentication.
Processes
Processes in Elixir are the fundamental building blocks of distributed systems. Each process is isolated and communicates with other processes only through messages.
An example of a simple process that can receive messages:
defmodule SimpleProcess do
def start do
spawn(fn -> loop() end)
end
defp loop do
receive do
{:msg, from, text} ->
IO.puts("Received message: #{text}")
send(from, {:ok, "Message processed"})
loop()
end
end
end
# Запуск процесса
pid = SimpleProcess.start()
# Отправка сообщения
send(pid, {:msg, self(), "Hello, process!"})
# Получение ответа
receive do
{:ok, reply} -> IO.puts("Reply: #{reply}")
end
A process can receive and process messages, and send responses.
Message transmission
Message passing in Elixir is done using operators send
And receive
This model of interaction is called actor model and allows you to create systems in which processes do not block each other.
As mentioned earlier, processes in Elixir are isolated from each other and do not share memory, which reduces the likelihood of multithreading errors.
Now let's look at how to set up and run a distributed cluster on Elixir.
Running multiple nodes is the first step towards creating a distributed cluster.
This is how you can run multiple nodes on your local PC:
# Запуск первой ноды
iex --sname node1 --cookie my_secret_cookie
# Запуск второй ноды в новом терминале
iex --sname node2 --cookie my_secret_cookie
Once the nodes are running, they can be connected to each other:
# Подключение node1 к node2
Node.connect(:"node2@localhost")
# Проверка подключения
Node.list() # должно вывести [:node2@localhost]
Once the nodes are connected, you can organize interaction between processes on different nodes. To do this, just use the same commands send
And receive
but specifying the PID of the process on another node.
Example of sending a message to another node:
# На node1
pid = Node.spawn(:"node2@localhost", fn ->
receive do
{:msg, text} -> IO.puts("Message from node1: #{text}")
end
end)
# На node2
send(pid, {:msg, "Hello from node1!"})
The process is created on node2
but sending the message is done with node1
.
It is important not only to organize interaction between processes, but also to ensure their reliability. This is achieved with the help of supervisors — OTP components that automatically restart processes if they fail.
An example of a simple supervisor that manages processes on different nodes:
defmodule ClusterSupervisor do
use Supervisor
def start_link(_) do
Supervisor.start_link(__MODULE__, :ok, name: __MODULE__)
end
def init(:ok) do
children = [
{Task, fn -> Node.spawn(:"node1@localhost", MyWorker, :start_link, []) end},
{Task, fn -> Node.spawn(:"node2@localhost", MyWorker, :start_link, []) end}
]
Supervisor.init(children, strategy: :one_for_one)
end
end
# Запуск супервизора
ClusterSupervisor.start_link(nil)
The supervisor manages two processes running on different nodes. If either process fails, the supervisor will automatically restart it.
Useful tools
Distributed task
Task
— is a basic abstraction in Elixir for managing asynchronous tasks, which simplifies starting processes and processing results. In the context of distributed systems Task
allows you to run tasks on different nodes, making the code more parallel and, therefore, faster.
Built-in module Task
can be used to create distributed tasks that run on remote nodes.
Let's say there is a task that is time-consuming and resource-intensive for a single computer. You can distribute this task across multiple nodes in a cluster using Task
:
defmodule DistributedTaskExample do
def start(nodes) do
nodes
|> Enum.map(&Task.async(fn -> perform_task(&1) end))
|> Enum.map(&Task.await/1)
end
defp perform_task(node) do
Node.spawn(node, fn ->
# симуляция длительной операции
Process.sleep(1000)
IO.puts("Task completed on node: #{Node.self()}")
end)
end
end
# Использование:
nodes = [:"node1@localhost", :"node2@localhost"]
DistributedTaskExample.start(nodes)
Create and run tasks on multiple nodes using Task.async/1
And Task.await/1
Each node performs its task, and the results are synchronously collected on the node from which the launch was initiated.
Task
good for relatively simple tasks that can be easily parallelized without requiring complex state management or error handling.
GenServer
GenServer
— is a component in the OTP ecosystem on Elixir, which is a server process with the ability to handle requests from other processes. It has a set of abstractions for building robust servers that run on a single node or in a distributed system.
GenServer
– a great choice when you need to maintain state between calls, handle messages, asynchronous requests, or manage complex computations.
To start using GenServer
let's create a simple example of a distributed server that will maintain a common state between all connected nodes:
defmodule DistributedCounter do
use GenServer
# API
def start_link(initial_value) do
GenServer.start_link(__MODULE__, initial_value, name: __MODULE__)
end
def increment() do
GenServer.call(__MODULE__, :increment)
end
def get() do
GenServer.call(__MODULE__, :get)
end
# Callbacks
def init(initial_value) do
{:ok, initial_value}
end
def handle_call(:increment, _from, state) do
{:reply, state + 1, state + 1}
end
def handle_call(:get, _from, state) do
{:reply, state, state}
end
end
# запуск на одной ноде
{:ok, _pid} = DistributedCounter.start_link(0)
# увеличение счётчика с другой ноды
Node.connect(:"node2@localhost")
:rpc.call(:"node2@localhost", DistributedCounter, :increment, [])
# получение значения счётчика
DistributedCounter.get() # должно вернуть 1
DistributedCounter
is a distributed counter that increments its value when the method is called increment/0
. :rpc.call/4
allows you to call functions on remote nodes.
Swarm
Swarm
— is a library for dynamically distributing processes in an Elixir cluster. It automatically manages the distribution of processes between nodes, ensuring a uniform load and high fault tolerance. Swarm
integrates well with GenServer
and other OTP components.
Suppose you want to create a service that handles many tasks, and you want these tasks to be distributed across all available nodes. Using Swarm
you can easily achieve this:
defmodule MyWorker do
use GenServer
use Swarm, strategy: :gossip
def start_link(_) do
GenServer.start_link(__MODULE__, %{}, name: :my_worker)
end
def handle_call(:do_work, _from, state) do
# Симуляция работы
Process.sleep(1000)
{:reply, :ok, state}
end
end
# Запуск работы
Swarm.register(:my_worker, MyWorker, [])
GenServer.call(:my_worker, :do_work)
Swarm
will automatically distribute the process MyWorker
to available nodes.
Horde
Horde
— is a library for creating distributed supervisors and registries.
Example of use Horde
to create a distributed supervisor that manages a set of processes:
defmodule MyApp.DistributedSupervisor do
use Horde.DynamicSupervisor
def start_link(_) do
Horde.DynamicSupervisor.start_link(
name: __MODULE__,
strategy: :one_for_one,
members: :auto
)
end
def start_worker(worker_module, args) do
Horde.DynamicSupervisor.start_child(__MODULE__, {worker_module, args})
end
end
defmodule MyWorker do
use GenServer
def start_link(_) do
GenServer.start_link(__MODULE__, %{}, name: :my_worker)
end
def handle_call(:do_work, _from, state) do
Process.sleep(1000)
{:reply, :ok, state}
end
end
# Запуск супервизора
{:ok, _pid} = MyApp.DistributedSupervisor.start_link(nil)
# Добавление worker в распределённый супервизор
MyApp.DistributedSupervisor.start_worker(MyWorker, [])
Horde
controls the supervisor, which distributes processes across nodes. If one of the nodes fails, Horde
will automatically move processes to other nodes.
In conclusion, we invite all system analysts and those interested in this area to an open lesson on August 20 – there we will consider the differences between user scenarios (Use Cases) and user stories (User Story). You can sign up for the lesson for free on the online course page.