Sharding schemes in PostgreSQL
This article will take a quick look at how different shardin schemes are implemented in PostgreSQL
Implementation of sharding schemes
Horizontal sharding
Horizontal sharding, or row-level sharding, involves dividing the data of one table into many shards, where each shard is stored in a separate database or on a separate server. Data is distributed based on a specific sharding key, which can be, for example, user ID
Horizontal sharding is suitable for high volume transactions and operations where high availability is needed.
Let's look at how horizontal sharding can be implemented using triggers and partition tables as an example. Partitioning involves dividing one large table into several smaller tables, called partitions, which logically represent one table.
Let's say there is a table with orders orders
which needs to be partitioned by year:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE NOT NULL,
customer_id INT NOT NULL,
amount DECIMAL NOT NULL
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2020 PARTITION OF orders FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
CREATE TABLE orders_2021 PARTITION OF orders FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
-- и так далее для каждого года
When inserting data into a table orders
PostgreSQL will automatically route the data to the appropriate partition based on the value order_date
.
If you have a user table that you want to partition by user country, you can do it like this:
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
country TEXT NOT NULL
) PARTITION BY LIST (country);
CREATE TABLE users_usa PARTITION OF users FOR VALUES IN ('USA');
CREATE TABLE users_canada PARTITION OF users FOR VALUES IN ('Canada');
CREATE TABLE users_germany PARTITION OF users FOR VALUES IN ('Germany');
-- и так далее для каждой страны
In cases where more granular control of sharding is preferable, triggers can be used to distribute data across shards:
CREATE FUNCTION orders_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF (NEW.order_date < '2021-01-01') THEN
INSERT INTO orders_2020 VALUES (NEW.*);
ELSIF (NEW.order_date >= '2021-01-01' AND NEW.order_date < '2022-01-01') THEN
INSERT INTO orders_2021 VALUES (NEW.*);
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER orders_before_insert
BEFORE INSERT ON orders
FOR EACH ROW EXECUTE FUNCTION orders_insert_trigger();
Before inserting into table orders
the trigger checks the order date and redirects the record to the appropriate partition.
Vertical sharding
Vertical sharding involves dividing a table into columns, where each shard contains a specific set of columns from the original table. Unlike horizontal sharding, which splits rows, vertical sharding splits data based on the functionality or context of the data.
Suitable for applications where individual modules or functions work with different sets of data.
Suppose there is a table users
, which contains a large number of columns, some of which are used much more often than others. For optimization, you can divide this table into two: one will contain frequently used data, for example, user_id
, name
, email
and the other is less frequently used data, for example, address
, phone_number
etc.
-- Таблица для часто используемых данных
CREATE TABLE users_common (
user_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL
);
-- Таблица для менее часто используемых данных
CREATE TABLE users_details (
user_id INTEGER NOT NULL REFERENCES users_common(user_id),
address TEXT,
phone_number TEXT,
CONSTRAINT pk_users_details PRIMARY KEY (user_id)
);
To implement vertical sharding across multiple servers, you can use FDW, which allows PostgreSQL to interact with data stored outside of the current database. Using FDW, you can create a logical separation of columns across different servers.
Let's say you want to store detailed information about users on a separate server.
Installing the extension
postgres_fdw
on the main server:
CREATE EXTENSION postgres_fdw;
Let's create a server configuration to access the external server:
CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'foreign_host', dbname 'foreign_db', port '5432');
Create a custom display for the external server:
CREATE USER MAPPING FOR local_user
SERVER foreign_server
OPTIONS (user 'foreign_user', password 'foreign_password');
We create an external table, which will be a vertical shard on an external server:
CREATE FOREIGN TABLE users_details_foreign (
user_id INTEGER NOT NULL,
address TEXT,
phone_number TEXT
) SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'users_details');
Functional sharding
Functional sharding is an approach in which data is partitioned and distributed based on business logic or application functionality. That is, dividing data by user type, geo region, etc.
For example, let's divide user and order data according to different schemes within one database
CREATE SCHEMA users_data;
CREATE SCHEMA orders_data;
Let's create tables in the appropriate schemas
-- Таблица пользователей в схеме users_data
CREATE TABLE users_data.users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);
-- Таблица заказов в схеме orders_data
CREATE TABLE orders_data.orders (
order_id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users_data.users(user_id),
order_date DATE NOT NULL,
amount DECIMAL(10, 2) NOT NULL
);
To access data in different schemas, simply specify the full table name, including the schema name:
-- Вставка данных в таблицу пользователей
INSERT INTO users_data.users (username, email) VALUES ('user1', 'user1@example.com');
-- Вставка данных в таблицу заказов
INSERT INTO orders_data.orders (user_id, order_date, amount) VALUES (1, '2023-01-01', 100.00);
You can also use FDW:
CREATE EXTENSION postgres_fdw;
Let's create an FDW server:
CREATE SERVER remote_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote_host', dbname 'remote_db', port '5432');
Let's create a user:
CREATE USER MAPPING FOR CURRENT_USER
SERVER remote_server
OPTIONS (user 'remote_user', password 'remote_password');
Let's create an external table:
CREATE FOREIGN TABLE users_data.users_remote (
user_id SERIAL PRIMARY KEY,
username VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
) SERVER remote_server OPTIONS (schema_name 'public', table_name 'users');
In conclusion, I would like to recommend free webinar of the BI analytics course.