Analyzing Distributed Data in PostgreSQL with FDW

Setting up FDW in PostgreSQL

To start working with FDW, you first need to install the necessary extension. In our case, we will use postgres_fdwwhich allows you to connect to other PostgreSQL databases.

Connect to your database and run the following command to install the extension:

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

This command will create an extension that allows the PostgreSQL instance to communicate with other PostgreSQL databases.

Now it's time to create a server to connect to. Let's say there is a remote database that we need to work with. To do this, we need to specify where this database is located.

Create a server:

CREATE SERVER my_remote_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote_host', dbname 'remote_db', port '5432');

Here my_remote_server — is the server name, and remote_host And remote_db — the address and name of the remote database, respectively.

Now we need to set up user authentication using user mapping:

CREATE USER MAPPING FOR local_user
SERVER my_remote_server
OPTIONS (user 'remote_user', password 'remote_password');

Replace local_user, remote_user And remote_password to your values.

To access data from a remote database, you need to create external tables that will represent the tables of the remote server.

Create an external table:

CREATE FOREIGN TABLE remote_table (
    id SERIAL PRIMARY KEY,
    name TEXT,
    value NUMERIC
)
SERVER my_remote_server
OPTIONS (table_name 'actual_table_name');

Now there is remote_tablewhich refers to the table actual_table_name in a remote database.

Querying Remote Data via FDW

Let's start with the simplest thing – performing queries on remote data.

To retrieve data from a remote table, run:

SELECT * FROM remote_table;

This query will return all rows from the remote table.

To reduce the amount of returned data and speed up processing, let's add a filter:

SELECT * FROM remote_table WHERE value > 100;

Here we get only those records where value more than 100.

FDW allows you to perform JOIN with local tables. Let's see how to do it:

SELECT l.id, l.name, r.value
FROM local_table l
JOIN remote_table r ON l.id = r.id
WHERE r.value > 100;

This query joins data from a local table local_table and remote remote_tablefiltering results by value value.

Once the query has been executed, you can process the data as usual using standard PostgreSQL functions. For example, to calculate the average:

WITH filtered_data AS (
    SELECT name, value
    FROM remote_table
    WHERE value > 100
)
SELECT AVG(value) AS avg_value FROM filtered_data;

In this example, we first filter the data and then calculate the average.

Sometimes you need to make sure that the deleted data actually exists before proceeding further. This can be done by:

SELECT COUNT(*) FROM remote_table WHERE value IS NOT NULL;

FDW is not limited to PostgreSQL. You can connect to other databases, such as MySQL or SQLite, using the appropriate FDW extensions. For example, to connect to MySQL, use mysql_fdw:

CREATE EXTENSION IF NOT EXISTS mysql_fdw;

CREATE SERVER my_mysql_server
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (host 'mysql_host', port '3306');

Then we create the user mapping and external tables just like we did for PostgreSQL.

Conclusion

Foreign Data Wrappers allow you to integrate data from different sources, simplifying analysis and management.

How do you use FDW in your projects? Share your experience in the comments!


And you can study modern principles and practices of architecture, effective management of a team of analysts, trends in the international market on the online course “Systems Analyst. Team Lead”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *