graph database you didn't know existed

PostgreSQL (Postgres) is a powerful relational database capable of storing a wide variety of data types and structures. When we need to store graph data structures, we often turn to databases that are positioned as a suitable solution for this, such as Neo4J or Dgraph. But take your time! While Postgres is usually overlooked when working with graph data structures, it excels at efficiently storing and querying graph data.

Understanding Graph Data Structures

Before proceeding to explain Postgres as a graph database, we need to understand what a graph data structure is. A graph, or graph data structure, is a set of nodes and edges, where each node represents an entity or object, and each edge represents a link between two nodes.

A visual example of a graph data structure.

To start thinking about graphs in code, you can write the following TypeScript:

class Node {
  edges: Edge[] = [];
  data: string;
}

class Edge {
  previousNode: Node;
  nextNode?: Node;
}

Each node (node) contains a list of its edges, and each edge (edge) contains a link to the next/previous node. As we will learn from SQL below, nodes do not always need to know about their edges.

Facebook is a popular social network that uses a graph to describe people and their connections. A person can have friends and these friends also have their own list of friends. Each person is represented by a node, and each friendship can be represented by an edge. Graphs are used to model many different systems, for example, npm dependencies, work processestransport systems, production lines and much more!

Storing graph data structures in Postgres

To store a graph in Postgres, we just need to create two tables:

nodes

And

edges

. Table

nodes

will store information about each entity, and in the table

edges

information about the relationships between these entities will be stored.

Let’s start by creating a table nodes:

CREATE TABLE nodes (
  id SERIAL PRIMARY KEY,
  data VARCHAR(255)
);

The table we defined

nodes

has two columns:

id

And

data

. Column

id

contains auto-incrementing integer values that serve as the table’s primary key. Column

data

are strings that store additional data associated with the node. In this example, we’ll keep it simple and keep only a string column, but in the real world, this table could contain anything and have any number of columns.

The most important table when creating a graph data structure is the table edges:

CREATE TABLE edges (
  previous_node INTEGER REFERENCES nodes(id),
  next_node INTEGER REFERENCES nodes(id),
  PRIMARY KEY (previous_node, next_node)
);

We have created two columns,

previous_node

And

next_node

, denoting the relationship between nodes. Each of these columns stores the foreign key for the node. The important conclusion is that the table

edges

refers to two rows in the same table. An edge can only have one pair

previous_node

And

next_node

so we use a composite primary key so that each edge is unique and cannot refer to itself.

Having created tables, we can insert data into them.

INSERT INTO nodes (data) VALUES ('Bob');
INSERT INTO nodes (data) VALUES ('Hank');
INSERT INTO nodes (data) VALUES ('Jeff');

And now let’s connect the nodes with edges:

INSERT INTO edges (previous_node, next_node) VALUES (1, 2);
INSERT INTO edges (previous_node, next_node) VALUES (1, 3);

If we were to visualize this graph, it would look like this:

Querying Graph Data Structures in Postgres

Having created a graph data structure, we can start executing queries on it using the SQL we know and love!

Want to know who Bob is friends with?

SELECT id, data
FROM nodes
JOIN edges ON nodes.id = edges.next_node
WHERE edges.previous_node = 1;

Finding everything

nodes

associated with the node having

id

1 (id of Bob).

Looks like Bob is popular! But what if we want to know who Bob’s friends are friends with?

Let’s insert some more nodes and edges to show this:

INSERT INTO nodes (data) VALUES ('Sally');
INSERT INTO nodes (data) VALUES ('Sue');
INSERT INTO nodes (data) VALUES ('Sam');

INSERT INTO edges (previous_node, next_node) VALUES (2, 4);
INSERT INTO edges (previous_node, next_node) VALUES (3, 4);
INSERT INTO edges (previous_node, next_node) VALUES (4, 5);

To query all friends of Bob’s friends, we can extend the previous query by joining the table again

edges

but then maintaining the database would become a nightmare, because we would have to join for each “level” of the graph.

Postgres has a built-in feature that allows us to query graph data without knowing exactly how many joins we need: recursive queries. Recursive queries allow you to traverse the graph, starting at a given node and moving along its edges to some given endpoint.

To create a recursive query to find all of Bob’s friends and their friends, we need to write the following SQL:

WITH RECURSIVE friend_of_friend AS (
  SELECT edges.next_node
  FROM edges
  WHERE edges.previous_node = 1
  UNION
  SELECT edges.next_node
  FROM edges
  JOIN friend_of_friend ON edges.previous_node = friend_of_friend.next_node
)
SELECT nodes.data
FROM nodes
JOIN friend_of_friend ON nodes.id = friend_of_friend.next_node;

At first, this may seem incomprehensible, so let’s analyze the commands. A recursive query has two parts: a base case and a recursive case. The base case is where we want to start the query. A recursive case is a “loop” that keeps executing until some endpoint is reached.

WITH RECURSIVE {name} AS (
  {base case}
  UNION
  {recursive case}
)

The above shows the simplest SQL structure of a recursive query.

In our example, we need to start the query with Bob’s friends to find the edges where Bob (id: 1) is previous_node. Then, in the recursive case, we continuously join the tables edges with itself until we reach the end of Count Bob (that is, until we reach friend_of_friend.next_node = NULL). Outside of the recursive case, we merge it all together. We need to request nodesassociated with the edges from the recursive query so that we can get the names of all of Bob’s friends.

Finally

Postgres built-in functions can store and query graph data structures. We used a similar approach in my previous work to dynamically generate work instructions on a production line. Based on the given parameters and the rules defined in each edge, it is possible to generate a correct document by traversing the entire graph stored in Postgres. If you are already using Postgres to work with relational data, then you can integrate graph data structures into your existing database by adding extra systems!