Exploring the “small world” graph with Neo4j

When my sister moved to another city and got to know the neighbors, it turned out that her neighbor’s grandparents and our grandparents were good friends and talked while living nearby in another city – two generations ago. It is interesting when such unexpected connections are discovered. According to network theory, paths connecting nodes in a network are often shorter than one might think.
If everyone in the network knows k other people, then we can simplistically assume that, starting from this person and completing n transitions from node to node, we will find kⁿ human. Given the exponential growth, it would take very little time to build a path from any particular person to any other person in the graph.
But in real social networks, many people who are familiar to a particular person also know each other. This crossover between friends of friends reduces the number of new people I can turn to after each hop in the path from the starting node. It can be difficult to find paths that start in a tight-knit community and then branch out to the farthest corners of the network.
AT chapter 20 of the book Networks Crowds and Markets its authors David Easley and John Kleinberg provide a theoretical apparatus describing how phenomena can arise in the real world that fit into the “small world” graph. This theory combines the idea homophilia, according to which similar people are clustered together, and the idea of weak ties, where relationships branch out throughout the network. Explanation based on work Duncan Watts and Steve Strogatz. Let’s follow these examples with code written with Neo4j.
To execute the code below, enter sandbox and select “New Project.” (New project). Next, from the proposed project blanks, select “Graph Data Science”.

The sandbox is loaded with a large amount of Game of Thrones data that we won’t need for this exercise. Let’s remove them.
MATCH (n)
DETACH DELETE n;
Now let’s lay out a grid of Person nodes, 30 by 30. Each Person node will receive three properties: a row, a column, and coordinates.
UNWIND RANGE(1,30) AS row
UNWIND RANGE(1,30) AS column
CREATE (p:Person {row:row, column:column,
coordinates:toString(column) + ', ' + toString(row)});
We create relationships KNOWS
between each specific person and the person located in the grid to his right.
MATCH (p1:Person), (p2:Person)
WHERE p1.row = p2.row
AND p1.column = p2.column - 1
CREATE (p1)-[:KNOWS]->(p2);
We create relationships KNOWS
between each specific person and the person located in the grid strictly below him.
MATCH (p1:Person), (p2:Person)
WHERE p1.column = p2.column
AND p1.row = p2.row - 1
CREATE (p1)-[:KNOWS]->(p2);
Between each specific person and two persons adjacent to him diagonally and located one row below.
MATCH (p1:Person)-->(p2:Person)
WHERE p1.row = p2.row-1
WITH (p1), (p2)
MATCH (p2)--(p3)
WHERE p3.row = p2.row
CREATE (p1)-[:KNOWS]->(p3);
We have a beautiful lattice pattern. Choose Person
and their neighbors, and then we’ll see how these relationships work.
MATCH (p:Person {row:5, column:5})--(n)
RETURN *;

I see a lot of triangles in this picture. Let’s use a library for data science on graphs to determine what is the proportion of such people in the circle of acquaintances of a given person who also know each other. First, let’s create a named graph in memory.
CALL gds.graph.create('base-grid', 'Person', {KNOWS:
{orientation:"UNDIRECTED"}});
Next, we will execute the algorithm for counting triangles.
CALL gds.alpha.triangleCount.stream('base-grid')
YIELD coefficient
RETURN avg(coefficient) AS averageClusteringCoefficient

A result of 0.44 means that, on average, only about half of the acquaintances of a particular person also know each other. This level of neighbor overlap can result in relatively long paths compared to what we might expect in a network where most nodes are of degree 8.
Let’s test this idea. Let’s call the procedure allShortestPaths
to find the shortest path between every two nodes in the graph. Next, we summarize the results.
CALL gds.alpha.allShortestPaths.stream('small-world')
YIELD sourceNodeId, targetNodeId, distance
RETURN max(distance) AS maxDistance,
min(distance) AS minDistance,
avg(distance) AS avgDistance,
count(distance) AS pathCount;

I was impressed that my free sandbox server was able to compute 809,100 paths in about a second. The diameter of a graph is the longest of the shortest paths between nodes. The diameter of 29 transitions in this graph is quite logical. No two nodes will be further apart than those located on opposite edges of the grid.
Consider the distribution of path lengths.
CALL gds.alpha.allShortestPaths.stream('base-grid')
YIELD sourceNodeId, targetNodeId, distance
RETURN distance, count(*) AS pathCount
ORDER BY distance;
Here is the chart with the results.

Now let’s modify the graph by adding random relationships between nodes that are not adjacent in the grid. You can compare these weak ties to relationships like this: there are some people you know who fall outside of your main circle of friends.
Let’s use the procedure apoc.periodic.iterate
to iterate over all nodes and add two nonadjacent links to each node.
CALL apoc.periodic.iterate(
'MATCH (p1) RETURN p1',
'MATCH (p2)
WHERE p1 <> p2 AND NOT EXISTS((p1)--(p2))
WITH p1, p2, rand() AS rand
ORDER BY rand LIMIT 2
MERGE (p1)-[:KNOWS {adjacent:False}]->(p2)',
{batchSize:1}
)
Let’s take another look at Person at coordinates 5, 5. This node now has ten links. Eight of them in the grid are adjacent, and two of the new links are non-adjacent.
MATCH (p:Person {row:5, column:5})--(n)
RETURN *;

Let’s create a new graph in memory, which we will analyze using the Graph Data Science Library.
CALL gds.graph.create('small-world', 'Person',
{KNOWS:{orientation:"UNDIRECTED"}});
What is the average clustering factor for the new graph?
CALL gds.alpha.triangleCount.stream('base-grid')
YIELD coefficient
RETURN avg(coefficient) AS averageClusteringCoefficient

In the new column, on average, there are 18% of people who know the person in question and at the same time know each other. With two new friends per person and less overlap between friends, there should be shorter paths between nodes.
Let’s run it again allShortestPaths
and check the statistics.
CALL gds.alpha.allShortestPaths.stream('small-world')
YIELD sourceNodeId, targetNodeId, distance
RETURN max(distance) AS maxDistance,
min(distance) AS minDistance,
avg(distance) AS avgDistance,
count(distance) AS pathCount;

The diameter of the graph has sharply decreased from 29 to 5, and the average distance has been reduced to only 3. Indeed, the world is small!
Consider a new distribution of path lengths.
CALL gds.alpha.allShortestPaths.stream('small-world')
YIELD sourceNodeId, targetNodeId, distance
RETURN distance, count(*) AS pathCount
ORDER BY distance;

There is a very small percentage of such people who are five steps apart from each other. The vast majority of shortest paths now involve three or four hops.
We started with a network where everyone only knew their direct neighbors. By adding two non-adjacent links for each node, we drastically reduced the graph diameter and the average length of the shortest path. You can re-run the code with more or less nonadjacent links for each node and see how the results change. Easley and Kleinberg show that the “small world” model develops even when not every person has non-contiguous connections. By creating a single nonadjacent link for a subset of nodes, it is still possible to significantly reduce the length of paths between nodes.