Part 1 – Breadth First Search
Let's imagine that you got a job at 2GIS many years ago and you had the honor of writing an algorithm that will plot the shortest car route from point A to point B.
You go looking for information and fortunately you come across this article, where we will discuss the following topics:
In the first part:
What are graphs, how to read and compose them
How does the breadth first search algorithm work?BFS)
What is a two-way queue (module
deque
)
In the second part:
Before we continue, I want to invite you to my telegram channel Don Pythonwhere I share interesting features of the language, non-obvious tricks and solve various problems.
Introduction to Graphs
The article will not contain a full-fledged graph theory. If you want to dive deeper than a general definition with examples, I encourage you to check out this comprehensive resource.
Graphs are mathematical structures that consist of vertices (or nodes) and edges connecting them. There are different types of graphs: directed (where the edges have a direction) and undirected (where the direction does not matter), weighted (with edge weights) and unweighted.
In the process of analyzing the algorithms, we will discuss undirected unweighted and weighted graphs. And in this part we will make do with an undirected and unweighted graph:
Using this graph, we simulate a map of a small area, where the circular shapes are nodes and the lines connecting them are edges. It is also not difficult to guess that the nodes imitate buildings (places), and the edges imitate roads.
Let's ask ourselves: if someone wants to get from the node Home to the node Theaterwhat will be the shortest route for the upcoming journey? Based on the available data, which does not take into account the directions and lengths of roads, we can say that the shortest route will be like this:
[Home, Park, Cafe, Theater]
Since edges in this graph have no weight, we assign a unit to each edge, which tells us that in order to go to a neighboring vertex it is necessary to overcome a certain route unit. Thus, we can “drive” to the theater in two ways:
[Home, Park, Cafe, Theater]
– 3 ribs[Home, Park, Museum, Shop, Theater]
– 4 ribs
Now we see that the route highlighted in the image is shorter than the other possible route by one.
Visually this is quite easy to determine, but how to represent it using code?
Now we have a theoretical minimum for analyzing the first algorithm.
Breadth First Search (BFS)
BFS is a classic algorithm for working with unweighted graphs.
Breadth-first search is often used in problems related to finding the shortest path, analyzing connections, or studying possible transition options in various systems.
Our first case is just an unweighted graph in which we are looking for the shortest path between vertices.
Before moving on to the solution, let's see visually how the graph is traversed:
To work with a graph, we need to somehow represent it as an object in code. In our case it will be a dictionary:
city_map = {
'Home': ['Park', 'School', 'Mail'],
'Park': ['Home', 'Museum', 'Cafe'],
'School': ['Home', 'Library', 'Mail'],
'Mail': ['Home', 'School', 'Hospital'],
'Library': ['School', 'Hospital'],
'Hospital': ['Library', 'Mail', 'Office'],
'Cafe': ['Park', 'Theater', 'Office'],
'Museum': ['Park', 'Shop'],
'Shop': ['Museum', 'Theater'],
'Theater': ['Shop', 'Cafe'],
'Office': ['Cafe', 'Hospital']
}
This representation is convenient for storing connections between the vertices of a graph, where the keys are the vertices and the values are the lists of neighboring vertices to which they are connected.
Let's write a function to find the shortest path from Home to Theaterand then we will analyze the final code step by step.
from collections import deque
def bfs_shortest_path(city_map, start, goal):
queue = deque([[start]])
visited = set()
while queue:
path = queue.popleft()
node = path[-1]
if node in visited:
continue
visited.add(node)
if node == goal:
return path
for neighbor in city_map.get(node, []):
new_path = list(path)
new_path.append(neighbor)
queue.append(new_path)
return None
city_map = {...}
start="Home"
goal="Theater"
shortest_path = bfs_shortest_path(city_map, start, goal)
print("Кратчайший путь от Home до Theater:", shortest_path)
# >>> Кратчайший путь от Home до Theater: ['Home', 'Park', 'Cafe', 'Theater']
Let's discuss what's going on.
Two-Way Queue in Python
To work, we import the class deque
built in python module collections
. You can read in detail here Here. I’ll briefly explain why we need this.
Class deque
(from “double-ended queue” – two-way queue) in python convenient for working with a collection of elements where add and delete operations elements at the beginning and at the end list. Regular lists are slow to do this because the elements have to be shifted and deque
optimized for such operations, performing them in O(1).
from collections import deque
# Создаем deque и добавляем элементы
queue = deque([1, 2, 3])
queue.append(4) # добавляет 4 в конец
queue.appendleft(0) # добавляет 0 в начало
print(queue) # deque([0, 1, 2, 3, 4])
# Удаляем элементы
queue.pop() # удаляет последний элемент
queue.popleft() # удаляет первый элемент
print(queue) # deque([1, 2, 3])
You don't need to know more for the current task.
Initializing a function
After import, we create a function that takes in arguments graph, start and end nodes. In the first lines, we initialize a queue to store paths, a set to add visited nodes, and for now we simply return None.
def bfs_shortest_path(city_map, start, goal):
queue = deque([[start]])
visited = set()
return None
In the example with
deque
we passed a one-dimensional list, but here we pass a two-dimensional one. In order not to get confused, do not pay attention to this for now, then everything will become clear.
while loop
We cannot say exactly how many iterations we will need to achieve the desired result. Therefore, the function will work until it goes through all possible paths, or until it reaches the goal. For this we will use a loop while
.
while queue:
As long as there are objects in the queue, the loop runs. We'll find out why the line should end later.
Now we need to perform three actions and two checks:
1. We extract the first path from the queue:
while queue:
path = queue.popleft()
In the first iteration, the path is extracted [Home]
since in our case this is the starting node and other paths are still unknown. Next, the following paths will be added to check. Here it becomes clear why we created a queue with a two-dimensional list: a path often consists of a sequence of elements, so we need a list of lists (paths).
2. Let's get the last node in the current path:
while queue:
path = queue.popleft()
node = path[-1]
We have extracted (removed from the queue and assigned to a variable path
) the first path from the queue. From this path we extract (without changing the path) the last node. The latter, because as seen in the image, a graph is a data structure that branches from top to bottom. Breadth-first search traverses all nodes and at each iteration starts from the outermost (last) accessible one.
3. Check whether the node has been visited:
while queue:
path = queue.popleft()
node = path[-1]
if node in visited:
continue
If the node has been visited, we do nothing more and move on to the next iteration. Why can some vertices appear twice? Return to the graph image and notice that, for example, the node School is a neighbor of two higher or equal nodes: Home And Mail. Since the node Home has a direct connection to the node Schoolafter the first iteration of the loop while
the queue will look like this:
deque([['Home', 'Park'], ['Home', 'School'], ['Home', 'Mall']])
When the loop processes the path at index 1, node School will be processed. But when it comes to the knot Mailthe following path will be queued:
['Home', 'Mall', 'School']
When processing this path, the loop will have to deal with the School node again. But since this node has already been visited, at this stage we ignore further actions using the condition.
4. Mark the visited node:
while queue:
path = queue.popleft()
node = path[-1]
if node in visited:
continue
visited.add(node)
Now that we have visited a node, we need to add it to the set for visited nodes so as not to return to it again.
5. Compare the current node with the target node:
while queue:
path = queue.popleft()
node = path[-1]
if node in visited:
continue
visited.add(node)
if node == goal:
return path
If the current node is equal to the target node, then the task is considered completed and the function should return the current path.
For loop Adding a path to the queue.
If the while loop has not completed and all checks have passed, the function must process all neighbors of the current node. Here we will use a for loop since the number of iterations depends on the number of neighbors.
def bfs_shortest_path(city_map, start, goal):
...
while queue:
...
for ...
...
1. Let's go through the list of neighbors:
def bfs_shortest_path(city_map, start, goal):
...
while queue:
...
for neighbor in city_map.get(node, []):
...
To access the current node in the dictionary city_map
we use the method get()
. Why don't we contact you directly? Method get()
guarantees safe handling. If suddenly the node is not in the dictionary, then the code will not fail with an error, but instead of the requested value it will simply return an empty list.
2. Create a new path and add the following node to it:
def bfs_shortest_path(city_map, start, goal):
...
while queue:
...
for neighbor in city_map.get(node, []):
new_path = list(path)
new_path.append(neighbor)
...
path
represents the current path from the start node to node
. This is conditionally static data within the scope of the algorithm, so it cannot be changed directly. To avoid changes, create a new path and copy the current one into it. This allows us to extend the path for each neighbor without changing the original path.
Next we add the current neighbor neighbor
to the end of the path copy new_path
. Now new_path
represents the path from the starting node to neighbor
through node
.
3. Add a new path to the queue:
def bfs_shortest_path(city_map, start, goal):
...
while queue:
...
for neighbor in city_map.get(node, []):
new_path = list(path)
new_path.append(neighbor)
queue.append(new_path)
...
After adding new_path
V queue
BFS will be able to continue the search starting with the new neighbor in the next iteration.
The function is ready. All that remains is to call it with the necessary parameters.
Calling a function
Let's determine the beginning and end of the path and call the function:
def bfs_shortest_path(city_map, start, goal):
...
start="Home"
goal="Theater"
shortest_path = bfs_shortest_path(city_map, start, goal)
print("Кратчайший путь от Home до Theater:", shortest_path)
# >>> Кратчайший путь от Home до Theater: ['Home', 'Park', 'Cafe', 'Theater']
I suggest going back to the full code, fixing the solution, rewriting the code and playing with different graphs and input data.
Comment
If there are multiple identical short paths in the graph, the algorithm will return the first short path found. If you know how to improve the solution to return all short paths, share your answer in the comments.
Speaking of our example, if you start your search from Mail to Theaterthen the function will return the following path:
['Mail', 'Home', 'Park', 'Cafe', 'Theater']
Although there is another way, which at first glance is more obvious:
['Mail', 'Hospital', 'Office', 'Cafe', 'Theater']
This was the first part. In the next part we will look at a more interesting case and will work with an undirected weighted graph and apply Dijkstra's algorithm to find the shortest path.
Good coding!
Telegram channel about development in python – Don Python