Classic Computer Science problems in Python. Book Review

Hello, Habr!

One of our most interesting Python books over the past year has been “Classic Computer Science Problems in Python“by David Kopec.

For those who have not yet had time to familiarize themselves with this book, we offer its review, written according to the original edition in October 2019. You can also check out a small discussion at Reddit… Also, everyone can express their opinion on the additional printing – for this, at the end of the article there is a voting ballot.

I really liked the book “Classical Computer Science Problems in Python” by David Kopec. It really addresses a lot of various problems, detailed information on which I have not come across before. Just a few examples: neural networks, constraint satisfaction problems, genetic algorithms, minimax algorithm. Unlike many other books on algorithms and programming problems, this book shows complete (but small) programs that you can easily explore on your own.

I enjoy learning algorithms. I worked through the book “Programmer career», Took several MOOC courses, in particular,”Design and Analysis of Algorithms“from professors Tim Rafgardena… Nevertheless, in Kopec’s book there were also such algorithms, the mention of which I have not seen before.

CHAPTERS I LIKED THE MOST

Neural networks… My favorite section in the book is about neural networks. The author describes the creation of a full-fledged neural network. Meanwhile, he describes how all its parts work – in general and in particular. Describes how neurons and layers are arranged, how the activation function works, how backpropagation is used during training, how weights are corrected.

Finally, the neural network is used as an example of Fisher’s irises. This is a classic dataset, compiled in the 1930s, with 150 specimens of flowers belonging to different types of irises (50 specimens each). Each specimen is accompanied by four numerical values: the length of the outer perianth lobe; the width of the outer perianth lobe; the length of the inner perianth lobe; the width of the inner perianth lobe. The values ​​are normalized first. Then a three-layer mesh is created. There are four neurons in the input layer (one for each category of input values ​​from the sample), in the hidden layer there are six neurons, and in the output layer there are three neurons (one for each type of iris considered).

140 out of 150 samples were randomly selected, which were then used to train the network. Training is run over 140 samples in 50 iterations. The network is then used to predict which of the three iris species the remaining 10 samples belong to. In most cases, the network correctly identifies at least 8 out of 10 samples, and quite often, all 10.

I really liked this experience: to program an entire neural network from scratch without resorting to machine learning libraries. I experimented with this program for some time (all the code for the book is posted on Github). For example, I have printed a printout of all weights in all neurons after the network has been fully trained. I wanted to see if there would be any similarity between the weights between different runs. The weights turned out to be strikingly different from run to run, although the predictive success in all cases remained similar. Maybe this is an elementary truth, but I was very interested in seeing this for myself.

Adversarial search… In this chapter, we are introduced to the minimax algorithm. He finds the best move in a zero-sum game with two opponents. The idea is to analyze potential moves, speaking on behalf of one or the other opponent; each opponent reacts to the last of the moves made until the game is over (or the maximum recursion depth is reached). In the first example from the book, the AI ​​is playing tic-tac-toe. In this case, the entire sequence of moves can be fully analyzed, since the size of the playing field is only 3 by 3 cells.
As in the other chapters, the general structure is first developed here, which is then discussed in relation to complex cases. After the example with tic-tac-toe, the game “Four in a row“. In this case, the assessment of moves is much more difficult than in “Tic-Tac-Toe”, so here the depth search is limited to three moves. I turned to this chapter a lot, since I had never seen a description of the minimax algorithm before.

Satisfaction Tasks… A general framework is also developed here, which is then used to solve several problems. At the heart of this structure is recursive backtracking. The first problem solved in this chapter is the task of coloring the map of Australia. Is it possible to get by with just three colors and color the map so that adjacent areas on either side of any border between regions are different in color? (Answer: yes). The second task is to place eight queens on the chessboard, making sure that no queen is under attack from any other. The third task is to arrange the word list in a grid to form a square for searching words. Finally, the structure is used to solve the classic crypto-arithmetic puzzle SEND + MORE = MONEY. The goal is to find out which digit each letter corresponds to so that the resulting arithmetic equality is correct.

I liked this chapter because it contains a lot of very diverse examples, and also because I have not previously used this approach for solving such problems (at least systematically).

Genetic algorithms… Before I read this chapter, I only had a very general idea of ​​how genetic algorithms work. The code for this chapter demonstrates a chromosome class that is instantiated by randomly selected genes. A chromosome can assess its own adaptability to the environment and implements crossbreeding (a combination of itself with another chromosome of the same type), and also mutates (makes small, completely random changes in itself).

The algorithm starts with a random population. In each generation of this population, the algorithm checks (using the fitness function) if there is a sufficiently good solution (fitness is above a given threshold value). If there is no such solution, then a new generation is created through repeated operations of crossing pairs of individuals (the higher the fitness of each individual individual, the more likely that these individuals will go into action). Further, several individuals are selected for mutations. Now these new individuals give rise to a new population and, again, their fitness functions are tested. The cycle repeats for a given number of generations.

The following tasks are considered as examples: maximizing a mathematical expression, the “SEND + MORE = MONEY” puzzle mentioned above, and ordering the items in a list to minimize the size of the list when compressed. The merit of this chapter, like many others, is that all concepts are shown in the context of a compact yet complete solution.

Search… The algorithms in this chapter were not new to me (except for A *), but the chapter is still interesting because of the examples it contains. The author demonstrates the principles of Breadth First Search and Depth First Search using the example of exiting a maze. Labyrinths are ordinary 9×9 grid cells, in which obstacles are randomly placed. Displayed on the screen using only ASCII characters. With the help of algorithms, a path through the grid is found, going diagonally, while avoiding obstacles. The path found in this way is also drawn through the maze.

You can easily change these programs to test how the algorithms work in different scenarios. After getting familiar with Breadth First Search, the author talks about A *. The difference is that the paths are drawn using a heuristic (Euclidean distance), which is used as a guideline and tells you which paths to draw first. The final example in this chapter uses a search function to help you get three missionaries and three ogres in a canoe across a river. The limitation is that the number of cannibals cannot exceed the number of missionaries either on the shore or in the boat, otherwise the cannibals will eat the missionaries. I think this is a very clever and interesting application of the search algorithm.

OTHER CHAPTERS

Other chapters contain algorithms that I am already familiar with.
Simple tasks (first chapter). The first chapter contains many small examples. It first describes the different ways to calculate Fibonacci numbers (in particular, recursive, with memoization, iterative, and with a Python generator). The next section discusses bit manipulation for compression and XOR ciphers. Finally, we will need recursion in this chapter to solve the Hanoi Towers problem.

Graph problems… Chapter 4 discusses graph algorithms for finding the shortest path and minimum spanning tree, using a map of the United States as an example. Dijkstra’s algorithm is also considered.

K-Means Clustering… This chapter shows you how to group data points into a predetermined number of clusters based on the relative distance from each point to the center of the cluster. The examples in this chapter were not as interesting to me as the examples in the other chapters.

Other tasks… The final chapter deals with the knapsack problem, the traveling salesman problem, and the mnemonics of telephone numbers.

WHAT WOULD YOU ATTACH

All programs use Python’s type hints. I have an ambivalent attitude to such tips. On the one hand, types contribute to a better understanding of the program. But on the other hand, I am not used to seeing them in Python, and sometimes I had to understand these hints first before I started to understand the logic of the program.

If you want to complain about a section, then the one that talks about dynamic programming. In my opinion, it lacks completeness. Dynamic programming is a tricky thing, and if you are going to use it in real-world solutions, you will need additional examples on this topic.

CONCLUSION

The main reasons why I really liked this book are the breadth of the algorithms considered, full-fledged (at the same time, compact solutions) that are easy to investigate on their own, as well as interesting examples selected to illustrate the algorithms.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *