“Deep reinforcement training. AlphaGo and other technologies ": the announcement of the book

Hello!

We have one of the best books on reinforcement training available for preorder, originally called "Deep Reinforcement Learning Hands-on" by Maxim Lapan. Here is the cover of the Russian translation:

So that you can appreciate the summary of the book, we offer you a translation of the review written by the author to the release of the original.

Hello!

I am a self-taught enthusiast, keen on deep learning. Therefore, when representatives of the Packt publishing house contacted me and suggested writing a practical book about the current state of deep learning with reinforcement, I was a little scared, but after some hesitation I agreed, optimistically assuming: “Oh, there will be an interesting experience.”
I will not say that this work was given to me as an easy walk, of course not. You have no days off, no free time, constant fear of “freezing stupidity" and the pursuit of deadlines for each chapter (two weeks per chapter and example code). However, in general, everything went positively and very interestingly.

Before briefly describing the contents of each chapter, let me describe to you the idea of the whole book.
When I started experimenting in RL more than four years ago, I had at my disposal the following sources of information:

Sutton and Barto Book of Reinforcement Learning: An Introduction
Scientific articles at arxiv.org
The course of David Silver.

Maybe there was something else, but these were the most important sources of information. All of them are very far from practice:

The book by Sutton and Barto, also known as “The RL book,” provides only the theoretical foundations of this discipline.
RL-related articles are published almost daily, but still rarely contain links to specific code. Only formulas and algorithms. If you are lucky, hyper parameters will be indicated.
David Silver's course was taught at University College London (UCL) in 2015. It gives a very good overview of the methods that existed at that time, allowing them to be intuitively mastered, however, here the theory again prevails over practice.

At the same time, I was deeply hooked on the article DeepMind (“A neural network can learn to play Atari games in pixels! WOW!”), And I felt that this dry theory hides great practical value. So, I spent a lot of time studying the theory, implementing various methods and debugging them. As you probably guessed, it was not easy: you can spend a couple of weeks honing the method and then discover that your implementation is incorrect (or, even worse, you misunderstood the formula). I do not consider such training a waste of time – on the contrary, I think that this is the most correct way to learn something. However, this takes a lot of time.

Two years later, when I started working on the text, my main goal was: to give thorough practical information on RL methods to a reader who is only acquainted with this fascinating discipline – as I once did.

Now a little about the book. It is focused primarily on practice, and I tried to minimize the volume of theory and formulas. It contains key formulas, but no evidence is given. Basically, I try to give an intuitive understanding of what is happening, not seeking the maximum rigor of presentation.

At the same time, it is assumed that the reader has basic knowledge of deep learning and statistics. There is a chapter in the book with an overview of the PyTorch library (since all examples are given using PyTorch), but this chapter cannot be considered a self-sufficient source of information on neural networks. If you have never heard of the loss and activation functions before, start by studying other books, today there are many. (Note: for example, the book "Deep Learning").

In my book you will find a lot of examples of varying complexity, starting with the simplest (method Crossentropy in the environment Cartpole contains ~ 100 lines in python), ending with rather big projects, for example, learning AlphGo Zero or an RL agent for trading on the exchange. Sample code is fully uploaded to GitHub, there are more than 14k lines of Python code in total.

The book consists of 18 chapters covering the most important aspects of modern deep learning with reinforcement:

Chapter 1: Contains background information on the reinforced learning paradigm, demonstrates how it differs from learning with and without a teacher. Here we consider the central mathematical model related to reinforcement learning: Markov decision-making processes: (MPPR). Acquaintance with MPNR was made step-by-step: I talk about Markov chains, which are transformed into Markov processes of reinforcement (with the addition of a component of reinforcement) and, finally, into full-fledged Markov decision-making processes, where the agent’s actions are also taken into account in the overall picture.
Chapter 2: talks about OpenAI Gym, a generalized API for RL, designed to work in a variety of environments, including Atari, solving classic problems, such as CartPole, continuous learning tasks, etc.
Chapter 3: gives an express overview of the PyTorch API. This chapter was not intended as a complete guide to DL, however, it lays the foundation for understanding further chapters. If you use other tools for solving deep learning problems, then it should serve as a good introduction to the beautiful PyTorch model, so that it is easier for you to understand the examples from the following chapters. At the end of this chapter, we will teach a simple GAN that will generate and distinguish Atari screenshots from different games.
Chapter 4: considers one of the simplest and most powerful methods: CrossEntropy. In this chapter, we will teach you the first network that can solve problems in the CartPole environment.
Chapter 5: This chapter begins second part of the bookdevoted to the iteration algorithm of values. Chapter 5 discusses a simple way of spreadsheet training using the Bellman equation to solve problems in the FrozenLake environment.
Chapter 6: In this chapter, we introduce the DQNs that play the Atari game. The architecture of the agent is exactly the same as in the famous article DeepMind.
Chapter 7: Explores several advanced DQN extensions to help improve the stability and performance of the underlying DQN. In this chapter, the methods from the article “Rainbow: Combining improvements in Deep RL”; all of these methods are implemented in the chapter, and, I explain the ideas underlying them. These methods are: N-step DQN, dual DQN, noisy networks, priority playback buffer, duel networks and category networks. At the end of the chapter, all methods are combined into a common code example, exactly as it was done in the “rainbow article”.
Chapter 8: describes the first medium-sized project, illustrating the practical side of RL in solving real-world problems. In this chapter, using the DQN, an agent is trained to perform operations on the exchange.
Chapter 9: begins with this chapter the third part books on gradient policy techniques. In it we get acquainted with such methods, their strengths and weaknesses in comparison with the methods of enumeration by values already considered above. The first method in this family is called REINFORCE.
Chapter 10: Describes how to deal with one of RL's most serious issues: policy gradient variability. After experimenting with basic PG levels, you will become familiar with the actor-critic method.
Chapter 11: talks about how to parallelize the “actor-critic” method on modern hardware.
Chapter 12: The second practical example that describes the solution to problems associated with natural language processing. In this chapter, we teach a simple chatbot to use RL methods on the material of the Cornell cinema dialog box.
Chapter 13: Another practical example on web automation: MiniWoB is used as a platform. Unfortunately, OpenAI refused to use MiniWoB, so finding information about it is difficult (here and here are some grains). But the idea of MiniWoB itself is brilliant, so in this chapter I show how to configure and train the agent to solve some of the problems associated with it.
Chapter 14: the last begins with it, fourth part books dedicated to more advanced methods and techniques. Chapter 14 focuses on continuous management tasks and describes the A3C, DDPG, and D4PG methods for solving problems in some PyBullet environments.
Chapter 15: Tells in more detail about the problems of continuous management and introduces you to the phenomenon of the trust region (Trust region) on the examples of the methods TRPO, PPO and ACKTR.
Chapter 16: devoted to teaching methods with reinforcement without gradients (working on the principle of "black box"); they are positioned as more scalable alternatives for the DQN and PG methods. Evolutionary strategies and genetic algorithms are applied here to solve several problems of continuous control.
Chapter 17: examines model-based RL approaches and describes DeepMind's attempt to fill the gap between model-based and non-model-based methods. This chapter implements the I2A agent for Breakout.
Chapter 18: The last chapter of the book discusses the AlphaGo Zero method used when playing Connect4. Then the finished agent is used as part of the telegram bot to check the results.