How the paths branch. About orientation and training of unmanned vehicles

  1. For pattern recognition in clear weather, at night, in rain or fog

  2. To recognize and predict the maneuvers of another driver, including dangerous ones

  3. To recognize children, animals and disabled people, in particular wheelchair users

  4. To recognize other cars from unusual angles

  5. To combat malicious techniques specifically aimed at disorienting an unmanned vehicle. Examples of such “adversarial attacks” are sticking bright reflective stickers on a road sign, wearing images of road signs on clothing (especially touches the STOP signhaving a characteristic shape regardless of what alphabet the inscription is written on it).

Taken together, these requirements mean that machine vision must be implemented in the image and likeness of human vision.

The orientation of self-driving cars is provided by machine learning algorithms, which currently allow safe driving only when trained on extensive specific datasets. In such datasets, it is extremely difficult to take into account all edge cases and prescribe the procedure for responding to inappropriate actions of other drivers in the flow of traffic.

Trying to solve precisely this range of problems, the group under the direction of Esheda On-Bara of Boston University proposed two innovations. Firstly, limit the set of data that the car will rely on at a particular point in time, and secondly, focus not on traditional machine learning, but shift it towards reinforcement learning: so that the car masters road traffic, much like a child learning to walk – imitating others. Such learning allows the car to generalize the road maneuvers made by other cars and pedestrians, evaluate traffic from “different points of view” and identify blind spots. In this case, the car is guided by the map from the navigator, but at the same time builds an up-to-date map of its immediate surroundings, analyzes how other cars turn, overtake and give way.

The proposed algorithm allows the car to detect and avoid obstacles, as well as distinguish other cars from pedestrians. In effect, the car extrapolates the “points of view” of other road users and translates them into its own coordinate system.

In 2021, Eshed On-Bar and his graduate student Zimuyan Zhang tested models of self-driving cars in two virtual cities. One of them was similar in properties to a “learning” environment. In particular, there were no sharp turns. In the second case, not only complex intersections (up to five routes) were provided, but also unexpectedly arising obstacles. However, in this simulation, the car successfully arrived at its destination 92% of the time. A fragment of such a simulation is shown below.

However, the convolutional neural networks that underlie such algorithms do not remember the past and, accordingly, do not accumulate experience, regardless of how many times a car has driven on a particular road. This problem becomes even more complicated in case of situational reduced visibility due to bad weather.

Since 2020, researchers from Cornell University (working in the College of Information and Computing Science and the College of Engineering) have published several papers on tools designed to specifically target such memories in a self-driving car. More precisely, we are not talking about memories, but about building a retrospective (“hindsight”) based on a 3D point cloud. Such a system is described in the article “Hindsight is 20/20: leveraging past traversals to aid 3d perception“, in which previously traveled routes (past traversals) serve as the training dataset for the car. Tool source code posted on Github. During the training run, the car needs to be controlled, but through the keyboard and from the back seat, as one of the experiment participants, Carlos-Diaz Ruiz, demonstrates:

The training dataset was collected in the vicinity of Ithaca, New York, completing 40 races along a 15-kilometer circular route over a year and a half. Despite the fact that, along with optical sensors, the car is equipped with a lidar, these runs showed significant inaccuracy in recognizing atypical objects from afar. So, if the car “saw” a tree with an irregularly shaped crown, then during the first scan it could easily mistake this tree for a pedestrian, but it corrected its mistake by driving closer. Therefore, the researchers supplemented the generated dataset with images taken from other cars that also drove along this route.

The resulting data set, called Ithaca365, included more than 600,000 images, in which special attention is paid to pictures that capture different weather conditions (snowy road, rain, fog). Here is one of the pictures as an example:

The HINDSIGHT algorithm uses neural networks to construct images of objects that a car passes by. Then the descriptions of these objects are reduced in dimension (some of the features are removed), an approach the group called “SQuaSH” (Spatial-Quantized Sparse History, “sparse spatially quantized history”). The resulting simplified images are plotted on a virtual map. Apparently a very similar principle embedded in the work of human memory.

Also, all points taken by the lidar on a given route are stored in the local SQuaSH database, and only such positional information is fully “remembered” by the machine. This database can be continuously updated and shared between any vehicles that have the HINDSIGHT+SQuaSH combination installed.

The described approaches seem promising even not so much for
cars, how many for rail public transport and warehouse
mobile robots (in conditions where environmental variability is reduced to
minimum). However, such “passive learning” is clearly not enough for a full-fledged
driving, and next we will look at a more advanced approach – active learning.

Active learning

This subcategory of machine learning, a type of supervised loop learning in which the algorithm itself can query the data source for new information and independently label the data. Active learning can significantly simplify the preparation of a training dataset, that is, it greatly reduces the required minimum of data that needs to be tagged manually. Active learning is a progressive improvement of the dataset, and it is also easy to automate. Here's how standard supervised learning and active learning differ:

Obviously, even this approach is impossible to implement without the participation of a person (teacher), but, as discussed above using the Ithaca365 dataset as an example, a pioneer car can collect the primary set of data almost independently, only under the supervision of a graduate student. When using active learning when working with self-driving cars, the most important thing is not to make a mistake with the size of the dataset that needs to be marked manually. It is also important to set the threshold of uncertainty in interpretation, after which the machine must request new data. This approach is called “uncertainty sampling” and is implemented using three main approaches (more details here).

  • Least Confidence. With this approach, a certain (high) confidence threshold is specified, and if the model encounters data in the interpretation of which it is confident below the specified threshold (for example, less than 99.9%), it requests new data about the object and proceeds with it. analysis.

  • Minimum indentation. This approach is intended to correct the shortcomings of the first one, and provides two labels for each data element: “most likely” and “second version”. The model can navigate by two marks at once and logically come to a conclusion about what kind of object is in front of it.

  • Entropy. Measuring the degree of uncertainty for each individual variable. In this case, the degree of (un)confidence in interpreting data becomes a spectrum. The model determines what exactly a given object is most likely to be, requests new data and continues to “develop” exactly the “version” that was assigned the maximum probability in the previous step.

Active learning applied to self-driving cars

The approaches described above are largely statistical, and therefore even active learning will fail when interpreting edge cases or illogical behavior on the road (from pedestrians or reckless drivers). It is relatively easy to train AI to follow standard scenarios and recognize objects that appeared in the training dataset. But we simply will not find even the minimum initial set of objects that occur in such situations:

Complications may arise from unusual road obstacles, exotic animals, or “humorous” road signs. In addition, the algorithm may fail when interpreting a tow truck or trailer for transporting passenger cars, to the point that it will mistake such images for an attempt at an adversarial attack (combining two or more vehicles in one picture).

When using active learning, both the system itself and its development become significantly more complicated. The learning process itself also becomes more complicated, although the learning result is of higher quality, which justifies the additional complexity. However, the scope for bugs in the code or errors in the interpretation of results is expanding. It also takes longer to get the first results than when learning with a teacher.

Above, I pointed out how important it is to take into account at least some edge cases during active learning. If the model's sensitivity to such cases is increased even slightly, then it will, in principle, focus on outliers. The model will continually “request clarification” on outliers; as a result, they will increasingly penetrate into the dataset marked by a person, and the entire dataset will deteriorate because of this.

Here we move on to the most sensitive aspect of training self-driving cars – how to teach the car to distinguish pedestrians, especially in the dark or in difficult weather conditions. In 2020, NVIDIA held study, which confirmed that with active training, the model learns to detect pedestrians in night conditions three times better than when marking the training dataset manually. But the range of situations is not limited to pedestrian detection. For example, how will a self-driving car interpret Men at work?

If a human driver sees that the highway suddenly narrows because two of the four lanes are blocked, then he will immediately assume that there is an accident or road work ahead. Consequently, a traffic jam may already be forming ahead, and if there is not one yet, then there are likely to be obstacles, construction equipment and living people ahead on the road surface, engaged in eliminating the consequences of an accident, road work or checking documents. Moreover, returning to the example with the Ithaca365 dataset, such a situation may well arise on a well-known route that you have been traveling on every day for a year, but have not observed anything like this before.

But for an unmanned vehicle, such a “data outlier” is not so bad. It is much more important that the car will have to react to changing traffic patterns (changing lanes of other cars in the stream), as well as to human gestures. The gesticulation of pedestrians also plays a decisive role in simpler situations: for example, a person approaches an unregulated pedestrian crossing, estimates the distance to an approaching car and decides to let this car pass, performing at least one of three actions:

1) Pausing, while monitoring the driver’s reaction – will he slow down to allow a pedestrian to pass?

2) Waving to the driver with his hand, in which case such a gesture will mean not “hello”, not “stop”, but “pass by”

3) Approaching the curb, he will take his smartphone out of his pocket and start scrolling through the feed, thereby demonstrating that he is not going to cross the road right now.

An interesting study of this kind spent in 2017, company specialists Cruise. The company develops software for self-driving cars and has its own fleet of 200+ such vehicles. San Francisco, located in a rugged, hilly area, was chosen as the test site, which is partly why the city is characterized by extremely difficult traffic and high population density.

Cruise has added technology to the software for its cars motion capture, initially becoming interested in this incident: let’s say one pedestrian “votes”, trying to catch a passing car, and a second pedestrian is walking next to him. The second pedestrian noticed an acquaintance on the opposite side of the street and waved to him. How would a self-driving car differentiate between these two gestures?

Enlisting the help of 3D artists and game designers, the company collected a dataset of dozens of gestures and poses that were imitated by guest actors and recorded in the machine’s memory.

Motion capture systems fall into two main categories—optical and non-optical. At optical approach multiple cameras are used, evenly distributed around the scene being filmed. Based on the streaming video coming from these cameras, it is possible to calculate the position of markers located on the suit with high accuracy (using the triangulation method). Even facial expressions are captured in such photographs. It was this technology that was used to simulate the movements of Smaug and Gollum in films based on the Lord of the Rings universe, as well as the movements of the natives in films from the Avatar series. However, this approach is only feasible in a studio, so Cruise opted for a non-optical (touch) option. This technology is based on working with microelectromechanical systems (MEMS) – portable, wireless and not requiring any studio. Such systems are also incorporated into the suit, which contains 19 sets of sensors connected to the head, torso and limbs. Each package, about the size of a small coin, contains an accelerometer, gyroscope and magnetometer. One copy for the entire suit includes a battery (on the belt), a data bus and a Wi-Fi module.

This approach made it possible to teach the car to recognize a variety of actions, including:

1) Trying to catch a taxi

2) Scrolling your phone while walking

3) Exit to the roadway if the asphalt is blocked due to construction work

4) The gesture with which the parking attendant indicates exactly where the car should park

Conclusion

All the experiments described here turned out to be successful at least enough to become material for scientific work, proof-of-concept or further development. I can assume that these developments are more applicable to relatively slow-moving freight transport or to mobile robots serving warehouses, airports or shopping centers. In addition, given that the machine quite confidently navigates a route traveled many times, such a robot could replace a person in hard-to-reach places, for example, in caves or under water. But for now there are clearly more problems than solutions, and I invite you to discuss them in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *