Samsung Moscow Center for Artificial Intelligence in employee stories
Below are mini-interviews of the center’s staff – speakers of the annual Forum on Artificial Intelligence, which was held at the Center in December last year. We interviewed colleagues from two laboratories: a computer vision and visual modeling laboratory and a multimodal data analysis laboratory.
About Samsung AI Center
Samsung invests in research and development ~ 8% of annual sales revenue – this is one of the leading indicators in the world. The company has the largest portfolio of active patents in the US and has filed applications for most of the hottest technologies in Europe. In the next three years, Samsung will invest $ 22 billion in the development of 5G and technology in the field of artificial intelligence.
Samsung Research, a research division of Samsung Electronics, unites 21 research centers around the world:
Samsung Research units on the world map (from the site https://research.samsung.com/)
Among them, 7 are centers specializing only in AI. The Moscow AI Center was opened on May 29, 2018, the remaining six are in Seoul, Montreal, Toronto, New York, Cambridge and Mountain View.
The main area of research at the Samsung AI AI Center in Moscow is machine learning, an approach that has been successfully applied in speech recognition, computer vision and data analysis. The director of the Center is Viktor Lempitsky, Ph.D., associate professor of the Skolkovo Institute of Science and Technology, the most cited Russian scientist in his subject category in 2018, winner of the Scopus Award Russia in 2018 for his contribution to the development of the industry.
The architects and designers who designed the office of the Moscow AI Center were inspired by the idea of digital infinity. The office concept is designed to create a space that maximizes creativity in a comfortable environment: mobile furniture and movable multifunctional partitions, with which you can combine several meeting rooms and create the necessary configuration of the workspace.
Lecture by Mikhail Romanov (Senior Engineer, Visual Understanding Lab) for students of Samsung AI Bootcamp 2018 in the Matrix meeting room
The meeting rooms bear the names of films about artificial intelligence (The Matrix, The Terminator, The Bicentennial Man, From The Car, etc.), each has screens on both sides, and you can write on the walls with markers. Tablets fixed at the door of the meeting rooms using face recognition technology allow you to find out about free time and reserve a room.
Open space with ergonomic furniture: movable tables, specially designed chairs
The AI Center has sports and recreation areas where you can play table tennis in a special room with sound absorption, do yoga and fitness, take a shower and change clothes. And even there are a few capsules for a short sleep!
Every year, the Samsung AI Forum takes place at the Moscow AI Center. The aim of the Artificial Intelligence Forum is the communication and interaction of outstanding scientists from Russia and abroad. At the event venue, they can share their knowledge and experience, offer ideas for solving the most pressing problems in the field of AI. In December last year, within the framework of the second annual Forum, the results of research by Moscow colleagues were presented, which can be further used to create full-fledged services based on AI technologies, as well as to develop applications and components for company products.
Laboratory of Computer Vision and Visual Modeling
The head of the laboratory is Anton Konushin, Ph.D., associate professor of the HSE and VMK Moscow State University, where he is also the head of the joint laboratory of Samsung and Moscow State University.
Mikhail Romanov and Igor Slinko, authors of the course, also work in the laboratory of computer vision and visual modeling. “Neural networks and computer vision”. This is the first free mass online course Samsung Research launched in Russia in 2019, and the guys are our pioneers. The course talks about the use of neural networks in the analysis of images from the basics, does not require specialized knowledge, only basic knowledge in the field of higher mathematics and statistics, and readiness to program in Python are needed. The course has already 24,000 enrolled students. And the killer feature: the prospect of employment – several people have already become employees of the Center following interviews.
We have two large groups in our laboratory: the first is engaged in Depth Estimation (measuring the image depth), the second is SLAM (by the method of simultaneous localization and map building). And there are small teams with different tasks, for example, my colleague Danil Galeev and I used to be engaged in GAN (generative-competitive networks), and now domain adaptation.
Domain adaptation is when we train a neural network model on one domain (domain), and then test it on another domain. The two most common domains are synthetic data and real data. It is this statement of the problem that is most relevant, because synthetic data can be generated as much as you want, they are cheap. For example, you can generate many images of cities and train an unmanned car on them, which is much easier than running a real car on the streets of real cities and collecting real data.
It is clear that if we train the neural network on synthetic data and just transfer it to real data, then it will not work very well. How to reduce this difference? You can generate and then use a lot of labeled synthetic data, to train a neural network on them. And then use a lot of unallocated real data (i.e. resources were spent, but only for data collection, and not for their markup). And thus, combining labeled and unallocated data, we are achieving a significant increase in the accuracy of neural network models.
Examples of different domains in the DomainNet dataset: clipart, infographics, painting, sketch, photo, graphics. The object is the same, but the domains are different.
Konstantin spoke at Samsung AI Forum with a report AdaptIS: Adaptive Instance Selection Network.
I am interested in dealing with algorithms that will help solve real problems. For example, automate everyday routine tasks. Human labor is the most expensive. Therefore, I am interested in doing those things that can be converted to benefit people.
In my opinion, artificial intelligence has two development paths: it will be either “strong” and something like the “Holy Grail” will turn out. The emergence of a “strong” AI will change everything in our lives; I find it difficult to predict what will happen. Or, it will be possible to talk about a “weak” AI, then robotics is probably the most interesting direction. An unmanned vehicle belongs to the same direction, because it is essentially a road robot. Replacing drivers with robots raises the question: what social consequences will it bring? We all live in a society, and technology can bring about global social change. I reflect on this topic.
One of my last articles is devoted to the topic of Instance Segmentation – searching and highlighting on the image all the objects we need. We select them using a pixel-by-pixel mask, that is, at each point it is indicated whether this pixel belongs to the object or not. This fits well with the concept of Visual Scene Understanding, because the first step in understanding an image is to understand what objects are present on it. There are Object Detection algorithms that solve this problem, but there each object is highlighted with a rectangle, and the objects overlap strongly: this gives too simple, too rough an approximation of where the object is. If you look at what ordinary indoor scenes look like (I’m not talking about ideal design rooms, where they are clean and tidy), real apartments will look like this: a sofa, pillows and other things lying on it.
When I started this task, I was faced with the fact that existing algorithms do not cope well with such cases. We came to a new algorithm that we presented in our work. It allows you to select objects with any complexity of intersections: the main thing is that at least one pixel of the object is visible. The algorithm is based on the hypothesis according to which you can always find a pixel in a picture that belongs to a specific object. If there is not a single pixel of the object in the picture, then there is no object. And if there is an object, if a person sees the object, then there is a pixel that belongs to him. Accordingly, the algorithm allows you to find such pixels and select the entire mask of the object through these pixels.
Now we are dealing with the topic of interactive segmentation, and this is also a very important task. Returning to the previous task: to train Instance Segmentation, you need high-quality pixel-by-pixel markup of all the objects in the pictures, and this is an expensive thing, because it’s banal to sit and select the outline of each object manually in Photoshop for a very long time. And interactive segmentation allows you to automate this markup. We mark up each object not by selecting the polygon of this object, but simply a person clicks on the object – makes the so-called positive click. The object is either selected from the first click, or if it didn’t work (for example, some parts of the object were skipped, or, on the contrary, something unnecessary fell), we put a negative click.
As a result, instead of selecting the entire object with a pixel-by-pixel outline, we reduce the problem to the fact that this area should or should not be selected with a simple click. Practice shows that in most pictures within ten clicks you can select objects with high accuracy. This is a huge difference, data markup will be accelerated at times.
The mask that the algorithm displays if you select an object point
Multimodal Data Analysis Laboratory
The head of the laboratory is Sergey Nikolenko, Ph.D., senior researcher at the St. Petersburg branch of the V. A. Steklov Institute of Mathematics (POMI RAS), associate professor at the Higher School of Economics in St. Petersburg, co-author of the book “Deep Learning. Immersion in the world of neural networks. “
At the Samsung AI Forum, Gleb made a presentation on High-Resolution Daytime Translation Without Domain Labels
My laboratory is engaged in generative models, computational photography. There are a number of tasks for the restoration of three-dimensional structures, i.e. when several photographs need to recreate the three-dimensional shape of a complex object. Also, these are the tasks associated with obtaining universal representations for pictures or objects in pictures. This all, in general, revolves around neural networks. From an applied point of view, applications are impressive where a person interacts with generative models, from implicit effects to cases where the model acts as a tool for a person, for example, in the synthesis of music.
I mainly deal with generative models combined with human-machine interaction. It is interesting! Something complicated, like a neural network, turns into a tool like a camera, applicable for getting momentary pleasure or some sensory experience: I pressed three buttons, got something cool, not thinking much about how it works, but roughly understanding what will happen as a result, although sometimes it turns out and something unexpected.
Our study solves a rather simple, at first glance, task. The algorithm, having received a landscape photograph at the input, feeds a set of photographs of the same landscape at different times of the day. For example, if at the entrance there is a photo of a daytime city, how would it look in the evening, at night, in the morning and in the periods between these times of the day to make a smooth beautiful video? This technology works in high resolution up to 4K.
We work with landscapes, because in landscapes the change of day or season will be the most obvious. The interior of the buildings does not change much during the day, except perhaps some reflections, glare, which depend on various factors – how the grilles and shutters are located on the windows. Everything is clear in the landscapes: you have the sun, the sky, a large space that needs to be lightened differently, to draw something on it. If the algorithm makes the transition from night to day, you need to stretch the dark areas, and if from day to night, you need to darken everything correctly.
Seeing the landscape, it is not very difficult for a person to imagine exactly how he will change depending on the time of day or year. It was very interesting to simulate essentially human perception, while not spending an insane amount of time collecting real images and videos for each landscape.
At the Samsung AI Forum, Dmitry presented two reports: “Free-Lunch Saliency via Attention in Atari Agents” and Perceptual Gradient Networks.
The main area of research in the laboratory for the analysis of multimodal data is the tasks associated with the generation and processing of images, and over the past year I have managed to work on two projects in this area. In the first half of the year, I was engaged in Reinforcement Learning (RL) – this is one of the machine learning technologies in which the test system (agent) learns by interacting with a certain environment. Simply put, the learning process can be thought of as a game: encourage actions leading to rewards and avoid leading to failure.
My project was about understanding what parts of the picture a neural network is looking at that implements agents in RL. Those. we needed to understand how it works and what we managed to teach it in the end, for this we build a “something” network in it, showing which parts of the original image it is looking at. My first report on the forum was about how we went through a bunch of different ways to embed this piece in a neural network. The problem was to embed in such a way that nothing was broken anymore. We seem to have succeeded, but with some flaws – the visualization of the map of the importance of parts of the picture is not very clear. We experimented in order to increase clarity, but, unfortunately, agents began to work worse from this.
Left: clear picture, weak agent. Right: rough picture, strong agent.
The second report was called “Perceptual Gradient Networks”, it was about optimizing perceptual loss – this is a loss function that is used almost everywhere where there is image generation by neural networks. To use perceptual loss, developers first go through the neural network forward, and then backward. Going back is computationally complex. We wanted to get rid of such a double pass and replace it with another neural network, through which everything can be done in one pass forward, this gives an increase in speed and a decrease in memory requirements. Now we are working on improving the architecture of this second network, we are striving to radically reduce memory costs without breaking the quality.
I’m interested in everything related to Reinforcement Learning, because this is the area closest to general artificial intelligence (General AI). The remaining areas such as computer vision, reconstruction of human posture, sound analysis, are more highly specialized. They are certainly more useful in the near future, they can already be taken and built into drone cars or search. About RL, with a few exceptions, this cannot be said, but with it, tasks that are no longer solved at all can be solved. For example, people thanks to these technologies have learned to play very complex computer games such as DotA and StarCraft very well. In general, RL is a method for optimizing anything for any goals that you set.
If you’ve reached the end of the article and you’re still interested, although most of the terms are unclear, then the good news is that Samsung has free onlinecourses on Stepik, to which we invite you. We wrote about them earlier in the blog (1, 2).
And for those who are far from new to what our colleagues talked about, open vacancies may be interesting. Samsung Research. Right now there are vacancies for Data Scientist (2 people), Machine Learning Engineer (2 people), Deep Learning Engineer.