Samsung's New Free Online Text Analysis with Neural Networks

We habitually use Internet search, communicate with chat bots, read documents in any languages ​​thanks to translators. Tell the robot vacuum cleaner to start cleaning with his voice? Nothing special … For many, voice assistants on a smartphone went into everyday life. The future, in which the computer, having read an extraneous note about football, changes the tone of the weather news accordingly, has already come.

How does it all work? How to become a specialist in NLP (stands for Natural Language Processing, do not confuse it with neuro-linguistic programming :))?

Those Habrovsk people who ask such questions, we invite you to the recently opened online course Samsung Research Russia. Under cat details …

Authors of the course “Neural Networks and Text Processing”

In June of this year, we wrote about the launch of our first online course "Neural Networks and Computer Vision." It turned out to be successful: there are already more than 20 thousand listeners, excellent reviews and we even received the Stepik Awards in September in the nomination "Best Course from New Authors" for it!

5 months have passed since the start of the first year, and we were not idle! Armed with the experience gained and inspired by the success of colleagues, another team of authors – developers of the Samsung Center for Artificial Intelligence in Moscow, machine learning experts Roman Suvorov, Anastasia Yanina and Alexey Silvestrov, with the continued editorial support of Nikolai Kapyrin, mastered the gigantic amount of work, and on October 15 on the channel “ Samsung Research Russia Open Education »Stepik platform launched the second course -" Neural networks and word processing. "

The course is designed for 7 weeks. If you spend on average 3-5 hours a week watching video lectures, answering questions and completing practical tasks, you will understand what is under the hood of modern search engines, chat bots and text generators. The team made a lot of efforts to ensure that, having passed only one course, students could confidently navigate technologies at the level of a junior developer or any technical specialist who has no specific experience in working with NLP, but now has to deal with it.

So what are the distinctive advantages of our course?

  • it was developed by Samsung Center for Artificial Intelligence, who have behind them the luggage of commercial projects in this area
  • there is a theory and practice – you will see how to create neural networks for processing text on PyTorch, implement the most relevant architectures and learn how to adapt them to your needs
  • As in the first year in computer vision, the best graduates are invited to an interview at Samsung Research Russia!

In the infographic below, we briefly reflected the content and current quantitative characteristics of the new course:

Graduates of the course receive certificates. In this case, two options are possible:

  • an ordinary certificate, points for which you can score by solving all the problems to the main part of the course;
  • certificate with honors: for it you will need to solve all the problems for the highest score, go through the theoretical tracks of the course (there the tasks are similar to those that are set for the employees at the interviews) and solve the final problem at Kaggle.

Teachers and course developers

Roman Suvorov
Senior Engineer, Samsung Center for Artificial Intelligence in Moscow
In data analysis, machine learning and natural language processing since 2011.

“In 2013, neural networks captured my attention and have not been letting go since then, although I don’t forget about classical approaches”

Anastasia Yanina
Samsung Center for Artificial Intelligence Engineer in Moscow

“I have been engaged in data analysis and NLP since 2015. She graduated from Moscow Institute of Physics and Technology (FIVT) and ShAD, now I teach machine learning at the PhysTech "

Alexey Silvestrov
Senior Engineer, Samsung Center for Artificial Intelligence in Moscow
“He was engaged in classic NLP in 2009-2012, as a student, and later – DL NLP in 2015-2017, later switched to the generation of music and images by neural networks. Graduate of the VMiK Moscow State University. ”
Nikolay Kapyrin
Producer of online courses, curator of educational programs on artificial intelligence, Samsung Russian Research Center
“I plan to write an article on Habr about technical and methodological problems that we solved while we made two online courses in a year”

Course program

1. Introduction

In this module, as a first approximation, we learn what text processing means of machine learning is today, what are the difficulties and what tasks of linguistics today can be solved only by machine learning methods.

  1. Hello! Tell us about yourself!
  2. In general terms: natural language and text
  3. Features of natural language processing
  4. In General: Linguistic Analysis
  5. In general: Feature extraction
  6. Applied word processing tasks and totals

2. Vector text model and classification of long texts

The math begins. Sparse vector models, tokens, mutual information … what is all this? We will go through the methods of translating a multidimensional and multifaceted structure, which is contained in the text – into numbers so that ML-algorithms can start their work.

  1. Vector text model and TF-IDF
  2. Create a neural network for working with text
  3. Theoretical Objectives: Vector Text Model
  4. Workshop: classification of news texts

3. Basic neural network methods for working with texts

Do we use fully connected neural networks? What is the operation “convolution over texts”? It seems to be an operation for matrices? The answers are in this module, where we will study the first successful attempts to teach neural networks to work with the meaning of the text.

  1. General algorithm for working with texts using neural networks
  2. Distribution semantics and vector representations of words
  3. Workshop: Food Recipes and Word2Vec at PyTorch
  4. Theoretical questions: Fundamentals of text processing by neural networks
  5. The main types of neural network models for word processing
  6. Convolutional neural networks for word processing
  7. Workshop: POS Tagging with Convolutional Neural Networks
  8. Theoretical questions: Convolutional neural networks in word processing

4. Language models and text generation

Dive deeper into neural networks. The text can be of any length, but only recurrent neural networks allow the algorithm to generate text without special tricks. We tried to teach the network to read, now we will give it the opportunity to compose.

  1. Recurrent Neural Networks
  2. Language modeling
  3. Workshop: Generating Names and Slogans Using RNN
  4. Aggregation, Attention Mechanism
  5. Transformer and self-attention
  6. Workshop: Modeling a Language with Transformer
  7. Theoretical Issues: Language Model and Transformers

5. Conversion of sequences: 1-to-1 and N-to-M

But what if the input is text and the output needs text? This is a job for the translator, to whom, as we know, context is most important. If you must translate one array of text into another, or into several, then this module will give you everything you need!

  1. Recognizing the flat structure of short texts
  2. Workshop: Recipe Recognition
  3. Workshop: aspect sentiment analysis as NER
  4. Sequence Conversion (seq2seq)
  5. Workshop: Generating Pieces of Code with Stack Overflow
  6. Theoretical questions

6. Transfer learning, model adaptation

Do you have a great project but no superhero computing resources? Then take a ready-made neural network and train it to solve your particular problem! You need to know a few names and a few tricks of training, and the point is in the hat.

  1. Contextualized representations and knowledge transfer
  2. Workshop: pytorch-transformers or how to run BERT
  3. Workshop: BERT for question and answer search
  4. Theoretical questions

7. Final Kaggle Competition and Conclusion

By moving the cursor to the “Start training” button, you already see how multidimensional chains of pseudo-characters unfold and the attention of the machine flows between concepts? Then show yourself in our final competition!

  1. What else to read, how to develop
  2. Kaggle Competition: Overview of the Problem and Basic Solution

Student requirements

The course is designed for students who are a little versed in the field of machine learning.

What do you need to start the course?

  1. Have basic knowledge of neural networks
  2. Have basic knowledge in the field of mathematical statistics
  3. Be prepared to program in Python

We can say that the course "Neural Networks and Text Processing" is a continuation of the first course in computer vision, because it relies on the basic level of knowledge on neural networks that we have already given.

Perhaps you already know something about NLP – that it's not just about editing text; that creating chat bots, retelling a text, classifying emotions, answering Wikipedia questions are simple tasks that require research. These tasks will become available to you after you complete this course. But, most importantly, we will teach you to ask the right questions in the world of modern NLP, and whether you will find the answers yourself or the external neural network – is there a difference. What's next? You decide.

Are you with us

Then welcome to the online course!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *