Word processing. Sequence-to-sequence model

Today we will study seq2seq (Sequence-2-Sequence) neural networks. Let's look at why they are needed, where they are used and how they work.

Subsequence

The mathematical definition of the sequence looks like this:

Subsequence – this is a numbered set of any objects, among which repetitions are allowed. Moreover, the order of objects matters. Numbering most often occurs using natural numbers from zero to infinity.

Examples of a sequence include:

houses on the street;
letters in a word;
words in a sentence;
stills for films;
sounds in a word.

If you, for example, mix up all the frames in a film, the plot will be lost. Or mix the leaves, branches, trunk and roots of a tree, and get a different type of vegetation.

Often there are tasks to get another from one sequence. For example:

from an English sentence – Russian (translation task);
from the question – the answer (chat bots);
from the article – abstract;
from film frames – a description of the action;
from sounds – text content (speech to text task);
from words in a sentence – sounds (text to speech task).

All these problems can be solved using neural networks.

Let's refresh some knowledge on recurrent neural networks obtained earlier.

People don't start thinking from scratch every second. As you read this post, you understand each word based on your understanding of the previous word. We don't throw everything out of our heads and start thinking from scratch. Our thoughts are permanent.

Traditional neural networks do not have this property, and this is their main drawback. Let's imagine, for example, that we want to classify the events that happen in a movie. It's unclear how a traditional neural network could use reasoning about previous events in a movie to infer information about subsequent ones.

This problem is solved by recurrent neural networks (RNN). These are networks containing feedback connections and allowing information to be stored.