Transformers in machine learning – is this moving where?

In 2017, Google engineers published Attention Is All You Need [1]), where they proposed a new neural network architecture model, called Transformers, and showed its effectiveness for translating texts. And in 2022, this technology was applied to create Alphacode [2]who is learning to solve some competitive programming problems.

Programming has long been an unattainable task for artificial intelligence, but now this milestone has been reached. Is then the Transformers model really the same artificial intelligence that will be able to replace programmers and other knowledge workers in the near future, or is it just a step in the development of technologies that can help a person in his activity, but are in no way able to replace him?

To answer this question, let’s first consider the two main models of machine translation. One of them involves the concept of meaning: the words of the text are mapped to some universal language of semantic entities, after which these meanings are translated into words of another language and the translation text is formed. This model dominated in the 1970s and 1980s and became automated in the 1990s. The idea of ​​another model is that in order to translate a text, it is generally not necessary to understand its meaning. On this model, in the 2000s, tools for machine-assisted translation began to be created, where whole pieces of text in one language are compared with pieces of text in another language.

Is the operation of meanings an obligatory part of a quality translation? Let there be two languages ​​that allow describing, for example, a certain class of machines. The task of translation is to create a description of a machine in one language from its description in another language. Practice shows that for such a translation there really is no need to involve the concept of meaning (the very idea of ​​autoamic translation came precisely from the field of translation of specifications).

However, not all texts written in these languages ​​will be descriptions of some machines. Not all texts describing machines will correspond to operable machines (in the sense that they will be able to perform some function in computable time; by executing a function one can understand the comparison of a machine with another machine that will perform the reverse function). Some of the working machines described will be able to perform their functions only for a limited range of input conditions.

Therefore, with such wonderful machine description languages, it is tempting to use them for other purposes, for example, to design new machines. We are describing a new machine (writing text), and, not yet reproducing it in hardware, we want to know: is it a machine at all? If so, can it perform a useful function? And if so, under what conditions and restrictions?

The translator does not necessarily appeal to the meaning of what is written, it does not matter to him whether the written is a description of something workable and useful, the translation will be done in any case. And the fact that automatic translation does not necessarily have to appeal to meaning is neither a strength nor a weakness of automatic translation, but a feature of the task itself. But the tasks of evaluating the performance of machines require significantly greater intellectual effort, and perhaps even a qualitatively different intelligence than the translation of texts from one language into another.

Design by description is, in a sense, a process of translation (translation from the language of instructions and specifications into the language of hardware). But after all, a person is able to imagine a machine and how it will work even before he has designed and tested it, that is, to preliminarily answer some questions about the performance and functions of a hypothetical machine even before it is assembled and tested. This is where the necessity of using the meaning of words as some conceivable entities that can interact with each other without being embodied lies.

Is the Transformers model a step towards using meaning in machine learning or a step towards not using it? So far, this cannot be said unambiguously. We can only say that the choice of the translation task as an application indicates that the use of Transformers is still directed to those areas where the use of meaning is not necessary, and the learning program itself and the interface of interaction between the learning automaton and the environment are designed in such a way that there is no sense in the calculation of meanings. benefits, even if they can be computed in the Transformers model. The fact that a program is required of a machine as an output (and a program is some entity that can be executed on a physically existing device and perform some useful function) does not mean that the machine appeals to the meaning of the program it writes: the machine there is no way to run the code for execution, no way to see the errors that arise, and accordingly there is no way to learn from these mistakes: it operates only with words and phrases, but not with the meanings of these words and phrases.

Based on the results of the experiment with Alphacode, we can only say that there are tasks in programming that do not require operating with meanings, and probably in the near future we will see many automatic tools that learn how to port code between languages ​​and platforms.

The authors of the Transformers model themselves claim that they see signs that “heads
attention” (which are specific elements of their neural network)
demonstrate behavior related to syntactic and semantic structure
offers. Isn’t this the ability to work with meanings? The next post is an analysis of the article “All you need is attention”, where we will explain in simple terms what has been done and the links to other works.

  1. Ashish Vaswani et al. Attention Is All You Need.

  2. Yujia Li et al. Competition-Level Code Generation with AlphaCode.

Similar Posts

Leave a Reply Cancel reply