Yandex translator, Google Translate and DeepL

Many companies or their employees have at least once used online translators for their tasks. It's fast, convenient, but the result is not always accurate. However, such translation sometimes has unpleasant consequences: it can be misunderstood by clients or business partners, which can damage the reputation.

In the last publication, my team and I reviewed the best localization programs in 2024, you can read it at this link. Today we will look at popular machine translation (MT) engines: Yandex Translator, Google Translate and DeepL. We will evaluate the capabilities of each, compare the pros and cons, what tasks this or that translator is suitable for, and tell you why it is better not to translate important documents online. This article will mention third-party research and testing of translators with links to them. Happy reading!

Table of contents

1) Translation technologies. How do they work?
2) Criteria for evaluating online translators
3) General information about translators:
– Google Translate
– Yandex.Translator
-DeepL
4) Table
5) Testing engines and common mistakes
6) Conclusions

1. Machine translation technologies. How do they work?

Every year, translation technologies are developing more and more, and at the same time the language barrier between people speaking different languages ​​is being reduced. Modern translation systems are based on neural networks and artificial intelligence (AI) technologies, which improve the quality of translation. The main task of such systems is not simply to replace words with equivalents in another language, but to take into account the grammar, context and even style of the text. But it wasn't always like this! Let's take a look at what machine translation used to be like.

History began in 1947 MPwhen a mathematician Warren Weaver first proposed the use of computers for translation. Over the next few years, many scientists tried to implement this idea, and in 1954 it worked. IBM, together with Georgetown University, held a public presentation of their experiment.

At that time, technology was just beginning to develop and there was only one method – the direct translation method. It had a lot of disadvantages and errors. But this was also a big breakthrough.

Since then, other approaches have emerged to help translate texts quickly and as closely as possible to the native speaker. Let's look at some of them:

  • RBMT (analytical). Or rule-based machine translation. One of the very first such technologies. With this approach, the most complete linguistic databases are collected, and the larger the database, the more accurate and accurate the translation is. Such databases include dictionaries, reference books, descriptions of grammars, and information about the regularities of the language. Information about translation algorithms is no less important. All these data together affect the quality of the final version – the translated text.
    In a short time, the system manages to carry out morphological, syntactic analyzes and synthesis of sentences. One of the main disadvantages of RBMT is that it ignores context because the system strictly follows the rules written in it.

  • CBMT. Translation based on text corpora. It is the next technology after RBMT and originated in the 1980s. This translation method uses an array of parallel texts (corpora) in two languages. Unlike the method described above, CBMT emphasizes the collection and use of actual translations. The system finds matches in different corpora and, based on them, translates almost any material. The disadvantages of this model include the quality of translation. The smaller the corpus, the worse the final version due to the small amount of data (too small a sample). Also, if the texts themselves in such a corpus contain errors, then the system can adopt them, perceiving them as a model.

  • EBMT. Or machine translation with examples. Began to be used since 1984. Unlike corpus-based translation discussed above, EBMT uses a database of sentences or text fragments that have already been translated. After the system receives the text to be translated, it divides it into several segments (sentences) and searches for similar ones in its database. Next, the segments are compared with those in the database and the translation occurs. When EBMT receives new text, it remembers it, forming a translation memory, and uses this request to further expand its database. The disadvantages of such a translation are almost the same as those of CBMT – if there is not enough text in the system’s memory, then the quality of the translation suffers. There may also be restrictions for long sentences with complex grammar.

  • SMT. Statistical machine translation. SMT has several subtypes, but in our publication we will not delve into them, but will only talk about the basics. It was Warren Weaver, whom we mentioned at the very beginning, who became the founder of this idea. However, in those years, the power of computer technology was not enough to realize this idea. Everything has become possible since the 1990s. The SMT model is based on probability theory, namely Bayes' theorem. Its task is to find the most likely translation of a sentence from one language to another. The more often a translation option appears, the more likely it is to be correct. The disadvantage of this approach, as with other systems, is that it largely depends on the volume of texts in the database, as well as the difficulty of taking into account the context. Until 2016, SMT even used Google Translate. If you are interested in this topic on a deeper level, we recommend reading the book “Statistical Machine Translation” Philip Kena.

  • NBMT. Machine translation based on neural networks. How does this mechanism work? Neural networks mimic the behavior of the human brain when processing data. Therefore, NBMT has a major advantage over other systems – the ability to take into account context and grammar at a deeper level. Unlike SMT, which uses probabilities, NBMT neural networks analyze entire sentences and texts in context, which allows you to create more accurate and natural translations.

  • HMT. Hybrid machine translation. This method can combine several approaches described above: RBMT, CBMT, EBMT, SMT and NBMT. In the 2010s Systran became one of the first companies to introduce a hybrid MP, combining SMT and RBMT. A particularly important development in hybrid machine translation was the advent of neural networks, which helped significantly improve the quality of the final translations.

2. Criteria for comparing online translators

We decided to compare each of the translators (Yandex translator, Google Translate, DeepL) according to several criteria, which we will summarize in one table. This way you can evaluate all the characteristics and choose which option to choose.

What will appear in the table:

  • Language support — let's see how many languages ​​each engine offers for translation.

  • Support for rare languages — find out which translators support rare languages.

  • Cost of paid features — we will write the cost of all tariffs for each of the presented translators.

  • Integration options — consider whether the engine supports integration with software systems and platforms.

3. General information about translators

Yandex translator

Not many people know, but in 2011, when the service just started operating, only three languages ​​were provided for translation: Russian, English and Ukrainian. Now this list, according to official data, consists of 96 languages, including not very popular ones: Haitian Creole (Haiti), Galician (Galicia), Malagasy (Madagascar). In addition, the developers decided to add unusual languages. So, since 2016, everyone has the opportunity to translate into Sindarin – the Elvish language invented by JRR Tolkien. A year later, Yandex learned to translate emoji into the language. For example, this is how he sees the title of our publication:

As for the technologies used by Yandex, this is a hybrid translation (HMT)which combines statistical machine translation (SMT) and translation based on neural networks (NBMT) using YandexGPT . For this purpose, an algorithm was developed based on the CatBoost learning method. It allows you to evaluate multiple translations and shows you the one it thinks fits best.

Google Translate

According to the latest data, by 2024, Google Translate will offer translations to users in 244 languages. Particularly surprising was the news in June 2024, when Google announced that they plan to make the largest update in history – 110 new languages, most of which are African.

Since October 2007, the company has used statistical machine translation (SMT), and in 2016 they developed their own neural machine translation model, called GNMT. It includes example-based translation (EBMT), which we talked about earlier. It is worth remembering that this system does not support translation into all languages.

The image shows languages ​​with supported EBMT translation. The result from using this technology is better:

DeepL

DeepL is often compared to Google Translate and Yandex.Translator. Since its launch in 2017, it has been recognized for its high quality translations and has quickly become popular among text professionals. For translation it uses MT based on neural networks. Its architecture is trained on massive amounts of data, allowing it to better understand the context and meaning of text than traditional statistical or phrasal machine translation models.

There are currently about 30 languages ​​represented in DeepLmuch less compared to other services, however, this did not prevent it from gaining popularity.

Let's move on to comparing the main characteristics of translators.

3.Table

Criterion

DeepL

Yandex.Translator

Google Translate

Language support

30+

90+

140+

Integrations

Yes, 700+ integrations

Yes, there are API integrations

Yes, more than 290 ready-made integrations

Support for rare languages

No

Yes

Yes

Price

Paid for companies, tariffs start from 7.49 €

Paid for integration into applications and web services.

For free

As you can see, DeepL's characteristics make it perfectly suited for enterprise work. More than 700+ integrations allow you to use it not only in CAT systems. In addition, everyone can choose the necessary tariff for themselves. However, it does not support rarer languages. Therefore, if you do not need many paid features and working with rarer languages ​​is important, it is better to consider Ya.Translator or Google Translate.

4. Testing engines and common mistakes

In June the company Intento published a full report on the state of machine translation in 2024. You can view it in more detail and download it on the official website. Total number of participants 52 different MPs and LLMsincluding Google Translate, Yandex Translator and Deepl.

The study showed that the models GPT-4o and DeepL superior other solutions in the field of machine translation. As part of the analysis, 11 language pairs were tested in 9 different domains, such as finance, legal, etc. Google takes 3rd place, but Yandex is far behind its competitors, taking only 14th place:

 comparison of different machine translation systems based on the number of times they performed best for certain language pairs and domains

comparison of different machine translation systems based on the number of times they performed best for certain language pairs and domains

Main conclusions of the study:

  • 80% of the most common errors are incorrect translations.

  • Chat GPT and DeepL showed the best results among other language models.

  • The number of errors associated with complex structures that machine systems cannot always process correctly has decreased.

  • Most translation errors involve changes in meaning and incorrect use of words or phrases.

We also decided to supplement this publication with our experience of working with the presented translators. Especially for this material collected common errorsencountered in the translation process and demonstrated how each translator behaves in different cases.

While using machine translation, we identified several types of errors:

  • Errors at the semantic level: incorrect use of words/distortion of meaning.

  • At the syntactic level: unnatural, non-native designs.

  • Errors at the grammatical level: violation of agreement.

First type of errors at the semantic leveloccurs most often. They are associated with the most important problem of any MP: the inability to grasp the context and organically substitute the necessary meanings of words. Fortunately, they are not critical and can be easily identified during visual scanning. For example:

Correct translation of the phrase: Chips have entered the chuck jaw. DeepL here I coped worst of all with recognizing a suitable translation for several words at once. Unlike Google Translate and Yandex Translator, it was unable to translate words cartridge” And “shavings” . With translation of the phrase chips got in” and no one could cope at all.

And the last example of this type of error syntactic and lexical tracing paper:

Unfortunately, not a single MP was able to translate the sentence correctly. Meaning of the original text: The fashion house has released a collection of clothes and accessories (group) for summer holidays. The logic of sentence construction was ignored and copied from the original.

Second type of errors – at the syntactic level. Occurs less frequently than in semantic. They are much more dangerous than the previous ones due to difficult detection. The text may appear correct on the surface, but contains unobvious inaccuracies.

DeepL completely failed to cope with this task and did not take into account that in Russian the word order can be both direct and reverse. Because of this, an option was proposed: to bring the passport into compliance with some requirements.

Third type of errors – at the grammatical level, in the structure of the language unit. In this case, the agreement violation:

As we can see, almost every translator tested experiences problems with the same types of errors. Most of all with incorrect translation, which is confirmed not only by our personal experience, but also by analytics Intentowhich we talked about above.

According to the frequency of occurrence of errors, they look like this:

  • Translation errors – more than 80%.

  • Errors when translating idioms.

  • Omissions of phrases.

  • Rest.

5. Conclusions

According to research, every year the percentage of errors of online translators becomes less and less, and the advent of neural networks allows the system to learn faster and minimize some types of inaccuracies. However, the mechanism is still not perfect and is not able to translate as well as experienced professionals.

If your goal — to translate simple everyday topics, everyday and light-duty dialogues, then online tools will do an excellent job of this, making a minimum of mistakes and not putting you in an awkward position because of them. For all this, we advise you to use any of the systems in our review today. Based on our experience, DeepL performs best of all.

Here are a few more cases where machine translation might be right for you:

  • General topics. Simple sentence structure, without complex speech patterns.

  • Medicine/pharma. Simple documents with the same structure and simple terminology are suitable. For serious instructions, on the correct translation of which a person’s life depends, only with the help of a translator.

  • Small manuals/instructionsin which a description of the actions. Without highly specialized terminology.

  • Some educational materials.

But if you want to translate something more serious, then it is better to turn to professionals. MP is especially bad at handling marketing materials. This happens because sometimes good text violates the rules familiar to a trained machine. As a result, it processes it incorrectly and produces incorrect results with errors.

A complete list of materials that should not be translated using MP yourself:

  • Technical texts with complex terminology.

  • Highly specialized texts: scientific articles, analytical studies, law enforcement standards and requirements, engineering specifications, patent documentation, etc.

  • Literary texts with complex figures of speech.

  • Documents containing classified information.

If you want to save on translation services, then there is also a way out. Many agencies provide the service PEMT including machine translation + post-editing. The editor will help eliminate any inaccuracies and make the quality of your text much better. In addition, there is a proofreading service – proofreading of the text by a native speaker for errors.

The demand for machine translation and editing is increasing (in our agency the number of service orders is 60% of all), but the request for MP alone is not at all in demand. This suggests that the quality of such work remains not at a high level and the texts still require proofreading and subsequent editing.

Today, machine translation is very often used by companies to perform everyday tasks. If you need to translate important documents, especially those containing classified informationit is better to contact translation agencies.

Every time you load text into an online translator, it stores all the information that is run through it. The system needs this in order to learn from your texts and improve. Sometimes such texts translators get out of the system to analyze and reduce the number of errors to a minimum. In such cases, there is no guarantee that the information will not leak” to competitors and will not be used by third parties. For this reason, it is better to entrust the translation to professionals. You don’t have to worry about confidentiality – translators use special CAT platforms, beyond which the information does not go beyond.

Perhaps in a few years MT will approach the level of native speakers. But in the meantime, the translation of important texts should be trusted only to professionals.

What translator do you use?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *