GigaChat MAX – a new, powerful GigaChat model
When updating the flagship, GigaChat significantly improved the tokenizer: it encodes text more efficiently and processes programming languages together with LaTeX. We paid special attention to the last two: thanks to the addition of frequent keywords, support for spaces, tabs and line breaks, the accuracy of working with code data has increased and the resulting metrics have improved.
The tokenizer divides the text into “pieces” (tokens), which are subsequently used by the language model to analyze and construct the text. The tokenizer's job is to do this as efficiently as possible, and this very efficiency is usually measured either using fertility, or by counting the average number of characters per token. Encoding efficiency is necessary to optimize inference: it is important that the model generates more text for the same number of tokens.
In the measurements below, we demonstrate the increased encoding efficiency of GigaChat MAX compared to its predecessor. In the table: the higher the number (the degree of text compression), the better.
characters/token | ||||||
model | code_go | code_python | math | LaTex | text_en | text_ru |
GigaChat-Max | 2,920 | 4,247 | 1.271 | 1,794 | 4,377 | 4.085 |
GigaChat-Pro | 2,466 | 3.552 | 1,283 | 2,146 | 4,069 | 3,571 |
Fighting cycles in GigaChat
One of the most annoying problems with large language models is looping when generating responses. You don’t have to go far: the very first comment to one of our articles was about cycles 🙂
The problem is common to all technology and manifests itself even among the world's industry leaders. Existing well-known penalties in popular libraries (frequency_penalty, presence_penalty, no_repeat_n_gram, etc.) do not solve the problem, and in some cases clearly spoil the generation.
Therefore, we have developed and implemented our own anticycle algorithm, which controls the generation, eliminating the repetition of n-grams much more softly than the mechanisms listed above do. Our algorithm works dynamically, allowing you to flexibly manage the fight against cyclicality in the process.
Naturally, anticycles do not always need to be included:
When writing code, all key and repeated elements, such as variable and function names, should not be changed;
different legal documents using the same terms must not be misrepresented by us;
the same applies to facts;
If you ask a model to repeat something, she should repeat it.
We took into account all these scenarios, seasoned them with a pack of heuristics. And now GigaChat does not repeat itself unless you explicitly ask it to do so:
Not just for developers
At the beginning of the article, we wrote that the easiest way to use the new GigaChat is through bot in Telegram or web version. For developers and those who want to customize GigaChat for themselves, we suggest using the public API, here documentation to him.
And for those who need to customize GigaChat for themselves, we have created a convenient interface and called it Studio. In it you can, in a couple of clicks:
create bots for your tasks;
customize API parameters for yourself (for now these are generation temperature and response length);
test various system prompts;
and much more.
To try it out, follow our tutorial: we'll discuss how to turn GigaChat into an assistant in Italian!
Come by link and register using Sber ID or another method convenient for you.
Select “Go to PlayGround”.
Select the Early Access booth, then the model – GigaChat‑Max will work best. Generations are limited only by the balance of tokens; for Pro and MAX models the balance is common, and for Lite it is separate.
Enter the prompt (translated into Russian: “You are Gigachat. Answer user requests.”) and we get GigaChat-Italian!
“Tu sei GigaChat. Rispondere alle richieste degli utenti.”
Four one step – and the bot is ready! Try creating your own characters or using GigaChat in more work scenarios. It is important to remember that now the function is still in the Beta stage, so you must approach the drafting of the prompt carefully: it must contain detailed and clear instructions for action, otherwise GigaChat may leave the role. It is best to use GigaChat MAX as a base model, but even with the lower models in the line you can achieve excellent results if you pay attention to the quality of the instructions.
Instead of a conclusion
In this article, we were able to talk about some of the work done and the growth in the quality of GigaChat MAX. The model is superior to Russian and most foreign analogues in terms of the beauty of the answers and their information content. However, that’s not all: in the article we talk about the intermediate result. In the meantime, GigaChat MAX has not finished its training: the model is right now completing its training on our clusters in order to conquer new heights.
And once again about the highlight of the article – the GigaChat team is launching its Telegram channel (subscribe) where we talk more about artificial intelligence, the latest technologies and how GigaChat is getting smarter every day!
Stay tuned! 🙂
Acknowledgments
I would like to sincerely thank:
Our colleagues from the markup, analytics and alignment group, whose examples are used above, and training data make GigaChat better and better.
A community of experts who helped create unique STEM data.
A team of engineers and ML specialists who organized and supported the entire complex process of training and model output to our users.
And finally, many thanks to Evgeniy Kosarev (@evgenijkkk), Fedor Minkin (@fminkin), Grigory Leleitner (@RunFMe) Nikita Savushkin, Sergey Porkhun, Vladimir Karlov, Victoria Alenchik, Nikita Pastukhov, Daria Latortseva, Eldar Damirov, Konstantin Krestnikov ( @Rai220), Sergei Markov (@oulenspiegel), Sergei Averkiev (@averkij) and GigaChat‑Max (he too ;)) for writing this text, a large number of useful edits and sensible comments that made the review much brighter and better.