GIGA R&D DAY: conference materials

At the recent GIGA R&D DAY conference organized by the R&D team SberDevicesparticipants discussed the latest advances in the development of GigaChat, NLP, Vision and Audio.

The event brought together leading specialists and experts in the field of artificial intelligence who shared their ideas and developments. On GigaChat's birthday, we share with you video recordings of speeches and presentations of reports that cover a wide range of topics from multimodality and multi-expertise to alignment problems and speech generation problems.

R&D GigaChat: directions and focuses

Valery Ternovsky and Alexander Kapitanov reviewed the key areas of experimentation in NLP GigaChat. They discussed pretrain recipes, multi-expertise and multi-agency and GigaQ*, and also talked about multi-modality: images, video, sound, 2D/3D and image manipulation.

Slides

Research in Alignment GigaChat

Nikita Sidorov shared his experience of implementing research solutions into the development of GigaChat, and talked about how the team is working on its alignment.

Slides

GigaSearch or RAG on GigaChat

Prokhor Gladkikh described the development and implementation of Retrieval Augmented Generation based on GigaChat – GigaSearch. He elaborated on the challenges the team faced and demonstrated gains in quality metrics for answering factual questions.

Slides and Q&A

Question: Why do you use Open Search? Have you tried other systems?

  • The search team already had a lot of experience with this system, so they chose it. There have been experiments with other systems.

Question: How do you filter out provocative topics? Politics, drugs, etc.

Question: Can you tell us more about the database for extraction – why the approach based on chunks is used? Have you tried the approach on knowledge graphs?

Question: how do you measure the quality of the model and changes in production metrics?

  • Before each release there is quality control, including through autometrics. There is also a marking of sections of the flow from PROD.

Question: GigChat does not adhere well to the output format; even in json it does not always adhere to the format well. How can I get gigachat to call gigafunctions correctly?

  • The quality of a function call strongly depends on the quality and detail of the description of it and its parameters, as well as output parameters. It's worth working in this direction.

Question: Are relevance and credibility assessed independently or is credibility assessed only for what is relevant?

Question: are anaphors resolved by gigachat itself or is there a separate model?

mGPT models for small languages ​​of Russia

Igor Churin and Maria Tikhonova spoke about new experiments with the multilingual mGPT model presented at the EMNLP conference. Particular attention is paid to the development of 23 mGPT finetunes on monolingual corpora of languages ​​of small peoples of Russia and the CIS countries. This set provided a unique opportunity to harness the power of language models for low-resource languages.

Slides and Q&A

Question: How long did it all take you?

  • Training finetune mGPT up to 250k steps takes about 3 days on an a100 with 80 gigs, it took about 2 months to train all the models to the plateau

Question: What use do you see for this model other than a translator?

Question: What can you say about the quality of work of your models in languages ​​with different features (synthetic/analytical, internal inflection, hallucinations, etc.

  • We have not yet studied the dependence of the model’s operation on the mentioned features of languages. However, we looked at the relationship of patterns with the alphabet and found that there is a strong connection between whether a language uses the Latin alphabet or not. There is also a relationship between the size of the training corpus and the quality in that language.

Question: What was the size of the token dictionary on the basis of which you were running cross-entropy loss? Were any hacks used like adaptive softmax, negative sampling, etc.?

  • The size of the model dictionary is 100k tokens. We used a single tokenizer for all languages. During training, we tried to follow the classic approach from the original article about GPT-3, without using the mentioned techniques. Detailed information on training and hyperparameters used can be found in the original article https://arxiv.org/abs/2204.07580.

Practical aspects of ranking responses from Salyut virtual assistants

Artem Snegirev spoke about methods for ranking responses from virtual assistants. He shared his experience working with data, methods for improving response quality, and memory and time optimizations.

Slides

MERA: a benchmark for evaluating fundamental models

Alena Fenogenova, Albina Akhmetgareeva and Maria Tikhonova spoke in detail about the methodology of the MERA benchmark and its features, and also analyzed 21 tasks to assess the model’s skills, including common sense, goal setting, logic, knowledge of the world, memory, mathematics, ethics and much more.

Slides

SAGE v1.1.0: multilingual spelling and punctuation correction

Nikita Martynov spoke about transformer models for spelling correction in Russian and English, which outperform open spell checkers (Yandex.Speller, JamSpell, Hunspell) and proprietary models (GPT-3.5, GPT-4). Nikita also described updates to the SAGE library: expanded markup in datasets, a metric that takes into account different aspects of spelling, and additions to the family of open pre-trained models.

Slides

Panel discussion: GPT-5, how to catch up and overtake Western competitors in Russian realities

Sergey MarkovGigaChat Research and R&D Program Manager.
Konstantin Krestnikovleader of the GigaChain project (GigaChat SDK), ambassador of AI agents.
Ivan OseledetsCEO of AIRI, professor at Skoltech.
Tatyana Shavrinaopen source LLM enthusiast, Senior Research Fellow, Institute of Foreign Languages ​​RAS.
Denis DimitrovKandinsky project leader, AIRI scientific consultant.

How to teach a model to understand sign language

Alexander Nagaev spoke about the key features of sign language and the main problems that arise when translating it. Computer vision technologies were presented, the differences between the tasks of gesture recognition and sign speech translation were described, as well as the specifics of the data for solving these problems.

Slides

Generative 3D, fast synthesis and reconstruction of 3D objects

Mikhail Mazurov: The study of diffusion models has opened up the possibility for us to transfer textual concepts onto the digital canvas. It would seem, what else is needed for happiness? Bring it all into 3D! We'll find out how to create almost any object in 3D space using neural networks, how to make Kandinsky look around a corner, and whether a future awaits us like in Ready Player One..

Download presentation

Quiet! Now there will be that same scene: how to automatically find the most catchy moments in a video

Marina Bessmertnaya spoke about an automated pipeline for analyzing video content. Her team created a system that works with natural language queries to identify interesting moments in videos.

Slides

LLM approaches in speech synthesis

Boris Zhestkov discussed the problems of speech generation using LLM, considered the potential and limitations of these architectures and the application of LLM in various tasks in the speech domain. Architectures, audio tokenization, data collection and validation pipelines.

Download presentation

Control of speech characteristics in the speech synthesis model and instructional data

Artemy Tarazanov presented a method for representing speech characteristics that allows you to control the tempo, tone, energy, expression and articulation of speech in a speech synthesis model based on FastSpeech architectures. He shared approaches to creating an instructional dataset for speech synthesis using LLM.

Download presentation

If you can't say it, sing it! Synthesis of singing at the touch of a button

Maxim Smolyakov spoke about vocal synthesis and generation of singing with text accompaniment.

Download presentation


We are grateful to all SberDevices R&D experts for their contribution and desire to share knowledge and experience. We invite you to the Telegram channel Salute AI, where SberDevices ML specialists share their developments in NLP, CV, Speech and other areas.

Be sure to come to future events SberDevices!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *