updated ruMTEB benchmark and leaderboard

Hello everyone! My name is Roman Solomatin, I represent the AI-Run team from X5 Tech, we work on generative networks in general and language models in particular. A few months ago, the Russian-speaking community of artificial intelligence developers received a tool for evaluating models – a benchmark ruMTEB (Massive Text Embedding Benchmark). It is designed to assess the representation of Russian-language texts and allows for an objective comparison of various embedding models that transform text into vectors of numbers oriented towards working with the Russian language (more details in the authors' article). However, the first version of ruMTEB had 6 tested models, but there was no convenient leaderboard.

We decided to improve the situation and conduct additional testing to make the benchmark more representative and useful for the community. Namely, we tested 20 more models and added a tab for leaderboardwhere you can see the results of each model. This will help developers better navigate the selection of models for their projects.

Status of the ruMTEB leaderboard as of 09/23/2024

Status of the ruMTEB leaderboard as of 09/23/2024

What is vectorization used for?

The model converts text into numerical vectors, which allows for efficient work with text data.

Here are the key areas of its application:

  • Text classification – distribution of text into categories.

  • Semantic Analysis (STS) – determining the similarity of sentences.

  • Text Clustering – text clustering.

  • Machine translation (Bitextmining) – search for the most suitable offer in another language.

  • Search (Retrieval) – search for similar texts according to your request.

For more information on using LLM and embeddings, read our article “Integrating LLM into Enterprise Chatbots: RAG Approach and Experiments“.

Tested models

We tested the following models:

Results and recommendations

Based on the benchmark, the following conclusions can be drawn:

  • Maximum performance: If you have the ability to allocate significant resources, then the best choice would be the model intfloat/e5-mistral-7b-instructIt shows the best results, but requires a lot of memory and computing power.

  • Optimal balance: if you have a few free gigabytes of VRAM, then it is worth paying attention to the models deepvk/USER-bge-m3, BAAI/bge-m3 And intfloat/multilingual-e5-large-instructThey offer a good balance between performance and resource consumption.

  • Resource efficient solutions: for developers with limited resources, models are suitable sergeyzh/LaBSE-ru-turbo And deepvk/USER-baseThese models are capable of operating on average configurations and provide decent quality of text representation.

  • Minimum requirements: If you have virtually no resources, then the model sergeyzh/rubert-tiny-turbo will be the best choice. It can work even on very modest configurations, while remaining quite effective.

How do you get on the leaderboard?

If you want to test your model and add it to the ruMTEB leaderboard, follow these steps: next steps:

1. Running the model on MTEB

First, you need to run your model on MTEB (Multilingual Task Embedding Benchmark). This can be done in two ways:

  1. via Python API;

  2. using the command line.

Using Python API:

import mteb
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(model_name)
benchmark = mteb.get_benchmark("MTEB(rus)")
evaluation = mteb.MTEB(tasks=benchmark)
evaluation.run(model, output_folder="results")

Using command line:

mteb run -m {название_вашей_модели} -t {названия_задач}

These commands will save the results in a folder results/{model_name}/{model_revision}.

2. Formatting the results

Once you have the results, you need to format them to add them to the leaderboard. Use the following command:

mteb create_meta --results_folder results/{model_name}/{model_revision} --output_path model_card.md

If your model already has a file README.mdyou can merge the results with the existing description:

mteb create_meta --results_folder results/{model_name}/{model_revision} --output_path model_card.md --from_existing ваш_существующий_readme.md

3. Adding metadata to the model repository

Copy the contents of the file model_card.md to the beginning of the file README.md your model repository on Hugging Face. This is necessary for your results to be taken into account in the leaderboard. Example README.md file

4. Waiting for the leaderboard update

After adding metadata to README.md your model, just wait for the leaderboard to update automatically. The update happens daily, and your model will appear on the list at the next update.

Conclusion

A new leaderboard and an expanded list of tested models will help you choose the most suitable tool for your tasks, whether it is working on powerful servers or on devices with minimal resources.

Authors: Roman Solomatin, Michiel Egorov

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *