Introduction to Weights & Biases

My name is Igor Stureiko and I am a teacher and course leader. MLOps in OTUS. In the practice of any machine learning engineer, the need to account for and structure the experiments carried out quickly comes to the realization, and a tool for managing the machine learning life cycle is always present: tracking experiments, managing and deploying models and projects.

MLOps helps manage this lifecycle by providing a set of best practices, tools, and frameworks that enable teams to develop, test, deploy, and monitor machine learning models in a scalable and reliable manner.

In this article I will briefly talk about such a tool from the company Weight & Biases, which has been undeservedly overlooked in the vastness of the Russian-speaking world.

Working in modern realities requires rapid development and evaluation of models. We have to operate with many components: exploratory data analysis, training of different models, combining trained models into ensembles, etc.

Many components = many places to go wrong = a lot of time spent debugging. You might miss important details and have to retrain the model, or you might train on the wrong data (information leakage). Or you might use the wrong model to generate the representation.

This is where Weight & Biases comes in. It’s a great tool for MLOps. It provides a Dashboard for visualizing metrics and experiment parameters, and a model registry that lets you deploy a model to production with a single command.

  • Dashboard (experiment tracking): Log and visualize experiments in real time = Store data and results in one convenient place. Think of it as an experiment repository.

  • Artifacts (versioning of datasets + models): Storing and versioning datasets, models and results = Know exactly what data the model is trained on.

Dashboard W&B

Dashboard (experiment tracking)¶
Use the Dashboard as a central location to organize and visualize results from your machine learning models.

  • Tracking metrics;

  • Visualization of results;

  • Access from anywhere;

  • Keep everything in a single cloud environment;

  • Monitor model performance in real time.

Visualization of results
W&B Dashboard supports a wide range of information types – visualize graphs, images, video, audio, 3D objects and more.

In addition, the Dashboard is interactive – by hovering over it, you can get more options and information.

Displaying summary information

Displaying summary information

Tracking Experiments

Before we look at the main components of W&B, let's start with installation.
You can install it using pip:

pip install wandb

Next, you need to create an account on W&B. After that, you can log in using the following command:

import wandb

wandb.login(key='you_wandb_api_key')

Once you have installed W&B and logged in, you can start tracking your experiments. To do this, you need to initialize the wandb library with the following command:

wandb.init(project="my-first-project", entity="my-company")

The project parameter is the name of the project you are working with on W&B. If the project already exists, it will be used, if not, it will be created. You can also use the entity parameter to specify the name of your organization. If the organization does not exist, it will be created.

Experiments are tracked by logging metrics and parameters. To log a metric, you can use the wandb.log()For example, to log model losses, you can use the following command:

wandb.log({"loss": loss})

Let's put it all together and look at the project code. Since the API key is a unique identifier for each user, I put it separately in the config file. So:

import wandb  # импорируем библиотеку wandb
import random  # импорируем библиотеку random для генерации случайных чисел
import config  # импорируем конфигурационный файл

wandb.login(key=config.wandb_key)  # входим в систему wandb

# устанавливаем гиперпараметры расчета
epochs = 10
# lr = 0.01

for _ in range(5):
    lr = random.uniform(0.01, 0.02)
    # запустим логирование результатов эксперимента
    run = wandb.init(
        # устанавливаем название проекта в системе логирования
        project="my-test-project",
        # Вносим гиперпараметры для отслеживания
        config={
            "learning_rate": lr,
            "epochs": epochs,
        },
    )

    # задаем случайность нашему демонстрационному примеру
    offset = random.random() / 5

    # Запустим обучение модели
    for epoch in range(2, epochs):
        acc = 1 - 2**-epoch - random.random() / epoch - offset
        loss = 2**-epoch + random.random() / epoch + offset
        print(f"epoch={epoch}, accuracy={acc}, loss={loss}")
        wandb.log({"accuracy": acc, "loss": loss})

This code will create 5 runs of our experiment with different values ​​of the lr parameter. Now we need to visually evaluate the results obtained.
In the output, Wandb reports that it has saved the calculation results locally and synchronized them with the Wandb server:

Run data is saved locally in /Users/<user>/Documents/Programming/wandb_intro/wandb/run-20240807_111913-mm1xaqfl

Syncing run still-wind-1 to Weights & Biases (docs)

View project at https://wandb.ai/<user>/my-test-project

We can view the results of the experiments in two ways:

  • locally by running the wandb server;

  • on the wandb portal.

Often, the data we operate with is a trade secret or the organization's policy does not allow us to use the wandb portal to display and track the results. In this case, we can run our own Wandb server locally from a Docker image.

To do this, simply execute the command:

wandb server start

After that, the wandb system will download the server image and run it locally. For more information, please refer to the official documentation: https://docs.wandb.ai/guides/hosting/self-managed/basic-setup

We will use the wandb portal, which allows us to display the results of the experiments we have conducted.

In the console, Wandb gave us information that we can access the portal at the address https://wandb.ai/<user>/my-test-project where all the detailed information is displayed:

Results of computational experiments

Results of computational experiments

On the left are the runs within the experiment, on the right are the graphs of the monitored parameters. Since we did not assign any names to our runs, Wandb assigned them automatically.

Conclusion

Wandb is a convenient service for tracking the results of computational experiments and logging their parameters. Along with MLFlow and DVC Studio, Weghts&Biases is a powerful tool, the introduction of which into daily practice will improve the quality of your work and will ensure precise control over the results of experiments.

Source code used in the article available on GitLab.

Learn more about the MLOps course.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *