managing multitasking learning with independent models


Introduction

Multi-task learning is machine learning where a model is trained on multiple tasks at once. This is different from conventional single task training, where the model is trained on only one task. Multi-task learning can provide greater efficiency and better generalization ability as the model learns a common concept between tasks.

However, multitasking training can present problems for model management, such as:

  • The complexity of training and testing the model on several tasks

  • Lack of a common metric to evaluate model performance on all tasks

  • Lack of a way to track and compare the results of different experiments

  • Difficulty in interpreting and explaining the model on multiple tasks

MLflow provides a number of tools for managing multitasking learning including:

  • Experiment Tracking: Track and compare the results of different experiments on each task

  • Packaging and Serving Models: Packaging a multipurpose model for easy sharing and deployment

  • Integration with popular frameworks and cloud platforms to manage multitasking learning

Entering MLflow as a solution to these problems allows for more efficient management of multitasking learning, with the ability to track and compare experiments, package models, and integrate with other tools and platforms.

Multitask learning with independent models

What is the difference between multitasking learning with independent models and conventional learning? In conventional multitask learning, a model is trained on multiple tasks at once, using the same model for each task. The idea of ​​multi-tasking learning sounds very cool, but it doesn’t always work well in practice. Sometimes training a separate model for each task works better because the gradient updates for different tasks interfere with or conflict with each other, it will not be efficient. This is called negative transfer.

In addition, separately trained models are easier to deal with. This multitasking training with independent models can be useful for tracking and comparing results, packaging and deploying independent models for each task individually. Let’s immediately consider in practice.

Exercise

I propose to immediately take a ready-made example of a problem and a solution Multi-Task Learning for Classification with Keraschange some pipeline and code to switch from one model to two independent models for each task.

Multiclass classification problem: the model must classify the image into one of several categories of animals and vehicles (cat, dog, horse, car, truck, and others).

Binary classification task: the model should determine if the image is just an animal or a vehicle.

Loading a data set

First find a known ready-made dataset, then the code is easier to import the TensorFlow framework get and distributed set of training and test data:

import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

CIFAR-10 contains 60000 images 32x32x3 in 10 specific classes: 0 – airplane, 1 – car, 2 – bird, 3 – cat, 4 – deer, 5 – dog, 6 – frog, 7 – horse, 8 – ship, 9 – truck.

Preprocessing

Another function is to get a binary data set, then convert from one image output to 2 outputs, you get two data sets of binary (animals and vehicles) and multi-class (10 classes) classification.

def generate_binary_labels(yy, animal_classes):
  yy_2 = [0 if y in animal_classes else 1 for y in yy]
  return yy_2

# 0 = animal, 1 = vehicle
y_train_2 = generate_binary_labels(y_train_1, [2, 3, 4, 5, 6, 7])
y_test_2 = generate_binary_labels(y_test_1, [2, 3, 4, 5, 6, 7]) 

For multitasking learning, it is important that learning labels are specific to each task. Therefore, with n-task learning, n arrays of different labels will be defined. In this case, the first task requires the labels to be integers between 0 and 9 (one number for each class, i.e. a multi-class classification), and the second task requires labels 0 and 1 (since this is a binary classification). The data has been previously pre-processed so that the labels are digits 0 to 9 and as expected the labels for the binary classification will be generated based on the original labels 0-9 so if the instance corresponds to an animal (3 is a cat, 4 is a deer , 5 – dog, 6 – frog, 7 – horse), it will have a label of 0, and if it matches a vehicle (0 – aircraft, 1 – car, 8 – ship, 9 – truck) – label 1.

from tensorflow.keras.utils import to_categorical
def preprocess_data_cifar10(x_train, y_train_1, x_test, y_test_1):
    
    # 0 = animal, 1 = vehicle
    y_train_2 = generate_binary_labels(y_train_1, [2, 3, 4, 5, 6, 7])    
    y_test_2 = generate_binary_labels(y_test_1, [2, 3, 4, 5, 6, 7])    

    n_class_1 = 10
    n_class_2 = 2
    y_train_1 = to_categorical(y_train_1, n_class_1)
    y_test_1 = to_categorical(y_test_1, n_class_1)
    y_train_2 = to_categorical(y_train_2, n_class_2)
    y_test_2 = to_categorical(y_test_2, n_class_2)
    return x_train, y_train_1, y_train_2, x_test, y_test_1, y_test_2


x_train, y_train_1, y_train_2, x_test, y_test_1, y_test_2 = preprocess_data_cifar10(x_train, y_train, x_test, y_test)

After that, the two getter functions set the binary dataset is executed to_categorical this is a function from a module keras.utils libraries TensorFlow, which converts the classes to a categorical format, i.e. into a matrix of zeros and ones, where each row corresponds to one example and the index with one corresponds to the class of this example. In this case, a function is used to transform the array y_train_1 and y_test_1 into a categorical format with n_class_1 binary classes, and y_train_2 and y_test_2 into a categorical format with n_class_2 classes. This is required for use in machine learning.

As a result, the function returns training and test data and two arrays of labels for two classification problems: multiclass classification with labels from 0 to 9 and binary classification with labels 0 and 1. Labels 0 and 1 correspond to animals and vehicles, respectively.

Training the first and second models

Create two convolutional neural models separately, see conventional multitasking learning architecture from the article Multi-Task Learning for Classification with Keras which has one image input and two outputs: 2 classes and 8 classes, while mine has 2 classes and 10 classes.

Multi-Class Classification Convolutional Neural Network Model Creation Function model1

def create_task_learning_model(x_train_shape, y_train_shape):

    inputs = tf.keras.layers.Input(shape=(x_train_shape[1], x_train_shape[2], x_train_shape[3]), name="input")

    main_branch = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=1)(inputs)
    main_branch = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2)(main_branch)
    main_branch = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), strides=1)(main_branch)
    main_branch = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2)(main_branch)
    main_branch = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=1)(main_branch)
    main_branch = tf.keras.layers.Flatten()(main_branch)
    main_branch = tf.keras.layers.Dense(3512, activation='relu')(main_branch)

    task_1_branch = tf.keras.layers.Dense(1024, activation='relu')(main_branch)
    task_1_branch = tf.keras.layers.Dense(512, activation='relu')(task_1_branch)
    task_1_branch = tf.keras.layers.Dense(256, activation='relu')(task_1_branch)
    task_1_branch = tf.keras.layers.Dense(128, activation='relu')(task_1_branch)
    task_1_branch = tf.keras.layers.Dense( y_train_shape[1], activation='softmax')(task_1_branch)

    model = tf.keras.Model(inputs = inputs, outputs = [task_1_branch])
    model.summary()
    return model

After function create function compile model.compileused to configure the model for training.

model = create_task_learning_model(x_train.shape, y_train.shape)
model.compile(optimizer="adam",
            loss="categorical_crossentropy",
            metrics=['accuracy'])

In this case, the optimizer is used Adamthe categorical_crossentropy loss function for multiclass classification, and the accuracy metric for assessing model quality.

Farther model.fit() uses x_train and y_train training data for 50 epochs with batch_size = 128. During the learning process, information about the progress from the parameter is not displayed verbose=0.

model2_history = model.fit(x_train, y_train, epochs=50, batch_size=128, verbose=0)

Learning outcomes are stored in a variable model2_history.

After the function of creating a model of binary classification convolutional neural networks model1 and as above there is a description, but slightly different model architecture and compilation options for the binary classification task are the following differences:

  • Remove one layer with 1000 neurons

  • Binary output of a convolutional network model

  • binary_crossentropy – loss function for binary classification.

def create_task_learning_model(x_train_shape, y_train_shape):

    inputs = tf.keras.layers.Input(shape=(x_train_shape[1], x_train_shape[2], x_train_shape[3]), name="input")

    main_branch = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=1)(inputs)
    main_branch = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2)(main_branch)
    main_branch = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), strides=1)(main_branch)
    main_branch = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2)(main_branch)
    main_branch = tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=1)(main_branch)
    main_branch = tf.keras.layers.Flatten()(main_branch)
    main_branch = tf.keras.layers.Dense(3512, activation='relu')(main_branch)

    task_2_branch = tf.keras.layers.Dense(512, activation='relu')(main_branch)
    task_2_branch = tf.keras.layers.Dense(256, activation='relu')(task_2_branch)
    task_2_branch = tf.keras.layers.Dense(100, activation='relu')(task_2_branch)
    task_2_branch = tf.keras.layers.Dense(y_train_shape[1], activation='sigmoid')(task_2_branch)

    model = tf.keras.Model(inputs = inputs, outputs = [task_2_branch])
    model.summary()
    return model

model = create_task_learning_model(x_train.shape, y_train.shape)
model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=['accuracy'])
model2_history = model.fit(x_train, y_train,
                        epochs=50, batch_size=128, verbose=0)

If the problem is with determinism for learning, see stackoverflowwhich is solved in my practice:

import tensorflow as tf
import random
import os
SEED = 0
def set_seeds(seed=SEED):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
def set_global_determinism(seed=SEED):
    set_seeds(seed=seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    tf.config.threading.set_inter_op_parallelism_threads(1)
    tf.config.threading.set_intra_op_parallelism_threads(1)
# Call the above function with seed value
set_global_determinism(seed=SEED)

What is this for? This can be useful in cases where you need to replicate the results of an experiment or debug a model that uses randomness. Without this code, random numbers will be generated with a different seed each time the code is run, and the results will be different. So if you want to re-create the same result in your code, you must explicitly set the seed so that the generator uses the same seed. This can be useful for debugging as it allows you to reproduce the results.

Saving Trained Models

model.save(PATH_MODEL)
new_model = tf.keras.models.load_model(PATH_MODEL)
new_model.summary()

This code saves the first trained model to the specified file. It then loads it and displays its description for verification. This allows you to use the trained model in the future without the need for retraining, just like for the second trained model.

Overall score

import tensorflow as tf
model1 = tf.keras.models.load_model(PATH_MODEL_1)
model1.summary()
model2 = tf.keras.models.load_model(PATH_MODEL_2)
model2.summary()
e1 = model1.evaluate(x_test, y_test_1)
print('Task1 evaluate: ', e1)
e2 = model2.evaluate(x_test, y_test_2)
print('Task2 evaluate: ', e2)

sum_loss = e1[0] + e2[0]
print('Multi task evaluate sum loss: ', sum_loss)
ave_acc = (e1[1] + e2[1])/2
print('Multi task evaluate  average accurate: ', ave_acc)

The code uses loading trained models from previously saved paths PATH_MODEL_1 and PATH_MODEL_2then displays information about the models using summary(). Next, the quality of the models is assessed on test data. x_test and y_test_1 and y_test_2 using evaluate() with the calculation of the total loss and average accuracy. MLflow will show them. By the way, Enable GPU export CUDA_VISIBLE_DEVICES='0'if there is a GPU.

Directed Acyclic Pipeline Graph()

A directed acyclic pipeline graph (pipeline DAG for short) turned out from zero dataset loading to a general assessment using the framework DVC for orchestration and platform DagsHub for visual graph

Prominent red arrows to stage files (Stage File) will be discussed in MLflow later.

MLflow: Tracking experiments with independent models

MLflow can be used to manage multi-tasking learning with independent models just like normal multi-tasking learning. Although there are many articles about MLflow on Habré, I will not write a detailed guide on it. In this case, I will just show the various parameters and results of the experiments, but will not consider setting up training models.

API code

To track experiments, you need to add a few lines of code:

MLflow library is used to track experiments mlflow.set_experiment("model1") sets the name of the experiment in MLflow. mlflow.start_run() starts a new run session that is used to track metrics and model parameters. Interesting mlflow.tensorflow.autolog() Automatically tracks metrics and model parameters during the execution of the training code, but is not required. mlflow.log_metric("test_loss", e[0]) and mlflow.log_metric("test_acc", e[1]) writes test loss and accuracy metrics to the run session.

For the second model model2 you need to add the same code to track her experiments in MLflow. But for each model, you need to set your own experiment, so you will need to changemodel1 on model2 or any other name to separate experiments for different models.

For a general assessment of multitasking learning, add a little differently:

The last code first creates an experiment named main_evaluate, then run is run using the context manager. Models are loaded next. model1 and model2 with paths PATH_MODEL_1 and PATH_MODEL_2 respectively. The models are then evaluated on the test data with model1.evaluate and model2.evaluate. After that, the total loss and average accuracy are calculated and written to the logs using mlflow.log_metric("sum_loss", sum_loss) and mlflow.log_metric("ave_acc", ave_acc) respectively.

Experiment tracking interface

To test MLflow UI, you need to run “mlflow ui” command in terminal or console. This will start the web server on the local machine on port 5000. After that, you can open a browser and go to http://localhost:5000to view the MLflow interface. In the interface, you can see all the experiments that were run with MLflow, including three experiments for each model and the total learning score, this will be shown as three lines in each interface, see the first model:

See the second model:

And see the overall assessment of training:

The MLflow interfaces show four experiments for two models and an overall learning score. In each interface, the last two lines show the same results and parameters. This is due to the solution to the problem of determinism, as described above. Changing a parameter epochs in two models will change the results in the first two lines in each interface.

It seems to me that you now know how to use the MLflow tool for multitasking learning and can move on to a more detailed guide. Next, we’ll look at model maintenance.

MLflow: Serving Models

Model serving is the process of serving a trained model so that it can be used in production. This may include starting the server that serves the model and managing it to be available to clients and update the model as needed.

Server start

MLflow can be used to run a server with independent models. First of all, you need to turn off the GPU using the command export CUDA_VISIBLE_DEVICES=''to avoid problems with two models on the same graphics card. Then you need to go to the MLflow interface, copy the path (file:////mlruns/run_id/uuid/artifacts/model) to the best model from the “Run Name” column and use it to start the server using the command mlflow models serve --no-conda -m file:///<path>/mlruns/run_id/uuid/artifacts/model -h 0.0.0.0 -p 8001. For the second model, you need to run the mlflow models serve command with the same arguments, but on a different port, for example: mlflow models serve --no-conda -m file:///<path>/mlruns/run_id/uuid/artifacts/model -h 0.0.0.0 -p 8002. This will allow the second model to run on a different port without conflicting with the first model.

Here is an example run command for the first model:

mlflow models serve --no-conda -m file:///home/yayay/yayay/git/github/mlflow_MTL/src/mlruns/404852075031124987/3e3d293adfcd4c509ce51a445089c417/artifacts/model -h 0.0.0.0 -p 8001

For the second model, use a different path on a different port:

mlflow models serve --no-conda -m file:///home/yayay/yayay/git/github/mlflow_MTL/src/mlruns/909331581180947176/db0f2cbb10a64aeeb768d5408fcb9cca/artifacts/model -h 0.0.0.0 -p 8002

Unfortunately, MLflow does not allow you to use different routers for each model on the same port, for example, 0.0.0.0/model1, 0.0.0.0/model2 etc. As shown in stackoverflow. Alternatively, you can use other model deployment tools such as Selder, FastAPI, and others.

Validate Requests and Responses

Request and response validation with models can be done using the requests library. This library provides tools for sending HTTP requests and handling server responses.

For example, you can send a POST request to the address of the server where the model is deployed using the code:

import requests

r2 = requests.post('http://0.0.0.0:8002/invocations', json=j)
print(r2.status_code)
dict_r2 = r2.json()
print(
    "predicted binary labels:",
    dict_binary_label[np.argmax(dict_r2["predictions"][0])]
)

r1 = requests.post('http://0.0.0.0:8001/invocations', json=j)
print(r1.status_code)
dict_r1 = r1.json()
print(
    "predicted multiclass labels:",
    dict_multiclass_label[np.argmax(dict_r1["predictions"][0])],
)

Output status code 200 (http is working) and response for animals and horse:

Command line
Command line

All task and solution is completed and can be viewed complete codeto make sure it works correctly.

Additional materials

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *