How to Customize LLM with Supervised Fine-Tuning

In the rapidly evolving field of Natural Language Processing (NLP), fine-tuning has become a powerful and efficient tool for adapting pre-trained Large Language Models (LLMs) to specific tasks. Pre-trained LLMs (such as the GPT family) have shown significant improvements in language understanding and generation. However, these pre-trained models are typically trained on huge amounts of text data using unsupervised learning and may not be optimized for a narrow task.

Fine-tuning addresses this gap by taking advantage of the general language understanding gained during pre-training and adapting it to the target task using supervised learning. By fine-tuning a pre-trained model on a task-specific dataset, NLP developers can achieve impressive results with much less training data and computational resources than training a model from scratch. In particular, fine-tuning is essential for LLM, because retraining on the entire dataset is computationally too expensive.

Comparison of LLM pre-training and fine-tuning

The success of fine-tuning has led to many cutting-edge results across a wide range of NLP tasks and has made it a standard practice in developing highly accurate language models. Researchers and practitioners continue to explore variations and optimizations of fine-tuning techniques to further expand the capabilities of NLP.

In this article, we will take a deeper look at the process of fine-tuning instruction-based LLMs using the transformers library in two different ways: simply with the library transformers and with the module trl.

Supervised fine-tuning

— is the adaptation of a pre-trained LLM to a specific task using

labeled data

. In supervised fine-tuning, the data for fine-tuning is collected from a set of responses that have been manually validated. This is the main difference between the method and

unsupervised methods

where the data is not validated in advance. LLM training is usually done unsupervised, while fine-tuning is usually supervised.

In supervised fine-tuning, a pre-trained LLM is fine-tuned to this labeled dataset using supervised learning techniques. The model weights are adjusted based on gradients derived from a task-specific loss function that measures the difference between the LLM predictions and the reference labeling.

The supervised fine-tuning process enables the model to learn task-specific patterns and nuances present in the labeled data. By adapting its parameters according to the distribution of the specific data and the requirements of the task, the model becomes specialized, providing high accuracy in performing the target task.

Let's say we have a pre-trained LLM. In response to the question Я не могу войти в свой аккаунт. Что мне делать? she answers simply Попробуйте сбросить пароль при помощи опции "Forgot Password".

A dry and concise answer from a pre-trained LLM to a support question

Now imagine that you need to create a chatbot for a technical support service. Although the answer shown above may be correctHe inadequate as a support response that requires more involvement, a different format, additional contact information, and other data. This is where supervised fine-tuning comes in.

A better response to a technical support question after fine-tuning that meets business requirements

If you provide the model with a set of validated training examplesthen it will learn to answer prompts and questions better. In the example in the figure above, we taught the model empathy of a technical support employee.

Here are some of the reasons why you might need to fine-tune your LLMs:

Providing better quality answers that are relevant principles of doing business.

Providing the model with new specific/sensitive data that is not available during the training phase so that the LLM model can adapt to your specific needs knowledge base.

LLM training to respond to new (unknown to her) Promty.

Library

transformers

Hugging Face has by far become the most popular library for training and fine-tuning models, including LLM. Model finetuning has always been one of its core features, which is almost transparently included in its helper class

Trainer

However, recently, with the release of the module trl for reinforcement learning, a new class was released SFTTrainerwhich is more narrowly focused on Supervised fine-tuning of LLM models. Let's break down the differences.

Fine-tuning with the Trainer class

Class

Trainer

simplifies pre-training and fine-tuning of models, including LLM. It requires the following arguments:

modelloaded using AutoModelWithLMHead.from_pretrained;

TrainingArgs;

training dataset And evaluation dataset;

Data collatorwhich applies various transformations to datasets. One of them is Padding (to create batches of the same length), but this can also be done using tokenizerHowever, in the case of LLM it can be used in a different way, so it is mandatory in this type of training – it masks random tokens to predict the next token.

Fine-tuning a pre-trained LLM using the Trainer class

eval_dataset And train_dataset — are objects of the Dataset class. Datasets can be created from many different formats. In this case, let's assume that I have datasets in two files txtlocated in <TRAINING_DATASET_PATH> and in <TEST_DATASET_PATH>. Then to get Dataset for both blocks and also Collator, it is enough to do the following:

Data preprocessing is a mandatory step before training.

If we want to train LLM from instructions, the following datasets may be useful:

GPT-4all Dataset: GPT-4all (pairs, English, 400k items) – a combination of some subsets of OIG, P3 and Stackoverflow, covering general QA questions and modified creative questions.

RedPajama-Data-1T: RedPajama (PT, mostly English, 1.2 trillion tokens, 5 TB) is a fully open source pre-training dataset based on the LLaMA methodology.

OASST1: OpenAssistant (pairs, dialog, multilingual, 66497 conversation threads) – A large, human-written and annotated, high-quality conversation dataset designed to improve LLM responses.

databricks-dolly-15k: Dolly2.0 (pairs, English, 15K+ elements) is a dataset of human-written prompts and answers, including tasks such as question answering and summarizing.

AlpacaDataCleaned: Alpaca/LLaMA-like models (pairs, English) – a cleaned-up version of Alpaca, GPT_LLM and GPTeacher.

GPT-4-LLM Dataset: several models like Alpaca (pairs, RLHF, English, Chinese, 52k English and Chinese items, 9k unnatural-instruction items) – a dataset generated by GPT-4 and other LLMs to improve pairs and RLHF, including instruction and comparison data.

GPTeacher: (pairs, English, 20k items) – a dataset containing GPT-4 generated tasks, including generative tasks from Alpaca and new tasks like role-playing games.

Alpaca data: Alpaca, ChatGLM-fine-tune-LoRA, Koala (dialogue, pairs, English, 52k items, 21.4 MB) — a dataset generated by text-davinci-003 to enhance the ability of language models to follow human instructions.

Fine-tuning using the SFTTrainer class of the trl library

As mentioned above, there is another class –

SFTTrainer

which was later added to the Hugging Face library

trl

designed for reinforcement learning. Since supervised fine-tuning is the first stage

Reinforcement Learning by Human Feedback (RLHF)

the developers decided to separate it into a separate class, while adding auxiliary functions that, when using the library,

Trainer

would have to be implemented manually. Let's see what that looks like.

Fine-tuning a pre-trained LLM using the trl.SFTTrainer class

You may not have noticed anything new in the example above. And you are right, because the class SFTTrainer inherits from function Trainerwhat can be learned by studying source. Essentially, the same is required model, train_dataset, evaluation dataset and collator.

However, in class SFTTrainer Some features have been added to make learning easier when working with LLM. Let's list them:

Support peft: SFTTrainer has support Parameter Efficient Finetuning librarieswhich includes Lora, QLora, and so on. Lora allows adding Adapters with weights, i.e. the only parameters that are fine-tuned, with the rest frozen. QLora is a discretized version. Both methods greatly reduce fine-tuning time, which is especially important when fine-tuning LLM due to its high computational costs.

Comparison of fine-tuning, Lora and QLora

Упаковка batches: instead of using a tokenizer to fill sentences up to the maximum length supported by the model, упаковка allows you to combine input data with others, which expands the capacity of batches.

Packed samples 1,2,3

Fine-tuning was originally a mechanism for optimizing transformer architectures. When training a language model to predict the next token and/or mask, or to train a token sequence classifier (and other tasks), fine-tuning is supervised because it requires data that has been verified by human annotators.

In the case of LLMs, their responses usually need to be customized to the customer's requirements, as they are usually trained on large amounts of open/available data.

Fine-tuning of the LLM model can be done in many different ways. One of the simplest options is the class Trainer from the library transformerswhich has been used for quite a long time for fine-tuning any models based on transformers.

Hugging Face recently released a new library trldesigned for reinforcement learning based on human feedback (Reinforcement Learning by Human Feedback, RLHF). One of the main stages of such learning is supervised fine-tuning, so the developers created a new class SFTTrainerwhich controls the process and optimizes the efficient use of parameters (peft ) And упаковку.