Prompt Engineering Techniques Guide

  • Used to automatically generate text content (articles, product descriptions, answers to frequently asked questions)

  • Used to analyze and understand natural language, which allows for automation of NLP tasks (classification, entity extraction, question answering)

  • Used to create personalized recommendations and offers for customers based on their preferences and behavior

  • Integrates into various business applications and systems to automate routine tasks (reporting, responding to customer requests, planning)

Today, you don't learn your LLM from scratch; you can find an open-source pre-train/instruct LLM to your taste and requirements. The process of adapting LLM to a specific task may include the following stages: prompt-engineering, PEFT (Parameter efficient fine-tuning) and fine-tuning. Usually, PEFT is used immediately after creating MVP prompts and evaluating the initial LM, but PEFT is more demanding to use: sufficient data volumes for additional training, high-quality markup (preferably written by markup engineers and validated by business), concise and understandable prompt instructions, GPU computing resources for calculations.

It seems that not much time is devoted to prompt engineering, which could select concise and effective instructions for potential additional training of the model, thereby bringing the minimum acceptable metric values ​​​​for a good MVP.

In this guide we will talk about (1) preparation for prompt engineering, (2) the basic principles of writing a prompt, its structure and the types of tasks to be solved, (3) advanced techniques through reasoning to improve the quality of the answer and reduce the likelihood of hallucinations.

The guide is based on open sources and my personal experience.

Preparing for Prompt-Engineering

Choosing an LLM

LLMs have been updating rapidly lately, with new models appearing. I usually pick up models here in paper with codeV extractum.ioI'm also watching HuggingFace Updates and in Telegram DS channels.

Look at the metric values ​​for these benchmarks: MMLU and GLUE – they test the model's ability to solve basic language understanding tasks.

The selection of NLP models is usually based on the following criteria.

Model size

Usually they look from 7b to 70b. Depends on (1) the amount of computing resources and the ability to reduce the model (through quantization, pruning, etc.), (2) the speed of inference (the smaller the model, the faster we get the answer), (3) the quality of the answer (the larger the model, the better the answer).

In my practice I use this miracle calculator from HuggingFace, which helps to calculate in advance the amount of memory utilization in the GPU or CPU, taking into account the reduction in numerical precision (to int4, int8) before loading the model into the workplace.

Knowledge of the required languages

Most LLMs were taught with English instructions and texts. Make sure the model supports your language and can generate answers in it. New multilingual models have been coming out recently, so there will be plenty to choose from.

There is no point in translating a prompt from your language into another. Firstly, you will have to check the quality of the translation. Secondly, there are study from Google – in short: write in your language, no need to translate the prompt.

List of NLP tasks the model was trained on

You are interested in the selected type of ML task. We will talk about this below.

We reduce the technical specifications to the ML task

First, you need to define the problem statement. This step is good practice for understanding the wording of the TOR and determines the format of the request and answer in future prompts. The goal is simple – to be able to explain your problem to yourself and the LLM. Since current LLMs were trained on many different types of NLP tasks and subtasks, so that the model could adapt to new requests.

It is best to reduce to known problems, such as:

  • Classification

  • Summarization

  • QA (Question and Answer)

  • Translation (human, text-to-code)

  • Named Entity Recognition (NER)

Let's create an MVP-prompt

Prompt structure

In the days of active use of gpt-2, the prompt used to consist of a prefix, for example: “summarize/translate this text ...", "answer ...”Today, when we talk about prompt, it is already a full-fledged instruction in the human understanding.

Slide 1. Example of modern instructional prompts

Slide 1. Example of modern instructional prompts

Let's talk about the classification of prompts.

System prompt – a prompt that defines the model's behavior – how to communicate with users. We set the role, immerse it in the context of the task domain, that is, what requests the model should solve in the future. Do not ignore this type of prompt, it will help achieve better results than the default version declared in the model.

Custom Prompt (User prompt) — a prompt in which we form our specific request to get an answer. It is better to describe the user prompt in detail and clearly. For the best result, it is better to describe it according to the structure. Below we will discuss the user prompt structure template that I adhere to. The structure includes the following chapters:

  1. Context

Describe the input data and what it is.

“Тебе дан диалог между Ромео и Джульеттой, ограниченный тегами <dialog>”

  1. Request

Specify your main request, what problem the model should solve.

“Найди в диалоге информацию о будущих намерениях Ромео и расскажи, что он планирует делать дальше”

  1. Answer

Specify what the answer should look like (depending on the type of ML problem being solved):

  • brief or detailed

  • what words should it start with

  • how should the model behave if there is no context

"Формат ответа
- Отвечай коротко
- Ответ начинается с 'Ромео планирует'
- Пиши 'Нет ответа', если не найдешь информацию"

4. Motivation (optional)

The model sometimes responds well if you indicate the importance or motivation of the answer. This is indicated like this:

  • “Соблюдай формат, условия и порядок ответов!”

  • “Это важно для моей карьеры!”

  • “ЭТО ОЧЕНЬ ВАЖНО ДЛЯ МЕНЯ!!!”

  • well or show this guy's face and the text below

Slide 2. I WILL GIVE A MILLION DOLLARS TO ANYONE WHO DECIDES...

Slide 2. I WILL GIVE A MILLION DOLLARS TO ANYONE WHO DECIDES…

The motivation to earn money has a positive effect on quality, and the more money, the better the metrics.

Prompts override

There are no universal prompts – each model will respond differently.

Try creating different variations of both the prompt and the prompt structure as a whole. For example, make different combinations of chapter options and evaluate each one by metrics.

Advice: The chapter versions should be different not only word-by-word, but also structurally – changing the original prompt by one word makes no sense.

In addition, you can search the Internet for other templates – more or less they will be similar to each other.

Improving the prompt

Prompt Engineering is not standing still. In recent years, new types of prompts have appeared that work through the “reasoning” of the model itself.

Chain-of-Thought (CoT)

Chain-of-thought (CoT) — is a method that aims to stimulate the model to think logically and explain its steps when solving problems. It differs from standard prompting, which only requires an answer, not an explanation of the solution process. It is enough to show an example of reasoning through a shot-example so that the model similarly shows reasoning for your request.

Slide 3. Example of CoT work in comparison with one-shot

Slide 3. Example of CoT work in comparison with one-shot

In the article, they compare one-shot and one-shot with CoT. The authors showed the effectiveness of solving mathematical problems in a textual setting – where the problem of incorrect order of arithmetic operations to obtain the correct answer without using reasoning is visible.

Zero-shot CoT

This approach is an extension of the original CoT – instead of using a shot example, let's implement a trigger phrase to invoke the model's reasoning. And this is about the famous phrase “Let's think step by step”.

In the article, the trigger phrase is used in the prompt as the beginning of the model's response. Today, this wording can be used in the instructions themselves, for example, in the chapter “answer format”

Slide 4. An example of comparison of Zero-shot CoT with other approaches

Slide 4. An example of comparison of Zero-shot CoT with other approaches

Moreover, there was study search for other trigger phrases (slide below). Use your trigger phrase, tailored to your task and language.

Slide 5. Using different trigger phrases and their effectiveness

Slide 5. Using different trigger phrases and their effectiveness

Prompt Chaining

If your task is too complex to describe in one call, then It's better to decompose it into subtasks.

Once these subtasks are defined, the LLM receives a request for one subtask, and then its response is used as input for another request. This is called a “hint chain”, where a task is broken down into subtasks to create a chain of hint operations.

This approach helps increase the transparency of your model, increases manageability and reliability. This means that you can more easily debug problems with model responses, analyze and improve the performance at various stages that need improvement.

In my experience, the following chain works well:

Prompt 1. Let's make a summary of the document
We filter out unnecessary information, leaving only the context we need

"Тебе дана часть диалога Ромео и Джульеттой, ограниченный тегами <dialog>. Составь саммари по диалогу, включив туда дальнейшие действия и планы Ромео."

Prompt 2. We set the target request

"Найди в саммари, заключенный тегами <summary>, информацию о дальнейших планах Ромео, если найдешь"

Prompt 3 (Optional). Confirmation of the extracted answer.

Helps filter out hallucinations (false extracted responses)

"Тебе дан диалог между Ромео и Джульеттой и следующее утверждение про планы Ромео {statement}. Является ли данное утверждение следствием диалога?"

Self-Consistency

Of the interesting techniques, I would like to mention self-consistency. The idea is to use few-shot CoT to analyze many different lines of reasoning and use several answers to select the most appropriate one through the most frequent one.

Example of a prompt:

Q: There were nine computers in the server room. Five more computers were installed each day, frommonday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.The answer is 29.
Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How manygolf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. OnWednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.
Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent 5
Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?/
A:

Answer 1: When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70 - 3 = 67. The answer is 67.

Answer 2: When the narrator was 6, his sister was half his age, which is 3. Now that the narrator is 70, his sister would be 70 - 3 = 67 years old. The answer is 67.

Answer 3: When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70/2 = 35. The answer is 35.

In practice, this helps improve the quality of answers. The downside is that LLM is called multiple times for one document.

Self-discover

Relatively recently, one of the potential approaches to prompt engineering was proposed by Google. The idea is this: let the model decide for itself how to reason for your task.

The problem is solved in three steps.

Step 1. Select

First, we select which options of reasoning are best suited to the task we have received.

Slide 6. Prompt for the Select stage

Slide 6. Prompt for the Select stage

Step 2. Adapt

The selected reasoning is adapted by the model to the task domain

Slide 7. Prompt for the Adapt stage

Slide 7. Prompt for the Adapt stage

Step 3. Implement

A list of adapted reasoning is used to generate an answer

Slide 8. Prompt for the Implement stage

Slide 8. Prompt for the Implement stage

This approach has already been implemented in the self-discover framework. It already provides a list of 39 arguments. This list can be expanded with your own arguments or ready-made ones taken from the work Promptbreeder from Google DeepMind.

Conclusion

These are not all the advanced techniques I would like to tell you about. Let's summarize the steps of prompt-engineering.

Prepare for Prompt Engineering

  • Formulate your technical task as an ML task

  • Select a few LLMs to experiment with

  • Define target response evaluation metrics

Write mvp prompt

  • Create your first instruction following the prompt structure.

  • Experiment with different prompt options

Improve the prompt through reasoning

Thanks for your attention! Write in the comments whether this guide was useful to you, and about your experience in prompt-engineering.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *