Easily create dynamic prompts in Python

This article will be useful for Python developers working with language models (LLMs).

Recently I needed a tool for generating prompts in Python code. I didn't want to use complex solutions, so I created a small library called FlexiPrompt. Here are its main advantages:

Easily integrates into existing code
Allows you to quickly and flexibly set up a dialogue with LLM
Can split one LLM into multiple agents, customizing communication through templates

What it looks like in code

Here's a simple example of using FlexiPrompt:

from flexi_prompt import FlexiPrompt

fp = FlexiPrompt()
inner_fp = FlexiPrompt({"another_field1": "nested value1, "})
inner_fp.another_field2 = "nested value2"
inner_fp.another_field1().another_field2()

fp.final_prompt = "Here is: $inner_fp, $some_field, $some_callback"
fp.inner_fp = inner_fp
fp.some_field = 42
fp.some_callback = input  # Пример: введите "user input"

print(fp.final_prompt().build())  
# Вывод: Here is: nested value1, nested value2, 42, user input

Case Study: Improving Self-Proctored LLM Answers

Let's look at an interesting example of using FlexiPrompt. We will create a system where language models evaluate and improve their own responses. Here's how it works:

We receive a request from the user
Generating a response with the first neural network
We ask two different neural networks to evaluate the answer and take the average estimate
Generating a response using the second neural network
We evaluate the answer again
If one of the answers receives the maximum score, save it as the best and complete the process
Repeat steps 2-6 up to 5 times, saving the best answer
We give the best answer to the user

Implementation

For this example we will use the OpenAI and Anthropic APIs. Here is the basic structure of the code:

from flexi_prompt import FlexiPrompt
from openai import OpenAI
from anthropic import Anthropic

# Настройка API ключей и клиентов
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
os.environ["ANTHROPIC_API_KEY"] = userdata.get("ANTHROPIC_API_KEY_TEST1")

def get_openai_answer(question, openai):
    openai_compleion = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question},
        ],

    )
    return openai_compleion.choices[0].message.content

def get_antropic_answer(question, antropic):
    message = antropic.messages.create(
        max_tokens=4096,
        temperature=0,
        model="claude-3-haiku-20240307",
        system="You are a helpful assistant.",
        messages=[{"role": "user", "content": [{"type": "text", "text": question}]}],
    )
    return message.content[0].text

fp = FlexiPrompt()

# Настройка промптов
fp.question = "Your question here"
fp.rate_prompt = """
Rate the answer to the question from 1 to 9, where 1 is the worst answer.
Be rigorous in your evaluation. Give back only one number as your answer.

Question: 
 $question
Answer: 
 $answer
"""

# Основной цикл
MAX_ATTEMPTS = 5
THRESHOLD_SCORE = 9
best_rate = 0
best_answer = ""

for attempt in range(MAX_ATTEMPTS):

    fp.answer = get_openai_answer(fp.question().build(), openai)
    answer_rate = get_answer_rate(fp.rate_prompt().build(), openai, antropic)

    if answer_rate > best_rate:
        best_rate = answer_rate
        best_answer = fp.answer

    fp.answer = get_antropic_answer(fp.question().build(), antropic)
    answer_rate = get_answer_rate(fp.rate_prompt().build(), openai, antropic)

    if answer_rate > best_rate:
        best_rate = answer_rate
        best_answer = fp.answer

    if best_rate >= THRESHHOLD_SCORE:
        break

print(best_answer)
print("The answer rate is:", best_rate)

This approach allows for better responses from language models by using their own abilities for self-evaluation and improvement. The complete example code is on Github.

Comparison with alternatives

I looked at Haystack, LangChain and a few smaller libraries.

Most out-of-the-box solutions pack in a bunch of functions besides prompting. Almost everyone uses jinja under the hood.

Jinja itself is a more difficult solution and is not designed for prompts. Suitable for large-scale projects.

FlexiPrompt focuses on simple projects. There is no need to fence classes and abstractions, but you get flexibility in the output.

Plans

For now, there are obvious things that need to be added: the ability to escape special characters and safely adding strings.

In the future, I would like an efficient and reliable parsing of the answer, which will take into account the liberties of the LLM answers. I think it should be converting from string to object or firing triggers.