OpenAI on New AI Models That Can Reason

OpenAI o1 Series Models — are new large language models trained using reinforcement learning to perform complex reasoning. o1 models think before answeringand may create a long internal chain of reasoning before responding to the user.

o1 models excel at scientific reasoning, ranking 89th in percentage on competitive programming questions (Codeforces), ranking in the top 500 US students in the AIME Qualifying Round, and exceeding human PhD-level accuracy on Physics, Biology, and Chemistry questions (GPQA).

There are two reasoning models available in the API:

  • o1-preview: An early draft of our o1 model, designed for reasoning about complex problems using general knowledge about the world.

  • o1-mini: A faster, cheaper version of o1, especially effective for coding, math, and science tasks that don't require extensive general knowledge.

o1 models show significant progress in reasoning, but they are not intended to replace GPT-4o in all use cases.

For applications that require image input, function calls, or consistently fast response times, the GPT-4o and GPT-4o mini models will continue to be the right choice. However, if you plan to develop applications that require deep reasoning and are designed for longer response times, o1 models can be a great choice. We can't wait to see what you build with them!

The o1 models are currently in beta.

Access restricted by developers 5th level (check your usage level Here), with low speed limits (20 RPM). We are working on adding new features, increasing speed limits, and expanding access to more developers in the coming weeks.

Quick Start

AND o1-previewAnd o1-mini available via chat completions endpoint. .

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user", 
            "content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
        }
    ]
)

print(response.choices[0].message.content)

Depending on the amount of reasoning required by the model to solve the problem, these queries can take anywhere from a few seconds to a few minutes.

Beta version limitations

During the beta testing phase, many parameters of the Chat Completions API are not yet available. In particular:

  • Modalities: text only, images are not supported.

  • Message types: Only user and assistant messages, system messages are not supported.

  • Streaming: not supported.

  • Tools: tools, function calls, and response format parameters are not supported.

  • Log tests: are not supported.

  • Other: temperature, top_p and n are fixed to 1, and presence_penalty and frequency_penalty are fixed to 0.

  • Assistants and Batch: These models are not supported in the Assistants API and Batch API.

We will add support for some of these options in the coming weeks as we move out of beta. Features such as multimodality and tool usage will be included in future o1 series models.

How Reasoning Works

o1 models are provided with reasoning tokens. Models use these to “think”, breaking down their understanding of the prompt and considering multiple approaches to formulating an answer. After generating reasoning tokens, the model outputs an answer in the form of visible completion tokens and discards the reasoning tokens from its context.

Here is an example of a multi-step conversation between a user and an assistant. Input and output tokens from each step are carried over, and reasoning tokens are discarded.

Although reasoning tokens are not visible through the API, they still take up space in the model's context window and are called inference tokens.

Managing the context window

The o1-preview and o1-mini models offer a context window of 128,000 tokens. Each completion has an upper limit on the maximum number of tokens that can be output, which includes both invisible reasoning tokens and visible completion tokens. The maximum limits on the number of tokens that can be output are:

When generating completion tokens, it is important to ensure that there is enough space in the context window for reasoning tokens. Depending on the complexity of the task, models can generate from several hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used can be seen in the usage object of the chat completion response object in the completion_tokens_details section:

usage: {
  total_tokens: 1000,
  prompt_tokens: 400,
  completion_tokens: 600,
  completion_tokens_details: {
    reasoning_tokens: 500
  }
}

Cost control

To control costs in o1-series models, you can limit the total number of tokens generated by the model (including reasoning and completion tokens) using the max_completion_tokens parameter.

In previous models, the max_tokens parameter controlled both the number of tokens generated and the number of tokens visible to the user, which were always equal. However, in the o1 series, the total number of tokens generated can exceed the number of visible tokens due to internal reasoning tokens.

Because some applications may rely on max_tokens matching the number of tokens received from the API, the o1 series introduces max_completion_tokens to explicitly control the total number of tokens generated by a model, including both reasoning tokens and visible completion tokens. This explicit choice ensures that existing applications will not break when using the new models. The max_tokens parameter continues to work as before for all previous models.

Allocating space for thought

If the number of tokens generated reaches the context window limit or the max_completion_tokens value you set, you will receive a chat completion response with finish_reason set to length. This may happen before visible completion tokens are generated, meaning you may incur typing and reasoning costs without receiving a visible response.

To prevent this, make sure there is enough space in the context window, or change the max_completion_tokens value to a higher value. OpenAI recommends reserving at least 25,000 tokens for reasoning and inference when you start experimenting with these models. As you become familiar with the number of reasoning tokens required by your prompts, you can adjust this buffer accordingly.

Prompting Tips

These models work best with direct prompts. Some methods of developing prompts, such as multi-frame prompts or telling the model to “think step by step”, not only do not improve the work, but sometimes hinder it. Here are some best practices:

  • Keep your prompts simple and direct: Models understand and respond well to short and clear instructions that do not require detailed directions.

  • Avoid chain-of-thought prompts: Since these models reason internally, there is no need to encourage them to “think step by step” or “explain their reasoning.”

  • Use separators for clarity: Use delimiters such as triple quotes, XML tags, or section names to clearly denote distinct parts of the input data, helping the model interpret the different sections correctly.

  • Limit additional context in RAG: When providing additional context or documents, include only the most important information to avoid complicating the model's answer.

Examples of prompts

OpenAI o1 series models are capable of implementing complex algorithms and producing code. This o1 challenge asks you to refactor a React component based on specific criteria.

from openai import OpenAI

client = OpenAI()

prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red
  text. 
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- For formatting, use four space tabs, and do not allow any lines of code to 
  exceed 80 columns

const books = [
  { title: 'Dune', category: 'fiction', id: 1 },
  { title: 'Frankenstein', category: 'fiction', id: 2 },
  { title: 'Moneyball', category: 'nonfiction', id: 3 },
];

export default function BookList() {
  const listItems = books.map(book =>
    <li>
      {book.title}
    </li>
  );

  return (
    <ul>{listItems}</ul>
  );
}
"""

response = client.chat.completions.create(
    model="o1-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
            ],
        }
    ]
)

print(response.choices[0].message.content)

OpenAI o1 series models can also create multi-stage plans. In this example, o1 asks for the creation of a file system structure for the complete solution, as well as Python code implementing the desired use case.

from openai import OpenAI

client = OpenAI()

prompt = """
I want to build a Python app that takes user questions and looks them up in a 
database where they are mapped to answers. If there ia close match, it retrieves 
the matched answer. If there isn't, it asks the user to provide an answer and 
stores the question/answer pair in the database. Make a plan for the directory 
structure you'll need, then return each file in full. Only supply your reasoning 
at the beginning and end, not throughout the code.
"""

response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
            ],
        }
    ]
)

print(response.choices[0].message.content)

OpenAI o1 series models have shown excellent results in STEM research. Hints that require support for basic research tasks should show strong results.

from openai import OpenAI
client = OpenAI()

prompt = """
What are three compounds we should consider investigating to advance research 
into new antibiotics? Why should we consider them?
"""

response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user", 
            "content": prompt
        }
    ]
)

print(response.choices[0].message.content)

Examples of use

Some examples of using o1 in real situations can be found in cookbook.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *