How to understand that this is ChatGPT 4?

I suggest asking a couple of questions to models 4o and 3.5. We will access 4o in two ways: through the shell BotHub and through the official ChatGPT app to compare answers. I’d like to note right away that the responses via the API and the official UI may differ, why? Due to system hardware, settings and parameters. While a developer can configure everything for himself as he pleases by purchasing an API, in the official UI everything can be done for him. As far as I know, OpenAI has not published internal instructions or system prompts for ChatGPT, but they definitely exist, except that you and I, ordinary people, are not allowed to fully delve into this, but I remember one of the articles from Medium with a way to pull out this same ChatGPT system prompt, where, by the way, the model is indicated, which the model should name if asked (now this method no longer works):

But you need to understand that this is July material, and the OpenAI team could update the system prompts.

An example of how a system prompt might look in the official UI is publicly available from Anthropic (Claude models). That is, for example, Claude acts on the principle: what is not written, I don’t know, and if the prompt is empty, then I will invent it.

*Claude is just a good example for understanding the difference between API and office. UI, we won’t use it anymore

Current data

Let's start with the data. The training sample for model 4o is until October 2023, and 3.5 until September 2021.

Our promt will be like this:

what happened on February 6, 2023

ChatGPT-4o (BotHub):

ChatGPT-4o (OpenAI):

ChatGPT-3.5:

So, before our eyes there are completely different answers, a clear demonstration of the difference in the provision of up-to-date information (under sampling conditions, of course). To make sure that you have ChatGPT 4o, just ask a question about the information that appeared on the Internet after September 2021, but before October 2023 (this is without the search function). ChatGPT 3.5 will not be able to answer such a question, or will answer incorrectly, since it will be limited in its capabilities.

Logics

We know that version 4o is much better than 3.5 in logic, I suggest checking this with a specific example.

Our promt will be as follows:

Two boats are floating along the river parallel to each other. Each moves at a speed of 30 km/h. At what speed relative to the shore does their common center move?

ChatGPT-4o:

ChatGPT-4o (OpenAI):

ChatGPT-3.5:

In a logic problem, when comparing models, you will notice that ChatGPT 3.5 will show poor understanding of the problem statement, leading to incorrect conclusion, while ChatGPT 4o will be more accurate in logical reasoning and physics problems. ChatGPT 3.5, due to the fact that it does not understand the task, will go into (or vice versa avoid, more on this later) details where it is not required and will only confuse you, while version 4o will be able to immediately establish which elements are important for solutions and apply the correct approach to the solution.

Mathematics

And let's take, for example, a math problem that is simple, but requires attention.

Our promt:

A brick weighs 1 kg and half a brick. How much does a brick weigh?

ChatGPT-4o (BotHub):

ChatGPT-4o (OpenAI):

ChatGPT-3.5:

So, as I pointed out earlier, ChatGPT 3.5 may instead avoid detailing how it came to this conclusion, and the response received now is a clear demonstration of this phenomenon. You will not see logic and reasoning, but only an incorrect interpretation of the problem, which, as in the previous problem, will only create confusion. Also, you will get the wrong answer, since version 3.5 has large gaps in mathematics compared to version 4o.

Thus, by running models through tasks that are simple for humans, but sometimes incomprehensible to machines, you can easily calculate where the model you came for is and where the swindler is hiding. GPT 3.5 copes exclusively with basic queries, if you try to dig further and give it tasks related to reasoning, logic and complex (compared to the basic level) calculations, it will crumble and you can understand that this is ChatGPT 3.5. But at the same time, you need to understand that GPT 4/4o is not as good as it might seem while reading this short article. The article provides examples of tasks on how to understand that you are looking at ChatGPT 3.5, but does not raise GPT 4/4o to heaven: our blog has many comparisons of models (for example, the latest release API Grok 2), searching for hallucinations And cognitive distortionswhich demonstrate flaws in models, including GPT-4o.

Thank you all for your attention! And don't trust models when they say their version or context size(: