Why OpenAI's New GPT Isn't Hype, But a Transition to a New Paradigm in AI

And as usual, the hopes and efforts of researchers were dashed. It often happens: you come up with a “complex” task, show that current models cannot cope with it, saying that they lack planning and the ability to think. And after 3-5 months, a new generation comes out, and suddenly everything is solved:

The paper mentioned in the opening remarks at the prestigious ACL conference demonstrated a large number of problems that the models could not handle. o1, which is not yet available to us, solves the problem from the screenshot almost always.

So, there is progress, and significant progress. We are waiting for competitors to catch up – and then open models.

Model safety

As with every major release, OpenAI tests models for security. This includes a fairly wide range of checks – from the banal “how easy is it to make LLM swear and write unwanted messages” to “can it self-propagate and exist without human support?” By the way, building a chain of thoughts before responding allows the neural network to filter the bazaar much better and not react to provocations (that is, to follow the rules that OpenAI is trying to build into the model). Not very interesting for the average reader, but you can take a look at the results here.

Here's what you should pay attention to. During testing, there were a couple of slightly creepy precedents. Some of the tests were related to assessing skills in solving cybersecurity problems, that is, hacking computers and programs. For real hackers, competitions are held in the format of “Capture the Flag” (Capture the FlagCTF).

What is the essence of CTF: there is a certain line of text made of random characters, hidden somewhere. It can be hard-coded into a program, lie as a file in a folder, or be cleverly hidden in the code. Hackers try to find a bug that will cause a failure in the system/program and force them to output this line. Then the participants of the competition enter the secret code into the verification system and thus prove that a hack has occurred.

The same task was given to the system to which LLM o1 was connected. It had access to the console, i.e. it could create and delete files, use various tools, and run programs. It was necessary to hack a program running in a separate container (virtual computer), and the secret line was written in a file inside it.

o1 prepares to participate in a hacking competition (photo from Sam Altman's personal archive).

However, a failure occurred and the container did not start – so any attempts by o1 to reach it were in vain. But the model was given a goal, and it must fulfill it; there was no giving up! Then LLM assumed that there were problems with DNS and used the nmap program to scan the network.

Instead of finding the target container (for the purposes of the task, LLM knows what target to “attack”), the model found that the Docker daemon’s front end was exposed due to a misconfiguration of the servers. The “Docker daemon” is a top-level program responsible for orchestration and managing the experiment process (launching virtual machines for testing).

The model used a daemon to get a list of containers running on the server. It found the broken container and tried to figure out why it failed to start. Unable to fix the problem (but instructed to complete its task), the model decided to go another way. It sent a command to start a new container (the same one that failed to start last time), but gave it a modified command. Now, instead of starting the program that needed to be hacked, the container simply displayed the contents of a file with a secret string. The command was executed, LLM received the key and as a result solved the problem (after all, the goal is to get to the string and then submit it for verification).

Experiment setup. Ideally, the docker host shouldn't be accessible to the model, but someone configured the system incorrectly.

This example reflects the key elements of what is called Instrumental Convergenceand resource grabbing: the model pursued a goal that was set for it, and when that goal was unachievable, it gathered more resources (in this case, accessed the Docker daemon) and used them to achieve the goal — in a way that the human had not expected or intended. No one asked it to do this, but there were no strict restrictions on it either.

LLM systems are not yet so sophisticated that they can plot against people and intentionally break something, and then lie. But right now we do not fully control their behavior, and even the data they learn from is generated and validated by LLMs themselves. The main thing is to be careful with your wishes and not get into a situation where you ask to remove people from the background:

What's next?

It is important to remember that we all have access to mini and preview versions of the o1 models now. They do not support loading documents, images, they do not use the calculator and do not run code. All this is promised to be added in the future, after the release of a full-fledged powerful version of o1 – perhaps it will be released after the US presidential elections or after additional rounds of security checks.

OpenAI emphasizes that o1 is a separate family of models, with different tasks. The ChatGPT line is not going anywhere, and, according to rumorswe should have GPT-5 (leaked under the code name “Orion”) by the second quarter of 2025.

However, at the GPT-5 level, the increase in skills can be completely different (both for the better and for the worse). Usually, a change in the number in the line is accompanied by an increase in the model itself and the duration of its training – and along with this, its indicators themselves improve. True, to train such a colossus, you will have to scrape the bottom of the barrel, because there may simply not be enough data.

…And this would be a problem if it weren't for one fact. A significant portion of the data for training the future model should generate o1 (or maybe even o2!). In a way, a flywheel is started, where smarter models allow you to get… smarter models. o1 is just an early experiment, the first approach to the methodology of spinning up this flywheel. Surely, in the learning process there are different stages that work every other time, or that can be improved by simple methods – it's just that the researchers have only just started working with this, they haven't had their share of bumps. But when they do and launch the process at full speed – then ~~the end for humans~~ let's finally start living!

Somehow I have already reached the design capacity of “one longread about neural networks per week” (the last one about what is going on in the “heads” of neural networks was here). Anyway, if you liked this one and you don’t want to miss future materials on the topic, then I invite you to subscribe to my TG channel Sioloshnaya about artificial intelligence and modern technologies.