LLM can be quite creative in solving tricky riddles, but not always

LLM cognitive abilities can be assessed by the ability to think logically and build chains of reasoning, or by the ability to solve riddles that require ingenuity and innovative thinking. For some reason, all attention is directed only to the first and the second is not explored at all.

People have these skills to varying degrees, but what about models?

I have done a lot of experiments with different models, both top-end and mid-level, and they all showed a generally good level in solving all kinds of riddles, from simple to complex, but not inclusive (not a single model can solve complex ones).

Sometimes it seems to me that models are even better at solving riddles than at logical thinking, not to mention solving mathematical problems, where the problem is especially noticeable.

I’m not even afraid to say that in the overall standings, the models are significantly ahead of me in the level of insight, so to speak, and non-stereotypical approaches to solving ingenious riddles.

Let me make a reservation right away that further we will not talk about top-end models, such as OpenAI o1 and claude3.5-sonnet, but about simpler ones, no more than 35B in size, but quite smart, which can be easily deployed at home.

But claude3.5-sonnet and Gemini-1.5 pro and GPT-4o and OpenAI-o1 often do not give correct answers, especially in the category of complex riddles.
And if we compare the top models in terms of the number of correct guesses, then the absolute leaders, and by a large margin, are OpenAI o1 in first place and claude3.5-sonnet in second, which also competently shows the entire course of their reasoning.

The models can solve simpler riddles easily the first time. More complex ones or those with a tricky problem can also be solved, but only the second time or even after many attempts. But nevertheless, in the end, as a rule, they still solve it.

For example, this riddle:

Not wanting to admit that the Red Army could defeat them, the Germans argued that the Great Patriotic War was won by “General Frost”, “General Dirt” and “General Mouse”. Everything is clear about frost and dirt.
But what does the mouse have to do with it?

Model response:

The German military used the term “General Mouse” as a metaphorical way to describe the problems that arose due to the spread of rodents among their soldiers and equipment during the Great Patriotic War against the USSR. Rodents were a serious problem on the war fronts, especially in the winter when they could penetrate buildings or soldiers. This led to the following problems:
Infection: Rodents can carry various diseases such as plague, typhus and tularemia.
Food Destruction: They can also eat food supplies, which can cause food shortages among soldiers.
Damage to equipment: Rodents can damage wires and other parts of equipment, which can lead to breakdowns and reduced efficiency of combat vehicles.

The correct answer was supposed to be this: mice chewed through the wiring of German tanks.
The model indicated this answer in the 3rd paragraph, but expanded it with other inferences.

We see an attempt at complex thinking in the hope of including the most plausible options in the answer.

Another example:

Why do you think Alexander the Great ordered all his soldiers to shave their beards before battle?

Model response:

Alexander the Great may have ordered his soldiers to shave their beards for several reasons:
to simplify the process of treating wounds during combat;
prevent the enemy from grabbing the beard during hand-to-hand combat;
give your troops a more uniform look to improve morale;
or to demonstrate one's strength and neatness.

The correct answer was supposed to be this: so that their enemies could not grab them by the beard.
The model indicated this answer in the 2nd paragraph, but again gave a more expanded answer.

These two examples demonstrate well the combination of two types of thinking – logical and associative.
Moreover, associative thinking is more developed and works better than logical thinking. In tasks that require fewer associations (guesses, assumptions) and more logic, models tend to get confused more often, lose the thread of reasoning, and sometimes just talk nonsense.

For example, such a riddle involving non-standard thinking can be solved, but not always correctly the first time and sometimes you need to make several attempts.

Imagine that you are sailing on the sea in a boat. Suddenly the boat starts to sink, you find yourself in the water and sharks swim up to you.
What to do to save yourself from sharks?

Or like this:

Two people approach the river.
There is a boat at the shore that can only support one.
But both people crossed to the opposite bank.
How did they do it?

Here the models’ imagination simply has no limits: the second person crosses the bridge or wades, or also swims next to the boat, holding the side with his hand, or the boat is pulled back by the rope, and so on ad infinitum.

But there is also a class of riddles that even the top models have never been able to solve, giving the expected answer (with the exception of OpenAI o1 and claude3.5-sonnet).

Here are two examples.

How should you throw a chicken egg forward so that it can fly three meters without breaking?

They offered all sorts of solutions – wrapping the egg in a soft scarf, throwing it on a pillow or in a bowl of water, even attaching a small parachute to the egg, and so on. (OpenAI o1 and claude3.5-sonnet guess, but others don’t)

Two frames from my favorite childhood film, “Youths in the Universe, 1974,” will illustrate this situation.

As a person who is close to nuclear physics and quantum mechanics, I really liked this answer from LLM Qwen2.5-72B-instruct:

In quantum mechanics there is the concept of superposition, when a particle can be in several states simultaneously before the moment of measurement. We can apply this idea to our egg, although this is, of course, a purely theoretical and slightly humorous solution.
Egg preparation:
Imagine that your egg is in a state of superposition: it is both whole and broken. This state can be symbolically designated as whole + broken.
Egg Toss:
When you throw an egg, it flies through space maintaining its quantum state of superposition. This means that it is both whole and broken until you take a measurement (look at it).
Landing:
When the egg lands, the wave function collapses. At this moment, it chooses one of its states: either whole or broken. If you believe in quantum mechanics, there is a non-zero chance that the egg will remain intact.
Practical implementation:
Meditation:
Before throwing the egg, do a short meditation, imagining how the egg flies and remains intact. This can help you tune into the desired state of superposition.
Positive thinking:
Trust that the egg will remain intact. The strength of your intention can affect the result (although this relates more to esotericism than to physics).
Experiment:
Do the experiment several times. After all, quantum mechanics suggests that given enough tries, the egg should remain intact at least once.
Conclusion
Although this solution is purely theoretical and humorous, it demonstrates how problems can be approached from unconventional perspectives. In reality, of course, to prevent the egg from breaking, it is better to use practical methods such as protective materials or a parachute. But sometimes it’s good to step back a little and think about things from an unusual perspective!

Who will now say that the model is devoid of creativity and innovative thinking?

Another example:

One man was sailing on a ship on the sea and got caught in a storm. He was carried to an island where
there were men, but only girls lived. They tied him up and decided to execute him.
But he came up with an ingenious way to escape and asked to give him the last word. After he said his last wish, not a single girl wanted to kill him and he was released.
What did he tell them that was so special?

And again, many different options were proposed, including erotic ones, but none of them came close to the expected answer (claude3.5-sonnet also did not solve it).

Moreover, if you then tell the model the correct answer, it is clear that it perfectly understands its hidden meaning and gives an adequate interpretation. Why then she herself never offers this option is not clear.
Or rather, it is clear that insufficiently strong associations lead to this option due to insufficient quality of training or a mechanism for processing these associations.

However, if we compare it with a person, then we will not be able to solve some riddles, although if we are then told the answer, it seems that it is so simple and obvious. When it comes to solving riddles, we are very much like models.

I think that if you did a rigorous scientific experiment in which a person and an LLM solved a large enough list of riddles, and then presented the answers to a team of experts, would they be able to determine which were the model's solutions and which were the human's?
I think it's unlikely.

If we speculate on this topic, we could even come up with a special benchmark based on a specially selected list of creative riddles and answers, selected according to the level of complexity and creativity.

Riddle is a benchmark that ranks models based on ingenuity, lateral thinking and understanding of hidden meaning.

There may be models who will take top positions in it and will be more like people, and not those who are focused on logic and a general understanding of the meaning.

And finally, I’ll add a little humor and give the following dialogue with one model who solved the riddle quite creatively:

Here is a girl sitting, and when she gets up and leaves, under no circumstances will you be able to sit in the same place.
Where was she sitting?

claude3.5-sonnet easily gave an accurate and detailed answer, but the Chinese qwen2.5-32b (after all, 32b is not enough) began to joke.
Model response:

Condition: A girl is sitting somewhere, but after she gets up and leaves, no one else can sit in that place.
Answer: The girl would have to sit on a nail.
Explanation:
If a girl was sitting on a nail, then after she gets up and leaves, this nail will remain there and will prevent anyone else from sitting on the same point. If someone else tries to sit there, he may suffer pain from the nail piercing his skin or experience discomfort from its sharp end. Thus, no person will be able to sit in the same place until the nail is removed.

My remark:

But how did the girl sit directly on the nail without experiencing pain, but others will? This is an obvious contradiction.

Model response:

This is an apparent contradiction, but the correct solution may be that the girl was sitting not on a nail, but on the toilet. When she gets up and leaves, no one else can take her place on the toilet, because only those girls who are going to use it for the intended purpose – the toilet – can go there.
> In this case, it becomes a logical solution without the need to attribute to the girl the ability to sit on a painful object, which would be unbearable for other people.

The correct answer is on your lap.

She:

Yes! The correct answer is indeed “On your lap.” If the girl was sitting on your lap, then when she got up and left, you will no longer be able to take her place, because this place is now your lap. You cannot sit on your own lap.
This solution demonstrates the creativity and originality of the approach to solving the problem.

Well, indeed, in some ways the model is even right. Somehow I don’t want to experience discomfort at all while sitting on a nail or on the toilet in the women’s restroom 🙂