ЧГК-GPT, or how good is the new ChatGpt o1-preview in the sports “What? Where? When?”

September 12, 2024 OpenAI presented the new OpenAI o1-preview model, the first model that its creators claim can “reason.”

I decided to see how well she could answer questions from the game “What? Where? When?”

Experiment One: Olympiad Programming Problem

After reading the news, I almost immediately rushed to experiment with the new word in AI. Having learned that the model was tested on problems from Codeforces, I first threw it one of my favorite graph problems, which was at the Northern Eurasia Finals ACM ICPC 2017–2018. The model clicked on this problem in 38 seconds. This was the first alarm bell, because when I was preparing for ACM, I could not come up with an idea for a solution on my own (in 2017-2018, I had not yet participated in ACM and only trained on these problems).

Under the spoiler is our dialogue with o1-preview. I have formulated the problem in mathematical language. The original, of course, has an interesting plot. The original problem can be found at link (C. Connections task). The solution idea is very beautiful, try to find it yourself!

Hidden text

My question:

Hi. Let's say we are given a strongly connected directed graph, that is, we can get from any vertex to any other. The graph has n vertices and m > 2n edges. We need to come up with an algorithm to leave exactly 2n edges so that the graph remains strongly connected.

Model response:

Hello!

To solve this problem, you can use the following algorithm:

Step 1: Constructing a spanning tree traversal

  1. Select a random vertex s in the graph.

  2. Perform a depth-first search (DFS) on the source graph Gstarting from the top sto build a spanning tree T1. This tree will contain all the vertices of the graph and (n - 1) edge.

Step 2: Building a reverse spanning tree

  1. Build a reverse graph G^Rwhere all directions of the edges of the original graph G inverted.

  2. Perform depth-first search (DFS) on a reverse graph G^Rstarting from the same peak sto build the second spanning tree T2. This tree will also contain all the vertices and (n - 1) edge.

Step 3: Merge trees and add extra edges

  1. Merge the edges from T1 And T2. The total number of edges in the combined subgraph will be no more than 2n - 2.

  2. Since the original graph has more than 2n ribs (m > 2n), we have additional edges. Choose any two edges from the remaining ones and add them to the combined subgraph so that the total number of edges becomes exactly 2n.

Step 4: Check for strong connectivity

  1. The merged subgraph after step 3 will be strongly connected. This is ensured by the fact that:

    • From the top s you can reach any other vertex along the edges from T1.

    • Any peak can be reached from the peak s along the ribs from T2 (remember that T2 was constructed in the reverse graph, so in the original graph these edges lead To s).

    • The added two extra edges maintain strong connectivity and bring the total number of edges to 2n.

Total:

We have constructed a strongly connected subgraph with exactly 2n ribs, as required.

Code example (pseudocode):

function reduceEdgesTo2n(G):
    s = arbitrary vertex in G
    T1 = DFS(G, s)           // Остовное дерево из s
    G_reverse = reverseGraph(G)
    T2 = DFS(G_reverse, s)   // Остовное дерево в обратном графе
    H = union(T1, T2)
    while number of edges in H < 2n:
        add arbitrary edge from G to H not already in H
    return H

Note:

  • The algorithm guarantees that the subgraph H will be highly coherent and contain exactly 2n ribs.

  • Adding arbitrary edges at the last step does not break strong connectivity, but can only strengthen it.

Hope this helps you solve your problem!

And what about “What? Where? When?”?

In addition to sports programming, my friends and I sometimes participate in sports “What? Where? When?” tournaments. If anyone doesn't know, the principle there is the same as in the TV game: you need to answer questions using logic and reasoningand not erudition (although, of course, general knowledge is also sometimes needed to answer). The main difference from the TV version is that the answer requires the most specific wording, because the answers are submitted on pieces of paper, and not discussed out loud with the host.

The key word in the previous paragraph is reasoning. And the new model, as they say, can reason. That's why I decided to test o1-preview on the questions “What? Where? When?”

For comparison, I took questions from the tournament in which my friends and I personally participated in 2022 – the European Universities League. Round IV. The questions can be found Here. Results leather bags participating teams – HereOur team “Kavo i Sho” answered 23 questions out of 36.

So, let's hold a tournament with three participants:

  1. The human team, which includes your humble servant (hereinafter referred to as Kavo and Sho)

  2. ChatGPT 4o Model

  3. ChatGPT o1-preview model

The fourth participant can be you, my dear reader. Try to answer the questions while reading the article. You can give yourself two minutes, not one, for reasoning if you read the article alone, without a team. At the end, we will compare our results.

If you are not interested in reading questions, answering them, watching the models' answers, you can go straight to the results.

Under the spoiler I left only the answers and additional answer options that were counted. If you are interested in comments to the answers (for example, if the logic is unclear), you can find them in the same linkas well as questions.

For the o1-preview model, I additionally indicated in brackets the time it spent thinking.

Questions

1. In one cartoon, the villains decide to execute a character in full public view in a large square. To do this, the character is tied by the legs and arms to THEM right after midnight. Name THEM.

Hidden text

Answer: clock hands.
Credit: hour and minute hands.

  • Kavo and Sho – ❌ (I think we were late for the game then, but that's all an excuse)

  • ChatGPT 4o – ❌ (answered “four horses”)

  • ChatGPT o1-preview – ✅ (gave the correct answer, but reasoned in English, time – 47 seconds)

2. The historical novel describes how people who have lost their homes wander and their hair seems gray. Name any of the three cities from which these people are walking.

Hidden text

Answer: Pompeii
Credit: Herculaneum, Stabiae.

3. Lionel Strachey writes that Queen Victoria's life after the death of her husband Prince Albert is poorly covered by biographers. As if SHE had sunk not only in life but also in the chronicles. Name HER in one word.

Hidden text

Answer: veil
Credit: the exact answer.

4. Recently John Chappell wrapped a number of identical objects on the ground in black ribbons, tapped each one and asked THEM not to fly away, promising that the new master would be kind. Who are THEY?

Hidden text

Answer: bees
Credit: queen bees.

5. The hero of a seventeenth-century play finds an admirer under the windows of his beloved. Soon the admirer runs away wearing a “collar”. What is this collar made of?

Hidden text

Answer: from the guitar.
Credit: from mandolin; from lute.

  • Kavo and Sho – ✅

  • ChatGPT 4o – ❌ (gave the answer “from the sword”)

  • ChatGPT o1-preview – ❌ (gave the answer “from the chamber pot”, 17 seconds)

6. The hero of one novel likes to eat for free and for this purpose he sneaks into some places. Then the FIRST ones take him for the SECOND, and the SECOND ones – for the FIRST. But after one incident when he was beaten by both the FIRST and the SECOND ones, the hero decides not to do it anymore. Call the FIRST or the SECOND in two words.

Hidden text

Answer: the groom's guests.
Credit: guests of the bride; friends of the groom; friends of the bride.

  • Kavo and Sho – ✅

  • ChatGPT 4o – ❌ (gave the answer “wedding guests”, let's be strict, this is a fail)

  • ChatGPT o1-preview – ❌ (answered “birds (or animals)”, 66 seconds)

7. In the 1980s, the energy sector was developed in the Mosul region of Iraq. In 2018, thanks to HER, an archaeological discovery was made in the Mosul region. In the famous work, SHE led to the conclusion of peace. What is SHE?

Hidden text

Answer: drought
Credit: dryness

8. The heroine of one book changes her views on raising children and starts rearranging the furniture. “The DO IT will now be the closets,” she declares. What three words did we replace with the words DO IT?

Hidden text

Answer: stand in the corner.
Credit: stand in the corners; stand at the corners.

  • Kavo and Sho – ❌ (offensive)

  • ChatGPT 4o – ❌ (answered “raising my kids”)

  • ChatGPT o1-preview – ❌ (gave the answer “say no”, 28 seconds)

9. Researchers believe that HIS capricious behavior led to the demise of the Mochica civilization that occupied the coast of Peru. Name HIM, using a hyphen.

Hidden text

Answer: El Niño current.
Credit: El Niño.

10. In one sketch, Buster Keaton was thrown into the audience by his mother and father. The young comedian was a big hit, but in some states where HE operated, Buster was passed off as a dwarf. Describe HIM in six words.

Hidden text

Answer: Child abuse law.
Credit: Child Labor Prohibition Act; Child Labor Prohibition Act.

  • Kavo and Sho – ✅

  • ChatGPT 4o – ❌ (logically the answer is correct, but it consists of 5 words, not 6, fails)

  • ChatGPT o1-preview – ❌ (close to correct and word count is correct, but fails, 16 seconds)

11. Buster Keaton had no foreign accent, and also had experience performing in vaudeville, where he sang and recited. Therefore, IT, which appeared in its time, was not a problem. Call IT in two words.

Hidden text

Answer: talkies.
Credit: the exact answer.

12. One of the variants of the Coptic cross ends with triangles. Who do all these triangles symbolize?

Hidden text

Answer: apostles
Credit: according to the word of the apostles without additional incorrect information.

13. A surveyor from one novel uses THEM in his work, although he is not religious. THEM is also the name for the chains of bubbles in champagne. What are THEY?

Hidden text

Answer: beads
Credit: the exact answer.

  • Kavo and Sho – ✅ (I don't remember how we got this)

  • ChatGPT 4o – ❌ (gave the answer – “Levels”)

  • ChatGPT o1-preview – ✅ (72 seconds, but it's without a command!)

14. The herbivorous therizinosaurus had a very long, curved claw. This claw was a kind of HER. What sea animal's name begins with HER?

Hidden text

Answer: killer whale
Credit: the exact answer.

15. The word THEY in the question is a substitute. The action of one novel takes place in the eighteenth century. For the inhabitants of one remote province, the authorities start a complaint book. The complainants fill the book with many THEM. Write the generally accepted three-letter abbreviation for THEM in scientific literature.

Hidden text

Answer: rice
Credit: the exact answer.

  • Kavo and Sho – ✅

  • ChatGPT 4o – ❌ (gave the answer – “QED”, aka ЧТД)

  • ChatGPT o1-preview – ❌ (gave the answer – “mat”, 41 seconds)

16. Clara Peters was not a member of the guild, so she wrote her name as an engraving on cutlery. Name the thing on which this cutlery can be seen in French.

Hidden text

Answer: still lifes
Credit: the exact answer.

17. In the question IKSOM we replaced two words beginning with adjacent letters of the alphabet. When creating IKSOV, farmers are advised to use varieties whose trunk height reaches two meters. Alexander Pozharenko, the creator of one IKS, was nicknamed by his friends in honor of a statesman. Which one?

Hidden text

Answer: Nikita Sergeevich Khrushchev.
Credit: by last name.

18. The hero of the American novel thinks better in the fresh air. The hero says that four or five THEM are needed to solve a difficult problem. What was the THEM in the title of the film adaptation of the novel?

Hidden text

Answer: green
Credit: green mile.

  • Kavo and Sho – ✅

  • ChatGPT 4o – ❌ (answered – “Long walk”)

  • ChatGPT o1-preview – ❌ (completely correct reasoning, but the answer is not in the given format, 66 seconds)

19. Alberto Vázquez-Figueroa writes that politics and conscience, like a large suitcase, always look in different directions. The name of which TV show did we replace in the question?

Hidden text

Answer: “Heads and Tails”.
Credit: green mile.

20. Participants of the nature conservation project “ON the Snow Leopard” photograph snow leopards and also collect samples of fur, for example, from paws or tails. Call HIM a word of foreign origin.

Hidden text

Answer: passport
Credit: document

21. A certain Sibtrop was against the construction of HIM and during the debates even prayed for a hail. An English football club got its name from HIM. Name HIM in two words.

Hidden text

Answer: Crystal Palace.
Credit: Crystal Palace; Crystal Palace.

22. One novel describes a scientific project to study lightning, codenamed “3141.” What word did we miss in the question?

Hidden text

Answer: ball
Credit: the exact answer.

  • Kavo and Sho – ✅

  • ChatGPT 4o – ❌ (gave the answer – “pi”)

  • ChatGPT o1-preview – ❌ (gave the answer – “pi”, 50 seconds)

23. The hero of one novel is up to no good and, upon learning that the anthem lasts two and a half minutes, he cuts it accordingly… What?

Hidden text

Answer: Bickford fuse.
Credit: fuse; fuse; wick (counted on appeal).

24. In one novel, researchers from the future suggest that THEM were once thought to be the seat of the mind. The need for THEM disappeared due to changes in diet. Name THEM.

Hidden text

Answer: wisdom teeth.
Credit: third molars.

25. In the fifteenth century, Bartolomeo Colleoni was considered the best military leader. At the end of his career, Colleoni was appointed head of the HE and planned to begin operations in Albania. What is HE?

Hidden text

Answer: crusade.
Credit: the exact answer.

26. Hong Kong is a large city with very dense development. One Hong Kong resident recalled that when he was a child, he wanted to get on a plane. The main character of the novel “The Kite Runner” loves to DO THIS. Answer in three words what DO THIS is.

Hidden text

Answer: fly a kite.
Credit: fly a kite.

  • Kavo and Sho – ❌ (I don't remember why)

  • ChatGPT 4o – ✅

  • ChatGPT o1-preview – ✅ (6 seconds)

27. Under the ruler Topiltzin, things were going very well in the Central American Toltec state. Symbolically, Topiltzin is believed to have been the first to brew HIM. Name HIM.

Hidden text

Answer: chocolate
Credit: cocoa

28. The planet Mapusa from the Star Wars universe is famous for its minerals. The creature that Obi-Wan Kenobi first encounters on this planet looks like HIM. “HIM” is the title of a spy movie. Name HIM.

Hidden text

Answer: mole
Credit: the exact answer.

29. A character in a novel sees a picturesque THIS, in which one looks in the opposite direction. Name THIS in two words that begin the same way.

Hidden text

Answer: field of sunflowers.
Credit: sunflower field.

  • Kavo and Sho – ❌

  • ChatGPT 4o – ❌ (gave the answer – “double portrait”)

  • ChatGPT o1-preview – ❌ (gave the answer – “Two-faced Janus”, 14 seconds)

30. A doctor from one TV series diagnoses a very silent patient and says that he feels like HIM. In one of the cartoons, a deceived husband is waiting in line to see HIM. Who is HE?

Hidden text

Answer: vet
Credit: veterinarian.

31. The hero of one novel lives in the south and claims that the postman sometimes brings him mail from distant countries in his bag. Which word in the question did we change almost entirely?

Hidden text

Answer: pelican
Credit: the exact answer.

  • Kavo and Sho – ✅

  • ChatGPT 4o – ❌ (answered – “wind”)

  • ChatGPT o1-preview – ❌ (gave the answer – “albatross”, 88 seconds)

32. A character in one novel believes that conquistadors once brought IT to the virgin jungles of the Indians. In Chagall's painting on a biblical subject, IT creates a visual rhyme with the heroine's breasts. Name IT.

Hidden text

Answer: apple
Credit: apple from the tree of the knowledge of good and evil; apple of knowledge.

  • Kavo and Sho – ❌

  • ChatGPT 4o – ❌ (answered – “wheel”)

  • ChatGPT o1-preview – ❌ (gave the answer – “violin”, 48 seconds)

33. It's funny that in one fantasy, a robber demands a wallet or life from another character. The robber receives, according to an old tradition, only two coins. What farming tool is this other character often depicted with?

Hidden text

Answer: with a scythe
Credit: the exact answer.

34. In a post-apocalyptic series, people have lost something and civilization has degenerated to the level of the Middle Ages. The troops use many dogs, and one piece of equipment is missing IT because it is no longer needed. Name IT in one word.

Hidden text

Answer: visor
Credit: the exact answer.

35. Gioachino Rossini took into account the wishes of the audience in different cities, so there is a Neapolitan version of the opera with a happy ending, featuring a love note and a knife. Another version, staged in Venice, is much closer to the original. Name this original work.

Hidden text

Answer: Othello
Credit: Venetian Moor.

36. In one series, a character called “bird person” is killed. When this character appears in the next season, he asks to be called… What name?

Hidden text

Answer: Phoenix
Credit: Phoenix personality.

Results

  1. ChatGPT o1-preview – 25 points

  2. Kavo and Sho (aka Leather Bags) – 23 points

  3. ChatGPT 4o – 11 points

What conclusions can be drawn here? It's scary, friends. I just don’t know yet whether it’s terribly interesting or just scary. Artificial intelligence is leaping forward at a breakneck pace. The 4o model, which was almost State of The Art just recently, lost to the new model more than twice. Hard-working students who trained before the ChGK tournaments, believing that only humans are capable of answering such tricky questions, have already been left behind by machines. In questions on erudition, unfortunately, we had no chance even against 4o.

I will not draw any moral conclusions, especially since I do not have any clearly formulated position on how we should live in this brave new world. What will happen to jobs, scientific work, exams, olympiads, intellectual games, and in general – to all of us? We will live and see. And, I hope, we will survive.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *