Making a crossword puzzle about ML using Monte Carlo and ChatGPT
And this is how GPT-4 Omni copes with the task
Prompt> Please make a short crossword puzzle about machine learning.
GPT-4O> Sure! Here's an example of a 10-word machine learning crossword puzzle.
Crossword grid
| _ _ _ _ _ _ _
| _ _ _
| _ _ _ _ _ _ _ _ _ _ _ _ _
| _ _ _ _ _ _ _ _ _ _ _ _ _
| _ _ _ _ _ _ _ _ _ _
| _ _ _ _ _ _
| _ _ _ _ _ _ _ _ _
| _ _ _ _ _ _ _ _ _ _ _ _ _ _
| _ _ _ _ _ _ _ _
| _ _ _ _ _ _ _ _ _ _ _ _ _
Questions
An approach in machine learning where an algorithm is trained on labeled data.
A method for reducing the dimensionality of data.
A parameter that controls the complexity of the model and can prevent overfitting.
A graph used to evaluate the performance of classification models.
An algorithm that works on the principle of decision tree voting.
One of the most popular clustering algorithms.
A Python library for machine learning.
A technique used to evaluate the accuracy of a model by partitioning the data into parts.
A general name for methods that use multiple base models to improve accuracy.
The process of adjusting model parameters to achieve better results.
Answers
S U P E R V I S E D
P C A
R E G U L A R I Z A T I O N
R O C C U R V E
R A N D O M F O R E S T
K M E A N S
S C I K I T L E A R N
C R O S S V A L I D A T I O N
E N S E M B L E
H Y P E R P A R A M E T E R
Doesn't look much like a crossword puzzle, does it?
Things are no better in English
Prompt> Please compose small crossword-puzzle about ML.
GPT-4O> Certainly! Below is a small crossword puzzle focused on Machine Learning (ML) terms.
10 | O | V | E | R | F | I | T |
I | | | | D | | | S |
N | | P | I | | | | |
S | | R | - | H | V | A | R |
E | A | G | G | R | E | P | I |
T | | E | I | L | I | O | U |
4| E | L | M | A | I | N | C | V
|
Across
1. Essential component in programming used to store data. (8)
4. A type of neural network used mainly for NLP. (3)
6. The first stage in data processing. (4)
8. A loss function often used in regression tasks. (6, 3)
9. Technique to reduce overfitting. (8)
10. Model evaluation technique dividing data into training and testing sets multiple times. (10)
Down
2. Autonomous programs that replicate human decision-making. (5)
3. Technique used in SVM to create linearly separable classes. (9)
5. Algorithm paradigm inspired by the structure of the brain. (6, 8)
7. A statistical measure used to understand distances in clustering. (6)
Since GPT can't handle the task, we'll have to figure out how to help it. In this article, I'll tell you how to do it.
All code can be found in repositories.
TL;DR Crossword about ML
Here is a link to the crossword.
It will take you 20-30 minutes to guess. It is definitely solvable, tested on live programmers.
N.B. The crossword is in English, because all the professional jargon is nothing but Anglicisms and I decided it would be strange if the questions were in Russian and the answers in English.
Step #1 Collecting a word base
At first I wanted to write some ML terms off the top of my head, but my imagination was only enough for 30 words. I had to arm myself with a data-driven approach.
I found a reference book on ML terms on GitHub – https://github.com/bfortuner/ml-glossary
Written by script which parses unique words – there were about 3000 of them.
I looked through them manually and left only those that are suitable for use in the crossword: nouns or verbs in normal form, if a word appears in several spellings, then I leave the “most normal” one.
In the end, I got a list of 242 words.
It took me about half an hour of real time to review 3 thousand words.
In theory, the database could be filtered automatically using the distance between word embeddings and something like TF-IDF.
Step #2: Put them together in a crossword puzzle
There are many websites on the Internet that allow you to create your own crossword puzzle from a list of words, for example Crossword Labsbut the final crossword turns out to be very “sparse”, I would like to make it more compact.
I wrote a small program in C++ which randomly tries to assemble the most compact crossword puzzle, works like this:
Place the word in a random place on the field horizontally
We try to place the next word vertically in a random place so that there is at least one intersection with other words.
Next we place the word horizontally again with at least one intersection. GOTO p2
If the word cannot be placed, we look at the “occupancy” metric (the number of occupied cells divided by the number of free cells).
We repeat steps 1 through 4 several times, choosing the crossword with the maximum “business”.
Step #3: Make Questions with ChatGPT
Praise LLM, even earlier this stage was the most difficult to automate, now everything is done quite trivially. We run through every word via API from OpenAI (in total, for 242 words it cost me less than 10 cents).
System prompt
You are avid ML engineer working on a little puzzle (crossword) to check the knowledge of ML of your colleagues.
Prompt
Please compose a short ai/ml themed clue for word '{word}', do not mention number of letters in the clue.
Step #4 Putting it all together into a web page
I don't know anything about web development, so I tried to reverse engineer a few sites to “borrow” their crossword rendering engine, unfortunately I didn't succeed. But after about 3 hours of searching the Internet, I managed to find a great repository on GitHub that did what I needed https://github.com/jweisbeck/Crossword
Unfortunately, there were a lot of bugs, but somehow I managed to fix them. Another Python script to put it all together and…
Voila – the crossword is ready https://udalovilia.com/crossword/
Instead of a conclusion
If you liked it, please share it with your friends, here is a link to the crossword.
And you can give mine a star GitHub repository.