Hello, this is a story about the development of the most user-friendly app for learning English words by an indie developer. Another?!
– Yes, but with GenAI features and algorithms!
My name is Sasha Ershov
I am a mobile developer since 2012, based on Android, Lead
Part 1. Background, theory and research
Last November I try to live in Dubai and decide to improve my vocabulary. Starting to sort through the applications to study and settled on Anki – this is a legendary combine for cramming everything: words, hieroglyphs, names of muscles and bones – everything where there is cramming. Link to Wiki.
This is a flashcard application where you are shown a word, you try to remember the translation or definition, if you do not remember, press the button and look.
Anki works on spaced repetitions and Ebbinghaus forgetting curveinvented in the 19th century. Their essence is to try to show you the memorized word and repeat a little until you forget, gradually increasing the interval. So the word is fixed in long-term memory.
The first programs with this principle
began to appear in the late 1980s
In other words, you are shown the word you are learning. You try to remember his translation. If you remember, it will show up for inspection after three days, then after eight, twenty-two, and so on. But if you forgot, the chain is reset and starts to be built anew from the first day.
The algorithm embedded in Anki is called SuperMemo2. Interestingly, now there is already the 18th version of the algorithm. And now they go disputes Anki vs. Supermemo about the effectiveness of cramming approaches. I’m on the practical side. People were not ready to practice words more than once a day. Therefore, Anki and many other applications continue to use the second older version.
In any case, persistence is more important than the algorithm.
How to teach clearly
but what to learn?
Words, but what?
I searched tests to know the number of words. My level at the start showed – 4500 words. Further, once a month I re-passed such tests. And my practice has shown that such tests work with an error of ± 500-1000 words.
Words are more important than grammar, they can be spoken as tags and it’s clear: “I’ll go for a walk girl tomorrow”, “I’ll lose my document” – it sounds wrong, but it’s clear what it is about.
Word frequencies and efficiency
I found a list of words distributed by the frequency of use by the English in speech.
And you should start learning with daily he And she.
Then move on to harder words. respect And ticket.
And then even rarer refusal And pave for efficiency.
So for a smaller number of learned words, there will be more understanding.
I scrolled through this list, I found a place after which there were a lot of gaps in my list. And from this place I began to learn words in the direction of rarer ones. Let’s call this place the starting point.
But I didn’t know the word vast. And on my list it was in the first 1000 words.
And there were a lot of unknown to me, but frequent words to the starting point. I ran and also added them to the training.
I love saving time 🙂
Later, I flipped through all 16,000 words with my hands and excluded from the study all the words I knew so as not to learn and not waste time on them. At a rate of 1000 words per hour.
I found a study on vocabulary test sites that it is generally possible to find such a starting point. And it happens very rarely when a person does not know popular words, but knows many rare ones.
The students stop, who is on he, who is on disturb. After it, the student knows few words. They rarely met him and he did not remember them.
The percentage of English learners and the number of words they know at levels A1-C2 roughly varies:
Level A1, Beginner: ±10-20% and knows 300-500 words
Level A2, Basic: approx. ±20-30%, 600-1000 words
Level B1, Intermediate: about ±30-40%, 1000-2750 words
Level B2, Upper Intermediate: approx. ±15-25%, 2750-4500 words
Here they often start talking, watching movies and dropping out of the study.
Therefore, a small percentage of people switch to C1 +.
Level C1, Advanced: about ±5-10%, 5000-7000 words
Level C2, Fluency: about ±1-5%, 8000+ words
what people add
about 2500 words per year
in native language
Result, 10.000 words
Or +5500 new words. So many tests began to show after 6 months of study. I learned 30 new words a day or +900 a month. 30-60 minutes a day, after breakfast.
More than 60 minutes is hard, from 40 minutes I felt dissatisfied with how long I sat out.
It turned out to be a very effective way, even more effective than immersion and living in an English-speaking country.
Anki is a fantastic harvester with a million features. Her advantage is her problem. To get started, there is a lot to do and understand. Pass tests, find and install the base, find this starting point, turn words on and off, incomprehensible design and strange statistics.
Yes, there are other applications, I looked at them and used them. But everywhere it seemed to me difficult and also not clear. Or it was no better than Anki. Quizlet is also good, but did not solve a problem like mine.
Marvel is a miracle
While I was learning the words, I shared my statistics and various interesting words on the banned Instagram. And subscribers asked what kind of application. I talked and gave links.
I threw off the link to people, threw off my initial frequency base of words with voice acting. Really asked a lot. But I think that’s where a lot of people stop.
I even got upset when I realized that there is no simple app for memorizing words. Thought it might be a dead market. What people want is DuoLinguo, Lingua-Leo and they are more about grammar. There are similar and also good reWords, but nothing just worked, only the complex and old Anki. Somehow, the idea of an application was born, simple, with a simple entrance, for cramming words.
Part 2. Application
And I thought, I need to make my new application, with all this. And sat down, the technology was taken by the run-in Flutter. I know, I write, I tried Kotlin Multiplatform. But there is still a lot of things in the future.
At work, we switched to Flutter, it performed well. I sat on it for a year, I know it very well and write quickly. And as photographers say: the best camera is the one you have. Plus Flutter allowed to release immediately on iOS and Android.
In two months and 30,000 lines of code, I wrote the first version. And immediately began to run it on users and myself.
Frequency list of words
First, I gutted the tried and true list of words from Anki and tried it with him. But the frequencies were not quite right.
Then I began to search Google frequencies of the use of words in books NGram and just someone collected index bases for machine learning by 300k words improved the word frequency base. Plus I looked at the frequencies from Corpus of Contemporary American English (COCA). COCA, by the way, writes, take our frequencies and point us out and frighten us with their “spider” that scans the Internet for its lists at night.
I re-sorted the word base a million times. Improved Excel skills with chatGPT and caught the first database migrations for users. Everything is stored locally. And I couldn’t score and just say to so far not numerous users: Soryan, you need to start all over again. They were already training and I was sorry to lose their progress.
These are changes to the code on the git, almost a book. But 200k rows can be thrown away, that’s 12 full database reloads of 14,000 rows.
In the process, the idea came up to generate images with a neural network. I tried running Stable Diffusion on Mac. It was a discovery for me that anyone can run it at home.
It worked, but on my M1 one request was generated for 2-4 minutes, there was not enough RAM and architecture.
But at least 4,000 words and queries were needed – this is a sufficient language core for comfortable conversations with other people.
I even thought about buying or renting a gaming PC on Windows for 6000 rubles for a month with a powerful video card.
But the guys, former colleagues, helped, they raised their Stable Diffusion and already retrained it. They gave me a couple of times free API access for a couple of days to their server. And I generated 4500 pictures with scripts in a couple of nights.
At first I tried to generate by word, but there is a problem with abstract words like “upset”. The neuron does not cling to something that cannot be depicted. So I switched to examples plus the word itself in the query.
Art styles and coloring were taken randomly from the set and substituted into the query. But then I had to select them with their hands anyway at a speed of 1000 pictures a day.
Generation examples. There were 2 stages and the first one is immediately visible 🙂
Sometimes these pictures turned out very strange. At first I thought that the word horse necessarily needed a horse. But after trying, I even got high with the fact that they are like that.
To remember, you need unusualness,
so it’s easier for the brain to connect
new abstract words.
Plus, this approach involves more channels of perception, both visual and auditory.
Testing and product development
And in parallel with the development, I continued to test and talked on the forbidden Instagram about the development progress. Now there are about ten active users. These people are my gold and my consultants! Ten is not zero and it means that the idea is working. They suggested what was inconvenient and what did not work, and what came out cool.
For example, from the mistakes, at first I did the classical training. But no one read it, flipped through and naturally did not understand. Therefore, on my knee I collected a training video. It was possible to read it myself, but I found a voice and decided to voice it with a neural network, a little strange, but I even like it.
I uploaded it to the Rils to check the visibility and interestingness. According to such statistics, on people who are not “warmed up” at all, it turned out to be good. But it needs to be reduced to 45 seconds and the most important move to the first 15 seconds.
So why another English app?
It is believed that the first startups are best done not in the “blue” innovative and uncompetitive market: it is difficult for people to explain new ideas. It is advised to go to understandable “scarlet” and highly competitive markets, such as psychology, esotericism, weight loss and English. There is always a demand in these areas. And you can take your share of the pie. Plus, there are no clear leaders in the study of words.
I looked at almost all applications in this area and collected interesting and working ideas from them. I see a lot of apps with complex logins that need to be figured out. At such moments, micro-anger arises. Therefore, in mine, I try to throw out everything that is heavy, unnecessary and not working.
For example, I threw out the original test. It turned out to be not working, people did not know the words chair and blue, and the test said level B2, they thought they knew and were self-deceived. The test lied at the starting point, but the number of known words shows well.
Now the plans are to collect 1000 active free users and feedback from them. And then come to experiments with paid subscriptions. Although the functionality is ready.
Need to order a UI redesign. Mine if you show it to pro-designers they fuck you up. But taste, now UX is more important 🙂
And I want to do activities with friends. Word discussions, competitions, and challenges are like circles on an Apple watch. My wife was once so carried away by this that we went with her at 23:30 and closed activities so that she could beat her friends. It will be interesting to share and watch the statistics of friends.
I would like to make unusual examples, generate them with chatGPT. And redraw them.
I would like to understand the statistics of learning complex words and give the best ways to memorize the analysis of these statistics.
A month ago, I received an investment proposal from Dubai for 10M₽ for 20% of the company. The guys are great. But they need retention and more users.
In June, I applied for a grant for 4M₽, I am waiting for a decision.
And now I am preparing a campaign on Boomstarter for 2M₽.
I tried several times to run with other projects. And this is my first project in which I absolutely believe and see what can take off. I use it myself and it’s great to see how they use it.
Yes, money may run out, but you want to bring it to mind and success.
There are difficulties with RF advertising and disabled payments in our stores. By the way, the latter was solved, thanks to a colleague from Montenegro.
There are difficulties with payments in the Russian Federation, users returning to the application, bugs, transfers. But everything looks solvable.
The main thing is that there is an unusual idea and it became clear to me what uniqueness looks like. There is confidence doing the right thing, there are first users. And it is clear how to look for such ideas in the future.
You get used to the first negative assessments, this is gold, because it is clear what to improve. Though inside and angry at first.
Working on a startup is really not easy. But so far this is the most interesting thing I’ve done.
Learn English words from neuro-pictures
Download, learn, try
The application implements all the principles of efficiency from the prehistory:
Unusual GenAI pictures
Effective frequency word list with third forms and phrasal verbs
Super-simple Tinder-like interface
Determination of the input level, starting point, and instant exclusions of known words
Notifications, themes, sounds and voice acting
Word manager, search, filtering
and breaks for quotes by Jobs and Rocky 4
Full story in the form of a selection of stories in Google Photos.
There are proposals for work and cooperation, we can discuss:)
Thank you for your attention
Like if you liked it 🙂