Text recognition and translation software AssistAnt

Is your English good? Mine is not. At least not exactly enough to do without a translator in games.
A short search for free programs on the Internet did not help me. Perhaps I was just looking badly 🙂 But when I caught myself thinking that I would now pick up my cell phone and translate the screen using the camera, I realized that it was time to save myself. And the way to salvation is to make the translator yourself.
I found Tesseract OCR, a well-known text recognition program in narrow circles, and a free API for Google Translate. The result is a program that can translate the selected inscription on the screen on the fly. It looks something like this: you hold down the Win+Alt hotkey and use a rectangular area to select the part of the image where the incomprehensible text is located. The area is selected only while the hot key is pressed. Voila – in front of you is the translation in the tooltip! It looks something like this:

You can also recognize from the picture in the clipboard through the tray icon menu:

The AssistAnt project itself https://github.com/AantCoder/AssistAnt/releases/latest
Tesseract OCR text recognition component https://github.com/tesseract-ocr
Google Translate Rest API (Free) Translator using GTranslatorAPI https://github.com/franck-gaspoz/GTranslatorAPI
In short, that’s all 🙂 Some nuances and alternative ways of using it are in the “About” section of the application. Further I will describe the technical difficulties that I encountered during the development process for those who are interested.

Hot keys ruin everything

Whatever key on the keyboard you take: either busy or uncomfortable. If it seems that you have found a free and convenient one, then you simply have not found a program that already uses it. Therefore, I did not bind the hot key, but decided to only catch pressing Win + Alt. In my understanding, no program uses these two keys separately from the rest. Of course, if you press something else with Win + Alt, then my translator will not respond.
This combination serves its purpose well: it allows you to select an area on the screen, while minimally affecting the active program. True, there is one minus, I will describe it in the next paragraph.

Translation from tooltips

In an attempt to select text, move the mouse – it leaves the interface element – the tooltip disappears. This forced me to make a standard system, like when cutting screenshots. If you press Win + Alt and release it without moving the mouse, then a screenshot of the entire screen is created, which opens on top of all windows. And already in it it is proposed to select an area for translation, as with the standard combination Win + Shift + S (although this is not implemented so beautifully). Then everything is the same as in the first method: a tooltip with a translation is displayed, except that the selected image is placed on the clipboard (why? I just can).

Poor parsing of small text

It turned out that Tesseract (maybe all similar ones?) does not recognize text with a line height of less than 20 pixels. Especially when it’s with shadow or blur. The effects certainly help a person to read the inscription, but the neuron does not like it.
After suffering for several evenings, I threw in a complex combination of simple image filters. After that, small text sometimes began to read even better than medium-sized text. Because of this, I decided to add re-recognition without filters if the recognition quality was less than 90%. Ultimately it turned out like this:
First pass (good for the smallest text):

  • Enlarge the image by 2 times (nice, with “high-quality bicubic interpolation”),

  • Convert to grayscale

  • Enlarge the image by adding a 7 px empty border and 200 px white space on the right (short words are better recognized this way. Apparently, fewer lines are expected in the stretched image),

  • Increasing the sharpness

  • Enlarge the image by 2x

  • Sharpen again (double approach slightly reduces artifacts).

Second run (processing is simpler if the recognition quality from the first run is less than 90%):

  • Enlarge the image 3 times

  • Convert to grayscale

  • Sharpening

The third pass (suddenly, when discoloring, the text became invisible, or the sharpness worsens the recognition of the text):

Works slowly

There is a lot to be done in filters. They were made in haste, and remained. Obviously, if you select more than half of the screen, then the program starts to retreat into itself, trying to apply all these filters (the increase in the number of pixels by 16 times hangs especially well).
Therefore, and also due to the fact that, as a rule, in large selected areas of the screen and the font is also large, a selection of filters based on the image size was added:

  • If the image size is more than a million pixels (width * height), then we do not process it, but send it for recognition as is.

  • If the image is larger than 20,000 pixels, then we only increase it by 3 times. In this case, * appears after the % in the tooltip.

  • If the image size is smaller than that specified in the previous paragraph, then we apply all the filters described above. In this case, an * appears in the tooltip after the % with the number of runs it took to achieve recognition quality above 90%.

Memory leak

Sometimes it’s easier to kill than to feed. So I did, not wanting to mess with memory leaks in other people’s libraries (really in other people’s? ..) Now, five minutes after the last call to the translator, the program will automatically restart, and certainly free up all the memory. If it will be intensively used on weak computers, then a restart after 20 translations should help: the program waits 30 seconds after the last activation (to allow the text to be read) and restarts. I hope this will be fairly inconspicuous to the user.

It seems to be the most interesting described. The project itself can be viewed on github: https://github.com/AantCoder/AssistAnt
I will say with superfluous boasting – the program is cool. It helps me a lot with my terrible knowledge of English.

PS If you have any comments, ideas on what can be improved or performance suggestions, then write to me here or in Issues on github.

Similar Posts

Leave a Reply Cancel reply