Computer vision in autotests. Search for an element by photo

My name is Daniil Akhetov and I have been automating testing of web applications on JavaScript for over 5 years. During this time, I have tried all the most popular frameworks for automating UI testing, Playwright, WebDriver.io, Cypress, etc. It is impossible to say that any framework is better or worse, each has its own unique capabilities and features that need to be taken into account when selecting automation testing tools in each specific project. And although all the tools are different, they have one thing in common: they all search for elements on the page by selector. This requires that html elements be given unique attributes, be it a class or a data attribute. Very often I encountered the fact that after refactoring the application code, tests failed, and on the one hand, this is correct, because the tests responded to the change in markup, and on the other hand, there is no actual error in the application, because it was not the interface that changed, but the structure of the DOM tree, but every time it was painful to analyze such failures. Undoubtedly, special data attributes partially solve this problem, but sometimes they are forgotten, and when the project becomes large enough, the team begins to think about reducing the size of index.html, so that when the user clicks on the link, he gets useful information as quickly as possible. This is where the race for every byte and every millisecond begins, and the need for any “extra” character in the html markup is questioned. At such moments, data attributes for autotests are the first candidates for removal from the final html. For quite a long time, I have been thinking about how to find elements on the page without being tied to the DOM tree, the result of these searches is a plugin that I wrote for Cypress. This article will discuss it and the approach it uses.

Disclaimer

I would like to immediately make a reservation that the plugin is far from ideal and there is a problem that greatly hinders its commercial use. It is rather an open source prototype that may inspire the creation of something greater. I specifically tried not to use complex constructions in the code and deliberately abandoned TypeScript so that everyone could understand the working mechanisms and transform the plugin to their specific needs. Well, and, probably, it is not worth mentioning that Cypress was chosen as an example, a similar plugin can be written for any existing framework.

Solution

Once in the Appium source code I came across a method that was little written about in the documentation. At the moment, I have not even been able to find this method again. It consisted of finding the coordinates of a given template on the original image and returning the coordinates of the center, that is, by passing the image of an element to the method, we could understand where this element is on the page. Appium, using this method, allowed us to check for the “presence” of an element and click on the coordinates. It was interesting, but I still wanted to get a full-fledged element so that at least I could check the text, styles, sizes, etc. The search for a template in this method was performed using OpenCV methods – a computer vision library.

I have always been inspired by research and development in the field of computer vision. In my personal opinion, the team of developers of OpenCV, a library, has achieved much greater success in this area. Its algorithms for finding a template in an image and calculating its coordinates formed the basis, among other things, of my solution.

For a long time I tried to create npm packages like opencv-jsbut it didn't work very well, to put it mildly, it simply didn't work out of the box, and when I tried to compile something from the source, it took a long time and was unsuccessful. All the npm packages I tried had the latest update 2-3 years ago, apparently this greatly affected the success. Then it was decided to use the OpenCV version for Python, needless to say, everything was assembled in a matter of seconds. All that was left was to write the logic for searching for a template in Python and run the script in the context of Node.js, so that the autotest had the ability to pass the necessary parameters to the script itself. Fortunately, such tasks in Node.js are very easy to solve. The Python script itself is launched using the package python-shelland since the methods are asynchronous, they are wrapped in child_process.execSync()so that the sequence of actions in the autotest is not violated. As a result, we got a rather crazy design, we have a Python script that is launched in a JS script by the utility python-shellwhich in turn is launched on the server in a process that blocks execution of the main context through child_process.execSync(). But the order of calls in the autotest itself is not violated.

At the time of writing, npmjs the project came to life @techstark/opencv-jsand I will definitely try it again in new versions of my plugin.

The logic of the plugin itself is quite simple. We “cut” the element we want to find from a previously prepared screenshot of the page and pass it to the template search method. In this case, the screenshot on which we will search for the template is created during the autotest, that is, the coordinate system of the page in the browser and the screenshot will be the same. Then we are returned the coordinates, by which we will use the standard method document.elementFromPoint in JavaScript we get a DOM node, which we can work with as with a regular element. I think it goes without saying that if one element “overlaps” another, then either the parent element or the element with the highest z-index will be selected. It's like event capturing only with a challenge event.stopPropagationto stop the dive. In most examples that I have encountered, it is enough to use the framework itself to find a child element from the found element, even by the html tag. For example, in cypress this can be done like this:

cy.get('@el').find('button').as('targetElement');
cy.get('@targetElement').should('be.visible');

Yes, we still write a selector, but since we are searching within the already selected element, it will most likely be unique within a small html snippet.

It is worth noting that the template comparison method chosen was TM_CCOEFF. In this article I will not go into details of the types of template search, this is a topic for a separate article, I will only say that they are all described in the OpenCV documentation. One can only add that instead TM_CCOEFFit's probably better to use it anyway TM_CCOEFF_NORMEDbut for my purposes this was not essential.

Any pattern search algorithm is based on pixel-by-pixel comparison in one form or another. This approach requires taking into account the difference in DPI between the pattern and the image. It may arise from various factors, for example, the pattern was prepared from an image of a higher or lower resolution than the screenshot on which we will perform the search. Device Pixel Ratio also has an effect, it is clear that on the server the tests are most likely run in X11 with ratio = 1, but on the local machine the monitor may be Retina, which has the same Pixel Ratio of at least 2 units. As a result, we can get working tests on the server, but locally they will constantly crash or vice versa. For this purpose, a loop is implemented in the Python pattern search script that goes through different image scales (from 20% to 100% of the original size), reducing its size and performing pattern matching at each step. In each iteration, we remember the value of the highest correlation and ultimately leave the coordinates and scale of the iteration in which the correlation value, that is, the template match, was maximum. We need to save the scale in this case in order to multiply the template coordinates by it to find them on the original image. In this simple way, we get an algorithm that will find a template made, for example, at a resolution of 1280×720 on a picture in 1920×1080, which allows us to have one picture of an element for several screen resolutions that are set when passing the autotest. And it will not matter to us where the screenshot for the template was taken.

That is, now instead of a selector, we can pass a relative path to the image with the element, and when running the autotest will be able to find this element on the page and return it to us. The code looks something like this:

// Ищем элемент по картинке и записываем его в алиас
cy.searchByImage('cypress/imageSelectors/todoLogo.png').as('todoElem');
// Проверяем, что элемент видим на странице, обращаясь к нему через алиас
cy.get('@todoElem').should('be.visible');

What does this approach give?

No more long and painful search for unique selectors for an element
There is no need to specify data attributes or create special assemblies for automated tests.
It becomes possible to test the positioning of elements and their appearance, although without a color scheme, because the images are converted to gray colors to speed up the search
Autotests depend on visual changes to the interface, not on the DOM structure of the document

How to get it?

The plugin is already published as an npm package, here link on it, in the ReadMe I tried to describe its connection in as much detail as possible, the requirements for the system in which it will be launched, and made an example of use.

What is the problem for commercial use?

Of course, although this approach may seem like magic at first glance, it has the biggest disadvantage, which is the speed of work. If a standard CSS selector finds an element in a few milliseconds, then this approach can take up to 2 seconds of your life. Naturally, when we talk about a huge number of autotests, this approach simply becomes inapplicable and the time that we save on creating an autotest we will spend on running it. But the very possibility of using technologies close to computer vision in automating UI testing makes us one step closer to systems that can conduct acceptance testing almost like a real person.

P.S.

A Python script for finding a pattern in an image was found on stackoverflow and modified for a specific task, here original post with the original version of the script. In the plugin code, I deliberately did not remove the original comments from the script.

I know about existence SikuliXbut this is a separate large project, I wanted to show that similar functionality can be brought to any framework for automated testing of web applications.