About testing mobile applications. Part 3. End-to-end (UI, e2e) testing


Earlier, we got acquainted with the testing pyramid and its base. In this article, I propose to go straight to the top of the pyramid.

At the top of the pyramid presented in Article 1 are end-to-end tests. In the context of end-to-end tests, we can talk about e2e (end-to-end), UI, system, user interface tests … In other words, this article will focus on tests that are carried out on the system as a whole. The main task of this group of tests is to check whether the entire system, as a whole, satisfies the presented and declared requirements.

In the development of mobile applications in general, as well as in the development of applications for android in particular, the concept e2e tests often used as a synonym for user interface tests (UI tests) and tool tests. However, it should be remembered that in the general case this is not entirely true, since they do not always mean the same thing, since everything can depend on the context.

Instrumented tests are tests that require a special environment to run – either a physically connected device (smartphone, tablet, etc.) or an emulator (simulator). This group may not be limited to the usual UI tests, but may also include a large number of other types of testing – for example, testing of working with a database or disk, in which the UI is not needed at all, screenshot tests, etc.

E2E (end-to-end) tests, in turn, may include:

  • tests that use the entire system: the entire mobile application, without replacing certain layers (for example, depending on the chosen test strategy and the needs of the team, instrumental tests can be carried out on the application, with simplified logic, without using a real network layer, etc. .P.). In other words, E2E tests in the context of the application (service) architecture (shaded areas indicate possible types of testing and possible areas of their application):

    E2E tests in the context of the architecture

    E2E tests in the context of the architecture

  • tests of the main use cases – from start to finish. This term is often used when a user script includes too many steps. For example, the main scenario of an online store application can be divided into several loosely related stages: authorization, product selection, payment, post-payment. Or you can combine all the steps together and test the entire scenario from start to finish (end-to-end).

    E2E tests in the context of scenarios

    E2E tests in the context of scenarios

Thus, when using these terms, always keep the context in mind. Further in this article, we will mainly focus on instrumental UI tests.

So, UI tests, as a subset of instrumental tests, are executed on devices with the operating system of the target platform (real or emulated), so that they can use almost everything that the platform API provides.

UI tests for mobile applications are automated tests that simulate user interaction with the application. This group of tests allows you to verify that the user interface elements work as expected by the scenario.

Most often, such tests are written using frameworks. For Android, it’s mainly Espresso and UI Automator. However, there are also proprietary solutions using third-party technologies (for example, Jest). The framework API may not always be convenient and readable for a number of reasons, and they often write their own wrappers and libraries for more convenient work (for example Kakao, Kaspresso).

The most common practice when writing UI tests is to use the pattern Page Object. The main idea of ​​the approach is to create an object-oriented representation of the screen / page (originally the pattern was used when testing web pages), with which the test interacts. This view allows you to decouple tests from page implementation details and write better tests that are much easier to write and maintain. An example of such a view for the application described in article 1:

class LocationSelectionScreen : Screen<LocationSelectionScreen>() {
   val nextButton = KView { withText(R.string.next_action_bar_button) }
   val locationsList =
       KRecyclerView(
           builder = { withId(R.id.recycler_view) },
           itemTypeBuilder = { itemType(LocationSelectionScreen::LocationItem) }
       )

   class LocationItem(parent: Matcher<View>) : KRecyclerItem<LocationItem>(parent) {
       val label: KTextView = KTextView(parent) { withId(R.id.item_label) }
   }
}

An example of the test itself:

    @Test
    fun testLocationSelection() {
        onScreen<LocationSelectionScreen> {
            locationsList {
                isVisible()
                firstChild<LocationSelectionScreen.LocationItem> {
                    isVisible()
                    label { hasText("Sydney") }
                    click()
                }
            }
            nextButton.click()
        }
        onScreen<ForecastScreen> {
            screenView {
                isVisible()
            }
            locationName {
                hasText("Sydney")
            }
        }
    }

Run example:

UI test

UI test

The benefit of this approach cannot be overestimated if the screen interface changes. In this case, it will only be necessary to change one page object instead of rewriting all existing tests. As you can see, the test itself is not tied to a specific screen implementation: locationsList could be like RecyclerView, ListView and its proprietary solution. And in the case of migration from one implementation to another, most likely you will only have to change LocationSelectionScreenand leave the tests unchanged.

In addition, if such views are written in a similar way on other platforms (iOS / Web), you can resort to code auto-conversion, which will allow the QA engineer to reuse tests between platforms and increase their productivity many times over.

As you can see, in contrast to unit tests, user interface tests, and even more so e2e tests, include a large number of components and systems. Which, on the one hand, also allows us to test all these systems and their interaction by writing only one test. However, this also implies their main drawback – due to the presence of a large number of components that can affect the result of execution – such tests are not always stable.

To explain their instability is quite simple classical engineering approach.

Such questions are dealt with in such a field of science as Reliability theorywhich defines reliability as a property of an object (system) to keep in time within the established limits the values ​​of all parameters characterizing the ability to perform the required functions under specified conditions of use, maintenance, storage and transportation.

If, for simplicity, we assume that each command in our system passes through a system with a serial connection (in reality, of course, everything is much more complicated): screen tap emulation, operating system tap processing, application code tap processing, network request/response, application response processing , generating a command to update the user interface of the application, processing the received command by the OS, updating the interface hardware, processing the changed state of the interface by the test framework … Then you can apply the formula [1] multiplication of probabilities:

P=\prod_{i=1}^nP_{i}

That is, the reliability of our system is the product of the reliability of the elements of its components. There are no ideal components, and if we assume that our system consists of 10 elements, the reliability of each of which is 99%, we get

P=0.99^{10}\approx0.9

Which can be interpreted in such a way that every tenth test run will fail. Because of this, the team will often have to check whether something really broke, or if the result is a false negative. Which, you see, is very bad and will incredibly slow down development and demotivate the team (after all, we all want to write new features that we can add to our performance review, and not fix phantom bugs).

In addition to our code, the test result can also be strongly influenced by its environment – the state of the emulator or device, the state of the network (if it is used). Also, running UI tests is incredibly slow and resource intensive. For example, running the test above on my device took 2 seconds, while running unit tests takes a fraction of a second.

Error Example

Error Example

As you can see, there are enough problems with this type of testing, but you should not rush to write them off. There are ways and solutions to minimize these problems.

Assume the stability of the test is 90%, then on average 1 run out of 10 will give a false negative result. This result does not look so rosy. The quality of such a signal will be quite low, and at some point the team may simply stop paying attention to it or turn it off altogether. The formula can come to the rescue again [1]. If the test crashes in 10% of cases and these events are unrelated, you can restart it after the first crash and take into account the result of 2 launches. If it is enough for us that the test passes at least 1 time, then in this case we will see красный the result in our report is quite rare: 0.1×0.1=0.01 or 1% (in the case of 95% stability, we get an even more acceptable result: 0.05×0.05=0.0025 or 0.25%). There are projects that allow solving this and a number of other problems (for example, Marathon).

As you can see, UI tests are quite a powerful tool, but they should be used very carefully. While working with them, the following list of comments has accumulated:

  • Depending on the size of the team and the project, on the number of tests themselves, you can vary the strategies by which the tests are launched. For example, you can strategize based on impact analysis and check only those parts of the project that have been affected by changes (it’s unlikely that something will break on the location selection screen if you change the temperature rounding method on the next screen).

  • With a large number of developers and a lot of activity in the project, it may be more profitable not to run tests for every change, but to run them at some time intervals or after a certain number of changes have been made. And in the case of regression, it is already retrospective to look for the change that caused it (for example, using the mechanism bisect).

  • UI tests can help stabilize development in the early stages of a new project, when requirements may change rapidly or the project may even be cancelled. It also often happens that there is not enough time even for the minimum viable product, let alone for tests. At such moments, you can write a couple of UI tests that will cover the main scenarios and give at least some idea of ​​\u200b\u200bwhat is happening with the project. Of course we can say No! product team, but external factors should always be taken into account as well. There are times when it is better to release a product without sufficient test coverage than to have time to write only half of the project, test it perfectly, but never finish the second part, because new funding could not be obtained or the dynamics did not satisfy the top management and the project was simply closed. (Unfortunately, we had to observe such scenarios of the development of events even in practice).

  • It is possible to develop a strategy that will combine several types of testing, each of which will be responsible for its specific part, with which it copes best. On one of the projects, we came to the conclusion that UI tests were responsible for testing the functionality of the navigation and the correct display of the UI (without checking for specific values). Also, a small number of e2e tests made it possible to monitor the overall health of the project.

To recap, UI tests:

  • allow you to make sure that the main functionality of the application works correctly;

  • allow you to identify errors in the operation of user interface elements;

  • allow you to save time (their launch is much faster than manual testing);

  • increase test coverage and allow you to numerically monitor the status of the project;

  • resource-intensive – their creation and maintenance may require special knowledge and tools;

  • time-consuming – creation, launch and debugging can require significant time costs (compared to other types of automated testing);

  • fragile – their stability depends on the stability of a large number of components and systems;

  • limited – allow you to test only the functionality that affects the change in the user interface.

I hope I managed to remain objective to the end and reveal both the positive and negative aspects of the tests related to the upper level of the pyramid. There are still moments and observations related to UI tests, which I plan to reveal in subsequent articles about integration and contract tests.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *