Just say no to end-2-end tests

You probably had this when you and your friends really wanted to watch a movie, and then regretted that you spent time on it. Or maybe you remember the moment when your team thought that it had found a “killer feature” and found its “pitfalls” only after the release of the product.

Good ideas often fail in practice, and in the testing world, a testing strategy based on end-to-end test automation is a good example of this.

Testers can invest their time writing many types of automated tests, including unit tests, integration tests, and end-2-end tests, but this strategy is mainly aimed at end-2-end tests that test the product or service as a whole. Typically, these tests mimic real user scenarios.


End-2-end tests in theory

Although relying primarily on end-2-end tests is a bad idea, in theory you can come up with several arguments in favor of this statement.

So number one on a google list of ten things, which, as we know, are true: “The interests of users above all else.” Thus, using end-2-end tests that focus on real user scenarios seems like a great idea. In addition, this strategy is generally attractive to many participants in the process.

  • Developers like this strategy, because you can transfer most of the testing, if not all, to others.
  • Leaders and decision makers like it because tests that simulate real user scenarios can help them easily determine how a failed test will affect the user.
  • Testers like this strategy because they are worried not to miss errors affecting users and how not to write a test that does not check the real behavior; writing tests based on user scenarios often avoids both problems and gives the tester a sense of accomplishment.

End-2-end practice tests

So, if this testing strategy looks so good in theory, then what’s wrong with it in practice? To demonstrate this, I will give a fictitious scenario below, but which is based on real situations that are familiar to me and other testers.

Suppose a team creates a service for editing documents online (for example, Google Docs). Let’s assume that the team already has some fantastic infrastructure for testing. Every night:

  • Builds the latest version of the service,
  • then this version is deployed in the team testing environment,
  • then in this test environment all end-to-end tests are run,
  • The team receives an e-mail report summarizing the test results.

The deadline is approaching fast. To maintain a high level of product quality, let’s say we decide that at least 90% of successful end-2-end tests are required so that we consider that the version is ready. Let’s say the deadline comes in one day.

Despite numerous problems, the tests eventually revealed real errors.

What went well

Errors affecting the user were identified and corrected before they reached him.

Something went wrong

  • The team completed its programming phase a week later (and worked a lot overtime).
  • Finding the root cause of the failed end-2-end test is time consuming and can take a lot of time.
  • Failures of the parallel team and malfunctions in the equipment moved the test results by several days.
  • Many minor errors were hidden behind large errors.
  • End-2-end test results have been unreliable at times.
  • The developers had to wait until the next day to find out if the fix worked or not.

So, now that we know what went wrong in the end-2-end strategy, we need to change our testing approach to avoid many of the above problems. But what is the right approach?

The true value of tests

As a rule, the work of the tester ends when the test fails. The error is logged and then the developer’s task is to fix the error. However, to determine where the end-2-end strategy does not work, we need to go beyond this thinking and approach the problem using our basic principles. If we “focus on the user (and everything else follows),” we must ask ourselves: does the failed test benefit the user?

Here is the answer: “A failed test does not directly benefit the user.”

Although this statement, at first glance, seems shocking, it is true. If the product works, it works, regardless of whether the test says that it works or not. If a product is broken, it is broken, regardless of whether the test says that it is broken or not. So, if failed tests do not benefit the user, then what benefits him?

Correcting errors directly benefits the user.

The user will be happy only when this unpredictable behavior (error) disappears. Obviously, in order to correct a mistake, you must know that it exists. To find out that an error exists, ideally you should have a test that detects it (because if the test does not detect the error, then the user will find it). But in the whole process, from the failure of the test to the correction of the error, the added value appears at the very last step.

Thus, to evaluate any testing strategy, you cannot just evaluate how it finds errors. You should also evaluate how this allows developers to correct (and even prevent) them.

Building the right feedback

Tests create a feedback loop that informs the developer about whether the product is working or not. An ideal feedback loop has several properties.

  • He must be fast. No developer wants to wait hours or days to find out if his change is working. Sometimes the change does not work (nobody is perfect), and the feedback loop must be executed several times. A faster feedback loop leads to faster corrections. If the loop is fast enough, developers can even run tests before checking for changes.
  • It must be reliable. No developer wants to spend hours debugging a test, only to find out that the test itself was erroneous. Unreliable tests reduce developer confidence in them, and as a result, such tests are often ignored, even when they detect real problems with the product.
  • It should allow you to quickly find errors. To fix the error, developers need to find the specific lines of code that cause the error. When a product contains millions of lines of code, and the error can be anywhere, it’s like trying to find a needle in a haystack.

Think Small, Not Big

So how do we create this perfect feedback loop? Thinking of the small, not the larger.

Unit testing

Unit tests take a small piece of the product and test it in isolation from everything else. They almost create the same perfect feedback loop:

  • Unit tests are quick. We need to write a small block of code and we can already test it. Unit tests are usually quite small. In fact, one tenth of a second is too long for unit tests.
  • Unit tests are reliable. Simple systems and small code modules are generally much more robust. In addition, best practices for unit testing – in particular those related to airtight tests – completely eliminate such problems.
  • Unit tests allow you to quickly find errors. Even if the product contains millions of lines of code, if the unit test fails, then you will most likely look at the module under test and immediately understand what the error is.

Writing effective unit tests requires skills in areas such as dependency management, writing stubs / mocks, and tight testing. I will not describe these skills here, but to begin with, the typical example offered to the new Googler (or as they are called in Google, the Noogler) is how Google creates and is testing stopwatch.

Unit tests against end-2-end tests

For end-2-end tests, you need to wait: first to create the entire product, then to deploy it, and, finally, to complete all end-2-end tests. When the tests still work, most likely they will periodically fail. And even if the test detects an error, it can be anywhere in the product.

Although end-to-end tests are better at modeling real user scenarios, this advantage is quickly outweighed by all the shortcomings of the end-to-end feedback loop:

Integration tests

Unit tests have one significant drawback: even if the modules work well separately, you don’t know if they work well together. But even then, you don’t have to run end-2-end tests. You can use the integration test for this. The integration test takes a small group of modules, often two, and tests their behavior as a whole.

If the two blocks do not integrate properly, why write an end-to-end test when you can write a much smaller, more focused integration test that detects the same error? Of course, you must understand the whole context when testing, but in order to test the work of two modules together you just need to expand the perspective a little bit.

Test pyramid

Even with both unit and integration tests, you will likely need a small number of end-2-end tests to test the system as a whole. To find the right balance between all three types of tests, it is best to use a visual testing pyramid. Here is a simplified version test pyramids from the opening speech of the conference Google Test Automation 2014.

The bulk of your tests are unit tests at the bottom of the pyramid. As you move up the pyramid, your tests become more comprehensive, but at the same time, the number of tests (the width of your pyramid) decreases.

In a good way, Google offers a 70/20/10 separation: 70% unit tests, 20% integration tests and 10% end-2-end tests. The exact ratio will be different for each team, but in general it should retain the shape of the pyramid. Try to avoid the following “forms”:

  • Inverted pyramid / ice cream cone. The team relies mainly on end-to-end tests, using several integration tests and very few unit tests.
  • Hourglass. The team begins with a large number of unit tests, and then uses end-to-end tests where integration tests should be used. The hourglass has many unit tests at the bottom and many end-to-end tests at the top, but few integration tests in the middle.

Just as an ordinary pyramid tends to be the most stable structure in real life, a testing pyramid also tends to be the most stable testing strategy.


If you love and know how to write unit tests, then you are welcome! A job open at LANIT Java developer in the DevOps team where you will find like-minded people.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *