Unit and integration testing to improve software reliability

Hello everyone, my name is Andrey Fedotov, I am a backend developer at the Digital Industrial Platform company, where we create a product of the same name – an industrial Internet of things platform ZIIoT Oil&Gas. Our team is developing a set of services designed to receive data from various sources. And my article is a kind of history of our project through the prism of unit and integration testing.

As Kent Beck said: “Many forces prevent us from getting clean code, and sometimes we can't even get code that just works” His book on TDD (I’ll make a reservation right away that we don’t use TDD, but the book is very good) uses an approach where first the code “that works” is written, after which “clean code” is created. And this approach contradicts the architecture-driven development model, in which we first write “clean code” and then struggle to integrate code “that works” into the project. Our reality was even worse: we had code that didn't work well, and through a lot of effort it became code that worked. Perhaps one day we will come to something pure.

To understand what I’m talking about, I suggest looking at the unit tests that were on the project a year and a half ago, around the same time when our team took over the set of the above-mentioned services.

The screenshot below shows a folder with the names of files with tests. There were about two hundred of them in this folder.

Yes, that's what they were called: UnitTest001.cs, UnitTest020.cs, UnitTest120.cs…

The following picture shows an example of one of the tests located in the file UnitTest015.cs.

Test name TestMethod02 – a typical test name in such files.

In this test, requests to external sources are made twice: first for writing, then for reading. Which already hints that this does not look like a unit test. And the query results are checked randomly.

Here's another test example. TestMethod16. Contains one line and calls the formidable method Run_142_04.

What could this mean? Let's fail and see what this method is.

This is an extension method, and its code does not fit anywhere, so here are only parts of it. And there are hundreds of such methods. I suggest not even trying to delve into the authors’ intentions.

You might think that these are some kind of automatically generated tests. But no, they were written by hand.

As an interim result, I will list the problems that the project had at that time:

many requests from TP with bugs that require the involvement of the development team;
difficult to maintain code (not only in tests, but in general);
the purpose of existing tests is not clear (there is no protection against bugs);
the cost of corrections is high.

As Vladimir Khorikov said in his book “Principles of Unit Testing”: “It's better not to write a test at all than to write a bad test” Therefore, we decided to write our own tests and get rid of the existing ones.

By the way, if you haven’t read this book, I highly recommend reading it. This is a storehouse of experience and recommendations. The author is a true professional and the book is excellent, although, of course, there may be moments that are not suitable for your team (for example, we have not adopted recommendations for naming the tests that were described in the book, but these are minor things).

Before we go further, let's touch on a little theory, and then we'll see what tests we did and what results it yielded. Here I rely on definitions from books Vladimir Khorikov.

Purpose of unit and integration testing

Unit and integration testing is not just about writing tests. Their goal is to ensure stable growth of the software project. And the key word here is “stable”. At the beginning of a project's life, developing it is quite simple. It is much more difficult to maintain this development over time. The graph below shows the dependence of time on progress for projects with and without tests.

This reduction in development speed is called software entropy.

In our project, we did not set writing tests as an end in itself. First of all, we wanted to solve the existing problems that I mentioned above, and also be able to grow the code base without reducing the reliability of the product as a whole.

Tests and stress

Moreover, there is a connection between tests and stress levels. This is written about in the book on TDD by Kent Beck (although the book is about TDD, it is actually a fascinating read with a set of interesting stories from the author’s practice and life, so I also highly recommend reading it with a cup of tea).

The more stress we feel, the less we test the code we develop. The less we test the code we develop, the more mistakes we make. The more mistakes we make, the higher the level of stress we feel. It turns out to be a vicious circle with positive feedback: increased stress leads to increased stress.

Tests, in turn, turn stress into boredom. “No, I didn’t break anything. Tests are still going on.” Therefore, having tests also reduces stress.

What is a unit test

It's also a unit test. The general definition is as follows: it is an automated test that:

checks that a small piece of code (also called a unit) is working correctly;
does it quickly;
supports isolation from other code.

The question may arise, what is a unit and what is isolation?

It must be said that there are two schools of unit testing: classical and London.

It is called classic because this is how unit testing was initially approached. The London School was formed by a community of programmers from London (suddenly). The root of the differences between the classical and London schools is precisely the issue of isolation. The London School describes isolation at the class level, and the unit is usually the class itself. In the classical school, a unit means a unit of behavior. In our services we adhere to the classical school. It usually results in higher quality tests and is better suited to achieving the goal of sustainable project growth.

Types of dependencies

It is also worth mentioning the types of dependencies that exist:

shared – more than one test has access. Allows them to influence each other's results (for example, DB);
private – not shared;
out-of-process – work outside the application process.

Integration test

An integration test is one that does not satisfy at least one criterion from the definition of unit tests. It can (and often does) test multiple behavioral units at once. An integration test also verifies that the code works in integration with shared dependencies, out-of-process dependencies, or code developed by other teams in the organization.

End-to-End Tests

They are also API tests. They are end-to-end tests. They constitute a subset of integration tests. They also check how the code works with non-processor dependencies. They differ from integration tests primarily in that end-to-end tests usually include a larger number of such dependencies and usually check the full user path.

What should tests do?

Ideally, tests should test not units of code, but units of behavior—something that makes sense for the domain and whose usefulness will be clear to the business.

Testing pyramid

The concept of the classic testing pyramid prescribes a certain ratio of different types of tests in a project. Different types of tests in the pyramid make different tradeoffs between speed of feedback and protection from bugs. Tests at higher levels of the pyramid prioritize bug protection, while tests at lower levels emphasize execution speed. And vice versa: the lower the level, the less protection against bugs, and the higher, the lower the speed.

Now let’s move on from theory to tests in our services.

In our services, the testing pyramid currently looks like this:

Our project is a kind of proxy and contains little business logic, but a lot of interactions with other services. That's why we have more integration tests. But this is not the only reason.

WebApplicationFactory

Yes, unit tests are always faster to run than integration tests. But it's not all that scary.

Usage WebApplicationFactory allows you to create many parallel running instances of an application in memory, isolated from each other, and do it “cheaply”. Thanks to WebApplicationFactory everything happens quickly (not yet at the price of units, but still). Read more about WebApplicationFactory you can read it here.

And for us, integration tests are a kind of dogfooding. While they were doing it, they themselves realized the weak and inconvenient parts of their API and took action.

Why do we use tests?

We use integration and end-to-end tests to test user behavior. We have these tests:

run locally on developer machines,
interact with real services at stands,
there is a possibility of debugging,
It is possible to run services locally.

We use unit tests to test query validation logic and check any other internal logic.

Examples:

This is what a typical unit test looks like now. Here the data request mode is checked.

This is what a simple integration test looks like.

We use naming recommended by Microsoft. Also used here AAA pattern.

Characteristics of Good Tests

I would like to note that our tests now are not just improved code and the use of some practices and patterns, they are tests written from scratch that meet the criteria of good tests:

bug protection,
resistance to refactoring,
fast feedback,
ease of support.

The first three are mutually exclusive; at most, a test can only use two of them. We chose bug protection and refactoring resistance as the main ones. As for quick feedback, as I mentioned earlier, this is not very critical and WebApplicationFactory – this is the same pill. We have also released a typed client for our users and use it ourselves in tests. Another important criterion is that our tests are part DoD (Definition of Done).

There are also properties of a successful test suite:

integration into the development cycle. For now, we have it now conditional and the responsibility of the developers, our CI is not ready to run integration tests now, but these are plans for the very near future,
checking the most important parts of the code,
maximum protection against bugs with minimal maintenance costs.

Some statistics

Two years ago we had 125 technical support tickets and most of them required corrections. A year ago, the situation became better: 75 applications, but many of them still required the involvement of developers.

At the moment, this year there are much fewer applications: only 24. And what is most important is that most of them are applications for consultations or those that did not reach the developers.

During the stabilization period during releases, there is also a decrease in the number of bugs.

Of course, there are many factors here, this is due not only to our tests, because we also rewrote most of the service code, but nevertheless, a large number of defects are now caught at the development stage.

All of the above indicates an increase in the reliability of our software, which was stated in the title of the article.

Brief conclusions and recommendations

Tests should test units of behavior, not units of code,
do not neglect test writing practices: naming, AAA pattern, etc.,
the most revealing metric is the number of bugs,
in the modern world, integration tests are not much more “expensive” than unit tests.

Leave your questions, comments and advice in the comments – I will be glad. I also wrote about using HttpClient in our work. You can read about it here.