Bad Test Classes – Restarting Through Pain

Hard times call for hard decisions. The architecture of applications and test environments is not always ideal. They can be extremely bad and inflexible. And to test them, you have to step over the tester's pride and violate the basic principles of software testing.

A problem that doesn't exist

Firstly, every self-respecting tester should know what the “software testing pyramid” is. Namely, the number of simple tests, starting with unit tests, should be significantly greater than subsequent API, then UI and E2E tests. This is explained by the fact that the cost of development, execution and support of such tests is lower, they work faster, and, as a rule, already at the earliest stage of development and testing, they allow catching most of the errors. At the same time, the longest tests, which sequentially, following the business logic, cover a long sequence of actions and states – E2E usually make up the smallest part of testing. First of all, because all their components have already been (often repeatedly) checked by smaller and simpler tests. Secondly, the truly critical business logic, which E2E tests should be, is usually just a couple of scenarios, which, as it seems to me, with a high degree of probability will make up your “smoke set” for pre-release testing (acceptance regression).

Once again about the testing pyramid

Once again about the testing pyramid

To gobble up an elephant piece by piece, each piece must be able to be checked independently. Usually this is the case – several unit tests are written for a function, iterating over parameters (for example, using equivalence areas and granite values). For integration tests (when we check several components\functions), most likely, you will use stubs and mocks. In order to check the state of the system and its business logic, response format and design (for example, to check coverage using transition graphs or a decision table), most likely already at the UI level, to get the desired state, you will probably need to preliminarily, as a precondition, pull several API endpoints and feed the system the necessary data to get the desired state. And only in very rare cases, like an E2E scenario, will we write a test that will go from login to receiving an almost physical result – a lit traffic light, an ATM issuing money, or simply receiving the message “burn successfully completed with failure”.

In this case, most likely, the tests – automated or even manual, if you have several testers in the team, you will execute independently. To ensure that the state and data do not interfere with the execution of tests, the tests must be isolated and atomic. In the worst case, these are several test isolated environments. In the best case, these are separate components that can receive independent states for their context (functions, sessions, users, etc.). At the same time, the start of execution of any test should not depend on the results of execution of another. Neither because of the state of the system, nor, especially, on the logic of the test that you are writing. This is actually the principle of isolation. Then the tests will be small, simple, correspond to the principle – “one test – one check” (that is, fulfill another principle – atomicity), and will not have to check several times that we have moved from state A to B, in order to then get from it to state C, in which we will conduct tests of the transition to state D.

Somewhere in wonderful companies

Somewhere in wonderful companies

So, this is a great theory that works in great companies, teams. Of course, even ideal teams have limitations in the code or in the developed system, but in great teams, great developers can help and write test “pens” that will reset or set the necessary states for testing, and for the level of isolation of the environment, we have the ability to deploy a test lab on several servers, or even raise an unlimited number of containers or devices in the cloud. I wish you always work in such a place.

But, most likely, at some stage everything may go wrong. And the theory will diverge from practice in some part. Or even completely.

The problem that exists

Well, our world is not perfect. Companies and teams are not perfect, code is not perfect, and in general, to be honest, people are not perfect, even I would say, they are rare shit. And we have to live and work with it.

Let's say you have a test environment with hundreds of tests that are run in an isolated environment – behind several firewalls (like the one we run the tests on, like the one where the system is deployed, and the one that runs the tests – our CI server), which access one of the cloud testing platforms. If you've worked with such systems, you probably know that deploying a virtual machine takes some time, so it doesn't make much sense to break the connection and recreate it for each test. Moreover, even if you use only a few deployed virtual machines, their speed leaves much to be desired. And because of the accumulation of internal networks, as well as different regions, this bundle, so to speak, begins to work far from the speed of light.

If we also broadcast video and record tests through fenced systems...

If we also broadcast video and record tests through fenced systems…

Well, the test launch becomes slower, but still, no one prevents us from resetting and setting the state for each test inside such a virtual machine. Yes, but… the system is written in such a way that it does not have test handles that can be pulled and set the desired state. Moreover, for UI tests, you cannot simply go to the desired URL to the desired form, but you must click several screens. All this takes time, and in the conditions of the test environment, preparing for one test can take 1-2 minutes. This already sounds like a huge problem, especially if we are trying to write tests well, isolated and atomic, and the next check is, say, entering a character in a field. A catastrophe that forces us to either throw out such tests or wait for the results for hours.

Here, it seems, it is time to step over your pride as a tester and stick several checks into one test. Let's say, now the test will fill in twenty fields and click a button to go to another screen. I think you understand that while entering into a field, re-opening a checkbox or switch, something can go wrong – an error can be displayed, for example, the text can be lost, and when interacting with another component, the state can change again, and we will certainly get a not quite valid result. So, it means that we still need to somehow split our check into tests, but at the same time leaving the system state for each of the tests equal to the previous state. So, we can do this if we reset the context within the session. This is certainly not bad, but then each time we will raise a virtual machine under the session, uniting the sequence of tests. Trying to restore atomicity, we will lose in performance. Then there is another alternative – to isolate the state by files, or even better – to wrap the tests in test classes. So let's say, we will give SRP from the level of the test function to the test class. After all, nothing bad will happen, right? Oh, I forgot to say that I write in python and I use it pytestso we will talk about it further.

The problem we created

So, a nice and beautiful atomic test that you would normally work with looks like this:

"""This module has no class, only test functions"""


def test_no_class_first():
    """This test always passes"""
    assert True


def test_no_class_second():
    """This test always fails"""
    assert False

However, for my particular case with my imperfect system and team, we decided to use test classes. They would look something like this:

"""This is a basic test"""


class TestBasic:
    """This is a basic test"""

    def test_basic(self):
        """This is basic test 1"""
        assert True

    def test_basic2(self):
        """This is basic test 2"""
        assert False

    def test_basic3(self):
        """This is basic test 3"""
        assert True

Personally, my eyes bleed from this, because I can imagine what problems this will cause. Although, at first glance, it looks good. But that's at first glance. If you call the tests as usual:

pytest tests

Then the tests will be executed sequentially, and if there was any relationship between them, it will be preserved. However, if you try to call tests in several threads (pytest-xdist'om), then the sequence is mixed up.

pytest tests -n=auto

To avoid this, you need to specify how exactly to group the tests. To do this, you must not forget to specify the grouping method.

pytest test -n=auto --dist loadscope 

Now, let's imagine that we have twenty steps that depend on each other, and we will have to go through all of them, even if one of the first tests has failed. But this no longer makes sense, since the state has already been violated. Therefore, we need to somehow implement the logic in this case “fail fast” – to fall quickly. For this we have to interfere with logic a little pytest'a and implement:

"""conftest.py"""

import pytest


def pytest_runtest_makereport(item, call):
    if "incremental" in item.keywords:
        if call.excinfo is not None:
            parent = item.parent
            parent._previousfailed = item

def pytest_runtest_setup(item):
    previousfailed = getattr(item.parent, "_previousfailed", None)
    if previousfailed is not None:
        pytest.xfail("previous test failed (%s)" % previousfailed.name)

Now those test classes that need to be quickly failed need to be pre-marked:

"""This is incremental test class"""

import pytest


@pytest.mark.incremental
class TestMarkedFailFast:
    """"This test class will fail fast"""
    
    def test_will_pass(self):
        """"This test will pass"""
        assert True

    def test_will_fail(self):
        """"This test will fail"""
        assert False

    def test_wont_run(self):
        """"This test won’t run"""
        assert True

Another problem we have created is that it is pointless to use the plugin. pytest-rerunfailures. If we decide to use it, the test will be restarted outside the class context with a completely different state than we expected. We can break a bunch of copies on the topic of whether or not to use restarts. In the end, we will survive a couple of crashes and manually restart before the release. If there are many crashes, perhaps something critical broke, and we will quickly identify the error, and we will have to restart most of the tests anyway – then we can do it. But if you have a lot of tests – thousands, and instability still happens, and time is a pity, if the test, of course, does not regularly crash unstably, then it is still better to delegate the restart to the machine. But in our case … there are no such options, right?

The problem I solved

So, to solve this problem I wrote a plugin pytest-rerunclassfailures.

To install it and start using it, simply install it via pip:

pip install pytest-rerunclassfailures

To run tests with reruns, it is enough to pass the parameter —rerun-class-max with some number of reruns.

pytest tests --rerun-class-max=2

Or add it explicitly to pytest.ini:

[pytest]
plugins = pytest-rerunclassfailures
addopts = --rerun-class-max=3

You can also set additional parameters, such as delay between reruns or logging type:

Parameter

Description

By default

–rerun-class-max=N

number of restarts of a failed test class

0

–rerun-delay=N

delay between restarts in seconds

0.5

–rerun-show-only-last

show results of only the last restart – there will be no results with “RERUN” in the log, only the last, final run with the final result

Not transmitted

-hide-rerun-details

remove restart details (errors and traceback) in terminal

Not transmitted

YTHONPATH=. pytest -s tests -p pytest_rerunclassfailures --rerun-class-max=3 --rerun-delay=1 --rerun-show-only-last
Result of running tests with the plugin enabled

Result of running tests with the plugin enabled

The plugin is compatible with pytest-xdist can be used in multiple threads, but always specify –dist loadscope. After the error, the class will be reset to its initial state, but the next test will fail on restart because the class state was changed bypassing the constructor inside a function-level fixture. However, I hope you don't use this bad practice in your code.

Some more bad design
"""Test class with function (fixtures) attributes"""

from random import choice

import pytest

random_attribute_value = choice((42, "abc", None))


@pytest.fixture(scope="function")
def function_fixture(request):
    """Fixture to set function attribute"""
    request.cls.attribute = "initial"
    return "initial"


@pytest.fixture(scope="function")
def function_fixture_secondary(request):
    """Fixture to set function attribute"""
    request.cls.attribute = "secondary"
    return "secondary"


class TestFunctionFixturesAttributes:
    """Test class with function params attributes"""

    def test_function_fixtures_attribute_initial(self, function_fixture):  # pylint: disable=W0621
        """Test function fixture attribute at the beginning of the class"""
        assert self.attribute == "initial"
        assert function_fixture == "initial"

    def test_function_fixtures_attribute_recheck(self, function_fixture_secondary):  # pylint: disable=W0621
        """Test function fixture attribute after changing attribute value"""
        assert self.attribute == "secondary"  # type: ignore  # pylint: disable=E0203
        assert function_fixture_secondary == "secondary"
        self.attribute = random_attribute_value  # type: ignore  # pylint: disable=attribute-defined-outside-init
        # attribute is changed, but fixture is not
        assert self.attribute == random_attribute_value
        assert function_fixture_secondary == "secondary"

    def test_function_fixtures_attribute_forced_failure(self):
        """Test function fixture attribute to be forced failure"""
        assert False
Very briefly about the technical implementation

In order to be able to intercept and rerun tests of a test class, I intervene in pytest_runtest_protocol and intercept control if it is a test class:

@pytest.hookimpl(tryfirst=True)
def pytest_runtest_protocol(
    self, item: _pytest.nodes.Item, nextitem: _pytest.nodes.Item  # pylint: disable=W0613
) -> bool:

Next we get the test class and find its heirs – the test functions:

parent_class = item.getparent(pytest.Class)
for i in items[items.index(item) + 1 :]:
    if item.cls == i.cls:  # type: ignore
        siblings.append(i)

and then we perform the standard testing protocol for each successor sequentially:

for i in range(len(siblings) - 1):
    # Before run, we need to ensure that finalizers are not called (indicated by None in the stack)
    nextitem = siblings[i + 1] if siblings[i + 1] is not None else siblings[0]
    siblings[i].reports = runtestprotocol(siblings[i], nextitem=nextitem, log=False)

And finally, after determining the test status (how many times we had to restart it and set the result or rerun), we send the results back:

item.ihook.pytest_runtest_logstart(nodeid=item.nodeid, location=item.location)
for index, rerun in enumerate(test_class[item.nodeid]):
    self.logger.debug("Reporting node results %s (%s/%s)", item.nodeid, len(test_class[item.nodeid]), index)
    for report in rerun:
        item.ihook.pytest_runtest_logreport(report=report)
item.ihook.pytest_runtest_logfinish(nodeid=item.nodeid, location=item.location)

If you need to restart a test class, be sure to clean it up, reset the fixtures, and recreate the test class in its original form.

In conclusion

That's all. I hope my plugin will help you a little when you work with bad architectural decisions, bad code and tests. If this material was useful to you, then do not forget to like this post, write a comment, and also, if you are inspired – share a coin. Original article here.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *