How we attached proxies to autotests

Hi! We at the online cinema Ivi love writing autotests, especially client ones (because client applications are the first, and sometimes the only thing our users see). We have 4 main platforms – Android, Web, Smarttv, iOS (Android and iOS are also divided into mobile and TV versions).

And a little about the autotests themselves. Basically, they are all integration ones. We use almost complete copies of the backend, automatically deployed in k8s (more on that later). The total number tends to 7 thousand, and the average number per platform is one and a half. The peculiarity of this whole structure is that we strive to use native frameworks as much as possible or to use the stack that is best suited to support the project. This forces us to aggressively highlight common functionality, get rid of copy-paste and keep the architecture and approaches as similar as possible from project to project.

With this approach, one of the main problems we encountered was working with the network stack. The first, of course, was mocks – maintaining mocks for all requests could be quite difficult:

firstly, the number of requests in one scenario can exceed a hundred;
secondly, often one check can differ from another by only 1-2 parameters, and then an interesting balancing act begins with how to sort out the substitution of all these endless json-ins and form the correct set from them;
thirdly – if we are checking something for which only part of the response of some API method is responsible, we don’t want to keep and support a huge mess in the code and update it synchronously with the backend;
fourthly, and probably the most important thing, when testing a large amount of functionality, you don't want to give up the “integration” testing approach, and the tests should go to “real” services with “real” data as much as possible. This requirement resulted from the fact that our back-end tests are mainly component-based – we test 1 service in isolation, which gives flexibility and speed when testing each microservice, and also increases stability, but with this approach, integration testing shifts towards the client, which is what we have to do.

The second important problem with client testing is that we can't always check the result of the client's work on the backend “right now”. For some purchase or adding to favorites, we can check that the changes have occurred and they are correct (you can find a fresh purchase on the backend or go through the client to the purchases section and find what you are looking for there), but in addition to checking simple scenarios, we also have statistics checks.

Statistics are a large number of requests that the application sends during operation, and the biggest problem is that we cannot check that they were sent correctly during the test on the back-end, or it is very labor-intensive. Thus, all checks come down to the fact that we need to go into the network log and see what the application sent, and in 99% of cases, not only the fact of sending is important, but also the data that was sent. And we cannot refuse these checks, because:

a large number of business metrics depend on them, and therefore they need to be checked as often and as fully as possible;
Checking them manually is incredibly difficult and, most importantly, time-consuming.

First iteration

So, having all this baggage of problems in front of us, we started looking for a solution. For web platforms (web and smarttv), you can try to manipulate network requests through devtools. But for mobile platforms, such a tool could not be found. So, we will have to implement something third-party. What are our requirements:

Independence from the stack (embedded mocks and proxies in the process with tests are no longer suitable for us).
The ability to not only mock something, but also proxy requests if nothing needs to be done with them.
Recording a network log in a format that can be parsed not only programmatically, but also viewed manually when parsing failed tests.
Ability to produce https spoofing only for selected domains. In order not to interfere with the work of third-party resources that the device may access during the test.
Possibility of working in headless mode (so as not to suffer with ci).

Of all the variety of tools, one of the most popular is mitmproxyShe can do everything we need:

Selective work with domains via https.
Addon systemwithin which we have full control over the life cycle of the request, which actually makes it possible to do without any functionality that is missing out of the box.
All of this is written in Python, which the team has expertise in.
The ability to run in non-interactive mode and, in general, the absence of a rigid binding of any tools.

What we were missing to launch:

Network log (the most obvious format is har). At the time of development, it was not supported natively (in the latest versions there is already standard support for import and export).
Partial mocks. It was necessary to implement:
And the most interesting thing is to figure out how to interact with all this from tests.

We are finishing mitmproxy

Generally speaking, it is a constructor. There is a core that is responsible for low-level work with the network, and all other functionality is added by combining add-ons (you can look at the add-ons themselves in code project). This happens in classes inherited from master.

This means that our first priority is to assemble a minimal working build from native and custom add-ons and learn how to manage all of this remotely.

Partially inspired by the working principles Mountebank And WireMock We decided that the simplest and most effective solution is to attach the API to the proxy and then communicate with it.

What the API should be able to do:

“Charge” and remove mocks for specific queries.
Control which hosts to “expose” and which ones to leave unchanged.
Redirect requests from one host to another. This is useful to avoid creating configurations for applications being tested where you can do without it. We simply assemble an application that looks at production hosts, and then redirect it through a proxy to where it is needed.
Receive request data in har format.

Finally, after a few laps ~~hell~~ development and added requirements resulted in a list that looks something like this.

API

Yes, the scheme is not very beautiful, and requires combing, but it does not interfere much, and the most popular methods are – adding a mock and getting har, setting redirects by host and enabling tracking of these same hosts. The rest are used very rarely.

We called the resulting structure mitm_api (creatively, originally) and began to attach it to the tests.

What does WebSocket have to do with it?

Everything would be fine with the proxy, but there is one important nuance. We have a bunch of scenarios with steps like “after action n, request y was sent”.

The simplest option is to pull a method to get logs and see if anything new has appeared or not, BUT… the method is relatively resource-intensive + delays are added due to the fact that some pause must be made between re-requests (the classic problem of explicit and implicit waits).

How can we solve this problem – somehow add a notification stream. The simplest and most proven solution is WebSocket. For us, it has a lot of advantages:

There are clients on all used stacks.
There is no need to deploy and maintain additional entities (if you suddenly want to build something on some queue).
There are also server implementations for the stack we need.

Well, that's all. We raise WebSocket in the master, add a method for adding a message there and that's it – now we can from any addon via the global variable ctx contact the master and distribute messages to clients.

This technique allowed us to implement reliable checks for sending network requests after certain actions. And some teams switched to a mode of operation where the full log is not requested at all. We simply connect to the web socket at the beginning of the test and maintain the connection, saving the incoming requests.

We spoil traffic

And so everything is fine with us, we monitor requests, make mocks, save request logs. And then mobile clients and the player team come and start showing that we also have scenarios where we need to slow down the speed, introduce some packet loss, in general, reproduce the rush hour in the subway as accurately as possible.

The first thing we did was add delays to requests using mitmproxy (we simply wait for a specified time before starting to send a response to the client). This solved some of the issues (scenarios when, for example, you need to call a loader, and we know exactly what happens during this).

But there are also scenarios where you need to slow down not 1, but many requests – for example, during video playback. Setting some delays on a bunch of requests is inconvenient, and it doesn't work, and this delay is not entirely fair – the connection just hangs empty, and then data is sent to it at full speed. For video-related checks, you need to slow down the speed.

We didn't find such capabilities directly in the mitmproxy functionality (and it wasn't that convenient to implement them – we would have to go deeper into the core, and we didn't want that). But we found an excellent tool from Shopify – toxicproxythis is exactly what allows you to “honestly” damage the network connection in various ways, which gives the desired result.

But how to make friends with all this together? The answer is simple – we need to abandon the beautiful solution “1 container – 1 process” and run supervisor as the root process, and in it already toxyproxy and mitm_api. Thus, the number of handles sticking out of the container has increased even more (the api for toxyproxy also sticks out, we left it as is). And the scheme now looks like this – the client uses the toxyproxy address as a proxy, which in turn relays all this to mitm_api. There was an idea of toxyproxy before the backend, but we abandoned it – there is a chance that if you slow down the network before mitm, then in some scenarios it will simply buffer the response, and then give it back and we will return to what we were trying to get away from.

approximate scheme of interaction with proxies

Now let's talk about how we can manage this proxy. To do this, let's think about what we need:

Isolate a specific request by its URL, method, and parameters.
Make pointwise changes to the response. Why pointwise? Because for one request we want to be able to create a mock dynamically. For example: we have a request with content data, and in the test we need to change only the content name, or tags, or both parameters at once. At the same time, in the code we want to have one entity responsible for the request. The initial implementation could only replace the entire body, but as the number of tests grew, it became clear – we will either end up with a bunch of json-ins, or our own mechanisms for modifying json on each client. In any case, synchronization between platforms and support will be difficult.
Replace the entire body of the response (in contrast to the previous point, this is also sometimes necessary).
Change headers for request and response.
Add a response delay. Although we have a mechanism for emulating a “bad” connection, there are cases when we need to check timeouts for only one request (for example, we may need to check the operation with a long response time for some request).

These requirements were added gradually and we ended up with the following model:

Code

dataclass
class ApplicableForRequests:
    before_index: Optional[int] = None
    after_index: Optional[int] = None
    with_index: Optional[list[int]] = None


@dataclass
class Predicates:
    """
    Описание запросов, к которым должен применяться мок
    Если есть несколько подходящих моков - будет выбран мок с наибольшим числом совпадений по params и json_params

    host: хост запроса
    command: путь в url запроса
    method: HTTP метод
    params: если ключ-значение есть в query или form_data - число совпадений повысится
    json_params: число совпадений повысится если по jsonpath ключу совпадет значение
    excluded_params: если query или form_data есть хотя бы один из этих параметров - мок не применится
    """
    host: Optional[str]
    command: Optional[str]
    method: str
    params: Dict[str, Any] = field(default_factory=dict)
    json_params: Dict[str, Any] = field(default_factory=dict)
    excluded_params: List[str] = field(default_factory=list)
    applicable_for_requests: Optional[ApplicableForRequests] = None


@dataclass
class Modification:
    """
    Атомарная модификация части запроса или ответа

    selector: в зависимости от типа - jsonpath или ключ
    type: KEY или JSONPATH
    action: PUT или DELETE
    value: значение для PUT
    """
    selector: str
    type: str
    action: str
    value: Optional[Any]


@dataclass
class HeaderModification:
    """
    Модификация заголовков

    action: PUT или DELETE
    key: заголовок
    value: значение заголовка для PUT
    """
    action: str
    key: str
    value: Optional[Any]


@dataclass
class Request:
    """
    Модификации пересылаемого запроса

    headers: заголовки запроса
    modify_query: модификация по ключу
    modify_form: модификация по ключу
    modify_json: модификация по jsonpath
    """
    headers: Optional[List[HeaderModification]] = field(default_factory=list)
    modify_query: List[Modification] = field(default_factory=list)
    modify_form: List[Modification] = field(default_factory=list)
    modify_json: List[Modification] = field(default_factory=list)


@dataclass
class ResponseContent:
    """
    Модификация контента

    text: полностью заменить text
    json: полностью заменить json
    """
    text: Optional[str] = None
    json: Optional[dict] = None


@dataclass
class Response:
    """
    Модификации пересылаемого ответа

    response: если не null, то modify не применится
    modify: модификация по jsonpath
    delay_sec: задержка ответа
    headers: заголовки ответа
    status: статус-код ответа
    """
    response: Optional[ResponseContent]
    modify: Optional[List[Modification]]
    delay_sec: Optional[int]
    headers: Optional[List[HeaderModification]] = field(default_factory=list)
    status: Optional[int] = None

Screwing a wheel to a bicycle

We've more or less sorted out the proxy (finished the add-ons, made an additional master based on WebMaster (tornado is already attached there, so there's no need to invent too much with the web server), now we need to somehow make all this work with tests.

The first approach was to do it this way – to introduce the concept of “session” in proxy add-ons and somehow (already reliably and thoughtfully) transmit this session through the client. On web clients everything went relatively well (with the help of simple manipulations with nginx and referrer headers ~~you can get a trolleybus~~ you can convey some information to the proxy without changing the application code (which you really don't want to do)), but on mobile phones we immediately stumbled, fell and decided that we don't want to do that anymore. And the code with session support inside the proxy was not very easy to support (some scraps are still sticking out in the code).

The next step was this mechanism – at the beginning of the tests we know exactly how many threads we will have, so we can raise a certain number of docker images, mapping ports with a shift, and then in each test, knowing the conditional “number” of the worker, calculate these ports and connect to them. We have several ports – one for the API, the second for the proxy itself and several service ones, so there is logic with the calculation of each of them.

Scale the wheels

After living with this scheme for some time, we realized that:

It will still not be very convenient – there are problems with mobile platforms, tests on which do not run on 1 machine and it is quite difficult to reliably distribute them by numbers to avoid possible collisions.
There are also many problems with local development – you must not forget to launch the proxy before starting development, and if you need several threads locally – restart with different parameters.
The question is that the resource of 1 machine, although large, is not infinite, and we need to somehow learn to distribute the load from processes with tests and proxies.

This is how we lived during the first iteration.

Having high-quality and reliable solutions like selenoid the answer suggested itself – you need to make your own selenoid, only for the proxy.

What do we need from this service:

Be able to issue a proxy through a method. That is, launch a container with it under the hood, wait until the proxy rises and issue the host and a list of ports on which it spins.
Be able to extinguish the same proxy on demand. The reverse operation – we extinguish the container and issue its logs in response, in case of unexpected debugging.
Provide a system of timeouts, since the test may end abnormally and not make a call for deletion at the end.
Ideally, we can have not 1, but several machines with proxies, so we also want to have a balancer that will distribute the load between the machines and be a single entry point for requests.

As a result, another project proxy-hive was born, which can be launched in 2 modes – host mode (it launches and kills containers via the Docker API) and balancer mode (Round-robin selects a host from the list and proxies the request to it, adding additional data to understand which machine to proxy to the next time it is requested).

The host and proxy data boils down to the fact that in host mode, each proxy is given a random guid, by which you can determine in which “slot” (set of ports) this proxy is launched and extract the container id. And in balancer mode, host names are encoded in SHA1 (Version 5) UUID information and all of this is concatenated into 1 string id (the client doesn’t need to parse all of this, and we get a system that is easy to implement and understand).

It should be noted that we go directly to proxies (unlike, for example, Selenoid) because the implementation of TCP proxying:

may make a project more complex without any apparent benefit;
can become a point of failure, since at this stage about 150 megabits pass through the entire cluster with proxies at peak (not the largest but not the smallest load);
debugging a custom tcp proxy can be difficult.

After we implemented all this, we got the following picture. At the start of each test, it requests a proxy for itself, installs it in the client, and at the end kills it, saving all the logs (both har and the logs of the container itself) in the allure report.

The scheme turned out to be quite successful (in our opinion), and the success is evidenced by the fact that sometimes beginners, or those who just want to start doing autotests, do not pay attention to how the work with the network is arranged. They just have a set of methods for receiving requests and installing mocks.

Next steps

Have we implemented everything we wanted? No! The main desire is to learn how to record and play traffic for each test separately (we want to be able to refuse the need to access the backend, or at least reduce the calls to a minimum during some runs). mitmproxy can partially record and play dumps, but there is a certain set of problems that we are currently solving:

where to store data (currently we have implemented storage in S3);
what to do if the dump test fails the first time;
how to properly get rid of data tied to the current date and time;
how to implement work with application versions.

At the moment, 1 of the clients is running dumps in test mode and has a success rate of about 80% versus 98-99%% if using a real backend.

Conclusion

Does this design help us – certainly. Thanks to it we:

We can automate layers of scenarios that are difficult for a person to pass. For example, the same statistics, for checking which you need to watch content, advertising, and other videos in different combinations, simultaneously performing actions with the application and checking what flew into the network log (and there is quite a lot flying there).
We can do general checks related to our favorite statistics (It would be unbearable for people to check for the presence of certain queries in all scenarios, checking dozens of nested fields in parallel with the passage of the product scenario itself).
We are close to significantly reducing the load on test clusters and thereby speeding up some of the runs (especially those that must be run during the day, when not only our tests are running on the CI and test circuits).

Do all autotest projects need such complex and expensive solutions in terms of support and infrastructure setup – no. If there are not too many tests, and half of the requests are not fire-and-forget, you do not have to check requests from third-party libraries that cannot be configured (they always go to a hardcoded URL), then in general wiremock deployed next to autotests will be enough.