If you create an application that needs to scale – and we all hope that our applications will grow – then at some point we have to figure out whether it can do it actually… That’s when load testing comes to the rescue: if you want to know if your application can handle large scale, then we just generate check these scales! Sounds simple enough.

But then we try to actually generate the load. This can only be done easily if your application is terribly simple, because then you can use something like Apache JMeter to generate duplicate requests. If you succeed, then I envy you: all the systems with which I had to work are more complex and required a more sophisticated testing scheme.

If your application gets a little more complex, then you move on to tools like Gatling… They allow you to simulate virtual users running various scenarios, which is much more useful than a simple one. siege one or more URLs. But even that is not enough if you are writing an application that uses WebSockets at the same time and HTTP calls during a long session, and also requiring a timer to repeat certain actions. Perhaps I seriously overlooked something in the documentation, but I could not find a way, for example, to set up a periodic event that fires every 30 seconds and performs certain actions when responding to a WebSocket message, as well as performs actions on HTTP, and all this within the framework one HTTP session. I did not find such an opportunity in none load testing tool (and that’s why I wrote my own tool at work, which I hope to put in open source if I can find time to clean up the code and separate it from the proprietary parts).

But suppose you have there is a standard tool like Gatling or Locust that works and suits your needs. Fine! Now let’s write a test. In my experience, this is the most difficult task right now, because we first need to figure out what a realistic load looks like – you will have one to three days of painstakingly studying the logs and taking notes on the metrics of network tools in the browser when the web application is running. And after you know the realistic load, then you have to write code that boils down to the following: a subset of your application will pretend to be a user, communicate with the API and perform actions that the user performs.

And nothing was over yet! Great, we’ve written a stress test, and it’s realistic. But the challenge is constantly changing, because updates are released. That is, now we have a maintenance problem: how to ensure the relevance of the load test when the application changes? There are no quality tools for this and almost nothing will help you. You have to make it part of the process and hope you don’t miss anything. This answer is not encouraging, and that is why this aspect is one of the most difficult when testing an application.

You can even omit all the worries associated with “launching”, because honestly, if you have such success in a load test, then launching it will not be the most difficult task.

Where does complexity come from

Basically, the situation is as follows:

Most load testing tools support only the simplest loads, and even the most complex ones do not allow you to perform all the necessary steps for simulation real using the web application.
The hardest part is writing a test that simulates real-world use, even if the tools support the capabilities you want.
The second most difficult task is maintaining the test, and the tools do not help us at all.

Let’s take a closer look at each of the points, and figure out what degree of complexity you can get rid of.

User simulation. Is she really needed?

I can answer yes here, although this may vary from application to application. In this case, the user is meant service: if you have a monolith, then these are all your users, but in the case of microservices, the “user” can be one of the services! In building the apps I’ve worked on, I’ve had little success with targeted tests of individual endpoints. But in the end, this required such a complex system that it turned out to be not much easier than the load test itself! Although we got some results and improvements, we were unable to cover everything (there may be interacting endpoints in the application) and we were not able to get a realistic load.

Better to ask the question: “When not necessary simulate users? ” As I understand it, this happens when you you knowthat all endpoints are performance independent, there are no stateful requests, and the order of requests does not affect performance. These are pretty serious assumptions, and without checking their independence it is difficult to be sure of them, so we have to go back to writing the test.

Probably the best thing you can do here is during the API and system design phase, not the testing phase. If you simplify the API, you will have to test much less surface area. If you design a system with more independent parts (for example, when each service has its own database), then it will be easier to test them separately than a monolith. Plus, you can use a simpler tool – double benefit!

Writing tests – complex a task. As well as their support.

Writing tests is difficult because there are several tasks to complete: you need to figure out what the flow is use API and write a simulation of this use. Understanding the flow requires an understanding of systems other than the system under test, and since your system is unlikely to be covered in detail in their documentation, there will be no clear diagram of when and what is called; often it is necessary to understand the logs until you understand what the true usage pattern is. The next step is writing this simulation – also a non-trivial task, since you need to handle the state of a large number of actors that are users of your API!

Oh, yes, and now you need to write integration tests for all this.

There is research on how to simplify some of these tasks. For example, you can figure out what you need for an initial test and recognize regressions (missing new loads) from automated log analysis. But as far as I understand, there is no product on GitHub, let alone commercial availability, that can do this task for me. Therefore, it seems that such systems have not received widespread support in our industry. This project would be too big for independent implementation, and this may be the reason for its fading (or it is being implemented in large companies and is not being talked about).

Maybe not stress testing everything?

Load tests are very complex and there are not many tools that can help you cope with it. So the correct answer is probably to write fewer tests of these types and not expect them to give you all the answers about system performance.

To get a detailed picture of system performance, you can use one of the following methods:

Good old analysis. Sit down with a laptop, a pen and an understanding of the system as a whole, allocate half a day for this, and you will receive approximate calculations of the main parameters and limitations of the system scaling. When you stumble upon a bottleneck or unknown variables (how many transactions per second can maintain our database? How many are we generating?), Then we can test them specifically!
Deployment of features. If you can slowly roll out features to all users, then you may not need load testing! You can measure performance experimentally and check if it’s enough. Enough? We expand further. Few? We roll back.
Repetition of traffic. This will not help at all with new features (for them, use the previous paragraph), but it will help to understand the critical points of the system for existing features, while not requiring a large amount of development. You can take previously tracked traffic and repeat it (many times over and over, even combining traffic from different time periods) while monitoring system performance.

Advertising

Development servers – this epic from Vdsina.
We use extremely fast NVMe drives from Intel and do not save on hardware – only branded equipment and the most modern solutions on the market!