IN YuMoney a large testing department with almost 80 people who check the quality of products and services every day. In this article we tell you how we measure the effectiveness of testing, what metrics we collect and what kind of results it brings.
To communicate with 80 employees, the head of the testing department needs 40 hours
The manager must control the work of his department, monitor processes, understand where there is a lack of resources and where there is an abundance of them. To do this, once every two weeks (sprint) he needs to communicate with each tester, and one such meeting lasts approximately 30 minutes. Accordingly, the larger the department, the more time the manager spends on this.
Previously, when the testing team had five peoplesuch meetings took 2.5 hours. But now that the department has grown to 80 peopleto talk to everyone, the manager needs spend 40 hours. And this is a whole working week.
How to adequately assess the situation in teams and departments with such a manager’s workload? Need to delegate tasks And automate processes.
Who are the curators in the UMoney testing department and what do they do?
The manager delegates part of his powers to testers with leadership qualities – they become curators. A curator is a senior or leading department employee who accompanies testers from product teams:
holds meetings with them every sprint;
discusses current issues;
helps solve current problems;
organizes personal reviews;
controls the process so that there is a balance between testing time and quality.
Our curators, as a rule, come from related product teams, of which there are many at YuMoney. We already told about this in detail.
To quickly quantify the state of testing, we began collecting metrics. This was a good starting point to start discussing the sprint with the team.
What metrics do we collect?
At first, we were afraid that testers would not want their work to be evaluated using some common algorithms. But we managed to explain to the team that metrics are not needed for sanctions and fines. They are necessary to:
understand the current state of affairs;
identify and solve problems in a timely manner;
monitor trends – whether processes are improving or worsening over time;
improve the quality of testing of YuMoney products.
Testing is an important gateway on the path of a feature to the user. I wouldn’t want this gateway to slow down the release of features for production. That’s why testing time must be constantly reduced. The faster a feature reaches the user, the more money you can earn. Therefore, first of all, we decided to focus on testing time – to reduce it without losing quality.
Each product team began collecting the following metrics:
Testing Speed (TS) — the ratio of the number of tested tasks to the number of tasks submitted for testing in one sprint. It is calculated by the formula:
TS = Count (tasks tested) / Count (tasks submitted for testing)
Tasks Without Testing (TWT) — tasks without testing, when a developer drags a task to production, bypassing testing, which increases the risk of bugs in production.
Production Found Defects (PFD) — incidents in production, when users themselves find bugs in YuMoney products and bring them to technical support.
Integration Found Defects (IFD) — defects that were found at the stage of release acceptance testing by autotests. Three additional metrics have been identified here:
Test cases — number of test cases;
Autotests — number of autotests;
Coverage — the relationship between Autotests and Test-cases.
Total time that all tasks spent in testing / Total number of tasks
For all metrics, we made a color indication – a traffic light.
Meaning of flowers:
Green – Everything is fine.
Yellow – soon everything will be bad, but for now it’s okay.
Red – Yesterday it was time to intervene.
We determined for ourselves that if it takes less than two days to test one feature, then this suits us. If the average Test Duration metric does not go beyond this range, it means that the team is doing well with testing time, Time to Market is not suffering, and everyone is happy with the speed.
But we do not evaluate all metrics using traffic lights. Sometimes it is difficult to say how many test cases are considered good for a team. Every team has a different number of business processes and applications that they develop and support. We color such metrics V grey colour and paying more attention to how they change over time.
The first results of the testing team after the implementation of the traffic light
For each product team we made the following table:
At the top you can see that most of the cells in the table for this particular team are red. We saw a similar picture in several testing department teams. Something had to be done about this.
How we worked through the red zones
Conducted a rotation in some product teams.
Started evaluating testing when planning a sprint.
Changed the flow of working with tasks — so that the developer cannot push the task into production without testing.
We focused on writing automated tests. This is an expensive and time-consuming process. We invested a lot of resources in starting to write autotests, held an internal automation school and a code review school, and motivated testers to write autotests in various ways.
As a result, we improved the speed of testing in the department, and the metrics turned green:
But there was still something to work on.
Why we revised the metrics: removed some and added others
Metrics need to be constantly monitored. You can’t just add a few pieces and forget about them. And we must understand that a new metric will not always be beneficial. It happens that after a while you realize that it does not provide the necessary information. Such metrics must be mercilessly removed and an alternative must be sought. Otherwise, there will be a lot of “junk” metrics that are impossible to work with.
We removed metrics that did not provide us with useful information. These are Testing Speed and Tasks Without Testing. Instead, we added others that we still focus on:
Unstable Tests (UT) — the number of unstable autotests (those whose Success Rate for two weeks was below 70%).
Bad Tests (BT) — “bad” autotests, that is, those that required manual intervention three times in a month at the stage of release acceptance testing by autotests. Such autotests greatly reduce the Time To Market for which we are fighting.
Skipped Autotests (SA) — the number of blocked autotests, those that for some reason cannot pass release acceptance testing due to disabled business processes, bugs in the system, unready infrastructure, or because of the code of the autotest itself.
We regularly review and update our metrics. Their development is an evolutionary process. Everyone can suggest improving an existing metric or adding a new one. For us, this is a living tool that can be influenced by anyone from QA or other departments.
But there may be too many metrics, and then people will stop paying attention to them. So when adding a new metric, always ask the following questions:
For what purpose are we adding it?
What do we plan to do if it is red?
Is there another similar metric that will duplicate the information?
And so on. Also, don’t forget to collect feedback from users of these metrics.
Metrics highlight problems in a department, but do not show their cause or suggest a solution.
Let’s consider one of the cases YuMoney on working with metrics in testing. Here are the metrics for one QA team over eight sprints. We count the order of sprints from top to bottom:
In the first sprint of the year, from 7 to 20 January, the metric value is green. This means that everything was fine – the tasks were tested quickly.
In the next sprint, from January 21 to February 3, the time for testing tasks increased sharply, the metric turned red and took a very long time to return to normal. This happened because the most experienced tester left the team for some time, and was replaced by a newcomer who took a long time to adapt. By mid-April the metric turned green.
This shows that metrics only highlight the problem, but do not answer the question of why it arose and how to solve it. Each problem must be dealt with individually; there is no universal solution for everyone.
How we automated the metrics collection process
When we launched metrics in the testing department, team curators manually collected them using filters from Jira and TMS and put them into a table in Confluence. After some time, when the process became established and everyone got used to it, we decided to automate it.
At first, automation was very simple: we raised a backend service and made one handler. When it was called, the service went to Jira, processed the data and returned it in the form of JSON. Once a sprint, the supervisor of each team opened Postman, called the handler, and transferred the data from JSON to Confluence.
The team felt better with this automation, but wanted to simplify it even more. Then we attached a database to the service, added a client for working with Confluence, and began adding data to Confluence automatically according to a schedule – once every sprint, that is, every two weeks.
It would seem: everything is fine, we collect metrics automatically, they can be viewed at any time. But there was a problem – Confluence did not cover all our needs.
For example, it was difficult to compare multiple teams with each other. And when a lot of data accumulated, the pages took a long time to load and slowed down. We thought we should make our own visualization, because the graph in Confluence out of the box looked like this:
We already had experience writing a front-end in the testing department, so we attached a self-written UI to the ready-made backend, selecting a library for plotting charts. We called the new service Metric Reporter.
And our metrics began to look like this:
After some time, the designers and I redesigned the appearance of the new UI and added the missing functionality.
We thought it would be nice to work with the team on metrics directly in the interface, so we added the ability to comment on individual metrics and sprints. We also added general metrics for the department from various specialized QA teams.
As a result, we received a service that not only the manager, but also all colleagues from the testing department began to use. We were able to monitor the indicators and track the signals necessary to analyze the situation in the team and throughout the department.
Why a self-written tool
It would be logical to use some ready-made metrics visualization tool, for example Grafana. But:
In YuMoney, the storage period of data in Grafana is limited in time, and we need to look at metrics taking into account long periods.
Let’s sum it up
What opportunities did the metrics give us:
Objectively assess what is happening in teams.
Timely identify and resolve problems within the department.
Identify dependencies. For example, if you compare metrics year by year, you can see how the load on testers increases before the new year, when teams want to complete their quarterly plans on time.
Monitor changes and trends, such as the introduction of new processes or technologies.
Find growth points. Some metrics turn green after we’ve worked on them, we revise them and highlight others. We find new problems and solve them.
Distribute testing resources among teams.
These are reviews from colleagues from the testing department at YuMoneywho use metrics. They help them:
Monitor the testing process and visually determine the balance between quality and quantity.
Improve processes in the QA department.
If you’re still not measuring your testing, then take action. Metrics can reveal problems you didn’t even know existed.
How to start collecting metrics
Identify the problem you want to monitor.
Collect the indicators you want to count.
Develop a calculation formula and indicators for a traffic light.
Work with the team, sell the idea that introducing metrics is cool.
Receive and analyze results.
Work through the red zones.
Look for new growth points – constantly update your metrics.
To start collecting metrics, you don’t have to create complex automation or cut your services. You can use any tool that can build graphs and tables. Start with basic Excel or Google spreadsheets – after some time you will definitely notice that the testing process has improved.