how to avoid wrong conclusions

Author of the article: Kristina Kurdyumova

Product mentor, product manager Avito

A/B testing is one of the key tools of product analytics, allowing you to make informed decisions based on data. But despite their effectiveness, many teams make common mistakes when conducting and interpreting the results of A/B tests.

Read more about A/B test design: step-by-step instructions with theoretical foundations here.

In this article I will look at 7 most common mistakes with examples and ways to prevent them. To ensure that your A/B findings are accurate and reliable.

1. Incorrect definition of the purpose of the test

Problem: Often teams start A/B testing without a clearly defined goal. This results in test results that may be misinterpreted or may not meet initial expectations.

Example: Let's say you want to increase the number of registrations on your site. If the test goal is stated as “increasing clicks on button X,” you can focus on the click metric while ignoring the end goal of registrations.

Solution: Before starting the test, clearly define which metric you want to improve. This should be a specific and measurable goal, such as “increase registration conversion by 10%.”

2. Ignoring statistical significance

Error: Many teams draw conclusions before reaching statistical significance, leading to premature or erroneous conclusions.

Example: During the test, one version showed a 10% improvement after two days. The team stopped the test and started implementing changes before the test was completed, resulting in no increase in conversions in the long term.

How to avoid: Keep an eye on the P-value, which must be below 0.05 for the results to be considered statistically significant. Do not complete the test before completing the full term.

3. Insufficient sample size

Error: If the sample size is too small, the test results may not reflect the true behavioral changes of users. This increases the likelihood of random fluctuations and false conclusions.

Example: The test was conducted on a sample of 500 users, but this is not enough to draw statistically significant conclusions. The findings were that the button change increased conversions by 15%, although this was a random fluctuation.

How to avoid: Use tools to calculate sample size before starting the test. Make sure that the selected amount of data covers all possible variations of users.

Good calculators:

4. Neglecting the effect of seasonality or external factors

Error: Ignoring seasonality of data or events that influence user behavior.

Example: The test was conducted around the holidays, when traffic and conversions were abnormally high. The results turned out to be erroneous because they did not take into account the specific seasonal effect.

How to avoid: Ensure that the test covers a sufficient time period to compensate for seasonal factors and special events. If this is not possible, consider seasonality in your conclusions. Also, monitor external events and factors that may affect the test. When analyzing the results, consider these factors and, if necessary, adjust your conclusions.

5. Quitting the test early

Error: End the test without waiting for the results to stabilize, which may lead to incorrect conclusions.

Example: The test lasted only a few days, and although version B showed a significant increase in conversions in the early stages, this effect disappeared later on.

How to avoid: Conducting the test for a minimum full period and evaluating its results to achieve data stabilization.

Minimum duration of A/B test: 7 days = full week.

6. Parallel tests on overlapping audiences

Error: Conducting multiple A/B tests simultaneously on the same audience may skew the results due to overlapping effects.

Example: The company launches two parallel A/B tests on the same audience. The first team is testing a cart design change to improve conversions, and the second team is testing a change to the registration process. Users are faced with two different changes at the same time, making it difficult to determine which change led to an increase in conversions.

Options to avoid:

  1. Divide users into separate, non-overlapping groups for each test, so that each group participates in only one experiment.

  2. Run tests one at a time. This will eliminate the influence of one test on the results of another

  3. Use advanced analysis tools: Some companies have their own analytics platforms that support parallel testing and can help isolate effects.

7. Selecting irrelevant or insensitive metrics

/An insensitive metric is a metric that does not capture sufficiently subtle or meaningful changes that occur as a result of the test. For example, it can show that everything remains the same, even if minor improvements have occurred. A striking example of such a metric would be retention – user retention./

Error: Choosing the wrong metrics can result in A/B test results that do not reflect the true impact of changes on the business. Insensitive metrics do not capture meaningful change, and irrelevant metrics are not connected to the ultimate business goals.

Example: Team A chose “pageviews” as the primary metric instead of “conversion” or “user retention,” which did not provide a real sense of the impact of the change being tested.

or

Team B implemented a new feature in the application and decided to measure its success through a metric retention 30″day (user retention for 30 days). However, after a month, retention remained almost the same, and the team concluded that the changes did not lead to significant results. In fact, the feature improved short-term user sessions, increasing user engagement within the first few days, but the long-term metric did not reflect these changes.

Conclusion: metric retention appeared to be insensitive to the short-term effects of the new feature.

How to avoid: It is important to use not only the target metric, but also proxy metricswhich may reflect intermediate steps in the process of achieving a goal.

nI'll give you an example, target metric – conversion to payment. However, proxy metrics such as adding item to cart, entering card data And clicking the “pay” buttonwill help you catch small changes. If the target metric doesn't show meaningful change, proxy metrics can help you see where users are experiencing problems or improving the experience.

I talked even more about A/B testing here.

A/B testing is a powerful but complex tool that requires a competent approach and a deep understanding of analytics. The success of testing depends on the correct choice of metrics, taking into account statistical significance and correct interpretation of the results. While mistakes are inevitable, properly setting up tests and using proxy metrics can help minimize risks.

When done correctly, tests provide businesses with invaluable data to make informed decisions, allowing them to optimize their product and increase conversions. In this context, A/B tests become an important element in the product analytics arsenal.


You can gain more relevant analytics skills as part of practical online courses from industry experts.

In addition, on October 15, as part of the “Business Analyst in IT” course, an open lesson on “Precedents, use cases and Use cases” will be held, which will be useful to all those who want to improve their process description skills. If the topic is relevant – sign up for the lesson using the link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *