insight and practice. Lecture 2: Randomized controlled trials

Previous lecture.

Randomized controlled trials (RCTs) represent the most objective, transparent and effective methodology for conducting experiments. They are extremely popular and are used in a variety of fields, including science, medicine, marketing and technology. They allow scientists and specialists to test the effectiveness of new treatments, drugs, products or services by comparing results between two or more groups. RCTs are much more common than it might seem at first glance. This is an incredibly popular method for studying cause-and-effect relationships. Although they are quite simple to implement, their accuracy is significantly superior to all other approximation methods ATE.

There are several types of randomized controlled trials. The most used ones:

  • Simple randomization — Each trial participant is randomly assigned to either the study intervention or a control intervention.

  • Stratified randomization — participants are first divided into strata based on certain characteristics, and then within each stratum there is a random assignment to study groups.

  • Cluster randomization – in this case, randomization occurs by groups or “clusters”, and not by individual participants.

  • Crossover test – participants first receive one intervention, and after a certain period of time – another (and vice versa).

  • Factorial test – Each participant is randomly assigned to a group that receives a specific combination of interventions, including a placebo.

Analysis of 616 RCTs indexed in PubMed in December 2006showed that 78% were simple randomization studies, 16% were crossover, 2% were stratified, 2% were cluster and 2% were factorial.

Bias and Bias

In the previous article we encountered a key obstacle to considering ATE as the difference in means between the target and control groups. This obstacle is called bias, which in English means “bias” or “bias”. Despite their apparent unrelatedness, together these two terms describe the situation well.

Indeed, according to the conclusion from the previous chapter, if BIAS(\mathcal{M}) is not zero, then at least one of the values BIAS_0(\mathcal{M}) or BIAS_1(\mathcal{M}) is also not equal to zero.

Suppose, for example, that BIAS_0(\mathcal{M})much more than zero.

BIAS(\mathcal{M}) =\frac{1}{|\mathcal{M}{(1)}|} \displaystyle\sum{i\in\mathcal{M}_{(1)}}Y^ i_{(0)} -\frac{1}{|\mathcal{M}{(0)}|} \displaystyle\sum{i\in\mathcal{M}_{(0)}}Y^i_{ (0)}\gg 0

In other words, this means that even before the experiment began, these two groups were not the same. If our experiment had not been carried out, both groups would not have been exposed, and their means of the target variable would still not be equal to each other. This is bias. It is an attribute of dividing into groups and has nothing to do with the impact on objects.

What do we dream about when conducting an A/B test?

One way to overcome this bias is that experts in the field of causal inference conduct randomized controlled trials.

In the context of solving the assigned assessment task ATE when using two groups, we will learn to use RCTs of the first type. Namely, RCTs that assume the presence of two groups, control and target, denoted as \mathcal{A} And \mathcal{B}which is why they are also called A/B tests.

What is the point? From many \mathcal{U} a sufficiently large subset of objects is randomly recruited, after which treatment is also distributed randomly between them, forming \mathcal{A} And \mathcal{B} groups. Due to randomization of distribution T the groups turn out to be similar, or rather homogeneous.

What is homogeneity?

Homogeneity (homogeneity) – the property of the studied values ​​to have similar qualities or indicators. In the context of comparing two samples, homogeneity refers to the degree to which the samples are similar or homogeneous along certain dimensions or characteristics. If samples are homogeneous, this means that they are very similar to each other in terms of the characteristics being studied. Otherwise, if the samples are heterogeneous, they differ in these characteristics.

In particular, as already mentioned, we are interested in intergroup equality of means for both Y_{(0)}and for Y_{(1)}:

\frac{1}{|\mathcal{M}_{(1)}|} \displaystyle\sum_{i\in\mathcal{M}_{(1)}} Y^i_{(0)} =\ frac{1}{|\mathcal{M}_{(0)}|} \displaystyle\sum_{i\in\mathcal{M}_{(0)}}Y^i_{(0)}\frac{1}{|\mathcal{M}_{(1)}|} \displaystyle\sum_{i\in\mathcal{M}_{(1)}}Y^i_{(1)} = \ frac{1}{|\mathcal{M}_{(0)}|} \displaystyle\sum_{i\in\mathcal{M}_{(0)}} Y^i_{(1)}

So, all we want is to get a partition into groups that will be homogeneous in Y_{(0)} And Y_{(1)}which will be quite enough for us to successfully eliminate BIAS.

What does randomization do?

The homogeneity of groups in an A/B test is explained by the Central Limit Theorem. This theorem says that if we have a large enough number of random independent elements (for example, test participants), then the sum of their influences on the results becomes similar to a normal distribution, regardless of their original distributions. In the context of A/B tests, this means that if the sample is randomly divided into control and test groups, and the number of participants in each group is large enough, then the basic characteristics of the groups, such as the average, will be similar.

Feature variables

Naturally, if we want to control the experiment, find out with what probability the test will “lie”, with what error we consider ATEwhat sample size we need and the like – we need to have some information about the distribution of random variables Y_{(0)} And Y_{(1)}. But we don’t know anything about them yet. Particularly because most often the values ​​of these quantities have not even been measured before the start of the experiment.

What to do?

As proxy values ​​for Y_{(0)} you can use, for example, past values ​​of the target variable (Y_{lag\, 1}). It is logical to assume that their distributions will have common properties, since both quantities are measured in the absence of exposure, albeit at different points in time. For Y_{(1)} the variance may differ and be due to auxiliary variables (X_1, X_2, X_3etc.). Thus, to obtain homogeneous groups, we focus on auxiliary variables on which the target variable may depend.

Stages of conducting an A/B test

So, in order to conduct an A/B test “by hand”, you need to have a fairly good understanding of this topic. Although the algorithm is simple, it contains a fairly large number of pitfalls. There are a huge number of books and manuals on this topic. In this article we will not duplicate this information. To illustrate, we will analyze the most classic stages of conducting an A/B test and consider an example in Python using ready-made tools that help formalize and strictly control concepts such as homogeneity and its statistical significance for each variable.

Stage 1: Formulating a hypothesis

The first step is to formulate a hypothesis. It must be specific and verifiable. For example:

Changing the color of the “Buy” button on a web page from blue to green will increase the number of clicks by 10%.

Step 2: Define the target metric

Next, you need to determine the target metric that will be used to evaluate the effectiveness of the changes. In our example, the target metric is the number of clicks on the “Buy” button.

Stage 3: Formulation of a mathematical hypothesis

Then a mathematical hypothesis is formulated. For example, the null hypothesis H_0 may claim that there is no difference in the number of clicks between the two versions of the button, while the alternative hypothesis H_1 will argue the opposite.

H_{0}: \mu_{A} = \mu_{B}H_{1}: \mu_{A} \neq \mu_{B}

Stage 4: Grouping

At this stage, users are randomly divided into two groups. Randomization, as we remember, is necessary to ensure an even distribution of user characteristics between groups.

Step 5: Control of homogeneity of auxiliary variables (A/A test)

Before conducting the A/B test itself, an A/A test is often performed to check the homogeneity of the auxiliary variables. This test consists of two groups being formed and analyzed without introducing changes. The goal is to ensure that the groups are truly random and similar in important characteristics. To do this you need:

  1. Repeating a large number of A/B tests with the same scenario as the original one. The difference will be the fictitiousness of the impact.

  2. Collection of information on target and auxiliary test indicators. Monitoring and, if necessary, calibration of confidence intervals, p‑value distributions, etc.

  3. Successful completion of the A/A test will help us correct possible errors in the design of the original experiment and verify the functionality of the built mechanism.

  4. It happens that the A/A test fails. Often in this case, you should change the partitioning algorithm (for example, use stratification), test the hypothesis, or examine the data for interdependence.

Stage 6: Decision making

Based on the analysis of the results, a decision is made as to whether the hypothesis was confirmed or refuted. If the p-value is less than a predetermined significance level (for example, 0.05), then the null hypothesis is rejected and the change is considered to have a significant effect.

Example of data analysis in Python

In this example, we will look at running an A/B test on synthetic data using the hypex library.

Step 1: Install and Import Required Libraries

First, let's install the hypex library, if it is not already installed, and import the necessary modules.

!pip install hypex
import numpy as np
import pandas as pd
import hypex as hx

Step 2: Generating Synthetic Data

Let's create synthetic data for two groups: control (A) and test (B).

# Установка случайного зерна для воспроизводимости
np.random.seed(42)

# Размер выборок
n_A = 1000
n_B = 1000

# Генерация данных
data_A = np.random.normal(loc=50, scale=10, size=n_A)
data_B = np.random.normal(loc=55, scale=10, size=n_B)

# Создание DataFrame
df = pd.DataFrame({
    'group': ['A'] * n_A + ['B'] * n_B,
    'value': np.concatenate([data_A, data_B])
})

Step 3: Conducting an A/B test using hypex

We use the hypex library to conduct an A/B test.

# Инициализация эксперимента
experiment = hx.Experiment(data=df, test_group='B', control_group='A', metric="value")

# Проведение теста
results = experiment.run()

# Вывод результатов
print(results)

Step 4: Interpretation of results

A/B test results include a p-value, confidence interval, and other statistics that will help you decide whether there are significant differences between groups.

# Пример вывода результатов
print(f"P-value: {results['p-value']}")
print(f"Mean difference: {results['mean_difference']}")
print(f"Confidence interval: {results['confidence_interval']}")

Conclusion

Like any other method, randomized trials have their limitations and disadvantages. These include some high cost, length of the process and, sometimes, the practical impossibility of completely randomly distributing objects into groups. For example, when conducting an experiment to study the harm of smoking on pregnant women, it is impossible to force a participant to be assigned to one group or another.

In the next article we will look at a method that will help us determine in such cases ATE.

Ivan Vyacheslavovich Yurashku

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *