How to conduct a safe experiment: guardrail metrics

Let's take a closer look.

Types of guardrail metrics

What are these metrics anyway?

Financial guardrail metrics

When it comes to money, mistakes are unacceptable. Improving UX, interfaces or site loading speed means nothing if at the same time revenues are falling or the average bill is decreasing. Therefore, financial metrics are the first and most important on the list. guardrail metrics.

Example:

Let's say you're testing a new payment page interface that should speed up the ordering process. The test showed that checkout times were reduced by 20%, and everything seemed perfect… until it turned out that the percentage of completed transactions had dropped. This signals that users may be confused about the new payment step, and this directly impacts revenue.

Examples of financial metrics:

Percentage of completed transactions. If it falls, this is a bad sign – something in the UX is not working.
Average check. A decrease in the average bill indicates that users began to buy less or less frequently.

Example in Python:

from sqlalchemy import create_engine, func
from sqlalchemy.orm import sessionmaker

# Настройка подключения к базе данных
engine = create_engine('sqlite:///ecommerce.db')
Session = sessionmaker(bind=engine)
session = Session()

# Подсчёт завершённых транзакций и среднего чека
completed_transactions = session.query(func.count()).filter_by(status="completed").scalar()
total_transactions = session.query(func.count()).scalar()
average_order_value = session.query(func.avg(Order.amount)).scalar()

completion_rate = (completed_transactions / total_transactions) * 100
print(f"Completion rate: {completion_rate}%")
print(f"Average Order Value: {average_order_value}")

User Experience Metrics

Improving the user experience is good, but if your changes result in more errors, slow loading times, or decreased usability, users will start to churn. At Netflix, for example, all the fancy recommendation systems become meaningless if users experience delays when watching videos.

Examples of such metrics:

Page loading speed. Long loading times cause users to become impatient and leave.
Number of errors and crashes. More errors mean less trust among users.
Time on site. A decrease in the time spent by the user on the site indicates that something is going wrong.

Example implementation of metrics monitoring in Python:

import time
import requests

def monitor_page_load(url):
    start_time = time.time()
    response = requests.get(url)
    load_time = time.time() - start_time
    if response.status_code == 200:
        print(f"Load time for {url}: {load_time:.2f} seconds")
    else:
        print(f"Error loading {url}, status code: {response.status_code}")

monitor_page_load("https://example.com")

Strategic Metrics

Now the most interesting thing – strategic metrics. These are the metrics that reflect the company's long-term goals. They may not be as obvious as money or site speed.

For example, user retention is a strategic metric. If, as a result of the test, users began to leave, this may indicate an incorrect course.

Other examples:

Growing active user base — an indicator of how quickly your user base is growing is especially important for services with a subscription model.
Number of active sessions — if you test an interface change, and people start returning to the application less often, this is an alarming sign.

Additional metrics

By the way, you can also add a few interesting metrics:

Return rate. Example: on an e‑commerce website you introduced new functionality that increased purchases, but the number of product returns also jumped. This is a clear sign that the product or UX is failing somewhere.
User satisfaction level. Here classic NPS surveys or something like ratings in the application will help you. Changed functionality? Check that users are still happy.

Relationship between metrics

Frankly speaking, you are unlikely to run a test with only one metric. There is usually a whole orchestra of indicators: finances, user experience and strategic goals of the company.

Let's take for example financial metrics. They make it clear how much your product earns. But user experience metrics they are already talking about how users interact with it – whether they sit on the site, whether they get tired of long loading times or a surge of errors. Well strategic metrics will help you not to fly out in the long term: are users retained, is the customer base growing? If you want to not only make quick money, but also survive in the market, you need to monitor all these metrics at once.

Therefore, metrics never live separately. Here's how to proceed:

Check side effects. Increased conversions? Amazing! But make sure that page loading times do not increase indefinitely. No one will wait until the site finally loads. Improved one metric? Make sure others don't suffer.
Set up automatic response metrics. In large systems, you can set up automatic verification and relationships between metrics. For example, you can configure the system so that when conversions increase, user metrics and revenue indicators are automatically checked. This can be done via the API with PostHog, Grafana etc.
Metrics with condition. Sometimes increasing one metric can make another worse. For example, has the conversion increased and the return rate also increased? This is already a warning signal: perhaps your product is good in words, but does not live up to customer expectations.
Forecasting and control. Set thresholds for each metric and set up a system to monitor them. For example, if the page loading speed has dropped by 20%, the system should immediately signal you about this – even if conversions are growing by leaps and bounds. This is exactly the case when one indicator pulls the blanket over itself, and others begin to suffer.

An example of automatic checking of relationships between metrics:

import time
import requests
from statistics import mean

# Функция для проверки конверсии и скорости загрузки страниц
def monitor_metrics(urls):
    conversion_rate = check_conversion_rate()  # допустим, это функция, которая отслеживает конверсию
    load_times = []

    for url in urls:
        start_time = time.time()
        response = requests.get(url)
        load_time = time.time() - start_time
        load_times.append(load_time)
    
    average_load_time = mean(load_times)
    
    if conversion_rate &gt; 10 and average_load_time &gt; 2:  # Условие, при котором мы проверяем баланс
        print("Конверсия хорошая, но сайт начал тормозить! Проверь пользовательский опыт.")
    elif average_load_time &lt; 2:
        print("Скорость нормальная, можно продолжать эксперименты.")
    else:
        print("Проблемы как с конверсией, так и со скоростью. Время для полной ревизии!")

Thus, you can monitor several metrics simultaneously and respond to changes in each of them.

Conclusion

Guardrail metrics allow you to avoid unpleasant surprises, keeping your product afloat. Remember: a successful experiment is not just about improving one metric, but also about maintaining stability in all other aspects of the product.

Experiment, but be careful!

In conclusion, I’ll tell you about the open lesson on October 24 – “Secrets of metamodels: how business analysts create successful information systems.” What will we talk about:

— The concept of a metamodel
— Metamodel at the heart of design
— Concentration of view and viewpints
— Selecting views and identifying stakeholders
— Analysis of a practical example of IP design from different points of view

You can sign up using the link.