metrics and best practices

The use of APIs in software development has played a large role in the creation of modern applications and has influenced their overall evaluation and end-user experience. In this article Ekaterina Sayapina, Product Owner of the platform’s personal account MTS Exolve, talks about the correct approaches when monitoring API operation. Details are under the cut.

Business income has increased, which is confirmed by the results of a survey of more than 40,000 developers from Postman API status report. 43% said API use generates more than a quarter of their companies’ total revenue. At the same time, it is important to control the functioning of software interfaces and conduct high-quality life cycle monitoring to ensure that they are operating correctly. Such analysis will help identify problem areas and detect errors before the product enters the market.

In this article, we will identify industry best practices for API monitoring and priority metrics, and how to troubleshoot when problems arise.

Why track APIs?

Monitoring an API is the practice of constantly checking the availability of its endpoints, as well as the correct operation of the transactions it conducts. This way, their functionality and speed of response to requests of varying complexity will be clear. This type of analysis identifies erroneous or slow API transactions before they are reported by the end user. We use some of them for development SMS API.

Here are some benefits of this approach.

1. Ensuring successful API transactions

The API creates the necessary level of abstraction between microservices in modern programs. This architecture requires the use of complex, multi-step API interactions with third-party integrations.

For example, if you fail to integrate a payment gateway into your e-commerce application, you will lose both consumers and revenue. Therefore, detailed API monitoring will eliminate potential clutter and ensure success at every step of the transaction in your application.

2. Validation of response data and error handling

You should monitor API parameters to assess the reliability of transactions, as well as to identify failures and failed authentication attempts, in which case you send appropriate alerts to support or directly to developers.

3. Identify modified endpoints and third-party integration errors

API endpoints change quite frequently as the organizations that use them expand (adopting new product layouts). With this in mind, their versions are updated, which means frequent checks are necessary. This way you can ensure that the code doesn’t break when changes are made.

API timeouts, call delays, crashes, and downtime for endpoints that rely on third-party integrations can significantly impact program performance.

API monitoring helps identify and resolve these types of issues in real time. It is an effective tool for optimizing services in an organization or project.

What key API metrics to track

1. Availability or Uptime

This metric is a standard for measuring the availability of your product and is usually included in the SLA (service level agreement) when entering into a contract between the service provider and the customer. API uptime is measured as a percentage or in some cases as the average downtime per year. Among developers, you can often hear that the indicator is determined by “nines”.

Let’s look at the table as an example:

Availability, %	Downtime per year
99% (“two nines”)	3.65 days
99.9% (“three nines”)	8.77 hours
99.99% (“four nines”)	52.60 minutes
99.999% (“five nines”)	5.26 minutes

Of course, going from four nines to five is much more difficult than from two to three, but you have to strive for this.

2. CPU and memory usage

When debugging the API locally, you will see the system CPU usage via Windows task manager (or activity monitor on mac).

However, this is more problematic to do on the server. High CPU or memory usage on the API host server usually indicates that the virtual machine, container, or API Gateway host is overloaded, slowing down its performance.

You can track this metric, as well as the number of processes waiting to run, across the entire cluster hosting the API code. The memory used by the software interface is quantified as the proportion of available memory used.

3. API Consumption

API demand is measured by the number of requests per minute or second (RPM). This performance metric is often used when comparing HTTP servers or databases. Knowing the number of simultaneous user requests, the speed of response to them and the average reflection time, you can easily calculate the number of requests per minute using the formula:

r = n ÷ (T_response +T_think)

For example, you saw that after launching software for an online resource, the average number of concurrent users was 2,800. This depends on the number of people registered on the site, their behavior and the time they choose to send requests.

Using this information, we will use the specified formula to calculate requests per minute and the number of them that your system can process for this user base:

n = 2,800 concurrent users
T_response = 1 (average response time to a request is one second)
T_think = 3 (average thinking time is three seconds)

Calculation of the number of requests per second:
g = 2,800 ÷ (1 + 3) = 700

Therefore, the number of requests per second is 700 and the number of requests per minute is 42,000.

4. API Response Time

The metric is a counter of the time it takes for an API endpoint to provide a response. This metric is difficult to track when using third-party APIs because latency can be a result of either extremely slow endpoints or network issues.

APIs are considered high-performance if the average response time is between 0.1 and 1 second. At this speed, the person using your program will not see any interruptions in its operation. After one or two seconds the delay is already noticeable, and after five seconds you risk losing the application user.

For example, you might have a POST /checkout endpoint whose latency is gradually increasing due to the size of the SQL table and its incorrect indexing. However, due to the small number of POST /checkout calls, this problem is masked by your GET /items endpoint, which is called much more often than checkout. Likewise, if you have GraphQL, you need to look at the average latency per GraphQL operation.

5. Error rate

Error rate shows the number of errors per minute or second. It allows you to get accurate issue tracking information on specific API endpoints. This is the number of API calls per minute with status codes other than 200, and is critical to measuring how broken and error-prone your API is. Therefore, the lower the indicator value, the better. All status codes can be viewed on Wikipedia.

6. Number of unique API Consumers

This API metric helps the development team gain insight into overall product growth and new customer acquisition based on monthly active users. A rapid drop in performance during peak hours may indicate a problem with the application platform.

It is important to measure API DAU (daily active users) and web DAU if you have implemented such a format. We’re talking about those cases where your team creates a product both as an API and as a web platform. If the web DAU is growing much faster than the API DAU, this could indicate a leaky funnel during integration. This is especially true when the company’s main product is an API, like MTS Exolve.

7. API usage growth

Like the previous metric, we will call it a measure of the implementation of your public software interface among the masses. Correct operation of the API is not enough; its demand among users must also increase.

To do this, it is necessary that the interaction does not evoke negative emotions in people. This metric needs to be tracked to analyze the real growth trends of your product.

API Monitoring Practices

An API monitoring strategy is most successful when it is tailored to the unique needs of a specific business. What needs to be analyzed to make the API more accessible:

Supporting Infrastructure Monitoring. Not all problems with the API relate to the health and availability of its endpoints; some arise in other parts of the stack. For example, an insufficiently prepared database or a network failure can lead to delays and errors.
Periodic review of settings monitoring: business needs are constantly changing, as are the technologies that support them. It is therefore important to regularly review your monitoring strategy to ensure it is effective and up to date.
Sending automatic alerts to clients via messengers or SMS API from Exolve. An API monitoring strategy will not be effective if developers have to manually check its status. Therefore, a leader should use alerting capabilities and integration with communication tools so that teams are automatically notified of appropriate actions.
Look for historical trends. This will help improve performance by analyzing tests over time and identify trends when problems arise.
Run monitoring at the right frequency. Determine how often you will perform tests. Is your API critical and requires minute-by-minute monitoring? Or is it enough to run tests every 60 minutes or 24 hours?

Monitoring tools

All described indicators and metrics can be viewed and analyzed using any of the currently existing APM systems (Application Performance Monitoring).

Here are some solutions for monitoring and testing APIs:

Monitoring the health of your API may seem like a daunting task at first. But tracking the right metrics is an essential practice for anyone who creates and works with software interfaces.

Share your metrics, tools and practices in the comments.

metrics and best practices