Testing Leadership: Service Testing

Welcome to the series of articles “Leadership in testing» from software testing guru and consultant Paul Gerrard. This series is designed to help testers with years of experience, especially those working in Agile teams, succeed in their test lead and management roles.

In the previous article, we looked at the changing role of testers and ways to improve collaboration with colleagues. In this article, we'll dive into the intricacies of testing the performance, reliability, and manageability of a web application. In other words, testing services.

This week we'll look at testing services for web applications.

Let's begin.

What is service testing?

The quality of service provided by a web application can be defined by including all its attributes such as functionality, performance, reliability, usability, security, etc.

However, for our purposes, we identify three specific goals that will be the focus of the testers who will be conducting “service testing.” These goals are:

In all three cases, it is necessary to simulate user load to conduct tests effectively. The goals of performance, reliability, and manageability exist in the context of real customers using the site to conduct business.

The response time (in this case, the time it takes for one system node to respond to a request from another) of a site is directly dependent on the resources available within the technical architecture.

As more customers use the service, fewer technical resources become available to serve each user's requests, and response times will deteriorate.

Obviously, a service that is not heavily loaded is less likely to fail. The main complexity of software and hardware is needed to maintain resources within the technical architecture under heavy load.

When a site is loaded (or overloaded), conflicting requests for resources must be managed by various infrastructure components, such as server and network operating systems, database management systems, web server products, object request brokers, middleware, etc.

These infrastructure components are generally more reliable than custom code that requires a resource, but failures can occur in the following cases:

By simulating typical and unusual production loads over a long period, testers can identify weaknesses in the design or implementation of the system. Once these weaknesses are corrected, the same tests will show that the system is stable. Testers can use load testing tools to perform many of the processes described below.

All services typically have a number of critical management processes that need to be performed to keep the service running smoothly. It may be possible to take a service down for scheduled maintenance outside of normal business hours, but most online services operate 24 hours a day.

The working day of a service never ends. Inevitably, management procedures must be performed while the service is running and users are logged in. These procedures must be tested during system load to ensure that they do not negatively impact the real-time operation of the service (performance testing).

What is performance testing?

Performance testing is a key component of service testing. It is a way to check how a system performs in terms of responsiveness and stability under a given workload. It works as follows:

A graph of these changing loads is plotted against the response time experienced by our “virtual” users. When plotted, the graph looks something like the figure below.

At zero load, when there is only one user in the system, all the resource belongs to him and the response time is fast. As we increase the load and measure the response time, it gradually deteriorates until we reach a point where the system operates at maximum capacity.

At this point, the response time for our test transactions is theoretically infinite, since one of the system's key resources is completely consumed and no more transactions can be processed.

As the load increases from zero to maximum, we also monitor the usage of various types of resources, such as server CPU usage, memory usage, network bandwidth, database locks, etc.

At maximum load, one of these resources is exhausted to 100%. This resource is the limiting resource because it is the first to run out. Of course, at this point, response time has degraded to the point that it is probably much slower than would be acceptable.

The graph below shows the usage/availability of several resources depending on the load.

As you've no doubt figured out, performance testing requires a team of people to support the testers. These include technical architects, server administrators, network administrators, developers, and database designers/administrators. These technical specialists are skilled at analyzing the statistics generated by resource monitoring tools and assessing how best to tune an application, configure a system, or upgrade a system.

If you are a tester and you are not an expert in these areas yourself, resist the temptation to pretend that you can interpret these statistics and make tuning and optimization decisions. You should involve experts early in the project to get their advice and support, and then, during testing, ensure that bottlenecks are identified and resolved.

Reliability/Fault Tolerance Testing

Ensuring continuous availability of a service is probably a key goal of your project. Reliability testing helps to identify hidden errors that cause unexpected failures. Fault tolerance testing helps ensure that the fault tolerance measures designed to expected glitches really work.

Fault tolerance testing

When sites are required to be resilient and/or reliable, they are typically designed using robust system components with built-in redundancy and failover features that kick in when failures occur.

These features may include a variety of network routing, multiple servers configured as clusters, middleware, and distributed serving technology that handles load balancing and traffic redirection in failure scenarios.

Fault tolerance testing aims to study the behavior of a system under selected failure scenarios before deployment and typically includes the following:

A technique called fault tree analysis (FTA) can help you understand the dependencies of a service on its underlying components. Fault tree analysis and fault tree diagrams are logical representations of a system or service and the ways in which it can fail.

The simple diagram below shows the relationship between the basic component failure events, the intermediate subsystem failure events, and the top-level service failure event. Of course, more than three levels of failure events can be defined.

These tests should be run against an automated load to examine the system's behavior in production situations and to ensure that the necessary measures are taken to counteract fault tolerance. In particular, the tests show the following:

Ultimately, the tests are aimed at determining whether end user service is maintained and whether they notice when a failure occurs.

Reliability (or endurance) testing

Reliability testing aims to verify that failures do not occur under load.

Most hardware components are so reliable that their mean time between failures can be measured in years. Reliability testing requires the use (or reuse) of automated tests in two ways to simulate:

By focusing on specific components, we try to stress the component by subjecting it to an unreasonably large number of requests to perform its intended function. It is often easier to stress test critical components in isolation with a large number of simple requests before applying a much more complex test to the entire infrastructure. There are also specially designed stress testing tools to make the process easier for QA.

Soak tests are tests that stress a system for an extended period, perhaps 24, 48 hours or longer, to find hidden problems. Latent faults often only become apparent after a long period of use.

An automated test doesn't necessarily have to be scaled up to extreme loads (stress testing covers that), but we're particularly interested in the system's ability to withstand continuous execution of a wide range of test transactions to detect any hidden memory leaks, deadlocks, or race conditions.

Service Management Testing

Finally, a few words about service management testing.

Once a service is deployed to production, it needs to be managed. To keep the service running, you need to monitor it, update it, make backups, and quickly fix it when something goes wrong.

The procedures that service managers use to perform upgrades, backups, releases, and failover are critical to delivering a reliable service, so they need to be tested, especially if the service will be subject to rapid changes after deployment.

Specific issues that need to be addressed:

Tests should be conducted as realistically as possible.

Some food for thought

Some systems are subject to extreme loads when a particular event occurs. For example, an online business may expect peak loads immediately after an offer is advertised on TV, or a national news site may be overloaded when a major news story is released.

Think of a system you know well that has suffered an unplanned incident in your business or on the national news.

What incidents or events could cause excess load on your system?

Can (or could) you collect data from system logs that gives you the number of transactions executed? Can you scale this event to predict a 1 in 100 years or 1 in 1000 years critical event?

What measures could you take (or have you already taken) to reduce the likelihood of peaks, their scale, or eliminate them entirely?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *