5 Ways to Avoid Online Store Failures and Stop Counting Lost Profits

Hello! KISLOROD is on the line. Websites of all types are complex systems, in the operation of which errors in the software and failures in the technical part are often possible. But if for a regular website this is an unpleasant situation, then in e-commerce it always results in losses and lost profits.

In this article I will tell you what problems with the functionality of an online store can cause and how to protect yourself from them.

Back in 2014, the company Gartner conducted a study and found that the average cost of 1 minute of website downtime costs a business $5.6 thousand. And the study from Ponemon Institute For 2016, she named the figure of 9 thousand dollars per minute.

Examples of major failures

In 2022, Cloudflare made a planned configuration change that included adding a new routing layer. The change was imperfect and ultimately resulted in the shutdown of some of the world's largest sites and services, including Amazon, Twitch, Steam, Coinbase, Telegram, Discord, DoorDash, Gitlab, and many others.

Companies Affected by Cloudflare Outage

Companies Affected by Cloudflare Outage

In 2019, traffic congestion caused disruptions at major retailers:

  • J. Crew lost 323,000 customers and suffered $775,000 in losses in about 5 hours.

  • Walmart Lost $9 Million in Just 150 Minutes

  • Costco lost $11 million because their website was down for more than 16 hours.

The sites were technically unprepared for the sudden increase in traffic and lacked high-performance testing of the cloud infrastructure.

Another reason for the failures is cyber attacks and security vulnerabilities. Agency Statista provides data on the cost of all cyberattacks for European companies in 2023, which amounted to between 9.6 and 24.2 thousand dollars.

Average cost of all cyber attacks for European firms in 2023 by country

Average cost of all cyber attacks for European firms in 2023 by country

Why Major Failures Are Dangerous

  • Stop sales. If the online store is unavailable to the owners, then all sales stop, and the longer the downtime, the greater the losses.

  • Financial losses. If the failure concerns payment systems, then in addition to stopping sales, there is a risk of unforeseen expenses.

  • Decreased audience loyalty. Users will not wait for the store to recover and will most likely go to other sites, but if the site regularly crashes or works slowly, the store will lose customers.

  • Problems with the law. Online stores handle users' personal data and are required to ensure its confidentiality, and in the event of cyber attacks, a leak is possible, which can lead to lawsuits.

Causes of failures

Most often, failures are caused by certain events:

  • New product launches, exclusive events and seasonal sales. The associated excitement can dramatically increase the amount of traffic, so it is worth preparing in advance for any mass promotions.

  • Low code quality and implementation of new functionality. Implementing features that haven't been tested for bugs or aren't included in the architecture can bring down the entire site, and poor code quality impacts performance and leads to problems when scaling the project, which accumulate and can lead to failures.

  • Cyberattacks and security issues. Large online stores are often subject to cyber threats and attacks, so it is especially important for them to monitor the safety of user data in order to avoid becoming targets of blackmail.

How to prevent failures or eliminate them at an early stage

1. Load testing

Performance tests allow you to reproduce a critical situation and work on weak points in advance. This is an important moment in the preparation of large events, sales and seasonal promotions.

Load tests allow you to simulate different levels of demand in controlled situations: how well the site copes with increasing load, how it reacts to a gradual or sudden increase in traffic, whether it maintains its functionality.

Example from practice

One of our clients was having problems during sales. To carry out the promotion, a special landing page was created within the current site, and a large volume of traffic was attracted from all channels.

It turned out that the previous contractor had not foreseen this. The architecture of the code and server environment was not designed for heavy loads, and the system regularly crashed during the peak periods of customer influx.

To fix the problem, we ran a series of load tests, which revealed the critical minimum and maximum of the site. After that, we optimized the code and architecture for the corresponding loads, increasing the site's performance.

2. Monitoring website availability 24/7

Site health monitoring is a warning system that identifies problems early and allows you to fix them quickly. Monitoring involves regularly checking your site for performance and availability.

Monitoring is carried out in real time, which significantly reduces the time required to fix problems. Site administrators and all responsible persons promptly receive important information about failures from the system logs.

Example from practice

On one of our projects, where the monitoring system is connected, the number of busy web server work processes increased. As a result, thanks to this trigger, we detected a bot attack in advance and blocked them in time, preventing the site from crashing.

3. Monitoring code quality

Suboptimal code, vulnerabilities, current and potential bugs can cause a website to malfunction, so in our projects we use special software for analyzing and measuring code quality – SonarQube and a number of other services.

SonarQube performs continuous statistical analysis and warns about potential problems:

  • finds errors and bugs;

  • indicates duplication of code sections;

  • conducts searches for vulnerabilities and security issues;

  • shows code that has insufficient test coverage;

  • identifies violations of coding standards, poorly structured and confusing code, too many or too few comments.

In conjunction with SQ, we use the Sentry service, which allows us to automatically track and record errors that occur in the code and on the server, and keep records with prompt notification.

Example from practice

A large online store of women's clothing contacted us: during periods of marketing activity and an influx of visitors, the site experienced performance issues. When the number of orders increased significantly, the site's admin panel loaded for several minutes, which is why the sales department could not work fully.

We conducted a deep technical audit and found a number of problems: incorrect settings of Bitrix modules and the server, problems in integrations and a non-optimized code structure.

Based on the analysis results, we compiled a list of shortcomings and consistently eliminated them: we optimized the server operation, corrected functional and code errors, and refactored the software part.

4. Automate critical tests

Automated tests are used to monitor the performance of e-commerce sites and quickly find errors. The service runs the test, enters data into the testing environment, receives the results, compares them with the standard, and creates reports on the results.

Automated tests reduce time and simplify the testing process, and most importantly, they eliminate the influence of the human factor, that is, avoid unintentional errors by testers and developers.

Example from practice

Our client is a large online store of women's and men's underwear. Before working with us, the project did not implement autotests before launching new functionality.

Tests and releases were performed manually by the testing department, and since no one is immune to human errors, one day a new feature was released that contained a critical bug in its script at the checkout stage, due to which the payment system stopped working. Fortunately, the bug was quickly found and eliminated.

If the failure lasted for a significant period of time, the store would have suffered significant losses. The more orders per unit of time, the greater the losses, so automated testing should not be ignored.

5. A streamlined process for transferring changes to combat

One way to prevent failures is to have rules for transferring changes to the “production” version of the site.

To catch errors and prevent failures, we use a three-stage verification system:

  • First, the code is checked by the developers themselves, each of whom has their own copy of the site;

  • then the implementations are tested by the testing department;

  • the last step is to transfer the changes to the production site;

  • optionally run automated tests.

When a particularly complex functionality is released, we conduct functional testing on the “combat” version of the site. In addition, we do not release implementations at the end of the working day or week, and decisions on launching a release are made by team leads after additional verification.

Resume

  1. The more orders per unit of time, the greater the losses due to downtime.

  2. Prevention and warning are much cheaper than losses due to failures.

  3. Technical support and monitoring are a must for any e-commerce website.

  4. Saving on qualified development always ends up costing significantly more than quality development.

  5. It is necessary to regularly monitor the quality of the code, look for errors, test the performance of the site and critical implementations.

To quickly recover from a failure, it is also necessary to have automated systems and plans for backing up and restoring files, databases, and configuration.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *