How to solve the problem of business logic vulnerabilities? Break the application even before writing the code

Hi all. My name is Niyaz Kashapov, I am AppSec Lead at SberMarket. I have been improving secure development processes for more than 5 years. He started his career in fintech, where he worked on the security of code, features and business processes in online banking. And now I’m continuing what I started with one of the fastest growing players in the e-com market.

I think many people in practice have encountered a vulnerability that simply cannot be fixed – after all, it is embedded deep inside the solution being developed, hung with a bunch of dependencies and requires a complete rebuild of the solution itself. Most often, such vulnerabilities remain in the product forever and are kept from “falling” by many crutches. A reasonable question arises: “How did they arise?” Most often the answer is “This is how it happened historically,” and the origins of the problem have long been forgotten. It’s better to deal with this preventively, but I’ll try to tell you how to do it in this article.

Let's talk about how to avoid situations where the vulnerability is inherent in the architecture or business process and fixing it can cost many man-hours. Let's figure out when a feature becomes a bug and how to work out the architecture of services without creating security holes.

Try to fix the vulnerability and not drop the product

Try to fix the vulnerability and not drop the product

The pain and suffering of fixing architectural vulnerabilities

At the start of my career, I came across such vulnerabilities more than once. It is difficult to classify them as classic code vulnerabilities (injection, validation, file loading) – they were directly in the business logic of the application.

Here are some examples of such problems. All coincidences with reality are accidental. All examples are simplified for illustrative purposes.

Case 1. Vulnerability indicating the payment amount on the front end

This is a classic! From the very title of the case it is clear where the problem lies.

In one fairly common payment system used in online stores, a vulnerability was discovered that allows an attacker to change the purchase amount when paying for a product or service. At the last stage of the purchase—at the time of sending the payment request—the fraudster needed to change the payment amount parameter paymentSum downward. The final amount was not validated by the store in any way. Because of this, cases arose when goods worth several thousand rubles could be bought for 11 rubles.

How can this be avoided? Consider the payment process in such a way that the final purchase amount is always sent only directly by the store through back-2-back or through the buyer's front, but with the integrity of the sent data checked.

Case 2. Inability to kick all active sessions

I think many people have encountered such a case. The authentication system did not provide for the possibility of early termination of a session at the request of the user or the company’s technical support. That is, someone could use social engineering to ask for a one-time access code and log in from a new device. An SMS notification, of course, arrived, but the client could not do anything about it. And technical support could only delete the session directly in the database.

A similar case is the vulnerability of ever-living Refresh Tokens. The developers expect the user to click on the Log out button and thereby deactivate the active token. However, users simply closed the tab and the token remained in the browser’s storage. For a long time, anyone with access to a browser could take the token and use it to issue a session under a new Access token.

How can this be avoided? Don’t rush to roll out the unfinished authentication service into production; take into account the requirements for session management when developing the service.

Case 3. Refund for a delivered order

An unusual case that may arise due to microservice architecture.

Typically, the buyer goes through the path “cart → holding money → waiting for the courier → receiving the order → debiting money.” The scheme works great, but only until the store makes delivery on its own. The path breaks when the “Waiting for the courier” step is skipped, since there is no way to track its location. In this case, confirmation of the order occurs only when the courier returns to the store. At this time, the buyer who has already received the order can submit a request to cancel it. This is possible due to mismatch between services.

How can this be avoided? Take into account the delivery case by external forces and give the opportunity to cancel the order only before the final assembly of the order.

Case 4. Authentication tokens in GET parameters. POST-as-GET and balancers

Another example of how Legacy can become a headache for everyone.

Let's say that historically in an application, authentication tokens are passed in the parameters of a GET request. It doesn’t seem like a big deal, but with the advent of balancers, caches and a large amount of functionality, controlling the leakage of authentication tokens has become an impossible task.

Suddenly, files can become quite a big problem. They are opened only to authenticated users – and in the address bar, of course, there is an authentication token. Unaware users copy links and send them to friends, colleagues and various services that store these links (hello, link shorteners!). Of course, in this case, cases of data leakage may increase many times more and you will have to change the display of files to a scheme with a one-time link.

How can this be avoided? When writing services for the first time, do not use authentication using the GET parameter. But seriously, take into account the peculiarities of the system and immediately start working with one-time links to objects through a proxy or S3 storage.

Learning from mistakes

Almost all of these cases have one thing in common: when developing the service, the teams did not take into account the security requirements of the feature itself and did not consider non-trivial cases of using their systems.

All of the above cases show that in complex systems it is easy to make mistakes, especially in pursuit of time-to-market. Most often, analysts, architects and developers do not think about security as a basis or forget that in a cruel world there are fraudsters and hackers. This is how vulnerabilities are born in the very depths of developed systems.

If you can't control it, take charge

As a security professional, I’ll say: it’s unpleasant to be the last to learn about changes and similar problems. Over the years of work, it has become clear: instead of drawing threat models based on ready-made systems and copying diagrams from sources, it is necessary to obtain diagrams of services that exist only at the level of an idea.

If you help developers in the early stages of developing tasks and formulate security requirements, you can immediately close a large scope of potential problems.

Here's just a small list vulnerabilities that can be prevented at the stage of developing requirements and architecture:

  • SQL injection — vulnerabilities that arise from incorrect processing of input data, which allows an attacker to execute an arbitrary SQL query to the database. It is at the architectural level that you can determine data sources, determine whether they are trusted or not, and perhaps add intermediate adapter services that control incoming data.

  • Business Logic Errors — application logic vulnerabilities that allow you to skip business process steps or call certain API methods bypassing execution conditions. Control of processes and execution conditions must be coordinated to exclude fraud schemes or incorrect method calls.

  • Broken Access Control — attacks in which an attacker can gain access to someone else’s data when accessing by identifier or substituting display parameters. At the requirements level, you can correctly configure accesses and use object identifiers, taking into account possible searches.

  • DoS/DDoS attacks — attacks aimed at denial of service or distributed denial of service to the system. Analysis of the architecture allows you to find bottlenecks or lack of protection mechanisms against targeted attacks.

  • Fraud — fraud, the purpose of which is to obtain personal gain through the abuse of mechanics.

I speak from my own experience – companies often develop future services in the following way:

  1. First, requirements are written by business analysts or product managers.

  2. Then, based on them, system analysts carry out their work and formulate technical requirements and conditions.

  3. After this, the development team begins to implement the service.

Question: where is the concern for safety here?

In addition to security problems, in large teams developing microservices, problems of inconsistency may arise – teams are sometimes too lazy to go to their “neighbors” and agree on changes.

To eliminate cases where services contradict each other or duplicate functionality, you need to have centralized change management to understand the full picture and identify dependencies and integration complexities at an early stage.

At SberMarket, we started with face-to-face meetings on architecture, where we discussed initiatives to improve services. They brought diagrams, requirements for services, business goals. Every week we looked at one or two initiatives. However, it was convenient until the number of services exceeded 100+. Then there were too many people involved and it was not always possible to gather a quorum.

This is how the idea to introduce an asynchronous process came about system design review.

System design review is the process of analyzing and evaluating the architecture of a software product, during which potential flaws are identified, including in the security of the service itself. This stage is critical for the reliability and safety of the software solution, since it allows you to prevent many problems at the design stage. The analysis is carried out by architects, team leads of the systems being changed, and representatives of the security and operation teams.

architects and staff engineers discuss a new service

architects and staff engineers discuss a new service

In SberMarket, system design review is a separate process that processes all new services and major system changes. At the stage of creating a new service or the emergence of ideas for improvement, a document is drawn up in the form of an arch-solution – architecture decision record (ADR for short). It displays and describes in as much detail as possible the changing landscape – in the form of a description of the business process being changed, a list of use-cases and contracts being added or changed. To track the changes themselves, they are created as tasks and are called RFC (request for comments).

The architecture itself is described in the form of a C4 diagram, use-cases are similarly displayed in the form of data flows or events, contracts are described in the form of proto or swagger files. It helps generate services and clients using code generation from contracts.

The structure of the document itself is standardized:

ADR structure

ADR structure

The document is described in a git, changes are proposed for discussion in the form of a Merge Request. This is a very convenient story: you can throw CODEOWNERS and share responsibility. Approvers are distinguished by domain/system/functionality. Likewise, approvers from the information security and platform sides are identified.

It is worth noting that only after receiving all the approvals can the proposed solution be merged

It is worth noting that only after receiving all the approvals can the proposed solution be merged

At the stage of collecting comments, architects, technical leads and security specialists can post questions and suggestions.

Security representatives highlight the very cases that can result in vulnerabilities. In addition, they must set security requirements for new handles and proposals for the correct implementation of authorization for resources. At this stage, you can begin to create full-fledged threat models for new services and specific functionality.

An example of a comment indicating the requirement for file upload validation (in the use-case description):

Typical example of a security question

Typical example of a security question

If the required number of approvals is collected, the revision is considered agreed upon and goes to work – code development begins.

As soon as it is completed, the changes come back for audit. Here the security of the code and compliance with the proposed architecture are already checked.

Key safety aspects that we pay attention to when conducting a system design review:

  • Safety requirements. We make sure that they are clearly defined and comply with current standards and regulations – check the presence of authentication mechanisms, authorization, data encryption and other security measures.

  • Using best practices. We check that the proposed architecture applies the best security practices – HTTPS, data encryption, multi-factor authentication and others.

  • Architecture assessment. We analyze the system architecture for the possibility of vulnerabilities. For example, we check whether it is possible to process untrusted data, leading to SQL injections, XSS attacks, CSRF attacks and other common threats. In addition, you can evaluate the architecture from the point of view of resistance to DoS and DDoS attacks (presence of protection mechanisms and absence of bottlenecks).

  • Risk analysis: We analyze the risks associated with possible system vulnerabilities within the business process and develop a plan on how to eliminate or minimize them.

Creating a secure web application architecture requires a comprehensive approach, and during the review you need to focus on best security practices:

  • Privilege Sharing. The principle of least privilege must be applied at all levels of the application, including database access, session management, and system resources. This means that each application component should have only those rights necessary to perform its functions.

  • Component Isolation. Developing an application with clearly demarcated components helps minimize damage from potential attacks and makes it easier to update and patch vulnerabilities. Services are divided by business processes, and if the services should not interact in any way, then there should be no accessibility between them. Implemented through service mesh or network policies.

  • Authentication and Authorization. Strict authentication and access control policies based on proven standards and frameworks (OAuth, OpenID Connect and JWT) ensure strong user identification and resource access control.

  • Attribute-based access control (ABAC). Allows you to define access rights using granular policies that are based on attributes of users, objects, actions and environment.

  • Encryption. When transmitting data over untrusted networks and infrastructure, it is important to use encryption. Based on the case, you need to choose synchronous or asynchronous encryption algorithms.

  • Working with files. When accepting files from clients, you should be very careful about the downloaded data – check it not only by extension, but also by mime-type and magic bytes.

  • Resource limits. Any resources must have their limitations. This applies to incoming and outgoing traffic, resources consumed by services, memory usage limits, etc.

By following these recommendations, we create a secure web application architecture, minimizing the risks of unauthorized access to data and ensuring reliable system protection.

There is no limit to perfection. What else can you do?

In fact the process works great. From the point of view of the information security team, based on the architecture review, I would like to immediately generate standard and specific threat models – with subsequent conversion into tasks for development teams. This allows you to monitor threats and risks and proactively close any problems.

We have already implemented the generation of current architecture diagrams based on descriptions and downloads from the platform – I wrote about this Kirill Vetchinkin in his article (by the way, she won Technotext 2023 in the category “Preparation of technical documentation” at the Senior level). There are plans to borrow this scheme for the subsequent generation of threat models based on gateway settings, data access, the presence/absence of externally accessible handles and manual usecase processing by Application Security or a risk manager.

Summing up

I would like to return to the examples from the beginning of the article and remind myself and the reader why I decided to write this text in the first place. Products have become complex and require good development in depth. Even if it is an MVP or an experimental product, you need to take the development of the architecture as seriously as possible. It often happens that large systems grow out of unfinished MVPs and inherit a bunch of problems, including security ones.

We have implemented a large process that helps catch 80% of critical security problems, the fixing of which costs minimal effort (no code → no fixes → no waste of developers’ time). I believe that the practice is successful and should be implemented by all companies, especially in the early stages.

SberMarket's tech team manages social networks with news and announcements. If you want to know what's under the hood of high-load e-commerce, follow us on Telegram and on YouTube. And also listen podcast “For tech and these” from our it managers.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *