AppSec Platform for Hundreds of Millions of Lines of Code

Prerequisites

The prerequisites for implementing an AppSec platform (as well as any DevSecOps practices in general) in any company are often similar. For us, these were: active development of VK products and services, expansion of the code base and growth in the number of programming languages, as well as the use of heterogeneous solutions in various company products. All this required security control, but it is impossible to analyze this manually. Therefore, at some point, we decided that it was time to create our own centralized solution that would combine best practices and be scalable to all of the company's services and products.

We started, like many others, with the open platform DefectDojo: we created an MVP on its basis, connecting analyzers and the first assembly pipelines. But, like many others, we had problems with DefectDojo literally “dying” under load. After several attempts to optimize this tool, we realized that solving all the performance issues and implementing the functionality we needed would mean completely rewriting the project. Therefore, having thought over all the solutions, we decided to look for a different way.

To determine the needs for a central tool, we analyzed the size of the code base to be examined. As a result, we counted more than 1.2 billion lines of code and more than 25 thousand repositories in corporate source code repositories, which turned out to be very heterogeneous: this includes a huge spread across the technology stack, and various code storage systems (in-house and Open Source), as well as CI/CD systems (in-house and Open Source).

Having realized the scale, we formed a number of principles for ourselves that we wanted to implement in the new platform:

  • simple integration with the platform of the entire variety of systems and solutions used;

  • independence from a specific CI/CD system and code storage system;

  • the ability to dynamically implement new tools and tune rules in order to harmoniously develop the platform, adapting to new developer requests;

  • convenient UX/UI (but that's not what we're talking about today);

  • speed of work.

Platform architecture

Architecturally, VK Security Gate can be divided into three constituent blocks:

  • The first one is WebUI, which is a web application, the frontend of which VKUI uses(https://github.com/VKCOM/VKUI), and the backend is divided into two parts: user and administrator, and is written in Golang.

  • The second block is the modules responsible for the process of accepting the source code and initializing the scan, as well as for processing the scan report and placing it in the database (Orchestrator, Core, Processor).

  • The third block is the SAST Unit, which is responsible for orchestrating SAST analyzers, performing direct analysis of the source code, and deduplicating their reports into a single scan report for further placement in the database.

It is worth noting separately that the entire platform operates as cloud native (including code analyzers), is managed by Kubernetes and can be easily scaled.

Scanning process

Given our heterogeneous environment, we chose API as the main integration point. This is where the Security Gate platform begins — the Orchestrator module operates on its contour. It is ready to accept source code from anywhere within the company and is available for calling from CI/CD both manually and automatically. But since the vast majority of products use automated CI/CD pipelines for build, we also prepared a universal client, which is a binary file that transfers an archive with source codes for analysis to Security Gate. The client encapsulates all the logic of interaction with the Security Gate API, and all that is required for any CI/CD system is to configure a step containing a call to the Security Gate client by adding a few commands to the build script and forget about it forever. The Security Gate client will do the rest itself: create a new scan instance, remove unnecessary media files from the code base, pack it into an archive, and transfer it to the platform for scanning.

After receiving the archive with the source code and initializing the scan, SASTUnit gets down to business. It starts dynamically selecting tools using a service that analyzes what the transferred project consists of, what programming languages ​​and frameworks are used in it, and selects a set of tools that need to analyze this code base. After analyzing the composition of the project, independent (parallel) scanning is launched by several analyzers at once – that is, for example, three different SAST tools, a dependency analyzer and secret analyzers can be launched to scan the same code, and all of them will scan this code independently, each in its own container.

We chose this solution for several reasons. Firstly, to achieve the goal of the highest quality analysis, since we can use several tools with separate sets of rules, rather than focusing on one. Secondly, it allows us to avoid dependence on one vendor.

Based on the scanning results, we receive several reports for the same code. What to do with them next? Obviously, the scanning results may overlap. To combine the reports, the Resulter module is launched, which combines the reports and deduplicates the results, and also generates a unique fingerprint for each detection, which allows it to be uniquely identified on the platform in the future.

Many tools generate their own detection IDs, which often do not match each other. To unify everything, we do not link tools to each other or try to compare different IDs with each other, but try to correlate each detection found by different tools directly with the code. For this purpose, we form our own ID using an abstract syntax tree (AST). This allows us to unambiguously indicate the place in the code that causes the detection of different analyzers, and this fingerprint is stable, even if the line in which the analyzer detected the vulnerability “moved” lower or higher in the code, as long as it remains in the same function. This method of constructing detection IDs has proven its effectiveness over time and allowed us to successfully group and de-duplicate detections between tools.

Code scanning is complete: we have a single resulting set of detections for the entire project. But what about the data from previous scans, because this scan may not be the first for the project, but the second, tenth, hundredth?

To do this, we store the entire history of results for each project, product, and even branch.

At this level, the second stage of vulnerability deduplication occurs — by branches. A separate platform module — Security Gate Processor, carries out a new stage of detection deduplication, now taking into account the results of previous scans and their analysis by development teams and AppSec engineers (triage). Merging with previous scans is performed in three stages:

  1. Conformity fusion. All previously detected triggers (in previous scans) and present in the final report (current scan) are updated. Code fragments (±15 lines from the trigger location) are also updated, after which a new status is determined. For example, if a previously closed vulnerability reappears, its status will return from “Fixed” to “Not processed”. In this way, we do not create duplicate records in the database, but store one trigger status based on our own identifier. For example, if it was previously marked as a false positive, then after the next scan, it will not be necessary to analyze it again.

  2. New triggers. Here everything is simple: the appearance of a new identifier that is absent in the branch implies the creation of a new record in the database.

  3. Closing triggers. If the vulnerability has been fixed in the code and a rescan confirms this (i.e. its identifier is missing from the new resulting report received from the SAST Unit), then the trigger is assigned the status “Fixed”, since this indicates that either the vulnerable code has been excluded or has been fixed, and the analyzers no longer detect vulnerabilities in it.

Difficulties encountered

Behind all success stories there are difficulties that helped them to achieve them. We share ours – perhaps our experience will be useful when creating similar products.

  1. Exponential growth of load. Of course, we did load tests, but we just didn’t expect that in just a few weeks there would be such a number of responses and new registrations on the platform that we would simply run out of hardware. Increasing resources and some optimizations in orchestration helped us to easily continue what we started, given the scalability of Kubernetes.

  2. Tools drop periodically. If the scan is completed successfully, and one of the tools fails, then the vulnerabilities that it generated earlier will be closed automatically, because they are not in the general report. For this, we have implemented a special mechanism that allows us to survive a tool failure. It is based on adding a new connection to the DBMS between the trigger and the tool. It turned out to be quite cumbersome (we can write another separate article about this ), because a trigger can have a simultaneous connection with several tools. But, having finished working on this mechanism, we can survive the failure of any number of tools without losing triggers in the branches.

  3. Too many branches. We expected that people would come to us with branches “master”, “main”, “stage”, “test”, “dev”, but it turned out that within one project there can be tens and even hundreds of branches (and at the time of writing, thousands of branches). Even with the functionality of migration and inheritance of statuses of previously analyzed triggers, it is possible to inherit statuses between four branches, but between hundreds – almost unrealistic. A new approach to solving the problem was required.

Cross-light triage

When we talk about a development process that operates with a large number of branches (feature branches, release branches, and others), we understand that the differences in the code base between branches can be small in the context of the entire project. Yes, at the very start of development, this may be different, but the older the project, the smaller the percentage of differences in the code base between branches. If we take a module with 10-100 thousand lines of code, then the differences between branches often do not exceed 5% – that is, the main code base remains static. In fact, the results of scanning the main part will also not differ from each other.

And since we have stable AST identifiers of detections that are stable not only within one branch, but also between branches, we decided to make life easier for all users of the Security Gate platform and perform the last, final merge of detections – a merger between all branches. A new entity is created directly in the data model – a cross-branch detection, collected based on the results of scanning across all branches of the project.

Let's see what this leads to in practice. Let's say we have a project connected to the Security Gate platform and containing 10 different branches. 8 out of 10 have a common code section in which SAST analyzers have detected a potential vulnerability (let's assume it is a false positive). By forming a single cross-branch record in the data model, we only need to deal with this triggering once and determine its status. Even if the project has 10 more branches containing this false positive, we will not have to spend resources on analyzing it again – the status will be automatically inherited.

We consider this solution to be the pearl of our platform, as it allowed us to dramatically reduce the number of detections for analysis. If we dive into the numbers, over 7 months of the platform’s operation, we had approximately 180 million SAST analyzer detections in raw reports from the tools, including scanning of all projects connected to the platform. After implementing deduplication between analyzers (i.e. creating a general scan report), the figure dropped to approximately 150 million unique detections. The next stage — deduplication with data from previous branch scans — reduced this figure to 26 million detections. And finally, after implementing cross-branch processing, there were only 770 thousand unique detections left. Of course, this is a huge reduction in the load on everyone — both Application Security engineers and developers.

It is worth noting here that we went even further, and the final figure became even smaller, but we will talk about this in the following articles.

This is a short story about our VK Security Gate platform. If you are interested in learning more about how we conduct static analysis and develop the platform, as well as specific aspects of our platform functioning, write in the comments, and we will cover these topics in future articles.

P.S.: This material was written based on my speech at the VK JT conference. The recording of my talk can be viewed at link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *