SCA in the language of security

Recently, we have several enthusiasts within the company who are interested in DevSecOps for completely different reasons. Someone was asked an uncomfortable question by a customer, someone decided to take the path of self-development, and someone close to them is AppSec, and they want to be able to hold a conversation about work.

In this article, I would like to tell you in simple “IS” language about the core of the secure development process for those who encounter it for the first time or have heard something somewhere, but cannot tie it into a coherent picture. Because in fact, this is the closest thing to what you or your colleagues already know from your daily work, even if you have never worked in development and have never touched the source code.

Now it will be possible to try to speak the same language with those customers who, in addition to wanting to build their own information security processes, are also full-fledged developers of their own software. Let's go!

What is SCA?

What is behind this abbreviation and how is it adapted in our country? In the original, it is Software composition analysis, most often SCA sounds like Compositional code analysis. Rarely do they say composite, which is not entirely correct. And there is also an adaptation from a vendor of domestic SCA – component. The latter, it seems to me, reflects the essence to the greatest extent.

What's the point? First of all, SCA is a process like asset management or incident investigation. Not a one-time action, but a constantly repeating check aimed at improving the security level of a product or application.

During this process, the product is decomposed to determine dependencies, licenses, and vulnerabilities, as well as possible license compatibility (for example, if you have drawn a dependency on the GPL, now you supposedly need to open the source code of the product) of the components used. To put it simply, the result of regular checks will be the highlighting of weak points both in the source code and in the OpenSource project drawn in, and as a result, the owner receives knowledge about the coverage of components by checks: artifacts necessary for making further decisions on working with the identified risks.

Who is this for and why?

There are three reasons why it is necessary to implement the SCA process in a company (of course, provided that the company is a software developer):

· Licensing – in some cases, it is necessary to establish whether a product of your own development can be sold at all if it contains, for example, a “not for commercial use” component.

· Certification – the regulator regulates the requirements for the security of the source code; without checking these requirements, it cannot be said that your software can be taken “at its word”.

· Security – Even if you don’t sell your product and don’t want to get certified, it will be unpleasant, to say the least, if your brainchild gets hacked or if your application becomes the reason for your users getting hacked.

The application can be anything – be it mobile applications, serious ISS products and even games. Everything that is developed by you, the customer or even your pet project – by and large, all this requires verification and timely response.

SCA as a process

Source code and binary builds

Let's start with what humans work with when it comes to SCA. It's pretty simple – source code, binaries. Anything written by humans (or by Copilot) is subject to checking for vulnerable dependencies.

And in general, the result can be the dependency graph itself – to understand what percentage of closed and open code is in the product, how many vulnerable points there will be in the future assembly. Some specialists immediately start from the attack surface – the ability of a researcher or hacker to hack the product and cause harm. In order to determine the strategy and first steps, you need to understand how to set priorities.

How to choose what to check?

In fact, there are several opinions here. Both in the SDL community and at various public presentations, the picture is not entirely clear. Rather, it turns out that there is some generally accepted template of actions and there are a number of factors due to which the strategy can be adjusted. But if the process is being implemented for the first time, it is better to start with the obvious and simple.

The source code may contain OpenSource libraries and some proprietary ones in terms of dependencies. The latter are more difficult to check, instead of checking the code directly, you will have to trust the vendor or vulnerability knowledge bases, essentially getting a “pig in a poke” without the ability to confirm that the vulnerabilities found are all that await in the vendor's closed code.

Therefore, priority is often given to open source software. There is also a second reason – an attacker will know about it and, if he wants to hack your product, he will start with known vulnerabilities of open libraries.

The second step of verification is most often referred to as container builds.

If all these steps have already been completed and the process is established, then you can afford to spend time checking the IDE software in which the development works, checking the OS on which the product will be installed – there is a chance that your product may be very good and safe, but it will be possible to disrupt the operation of your game or system by hacking the OS itself.

Dependencies. Important nuances

Once the scope of work is defined, the collection of information about dependencies begins. All imports, all additional libraries are searched for in the source code in various ways.

What are the methods? If we list them from the least automated to the most automated, then these are:

· maximum manual work with an attempt to launch the product on a “bare” system and manual search for dependency packages;

· manual search and punching through knowledge bases;

· use of OpenSource or vendor products.

Where can we look for dependencies?

· the source code itself, for example, the “requirements.txt” file for Python;

· regular expressions for keywords import, scope, etc.

· project repositories – for the above;

· assembly of finished products;

· containers and virtual machines with the finished product.

The main question is how to hook all the dependencies? And how not to destroy your work with your own hands. For example, a useful piece of advice from experts is to always record the versions of libraries and their dependencies, since the result of SCA analysis will lose its relevance if you do not manage updates and do not understand which versions of libraries are used in the product at a particular point in time.

It is worth noting here that the above process is very similar to the already familiar process of searching for and managing vulnerabilities, with the only difference being that here we have dependencies and libraries, and there we have software. These processes will be similar in the future, when we move on to mitigation and corrections.

Just as a vulnerability scanner checks files and their contents to find the names and versions of software on a server or workstation, an SCA analysis tool checks code and looks for library names and versions. Where does this information go next?

SBOM and other booms

Let's get acquainted: SBOM (Software Bill of Materials) is a machine-readable format that lists libraries and their versions. Such a document can be given to researchers who will either search for known vulnerabilities in this software or test a specific version manually.

Dependency information is also located here, so that when a vulnerability is discovered in one of the components, it will be possible to go back a step and understand how exactly this component gets into the product.

Using such a document, you can build a supposed attack surface and understand where the weak points are in the final product.

There is also MLBOM for AI models, but that's just for general development.

What else are we collecting?

In addition to what we can find in open source code, by looking at the analysis results of other researchers, we can test both open and closed source code for the same purpose. Often, the results of static and dynamic code analysis go to a common vulnerability database of a specific product for subsequent processing.

Once dependencies are installed and vulnerabilities are discovered, they are sent to systems that automate the SCA process. Such systems allow you to set up the necessary risk tracking policies – update and development freezes, direct component updates, and notifications to those responsible for the process and patching.

We found vulnerabilities. What next?

It's the same analogy with the vulnerability management process – just like in your regular work – you need to get acquainted with the risks and decide what to do next.

Whose responsibility is this?

Most often, such problems are solved by the developers themselves with the help of colleagues from AppSec, who will help choose the right strategy.

Opinions differ on how to properly eliminate the vulnerabilities found, prioritize them, and conduct an initial triage, as well as at the stage of selecting the scope of testing. What unites them is that experts start from the types of dependencies found in the product.

Types of dependencies

Dependencies can be direct and indirect (they are also called directive and transitive). What does this mean?

For example, you have a python code that implements some API requests, and you decide to use the requests library. This is a directive dependency. It, in turn, pulls the urllib library as one of its dependencies – this is already a transitive dependency from the code's point of view.

Transitive dependencies have a higher chance of having vulnerabilities found in them. This may be because large projects are more frequently contributed to, tested more often, and as a result, more vulnerabilities are found and fixed. Or, in fact, it may simply be statistical: there are several transitive dependencies for every directive dependency, so the probability of finding vulnerabilities is higher.

But transitive dependencies are also more likely to come to the surface of attack, so some experts recommend starting with them.

There is an opposing view that one should start with directive dependencies, since this will give the developer more opportunities to influence the elimination process.

Interception plan

What options does a developer have? First of all, it is necessary to confirm the relevance of the vulnerability for the product – we say that yes, someone will be able to carry out such an attack and everything will break. This is for directive dependencies.

For transitive dependencies, everything is a bit more complicated – we need to build a trace – a chain of calls in our application. This is very similar to the attack killchain and the intruder route from the incident investigation process. It happens that traces confirm only part of the indirect dependencies, and the amount of work to eliminate them is significantly reduced. However, when we talk about transitive dependencies, it can be difficult to fix them in your code; for example, you created an issue, patched the vulnerability found, but your directive-dependent library has not been updated and works with an old vulnerable version – it is not possible to patch the entire chain. In this case, experts suggest going the other way and working with calls to vulnerable methods in the code itself, validating the call and setting up a secure wrapper there to protect data.

There are several popular strategies for dealing with discovered vulnerabilities:

· Update – checking the next version of an application or library with subsequent update. It should be noted here that update policies can be completely different – someone prefers to sit “several versions” behind the latest one, because often the latest version seems safer only because something critical has not yet been discovered there. Someone's motivation is the continuity of the application – a new version can mean both adding and removing functionality that was used in the product. The “just update everything” strategy, unfortunately, is not a panacea.

· Migration. It happens, for example, that you have pulled a library into your product with some component with many dependencies just for the sake of one function. Then you should weigh the risks of the application being vulnerable on the one hand, and the fact that you will have to spend time and resources on your own development or on replacing one library with another.

· Mitigation. There is another way to protect code. Many companies, either without waiting for a patch from the library author, or without the ability to switch to a new version, patch themselves and migrate this legacy from product to product. There are more convenient ways – for example, wrap the product in an information security system that will deal with all unwanted calls and protect sensitive data.

· Freeze everything. This applies to both updating and development. Most often, this strategy is used when it comes to a system-forming component of the system. There have been cases when products were not released to the market and the implementation dates for customers were postponed until the vulnerability was eliminated in one way or another.

The most important thing

The main thing I would like to emphasize in the end for memorization is that SCA is a regular process. It should evolve along with the product, increasing its security level. As soon as the next stage is completed, you can move on to the next one, do not be afraid of updates and the seemingly frighteningly large graph of dependencies – everything is fixable.

It is important to be aware of what the application uses and what it consists of. Only through a properly implemented and structured secure development process, the basis of which is component analysis, can you make something truly secure.

SCA in the language of security

What is SCA?

Who is this for and why?