Why look for an attack surface for your project

Any software systems include necessary and not so necessary packages. It turns out to be a huge amount of code (for one simple site npm list -a gives a list of 4256 dependencies). And since “all code is your code”, such dependencies need to be tested. And the regulator requires it, and you just want to protect your own products from intrusions, leaks and other troubles.

What's inside semver? Can it cause problems?

But what exactly should be analyzed? For example, if the list includes a Python interpreter, you can test it. There is a lot of code, and the test coverage will increase well. Will such testing bring real benefit? Can an attacker force Python to execute an arbitrary script? Probably not. This means that not all of the interpreter code needs to be tested.

To avoid having to deal with the entire code base, you need to identify the attack surface. The attack surface is the potentially vulnerable functions and modules that handle user input or sensitive data, as well as the interfaces through which this data comes. The attack surface is usually found using static and dynamic code analysis tools, or with the help of a very smart expert.

If you don't analyze the attack surface in advance and start looking for bugs right away, you'll have to study static analyzer warnings for the entire code base (and there will be many thousands of them). Or fuzz all the functions. The situation will be like those gendarmes who searched Lenin's library for banned literature. If they knew the right shelf, they could have found something. Instead, they just got tired.

The gendarmes look for vulnerabilities without first defining the attack surface.

It turns out that it is impossible to find the exact attack surface.

But it all started so well. Why is it impossible? It's all about the fundamental limitations of analysis methods, static and dynamic.

Static analysis is the study of a program without running it. By source or binary code. And here a problem immediately appears – the input data of the program is unknown until it is run. And not only the entire program, but also individual functions.

Here's a contrived example: loading a dynamic library with a name that is formed using input data.

sprintf(name, “prefix_%s_suffix.so“, input);
h = dlopen(name, RTLD_NOW);

It turns out that static analysis will not be able to determine this dependency. But wait. If you use dynamic analysis, in which the program is launched, you can get precise information about its behavior, right?

It turns out that this is not without its shortcomings. Firstly, not all code branches will be executed.

if (rand() == 42)
    recvfrom(sfd, buf, BUF_SIZE, 0, &addr, &addrlen);

Even if you come up with many different tests to cover all branches of the code, they may still work differently, as in the previous example with loading a dynamic library, since the program is not limited to its own code.

Secondly, data flow analysis is very important for finding the attack surface (which, by the way, can be done with static analysis). This way, you can determine which function calls, outgoing network packets (and anything else) depend on the input data. That is, what the attacker can influence. Sometimes this can be reliably determined, sometimes not. Here's an example:

x = read();
y = x + 3;
z = y – x;

Here, it seems like the input data is used to calculate the variable z. But if you look closely, the variable will always be equal to 3.

Fortunately, only malware intentionally obfuscates the execution flow or data flow, so for our own programs that we test, the attack surface will be more or less plausible.

Interfaces

The most obvious part of the attack surface are interfaces, that is, the ways in which a program (or system) interacts with the outside world. Obvious because they are the easiest to detect. For example, files that are opened are easy to detect using straceand open network ports using nmap.

Another popular type of interface is the web API. The program accepts external requests via HTTP(s) and sends responses. Entry points and their parameters (strings passed in the packet like /user/add?name=ada) are recognized by the web server and lead to the activation of the required functions.

To find all these entry points, there are many scanners like Pentest tools, BurpSuite, Nikto.

Pentest tools found entry point vulnerable to SQL injection

Tracking file operations or program launches is a bit more difficult, vulnerabilities will not be immediately visible. You can track a single application using strace. Get system call logs and then look for suspicious operations there.

If we are talking about the whole system, then we can take auditdIt's almost like straceonly it tracks all programs, and can also immediately report on access to incorrect files or programs. However, you will have to come up with the rules for finding such accesses yourself (or search on the Internet).

Attack Surface Analyzer by Microsoft also looks for potential entry points. It looks at changes in the system that occurred after the program was installed. For example, if a file that is not related to the installed application has an eXecutable flag, then this is a reason to think.

Full list of tracked entities

It turns out that finding interfaces for an application is not that difficult (if they do not appear and disappear over time). Since some of them will be part of the attack surface, something needs to be done with them. For example, fuzzing. But if you fuzz a web API head-on, it will be difficult to achieve a result (in the form of found bugs). After all, a user session can have a state that changes depending on the requests received, for example, after login.

It turns out that the input sequence for the program being tested must be the entire session. Moreover, the correctness of its structure must also be maintained. Because of this, the probability of detecting some small internal error is greatly reduced.

In practice, fuzzing is performed by looking for specific functions or subsystems, where generated data is immediately fed during testing without unnecessary binding in the form of sessions. And here the main problem arises – how to find those very functions that lie on the attack surface. That is, those that process data coming from the external interface.

Manual attack surface analysis

To find the code responsible for the operation of the desired interface, you can dig into the program manually. This is not always easy. For example, in the repository Keycloak — 34 megabytes of source code. It's good that they are in Java, there will be no pointers to voidHowever, it will take a lot of time to figure out how the application works.

If we have already used straceas suggested in the previous section, you can open the source code of the program and find which functions process this file. And then, of course, look for errors in them.

But the most common way to study the code is still a debugger. We find where the input operations are, and then either execute the program step by step, or use watchpoint to track where the access to the received data was.

Patience is always required for successful debugging.

The method is good, but labor-intensive. In addition, if the data is copied to different places, as when parsing a complex data structure, it will take a long time to sort it out. And if you need to test the attack surface for each release (suddenly new leaks appeared), you will have to recheck it manually. However, when you just need to poke around in a specific version of a specific program once, a debugger is quite suitable.

Attack Surface Definition Tool

A debugger is already a dynamic analysis tool. The program starts and runs, and we monitor it. To make the work of an analyst or tester more productive, automation would not hurt. Especially when it comes to autotesting the attack surface.

This is where the tool comes in handy Natch. It is designed specifically to determine the attack surface. By tracking program input, Natch determines where that data went and how it was used. For example, a network packet goes through a web server, then hits a Python backend, and then part of the packet is written to a database.

A large number of programs exchange large amounts of data.

To restore these connections, the technology of tracking marked data is used. All programs are launched in a virtual machine. For the processed data, shadow memory is allocated inside the emulator, which stores marks. The presence of a mark means that this data is interesting to the user. For example, if they came from the network, and he wants to track their processing.

Then each processor instruction of data transfer will copy the notes too (we are in the emulator, we can instrument the code as we want). As a result, no matter what function or process this data reaches, Natch will know about it (and draw a beautiful picture about it).

By tracking data flows, an analyst can assess the presence of unnecessary interfaces in a program, since Natch sees everything that happens in the system. And if a program opens a file /etc/passwdaccesses “telemetry” servers, launches background processes, this can be detected and fixed.

Tree of processes that processed the labeled data.

List of files and sockets of a separate program. You can look at the data exchange in detail for each of them.

But you can go deeper — consider not interfaces, but code. You can't get such a result without dynamic analysis — see the entire tree of function calls responsible for processing labeled data. This may be untrusted data, then we are talking about whether the tree contains the required validation functions.

Or, if sensitive data was marked, then it is worth looking for whether it is being processed where it should not be. For example, in borrowed code, which is not yet clear how it is arranged. And then this code should either be tested or deleted.

Natch analyzes not only native applications. It also supports analysis of Python and Java programs, for which you can also view the call tree.

The last two reports help not only to see the attack surface, but also to select targets for fuzzing. After all, if an untrusted user can “send” data to an internal function of the program, then it should be tested more thoroughly than everything else. Without testing, very unexpected bugs emerge, which can at least lead to a DOS attack.

Conclusion

If you can cover 100% of your code with tests, that's very good. You don't need to look for an attack surface, you can relax. The main thing is not to forget about the borrowed code either, because it is executed on par with the code you wrote yourself.