Binary Coverage for Reverse Engeneering
Code coverage is a procedure that helps researchers understand how many pieces of an application’s algorithm are involved in processing data. Typically, this procedure is used to find vulnerabilities in software. In this article, we will see in practice how you can use this tool to simplify the procedure for examining application code.
Tool set and stand setup
In order to set up a stand and select a set of tools for analyzing an application, you need to figure out how the code coverage determination procedure works. Code coverage is usually determined by the number of lines of source code involved in the process of the application working on the supplied data. This data can be represented by a file, a piece of RAM, or just a network packet: it is selected based on the algorithm of the application. Here’s an example of code research you can google in a few seconds:
It’s simple – parts of the code can be highlighted only if this part of the algorithm was executed during data processing. Sometimes you can find a statistical description with the number of approaches to line execution, but this works if we are talking about investigating coverage with source code. In this article, we will concentrate on determining code coverage without having the application source code.
What is the approach to researching the application in this case? It is necessary to determine the base unit that will describe the fragment of the algorithm, and then, using the conditional breakpoints of the debugger, calculate how many times the algorithm has passed through these base units. As a result, you can see the code coverage. Very often, coverage is shown in the quantitative ratio of all units of the algorithm relative to those that worked in one application launch. For example, we will give a picture of such studies below:
In the picture, the number of basic blocks from dll libraries of Windows OS is used as base units. You can also take as base units:
disassembled listing string
the basic block of the algorithm, usually these are lines of a disassembled listing between conditional transitions (jumps)
blocks between instructions of function calls – call
For simplicity, we will use the research approach in this article using the following basic unit – disassembled code between conditional jumps. Now is the time to decide on the tools.
Description of the stand and set of tools
For application research we will use:
IDA Pro as a disassembled code navigator
Project DinamoRIO as a code coverage building tool
IDA Pro plugin LightHouse as a code coverage visualization tool for algorithm research
Windows 10 virtual machine
Virtual Box as a virtualization environment
The research plan will be very simple. We take a random application from the CrackMe set. We will use this resource… In order not to figure out how to pass data to the application, it is advisable to choose CrackMe, which take parameters from the command line – this will make it easier to run the tools for code coverage. In general, this is not a limitation, if you wish, you can write a wrapper and simply pass parameters to it. But since we want to test the performance of our research approach, we will use the least labor-intensive approach. As a test sample, take this this is attachment.
When installing the tools, there shouldn’t be any special difficulties. Problems may appear when configuring the plug-in for IDA Pro. To install it, you need to do the following:
Open the file under investigation in IDA Pro
In the IDAPython line, type the command: idaapi.get_user_idadir ()
Copy the resulting path to the explorer, if there is no plugins directory in the resulting directory, create it
copy the contents of the plugins repository directory to the directory created in the previous step
restart IDA Pro
After successful installation, an additional option should appear in the File menu:
Now you can start collecting data about the application. To collect code coverage information, we will use the drrun.exe tool with the drcov plugin. The resulting command line will look like this:
drrun -t drcov -logdir ./ -- KeygenMe.exe
You need to choose the drrun tool in accordance with the bitness of the file being examined, since the release version of DinamoRIO has several versions of the application. In our case, we take the tool from bin32. As a result, a file with the “.proc” extension will be created in the directory. This is the code coverage information file. It needs to be loaded into IDA Pro through the option that appeared after installing the plugin. The result is shown below in the picture:
The color scheme shows which parts of the code worked so that the algorithm could process the supplied data.
By the way, if you switch to decompilation mode, the color scheme is preserved. Now you can clearly see what conditions have already worked out for us and what conditions have not been launched yet.
By using code coverage techniques, you can speed up the process of exploring your application. The profiling data collected helps to explore the algorithm more fully and shows how the algorithm works visually.
The article was prepared by OTUS expert – Alexander Kolesnikov on the eve of the start of the course Reverse-Engineering. Professional…
We invite everyone to take part in a free two-day intensive course on the topic: “Writing a process dumper”
Day 1: writing a PE analyzer, analyzing headers, parsing sections
Day 2: Methods for restoring the import table. Search for loaded libraries, restore