Reverse engineering GDB to work with Pwndbg

GDB's functionality is significantly reduced when dealing with files from which debug symbols have been removed (the result is so-called “stripped binaries”). Functions and variable names turn into meaningless addresses. To set breakpoints, we have to track down the addresses of the functions we need from an external source. We also have to output structured values ​​to the console and then pore over the memory dump, trying to isolate exactly where the field boundaries lie.

That's why this summer, while working at Trail of Bits, I expanded Pwndbg — plugin for GDB. It is supported by my mentor Dominic Czarnota. I added two features to the tool, thanks to which practical debugging of stripped-down binaries comes closer to similar work, familiar to us from working with the debugger in the IDE. Now Pwndbg has integrated the Binary Ninja tool, which allows you to better understand the specifics of GDB+Pwndbg, and also dump Go structures, so that debugging Go binaries has become more convenient.

Integration with Binary Ninja

To better understand the interaction between GDB+Pwndbg during debugging, I combined Pwndbg with Binary Ninja. This is a popular decompiler with a feature-rich scripting API. To do this, I set up an XML-RPC server inside Binary Ninja, and then started sending requests to it from Pwndbg. This gives Pwndbg access to Binary Ninja's database of analytical information. This information is used to synchronize symbols, function signatures, variable offsets on the stack, and much more. So from a practical point of view, debugging becomes much more familiar.

Fig. 1: Pwndbg displays symbols and argument names synchronized in a stripped-down binary using data from Binary Ninja

Fig. 1: Pwndbg displays symbols and argument names synchronized in a stripped-down binary using data from Binary Ninja

For decompilation, I did not serialize the tokens into text, but pulled them from Binary Ninja. This way we can decompile with detailed syntax highlighting and configure it to use any of 3 Levels of Intermediate Language Used in Binary Ninja. Decompilation is demonstrated directly in the context of Pwndbg. The line currently being processed is highlighted – just like in the assembly representation.

Fig. 2: Decompilation information pulled from Binary Ninja and displayed in Pwndbg

Fig. 2: Decompilation information pulled from Binary Ninja and displayed in Pwndbg

I also implemented a feature that allows you to display the current program counter register (PC counter) as an arrow inside Binary Ninja. Another feature of mine allows you to set breakpoints directly from Binary Ninja, so that you don't have to switch between Binary Ninja and Pwndbg so much when working.

Fig. 3: Binary Ninja displays icons of the current PC and checkpoints

Fig. 3: Binary Ninja displays icons of the current PC and checkpoints

The most non-trivial part of the integration work is to synchronize the names of stack variables. Whenever a stack address appears in Pwndbg, for example in the register view, stack view, or function argument preview, the integration engine checks whether Binary Ninja has such a named stack variable. If so, the corresponding label is displayed. Even parent stack frames will be checked to ensure that variables from the caller are also labeled correctly.

Fig. 4: This is how the stack variable label is displayed

Fig. 4: This is how the stack variable label is displayed

The biggest challenge in implementing this feature was that in Binary Ninja, stack variables are reported only as offsets relative to the base stack frame, so you also have to figure out the base stack frame and calculate absolute addresses from it. Most architectures, such as x86, have a stack pointer register that points to the base frame. However, most architectures, including x86, the stack frame pointer is actually not requiredso compilers are free to use it like any other register.

Fortunately, Binary Ninja has constant propagation, so you can check whether the registers have a predictable offset from the base frame. So my implementation first checks whether the pointer is actually pointing to the base frame. If not, then checks whether the stack pointer has advanced as far as expected (which is usually true with modern compilers). Otherwise, it moves on to checking all the other general-purpose registers, trying to find a consistent offset. Strictly speaking, this approach may sometimes fail, but in practice it almost never fails.

Debugging Go

There is a common pain point when debugging executables compiled from languages ​​other than C (and sometimes even from C): the layout of these files in memory is usually too complex, making it difficult to dump values. A relatively simple example is a dump slice in Go. In this case, one command should output the pointer and the length of the slice, and another should check its contents. On the other hand, when dumping a dictionary, even a small dictionary may require more than ten commands, and large dictionaries may require hundreds of commands. For a human, such a task is completely impossible.

That's why I created the go-dump command. For reference, Go compiler source codeI implemented dump output for all of Go's built-in types, including integers, complex numbers, strings, pointers, slices, arrays, and dictionaries. The built-in types retain exactly the same notation as Go, so you don't need to learn any new syntax – you'll be able to use the command correctly anyway.

Fig. 5: Dumping a simple dictionary using the go-dump command

Fig. 5: Dumping a simple dictionary using the go-dump command

The go-dump command also parses and dumps any nested types, so that a single command is enough to output information on any type.

Figure 6: Dumping a more complex slice containing dictionary types using the go-dump command

Figure 6: Dumping a more complex slice containing dictionary types using the go-dump command

Parsing Go types at runtime

While the Go-specific approach to dumping is much nicer than dumping memory manually, there are still some things that are awkward to do. You need to know the full type of the value you are dumping, and determining that type can sometimes be difficult, so you have to guess one way or another. This is especially true if you are working with structures that contain many nested structures as fields. Even if you can infer the full type, there are some things that are still impossible to figure out because they do not affect compilation. This applies, for example, to the names of structure fields and the names of user-defined types.

Conveniently, the Go compiler produces a runtime object for each type used in the program (should be used with reflect package). Such an object contains information about the layout of structures of arbitrary nesting, type names, size, alignment, etc. Such objects corresponding to types can also be compared with the values ​​of these types, since the interface value stores not only a pointer to the data, but also a pointer to the type object. Thus, when allocating objects on the heap, a reference to the type of such an object is passed to the function that allocates this object (usually runtime.newobject).

I wrote a parser that can recursively extract this information to handle arbitrarily nested type information. This tool is provided by the go-type command, which will display information about the type in effect at runtime, given its address. For structures, this information includes the type, name, and offset of each field.

Fig. 7: Exploring the type of structure consisting of an integer and a string

Fig. 7: Exploring the type of structure consisting of an integer and a string

This opens up two ways to dump values. The first, simpler one, only works with interface values, since the type pointer is stored along with the data pointer, making their extraction easy to automate. They can be dumped by using the Go type any to denote empty interfaces (those that have no methods), and the type interface to denote non-empty interfaces. When dumping a command, its type will be automatically extracted and parsed, so the dump will be seamless, and you won't need to enter any type information.

Fig. 8: Dump of interface value without any type information

Fig. 8: Dump of interface value without any type information

The second method works with all values. However, to use it, you need to find and set a pointer to the type of a particular value. Often, this is not difficult at all – just look at what pointer was passed to the function that allocated the value. But when working with global variables or variables whose allocation operation is difficult to find, you sometimes need to guess a little to figure out the type. However, this method is usually still easier than trying to figure out the type layout manually. It can also dump even the most complex types. I tested this method on several large structured types in a stripped-down build of the Go compiler, which is one of the largest and most complex open source Go codebases. The dump went through without any problems.

Figure 9: A dump of a complex structure in the Go compiler. Here, only the address of the type is specified, and the -p flag is used for neat printing.

Figure 9: A dump of a complex structure in the Go compiler. Here, only the address of the type is specified, and the -p flag is used for neat printing.

Summary and Prospects

This summer I was able to improve Pwndbg so that it now integrates with Binary Ninja and provides access to detailed debug information. I also added the go-dump command to dump Go values. These features are already present in the Pwndbg development branch and in the latest version of the tool (2024.08.29).

I guess the work to improve the debugging process doesn't end here. I implemented the integration with Binary Ninja in a modular way, so that it would be easy to add support for other decompilers in the future. I think it would be interesting to add full support Ghidra (currently, only decompilation is synchronized during integration), since Ghidra is a free and open source decompiler. Anyone can use it.

As for debugging Go, there is room for improvement in making goroutines more visible and easier to work with. This is currently where Go excels. Delve debugger (specialized for working with Go), which compares favorably with GDB/Pwndbg. For example, Delve can print a list of all goroutines and the command that created them. It also has a command to switch between goroutines.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *