Parsing binary formats. Part 1


Article author: Alexander Kolesnikov

Virus Analyst

One of the most difficult and interesting tasks that a reverse engineer faces is the analysis of binary files. Often this can be just a well-known format, and interest from the point of view of analysis appears here at the moment when you need to write your own viewer and / or make the file structure more visual. In this article, we will try to analyze several binary files.

Basic concepts for beginners

Any piece of information in a computer is written in binary format, so in order to save something useful, you need to convert the information into this very form. To understand what exactly is hidden in binary files, it is critical:

  • do not be afraid to read a large number of numbers.

  • understand why and how the numbers inside the file are structured.

The basic unit of data representation, which is usually used in order to at least approximately demonstrate information in a “raw” form, are bytes. It might look like this:

It is almost impossible to read such information just like that, however, special rules were invented, which data should be in which position in the file. That is, they are still just zeros and ones, they were simply arranged so that it was easier to read.

What rules are used to describe a file? These are file format specifications, they can be found on the net, usually the file format is developed for the OS, or for specific software. In both cases, the software developers’ resources must contain the specification. Let’s try to find a specification for the PE file format that is used to run applications in Windows. Here Here lies this same specification. Usually this is a very massive writing, where for each parameter there are at least general comments. What can or should be saved, but these rules are not mandatory, since the software developer himself can use individual parameters in the files for his own tasks. Therefore, it is very often possible to find studies of common file formats on the net, but some feature will be described there that was not recorded in the specification.

It is also worth noting that even the developers themselves, for a simpler perception of the binary file format, can present it in the form of a table. This is just a general idea that allows you to add a visual component or some kind of abstraction to make it easier to create parsers to parse such files. By the way here Here You can find a schematic representation of a PE file from MS itself when it is stored on disk and when it enters RAM.

Let’s try to look at the tools for quickly viewing the files that are available.

Tools for viewing structure and data

A universal way to view binary files is to open the files with a hex editor. Some of the most popular in this area are:

  • 010 Hex Editor

  • Hiew

  • WinHex Editor

These tools can be used in two cases. The first one is if, in order to write your own parser, you need to understand what exactly is written in the file. The second case is that some applications have file markup functions. Let’s just try to use the second option.

Let’s take a test here this file. Let’s open it in Hiew:

This application allows you to view raw data, but also contains a disassembler, so if we look at a binary file that contains commands for the processor, it is possible to see them. To do this, just press the Enter button.

In general, the procedure for reviewing the file is completed, but these are far from all the possibilities. Hiew also supports plugins, so you can add functionality. There are actually a lot of plugins, but for our article the most interesting is this. This is a plugin that was built on the basis of a framework for parsing file formats and network protocols KaiTai Struct. What can this plugin do? Let’s look at the file through it. You do not need to make additional settings, just copy the hem file from the repository to the directory with hiew and open the file. To make sure everything works as it should, press the F11 button. A list of plugins should appear, which is shown in the picture below:

And now just select the plugin and see what it shows. As you can see, the plugin has a whole list of prepared file formats that can be studied in terms of file structure. It works like this – KaiTai allows you to write the file format as the developer sees it, and then he independently highlights the individual parts that can be signed or converted for reading. It might look like this for an ELF executable:

Now, when hovering over a certain file zone, its name will be displayed. This is convenient when you need to understand what part of the file can now be read by the application or just to understand what is written to the file.

Let’s try on other formats:

PNG:

PE:

By the way, something similar can be done for Hex 010, but there you can use the so-called templates to describe the file format, they are written using the C programming language. You can say that this is just a header file that defines the file structure, and Hex 010 fills it data. It looks like this:

In this way, you can outline the structure for a future application that can view individual characteristics of a file or try to read and display it. In the next article, we will try to deal with network protocols and methods for parsing them.

And now I invite you to free lessonwithin which we will consider the ways in which you can intercept the function API.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *