Unpacking executable files

Hello, Khabrovites. Within the course “Reverse-Engineering. Basic” Alexander Kolesnikov (specialist in complex protection of informatization objects) prepared an author’s article.

We also invite everyone to an open webinar on the topic “Exploiting driver vulnerabilities. Part 1″… Participants of the webinar, together with an expert, will analyze driver overflow vulnerabilities and the specifics of developing exploits in kernel mode.


This article will discuss approaches to analyzing packed executable files using simple reverse engineering tools. We will discuss some of the packers that are used to pack executable files. All examples will be conducted in Windows OS, however, the studied approaches can be easily ported to any OS.

OS toolkit and setup

For tests, we will use a virtual machine running Windows. The toolkit will contain the following applications:

  • x64dbg debugger;

  • x64dbg Scylla plugin installed by default;

  • hiew Demo;

The fastest and easiest way to unpack any executable file is to use a debugger. But since we will also consider the Python programming language, we may need a project:

General methods of unpacking

Let’s figure out what a packing is. In most cases, the executable files of modern programming languages ​​are quite large with a minimum set of functions. Packing or shrinking can be used to optimize this value. The most common packer today is UPX… Below is an example of how the packer compresses an executable file.

In the picture, it may seem that the file has become larger in size, but this is not always the case. Most of the files, due to this modification, can reduce their size up to 1.5 times the original size.

What does this reverse engineer do? Why know and be able to determine that the file is packed? Let me give you an illustrative example. Below is a snapshot of the file that is not packed:

And the file that was passed through the UPX algorithm:

In this case, the changes affected two main points of the executable file:

  1. Entry point – in the case of a packed file, this is the beginning of the unpacking algorithm, the real program algorithm will work only after the original file has been unpacked;

  2. Original file code: now you cannot find patterns that can be immediately parsed as commands.

So, to analyze the original file again, you need to find the real or original entry point. To do this, you need to break down the algorithm into main stages:

  1. The stage of preparing the file execution – the OS loader sets up the environment, loads the file into RAM;

  2. Saving context – the packer saves the file execution context (a set of values ​​of general-purpose registers that were set by the OS loader);

  3. Unpacking the original file;

  4. Transfer of control to the original file.

All the stages described above can be easily traced in the debugger. The procedure for preserving context can be particularly prominent. For it in different architectures the commands can be used pushad/popad or multiple use of the command push… Therefore, the application is always traced to the first change of the ESP / RSP register, and the “Hardware Breakpoint” is set to the address that was placed in the register for the first time. The second call to this address will be when the context is restored, which was filled by the OS loader. Without it, the application will fail.

UPX example

Let’s try using the debugger to find the original entry point for the application. Let’s capture the original entry point before packing UPX:

What the same entry point looks like after packaging:

Let’s start the debugger and try to find the place to save the context:

We are waiting for the first use of ESP – in the debugger, the register value will be highlighted in red. Then we set a breakpoint on the address and just run the application:

As a result, we get to the original entry point:

It’s so simple, now using the Scylla Hide plugin you can save the resulting file to your hard drive and continue analyzing it.

A similar technique can be applied to any packer that stores context on the stack.

PyInstaller example

This approach does not always work for applications that use a more complex executable file structure. Consider a file that was created using PyInstaller, a package that allows you to convert a Python script to an executable file. When generating the executable, an archive is created that contains the Python virtual machine and all the required libraries. The application source code itself is converted into byte code and cannot be disassembled.
Let’s try to get something readable. Let’s create a simple Python application and package it using PyInstaller. Application source code:

def main():
    print("Hello World!")

if __name__ == '__main__':
    main()

Install the package pyInstaller and create exe file:

pip install pyinstaller
pyinstaller -F hello.py #-F создать один файл

So, let’s collect information about what happened in the end. We have an archive that should start the virtual machine, and the code that we wrote in the form of a script. Let’s try to restore the source and just read it even without starting it.
After executing the commands above, you should have created a directory ./dist/test.exe… Let’s open the file sequentially with pyinstallerextractor and uncompile3:

Our script is located in the directory that is created as a result of unpacking. The file name must match the name exe file. In our case it is test.pyc… Let’s open it in hiew:

Decompilation by standard means is impossible, since the tools simply do not know how to work with Python bytecode. Let’s apply a specialized tool – uncompile6

This way you can get the source code again.


Learn more about the course “Reverse-Engineering. Basic”.

Watch an open webinar on the topic “Exploitation of vulnerabilities in the driver. Part 1”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *