It’s hard, you can’t figure it out

The course of events showed that, firstly, it’s not so difficult (passing parameters to a PgSQL stored procedure, and getting a result, returning it to the user is far from being a masterpiece of obfuscation, rather, the principle “It’s easy there, anyone can figure it out, but not wants”), and secondly, someone will understand and find a way to change the behavior in the right way.

Are there still methods for protecting sources in python, and what (relatively sane) methods can be used to solve this issue?

Somehow they enter the bar .pyo, .pyc, and .pyd …

… and the bartender says to them “what’s up in the bytecode, boys?”

Before understanding the methods of protecting source codes, let’s recall how the execution of scripts in Python is arranged.

The Python implementation (classically CPython) is a compiler and a virtual machine. A script written in Python is converted by the compiler into bytecode, which, in turn, is executed by a virtual machine.

The bytecode consists of the operation codes (opcodes) of the virtual machine, and the arguments accompanying the opcode.

.py-files contain source code, and are executed by the virtual machine, passing through the stage of conversion to bytecode.

However, you can not convert .py files to bytecode every time, and why, if they, for example, do not change? .pyc files are used for storing and executing bytecode (for example, they can be observed in the __pycache__ directories).

.pyc-files contain ready-made bytecode that is directly executed by the virtual machine.

.pyo-files, as well as .pyc-files, contain ready-made bytecode, and, in fact, they do not differ in anything except preliminary code optimization (cutting out asserts and docstrings).

In addition to storing bytecode for a virtual machine, a scenario is possible in which Python source code is converted to C source code, and then compiled into .pyd files (for Windows) or .so for Linux.

But in our time…

Have you ever wondered why they don’t make crackmes in Python? It seems that you can take a script, convert it to bytecode and … with the same ease convert it back to source code, with some kind of uncompyle. And debugging a Python script is not to say that it is very difficult. However, crackmes in python, although the cat cried them, still exist.

They are roughly divided into the following groups:

  1. Python code wrapped with an interpreter in another compiled language, possibly converted to .pyd

  2. .pyc files that were created using a modified interpreter and are not executed by the standard interpreter

  3. The only ones I found examples of crackmes in pure Python. Demonstrate that the code can be obfuscated. “Unraveling” comes down to hyphenation, removal of unused characters that are syntactically valid, but do not carry any semantic load, and, possibly, short-term debugging with modifications on the fly.

Crackmes, as a measure of how much it is possible to complicate the task of “unraveling” the code, and answering the question “what did the author want to say”, in the case of Python, it shows that you can’t scare a hedgehog with a bare Python code (as long as it doesn’t make you laugh).

Modus operandi

Knowing the basic theory, it is possible to put forward a number of hypotheses about the points of introduction of protective mechanisms:

  1. .py

When it comes to converting Python code to another Python code that does the same job but is less readable or not at all readable, the word “obfuscation” comes to mind.

However, obfuscation, as a process of changing the source code, can also lead to changes in the operation of algorithms, for example, in speed, which is not good.

The only method of protection at this stage can be a recommendation to write code of such poor quality that its modification is included in the number of dangerous BDSM practices.

  1. .pyc \ .pyo

Bytecode can be decompiled back into readable source code, which is actually the main danger. Therefore, the direction of action in this case is the difficulty of parsing and decompiling the bytecode.

The structure of the .pyc file is quite simple:

  1. The first 2 bytes are the magic_number indicating the version of the interpreter.

  2. 2 fixed bytes 0x0D 0x0A

  3. 4 bytes date and time of last modification

  4. Next comes the bytecode

To solve the problem of the impossibility of decompiling the bytecode, one way or another, you will have to modify the Python implementation used, essentially creating a customized version of the virtual machine that supports casting the bytecode to an executable form.

You can obfuscate the bytecode using the following techniques:

  1. Encryption. When compiling, additionally convert the entire bytecode into an unexecutable without a decryption stage.

  2. Obfuscation of opcodes. The essence of the method lies in the additional transformation of the opcodes themselves in such a way as to “shift” the opcode relative to its original location in the table, turning it into another opcode, which, in turn, will make it impossible to execute such a bytecode with an unmodified version of the virtual machine.

The disadvantages of such methods are:

  1. The need for support when migrating to new versions of Python

  2. In the case of encryption, storing a key for decryption on a machine with a running code, or for implementing a key exchange protocol.

  3. In the case of opcode obfuscation, the possibility of shift analysis (when using simple algorithms like the Caesar cipher), or the need to use more complex algorithms in order to complicate static analysis.

  1. .pyd \ .so

The most effective method, in terms of protection capabilities, is compilation into executable code (not the bytecode of the Python virtual machine, but machine code), and further obfuscation.

The assembly of the .py file into the library itself is done using cython, and is quite non-trivial. As a result, we have a code that has nothing to do with the original source code at all, and cannot be decompiled (only disassembled and examined, as a result of which we can draw conclusions about the algorithms and features of the code).

In conclusion, I would like to recommend Python courses from my friends at OTUS. Learn more about the courses at the links below:

Similar Posts

Leave a Reply