how to decrypt the source code

Hi all!
Today on our broadcast there is a new author – Nikita Sinkevich, head of the analysis and response group of the Angara Security Engineering Center. So let’s begin!

Sometimes, during the investigation of an information security incident, it is necessary to understand whether a particular program has a malicious effect on the system. If you don’t have the source code for a given program, you have to use the research method “reverse engineering”, or Reverse Engineering. I will show you what “reverse engineering” is using the example of task “RE-101” from the resource cyberdefenders.

CyberDefenders is a platform for developing blue team skills in detecting and countering computer threats. This article will analyze the last, sixth subtask from task “RE-101”. In my opinion, it reveals the reverse engineering process better than the others: we have to restore a custom encryption algorithm from an executable file, write a decryptor script and decrypt the flag.

Tools used:

1. Detect it Easy

2. IDA

3. Python3

4. Notepad++

5. CyberChef

Exercise

The description hints that someone has implemented their own cryptographic algorithm in the proposed malware201 file.

Let’s launch DiE and see information about the file from the task:

We see that this is a 64-bit executable file for Linux; DiE did not find any protector or packer – this is good news.

Now let’s run the program:

The program does not ask for anything, it simply displays two lines: an encrypted flag and a hint line, most likely this is a line from the source code of this software.

Let’s load the file into IDAx64. Before us is the main() function, let’s break it into blocks as follows:

Let’s analyze each block separately.

Block 1

In this block, offsets in memory are determined, then they will be used to determine the location of data storage during program operation.

Block 2 Function prologue

This is the prolog, it is responsible for setting up the environment of the function. The first instruction saves rbp (also called frame pointer) on the stack.
This is done so that it can be restored after the function returns. The following statement copies the rsp value to rbpas a result of both rsp and rbp point to the top of the stack.

rbp will now store a fixed address and the program will use rbp to refer to function arguments and local variables. sub rsp, 40h decreases register value rspthus allocating space (64 bytes) for local variables.

Block 3 Line output

Team “mov rdi offset aTheEncryptedFl” places a pointer to a line <The encrypted flag is: \"> to the register rdi. The rdi register is used to pass the first argument to the function. The following command works similarly – to the register rax the address of some label is placed unk_40082Bafter which the value rax stored in memory at address rbp+var_8and to the register al fits 0 “mov al, 0”so that the subsequent function call printf worked correctly. The last command in this block calls the function printf handing her from rdi line address aTheenc.

This block can be described in one line in a higher-level programming language. I am sure that C++ would be very suitable here, but I do not have enough skills to work with it, so I will write C-like pseudocode, which will be further magically using special tools, converted into a normal C++ program.
So: pseudocode printf(“The encypted flag is: \“”);

Block 4 Calling a function sub_4005B0

The first two instructions are placed in registers ecx And esi value 2016=3210. The rsi register is used to pass the second argument to the function. Then in rdi the value from memory is placed at address rbp+var_8. Remember that the address of the tag is there unk_40082B, which was placed there by the command from the previous block. The penultimate command saves the value eax in memory at address rbp+var_24this is necessary because the subsequent function return sub_4005B0 will overwrite eax and previous data will be lost.

Let’s imagine this block as another line of pseudocode: sub_4005B0(&unk_40082B, 32);Let’s see what’s stored in &unk_40082B.

We see an array of 32 bytes, null-terminated, which is output when the program starts; this is a flag that will need to be decrypted. Let’s give this array the name flag.

To convert this array to characters, copy the first line of the program’s output into notepad++ and remove the “\x” characters.

You will get a line like this.

Now, using the resource gchq.github.io/CyberChef/ and the “From Hex” operation, we convert it into characters.

The resulting character string will be needed to restore the source code of the program.

At this stage it can be assumed that the function sub_4005B0 needed to output a string in a specific format. Let’s give it a name printing. To do this, select the function sub_4005B0press N and enter the name.

Pseudocode: printing(flag, 32);

Block 5 Line output

Just like block 3, this block outputs the line <”\n\n>
Pseudocode: printf(“\”\n\n”);

Block 6 Calling a function strlen

The first two instructions write row addresses <encrypt(\"%s\", %ld) == \"> And <my message> to registers rdi And rsi respectively.

The next three instructions store the address of the string <my message> to memory at address rbp+s and line address <encrypt(\"%s\", %ld) == \"> to memory at address rbp+format. Then to the register rdi the address of the line is written <my message>after which the value eax stored in memory at address rbp+var_34and the function is called strlenwhich took as input the string with rdi and recorded the result in rax. The final instruction in this block stores the value rax to memory at address rbp+var_18.

Pseudocode: var_18 = strlen(“my message”);
Let’s rename var_18 V len_message

Block 7 Output a string in a specific format

The first three instructions are placed in registers rsi, rdx and rdi line address <my message>line length <my message> and line address <encrypt(\"%s\", %ld) == \"> accordingly, after which the function is called printf. Let me remind you that the register rdi used to pass the first argument to the function, rsi – second, and rdx – third.

Pseudocode: printf("encrypt(\"%s\", %ld) == \"", "my message", len_message);
This code outputs the following portion of the line:

Block 8 Calling a function sub_400620

The first two instructions are sent to rdi and rsi line address <my message> and its length accordingly, after which the call occurs sub_400620 with saving the result into memory at the address rbp+ptr. Let’s call this function encryptthe same as in the program output.

Pseudocode: ptr = encrypt ("my message", len_message);

Block 9 Calling a function printing (sub_4005B0)

In this block the function is called again sub_4005B0but with different arguments.

The first argument is the result of the previous function, stored at rbp+ptr. The second argument is the length of the string <my message>.
Pseudocode: printing (ptr, len_message);

Block 10 Line output

Just like blocks 3.5, this block outputs the line <\”\n>
Pseudocode: printf("\"\n");

Block 11 Freeing allocated memory

Here, in rdi the result of the function is saved (most likely it will be a pointer) encrypt and passed as an argument to the function free. Pseudocode: free(ptr);

Block 12 Epilogue of the function

The epilogue is intended to do the opposite of the prologue. xor eax, eax sets the value eax equal to 0.

This is the return value (return 0). add rsp, 40h returns rsp to the position that was before the prologue. This is stack clearing.

pop rbp restores the old one rbp from the stack.

Pseudocode: return 0;

Let’s combine all previously written lines into a single program:
int main() {
int len_message;

int *ptr; // most likely this is a pointer, let the data type be int

char flag[32] = mxalÝ~e~GjOÌ÷ÊshUBSÜ×ÔkìÛÒámÞÑÂ;
printf(“The encypted flag is: \“”);
printing (flag, 32);
printf("\"\n\n");
len_message = strlen(“my message”);
printf("encrypt(\"%s\", %ld) == \"", "my message", len_message);
ptr = encrypt ("my message", len_message);
printing (ptr, len_message);
printf("\"\n");
free(ptr);
return 0;
}

From the program listing you can see that main two as yet unknown functions are called: printing and encrypt.

IDA broke the code into blocks because the instructions in a given function may not execute linearly. If you look closely, you can see the cycle. I named the variables to make it easier to navigate the code.

Block 1. Prolog, memory allocation for variables, storing arguments

In the first block, memory is allocated for the loop variable and variables that are intended for the arguments passed to this function. The first argument is passed to the register rdi – this is a pointer to the string message, the second argument is passed to the register rdi is a number that corresponds to the length of the string.

Block 2. Precondition

The second block represents the loop precondition: here it compares (cmp) the loop variable i and the variable len_message, which contains the length of the passed string.

JNB instructions (Jump if Not Below) jumps if the first operand is NOT less than the second (that is, greater than or equal to the second) when executing the cmp command. Thus, if i is greater than or equal to len_message, the condition will be satisfied and the loop will exit (follow the green arrow). Otherwise, the transition will not occur and execution will continue in the body of the loop.

Block 3. Loop Body
Let’s break it down into two parts as follows:

The first part calls the function _printf with arguments, list below.

1. rdi – pointer to a string < \\x%02x >;

2. esi – pointer to the i-th byte of the string message. It results in:

2.1. moving a pointer to a line to rax message: mov rax, [rbp+message]

2.2. moving the value of variable i to rcx: mov rcx, [rbp+i]

2.3. increasing the address of the pointer to the value of the loop variable: rax+rcx

2.4. moving to rdx the value of one byte located at the address rax+rcx: movsx edx, byte ptr [rax+rcx]

2.5. moving to esi values edx

Thus, this part outputs the i-th byte of the message string in the format \x00.
The second part of this block increments the variable i by 1 and moves to the previous block with the condition, starting the next iteration of the loop.

Block 4. Epilogue of the function

The fourth block represents the function epilogue and return from it.
Thus, we understand that this function is designed to sequentially output the bytes of the passed string in a specific format.

Pseudocode:

void printing (message, len_message){
for (int i = 0; i< len_message; i++){
		printf(“\\x%02”, *(message + i));
}
}

Analysis of the encrypt function

This function is similar in structure to the previous one – there is also a loop here. I also named the variables to make the code easier to navigate.

Block 1

In this block calloc is called with arguments rdi = len_message And rsi = 1. This function allocates space to store an array of len_message elements, each of which is 1 byte in size and equal to zero. A pointer to the given array is stored in rax and then placed in a variable encrypted_message. The last instruction in this block initializes the loop variable i=0.

Block 2. Precondition

This block, just like the similar block of the previous function, performs a transition if the first operand is greater than or equal to the second. Those. if i is less than len_message the body of the loop will be executed (follow the red arrow).

Block 3. Loop Body

And now we come to the most interesting part: this block implements an encryption algorithm.
Having understood the logic of this block, we can write a decoder program.

Let’s look at this block in more detail:

1. FF is written to ecx;

2. The value of the i-th element of the message array is placed in eax;

3. For this element, a bitwise shift to the left is performed by 1 (shl instruction);

4. For the same element, a logical OR operation with one is performed;

5. The result from eax is placed in the part variable;

6. The loop variable i is placed in rax and divided by rcx=FF
(the value was assigned in step 1), the result of the division is written to rax, the remainder of the division is written to rdx;

7. For rdx, a logical OR operation is performed with 0A0;

8. For the variable part, an “exclusive OR” operation is performed with rdx, the result is written to rcx;

9. The result from the previous paragraph is written to the i-th byte of the encrypted_message array;

10. One is added to the loop variable;

11. Go to Block 2.

Block 4. Epilogue of the function

In this block, a pointer to the array is placed in rax encrypted_messagewhich allows this value to be returned when the function exits, followed by the standard instructions for clearing the stack and restoring the old one rbp from the stack.

Pseudocode:

int encrypt(message, len_message){
	char *encrypted_message;
	for (int i = 0; i < len_message; i++){
	encrypted_message[i] = ( ( ( i % 0xFF ) | 0xA0) ^ ( ( message[i] << 1) | 1 ) )
}
	return encrypted_message;
}

Writing a decoder script

In this case, successful decryption of the message will occur if we can find out what is equal to message[i] from the 3rd function block encrypt for any i. To do this, we need to understand what information we have initially. So what do we know:

1. Loop variable – i. Initially it is equal to zero and increases by 1 with each subsequent iteration of the loop;

2. Each element of the encrypted message – encrypted_message[i] and the length of this message is len_messagesince we have this message;

Now, if you look at this line, highlighting the known variables and the desired one message[i]:

encrypted_message[i] = ( ( ( i % 0xFF ) | 0xA0) ^ ( ( message[i]) << 1) | 1 ) )

We get a kind of equation with one unknown – X, where a = encrypted_message[i], b = ( i % 0xFF ) | 0xA0

a = b ^ X, due to the XOR feature: A XOR (A XOR B) = Bit can be reduced to the following form:

X = a^b, which corresponds to ( message[i] << 1 ) | 1 = encrypted_message[i] ^ ( (i % 0xFF ) | 0xA0 )

It remains to deal with (message[i] << 1 ) | 1. Here for message[i] applies:

1. Shift left

2. Change the first bit to 1

Let’s take the character J as an example. In the ascii table it has code 4A16, which corresponds to the value 10010102

It turns out that in order to get the original value, it is enough to shift to the right or divide by 102. Both of these actions will have the same effect – removing the least significant bit. It remains to implement this logic in python. The code is below:

Python3:
# Encrypted flag
encrypted_flag = "\x6d\x78\x61\x6c\xdd\x7e\x65\x7e\x47\x6a\x4f\xcc\xf7\xca\x73\x68\x55\x42\x53\xdc\xd7\xd4\x6b\xec\xdb\xd2\xe1\x1c\x6d\xde\xd1\xc2"

# Convert each character to hexadecimal and then to decimal
decimal_array = [int(hex(ord(char)), 16) for char in encrypted_flag]

# Create a variable for the decrypted flag
flag = ""

# Main loop
for i in range (len(decimal_array)):

# Perform XOR to select the desired part
part = ((i % 0xFF) | 0xA0) ^ int(decimal_array[i])

# Remove last bit, convert number to character
flag += (chr (part // 2) )

# Output decrypted file
print(flag)

After executing this script, the following flag will be displayed:
flag<malwar3-3ncryp710n-15-Sh17>

# Create a variable for the decrypted flag
flag = ""

# Main loop
for i in range (len(decimal_array)):

# Perform XOR to select the desired part
part = ((i % 0xFF) | 0xA0) ^ int(decimal_array[i])

# Remove last bit, convert number to character
flag += (chr (part // 2) )

# Output decrypted file
print(flag)

After executing this script, the following flag will be displayed:
flag<malwar3-3ncryp710n-15-Sh17>

Restoring the program’s source code:

Still remember about pseudocode? So, it can be easily rewritten in C++ using ChatGPT. I used the following query:

#include <cstdio>
#include <string>
using namespace std;
void printing (message, len_message){
  for (int i = 0; i< len_message; i++){
    printf("\\x%02", *(message + i));
  }
}
int encrypt(message, len_message){
  char *encrypted_message;
  for (int i = 0; i < len_message; i++){
    encrypted_message[i] = ( ( ( i % 0xFF ) | 0xA0) ^ ( (2 * * message[i]) | 1 ) )
  }
  return encrypted_message;
}
int main() {
int len_message;
int *ptr;
string flag[32] = "mxalÝ~e~GjOÌ÷ÊshUBSÜ×ÔkìÛÒá mÞÑÂ";
printf("The encypted flag is: \"");
printing (flag, 32);
printf("\"\n\n");
len_message = strlen("my message");
printf("encrypt(\"%s\", %ld) == \"", "my message", len_message);
ptr = encrypt ("my message", len_message);
printing (ptr, len_message);
printf("\"\n");
free(ptr);
return 0;
}
Преобразуй в нормальный код

After ChatGPT’s answer and minor corrections, this is the code I got:

C++:

#include <cstdio>
#include <cstring>
using namespace std;
void printing(const char* message, int len_message) {
    for (int i = 0; i < len_message; i++) {
        printf("\\x%02x", static_cast<unsigned char>(message[i]));
    }
}
char* encrypt(const char* message, int len_message) {
    char* encrypted_message = new char[len_message];
    for (int i = 0; i < len_message; i++) {
        encrypted_message[i] = (((i % 0xFF) | 0xA0) ^ ((message[i] << 1) | 1));
    }
    return encrypted_message;
}
int main() {
    const char flag[] = "mxalÝ~e~GjOÌ÷ÊshUBSÜ×ÔkìÛÒá mÞÑÂ";
    printf("The encypted flag is: \"");
    printing(flag, 32);
    printf("\"\n\n");
    const char* message = "my message";
    int len_message = strlen(message);
    printf("encrypt(\"%s\", %d) == \"", message, len_message);
    char* encrypted_message = encrypt(message, len_message);
    printing(encrypted_message, len_message);
    printf("\"\n");
    delete[] encrypted_message;
    return 0;
}

It compiled and launched normally. The output of the resulting program is identical to the output of the original program, so we can assume that we have successfully restored the source code.

Conclusion:

We successfully reverse engineered the program, restored its source code, and based on the data obtained we were able to write a decryptor script.
I would also like to note that artificial intelligence can help with source code restoration tasks, which will help the reverser save a little (or maybe a lot) of precious time.

Thank you for your interest in this article!
If you have any comments or questions, I will be glad to answer in the comments. See you again!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *