Virtual Machine Analysis Using VMProtect. Part 2
In the first part of the article, we looked at the general view of the virtual machine pipeline, and also briefly touched on possible approaches for analyzing the virtual machine.
Following the sequence of measures described at the end of the first part, I will begin to describe what I tried to do at each stage, what difficulties arose, which of them I decided to bypass, and what happened in the end.
Let me remind you that this article does not provide a guaranteed method for removing virtualization, I just want to share my experience of analyzing VMs, which allows you to more or less understand something about the operation of VMs and can be useful when analyzing similar implementations.
Application of Triton
Since the use of symbolic representation to remove obfuscation of VM instruction handlers using Miasm (as described in ESET article ) didn't help me, since various transitions interfered and it was necessary to remove the trace of each handler in order to pass it on for analysis, I decided to try to get rid of obfuscation using Triton.
Let me remind you that at this stage of the analysis I wanted to find the fastest and most accessible way to analyze obfuscated virtual instruction handlers in order to describe (disassemble) the VM bytecode and, thus, understand the functionality embedded in the virtualized section of code.
I started looking at which side would be best to try using Tenet and came across my favorite examples directory, namely the example dead_store_elimination. In this example, the obfuscated fragment is even given as the original obfuscated code. VMProtectvery cool.
Without reinventing the wheel, I decided to use the given example, in which, in essence, everything comes down to applying the simplify method from TritonContext. This immediately comes to mind:
This creates the same problem as with Miasm – I need to somehow get a full listing of the VM instruction handler. The most accessible way seemed to me to be to use a trace taken from the execution of the handler code. At that time, I had a trace taken for the tool Tenet (this is a tool for analyzing the trace, I will talk about it when describing the next stage). Unfortunately, this trace does not have the disassembled commands inside the handler, but there were the necessary eipThe track for Tenet in text form looks like this:
So we have values eip (that is, the addresses of each instruction executed during the operation of the analyzed software). If we want to get the trace of a specific VM handler, it is obvious that we need to read the values in eip from the place where eip equals the address of the VM instruction handler. We found the array with handler addresses at the beginning of the first part of the article. In the sample I analyzed, there is a handler located at the address 0x50260c. As the last instruction of the handler's work trace, we take the instruction preceding MAIN_LOOP (mentioned in the first part of the article, in my case MAIN_LOOP located at 0x50281a). Thus, from the Tenet route we extract the following section:
Plot
esp=0x2f1f51c,eip=0x50260c,mr=0x2f1f4f0:0c265000 // начало трассы обработчика 0x50260c
edx=0x502621,eax=0x2d,eip=0x50260f
eip=0x502610
eax=0xf4,eip=0x502612,mr=0x4ccbdf:f4
edx=0x502608,eip=0x502615
eax=0xc2,eip=0x502617
esp=0x2f1f518,eip=0x502618,mw=0x2f1f518:82020000
esp=0x2f1f514,eip=0x50261d,mw=0x2f1f514:a2a1d06c
eip=0x503a1b
esi=0x4ccbe0,eip=0x503a1c
edx=0x5000c2,eip=0x503a20
edx=0x500000,eip=0x503a23
eax=0xc3,eip=0x503a25
eip=0x503a26
eax=0x3d,eip=0x503a28
edx=0x5000ff,eip=0x503a2a
esp=0x2f1f510,eip=0x503a2e,mr=0x2f1f518:82020000,mw=0x2f1f510:82020000
esp=0x2f1f4f0,eip=0x503a2f,mw=0x2f1f4f0:1cf5f102e0cb4c00e0f5f10210f5f10236fe7f08ff0050002ac44c003d000000
edx=0x5000fe,eip=0x503a31
eax=0x3c,eip=0x503a33
edx=0x50c4fe,eip=0x503a35
esp=0x2f1f4ec,eip=0x5030c5,mw=0x2f1f4ec:3a3a5000
edx=0x5031fe,eip=0x5030c7
edx=0x500001,eip=0x5030cb
ebx=0x87ffe0a,eip=0x5030cd
eip=0x5030cf
edx=0x500002,eip=0x5030d0
edx=0x500200,eip=0x5030d4
edx=0xf7ccccd8,eip=0x5030d7,mr=0x2f1f5e0:d8ccccf7
esp=0x2f1f4e8,eip=0x5030da,mr=0x2f1f4ec:3a3a5000,mw=0x2f1f4e8:3a3a5000
esp=0x2f1f4e4,eip=0x5034f9,mw=0x2f1f4e4:df305000
eip=0x5034fa
eip=0x5034fe
ebp=0x2f1f5e4,eip=0x503501
esp=0x2f1f4e0,eip=0x502759,mw=0x2f1f4e0:06355000
esp=0x2f1f4dc,eip=0x50275a,mw=0x2f1f4dc:06020000
eip=0x502760,mw=0x2f1f4dc:d610
eip=0x502763,mw=0x2f1f558:d8ccccf7
esp=0x2f1f4d8,eip=0x502764,mw=0x2f1f4d8:d8ccccf7
eip=0x502767,mw=0x2f1f4d8:d8ccccf7
esp=0x2f1f4d4,eip=0x50276c,mw=0x2f1f4d4:3d582af5
eip=0x502770,mw=0x2f1f4d4:d4f4
esp=0x2f1f51c,eip=0x502774 // конец трассы обработчика 0x50260c
eip=0x50281a // eip указывает на MAIN_LOOP
To get the executed machine instructions as an array of byte sequences (as in the example Triton to simplify the code, it works with this format), we will use IDAPy:
Scriptik
import idautils
import idaapi
import idc
def GetInsnLen(ea):
insn = ida_ua.insn_t()
inslen = ida_ua.decode_insn(insn, ea)
if inslen:
return inslen
return 0
eips=""'0x50260c
0x50260f
0x502610
0x502612
0x502615
0x502617
0x502618
0x50261d
0x503a1b
0x503a1c
0x503a20
0x503a23
0x503a25
0x503a26
0x503a28
0x503a2a
0x503a2e
0x503a2f
0x503a31
0x503a33
0x503a35
0x5030c5
0x5030c7
0x5030cb
0x5030cd
0x5030cf
0x5030d0
0x5030d4
0x5030d7
0x5030da
0x5034f9
0x5034fa
0x5034fe
0x503501
0x502759
0x50275a
0x502760
0x502763
0x502764
0x502767
0x50276c
0x502770
0x502774'''
eips_list = eips.split('\n')
for i in range(len(eips_list)):
eips_list[i] = int(eips_list[i], 16)
print(idc.get_bytes(eips_list[i], GetInsnLen(eips_list[i])))
At the output, we receive each command from the VM instruction handler trace in order to further pass these commands to the script. Triton. In this case, all the transition instructions (JMP, JZ, JNE etc.) should be removed, and the call instructions (CALL) replace with PUSH (as if the return address is pushed onto the stack).
The result is far from successful
It is clear that the attempt to remove the garbage code did not lead to the desired result, at least there were instructions for working with the stack, which look more like garbage, this is also indicated by the instruction LEAwhich, as you can see, clears the stack by loading the saved pointer.
Despite the fact that we still haven't received a clean and accessible version of the VM instruction handler, we still now have a listing of this handler, we can try to see what is happening in it (we'll do this on a slightly “simplified” version, it's not for nothing that we used the script from the examples Triton :)).
Let me remind you that in the first part of the article we learned that esi is VM_EIPthat is, a pointer to a byte in the bytecode. Focusing on the key points of the listing, one can notice that in AL value is loaded from [ESI] (1), then some transformations occur with it (2), after a series of transformations EAX is used as an offset (3) to which the value obtained from is written [EBP] (4), while EBP also increases by 4 (5):
Listing
0x50260c: xadd al, dl
0x50260f: mov al, byte ptr [esi] // 1
0x502611: shl dl, 3
0x502614: xor al, bl // 2
0x502616: pushfd
0x502617: push 0x6cd0a1a2
0x50261c: pushal
0x50261d: movzx dx, al
0x502621: sete dl
0x502624: inc al
0x502626: neg al
0x502628: dec dl
0x50262a: push dword ptr [esp + 4]
0x50262e: pushal
0x50262f: dec al
0x502631: push dword ptr [esp + 4]
0x502635: push 0x6cd0a1a2
0x50263a: xor bl, al
0x50263c: mov edx, dword ptr [ebp] // 4
0x50263f: push dword ptr [esp]
0x502642: push dword ptr [esp + 4]
0x502646: push 0x6cd0a1a2
0x50264b: add ebp, 4 // 5
0x50264e: push dword ptr [esp + 4]
0x502652: push 0x6cd0a1a2
0x502657: pushfd
0x502658: mov word ptr [esp], 0x10d6
0x50265e: mov dword ptr [eax + edi], edx // 3
0x502661: push edx
0x502662: mov dword ptr [esp], edx
0x502665: push 0xf52a583d
0x50266a: mov word ptr [esp], sp
0x50266e: lea esp, [esp + 0x48]
Looking at the indicated sections, we can assume that in the indicated handler one operand of size 1 byte is read, this operand is used as a pointer on the virtual stack (VM_ESP), according to which the value is placed from [EBP]The instruction can be written something like this:
MOV VM_EIP:OP1, [EBP]
So, the key components used in the VM instruction handler operation have become roughly clear. Now it has become clearer what exactly needs to be looked for to understand what the other handlers do, so I decided not to try using Triton remove obfuscation completely and use manual analysis of the work of each handler, using Tenet.
Trace Analysis with and without Tenet
As already mentioned above, Tenet This is a tool for analyzing a route using IDA. Details in the repository describedhow to remove the trace of the analyzed software, I used Intel Pin, the necessary module has already been assembled for it.
As a result of removing the route in the required format and transferring it to the plugin Tenet we get a very convenient opportunity to move in both directions along the route, carefully monitoring the changes that interest us:
Now that we have a little understanding of how VM instruction handlers work, and also have the ability to conveniently analyze the trace, we can begin to analyze each handler to understand the bytecode being executed.
Beforehand, I also recorded all the instructions executed by the virtual machine –
that is, removed the VM trace. This can be done using the x64dbg debugger or other methods (for example, the same Pin). If you do this through x64dbg, then you must not forget to install the plugin ScyllaHide and select the protection profile VMProtect. This can be done by setting an instruction breakpoint. RETwhich makes a transition to the VM instruction handler (this approach was discussed in the first part), and adding a record to the log of the value lying on the stack (the address of the handler of the next command from the VM bytecode):
Looking back I would have added a log entry value ESIbecause it is VM_EIP and it is more convenient to watch the movements in the bytecode, although it was possible to understand what was happening anyway.
As a result, we get a VM route, the addresses of which need to be reversed (converted due to Little Endian):
Of course, we also need to get the number of unique addresses, i.e. the number of all handlers used in the route, so that later, for example, we can mark which ones we analyzed and what exactly is happening in them. We got 31, not very many.
Gradually analyzing the handlers and understanding what is happening in them, we begin to describe the route, leading to a form like this:
Of course, I would also like to see the specific values of the operands, but for this it was necessary to implement the decoding that occurs in each handler, so I decided to look at the trace in the presented form for now.
During the analysis of the VM instruction handlers, the most interesting to me were the instructions that interpret memory writes, calls and transitions, as well as the implementation of the checksum calculation.
In my example, writing to memory is implemented by handler 0x502b1c, here is an example of a section of the route where writing is implemented 4 times in a row:
If you look inside the handler, you can see that the memory write itself occurs during execution. MOV [EAX]EDX. The values of both operands are taken from [EBP+0] And [EBP+4] respectively, which are filled in on previous VM commands (visible on the track).
Tenet allows us to step through each execution of the write instruction, so we can see what exactly is being written. For example, in the example below, after four calls, the string “kernel32.dll”:
Other lines with function names are written in the same way, for example, CreateFileA or MapViewOfFile. Somewhere earlier they are obviously deciphered. So we already understand what functions WinAPI the malware will use it and for what purpose.
The transition function is characterized simply by writing in ESI values from [EBP+0]which is also used to record when executing VM instructions earlier. Transitions help us find cycles, i.e. some repeating operations like checks, records, decryptions, etc.:
The checksum calculation is performed for the selected memory area in a loop inside the VM instruction handler using arithmetic operations (for example, XOR or SHLin the sample I analyzed, this is the handler at the address 0x50288e).
If you take a closer look at what functions WinAPI are called and for what the integrity check is used, it becomes clear that the malware maps its image into memory and then compares the checksums of the image sections with the contents of its own sections to determine the presence of breakpoints or other changes.
If the check is successful, the unpacked section of the malware implementing the main functionality is written to the .text section. This can be seen by analyzing how the VM instructions for writing to memory were called. The screenshot shows that there is a gap of approximately 28 thousand VM instructions between the first VM instructions for writing (which dealt with strings with function names) and the subsequent VM instructions for writing:
That is, in this gap, the code with the main functionality was checked and unpacked, after which the recording of the unpacked code in the section began. .text And .data (write instructions are now called at intervals of 4-5 steps and in a much larger volume).
So, having studied the functionality of each (or most) VM instruction handlers, we figured out what the virtualized code does – it maps its own image, checks its integrity, and unpacks the main section. If you take a dump after unpacking, you can see clear lines as a sign that unpacking has been completed (in particular, the names of functions indicating work with the kernel):
It is also noticeable that the disassembled and decompiled representation itself is in the area OEP and then became much more readable:
Now we can continue analyzing the sample under study.
Conclusion
You'll laugh, but it turns out that in this case it was enough to simply set a breakpoint on the execution of the code section and enable the profile VMProtect V ScyllaHideto get the unpacked version :). But the purpose of the analysis was to understand the operation of the virtual machine in order to understand how to approach the study of samples protected by VM in a more complex way (with virtualization of the main functionality). In addition, it is possible that some functions of the unpacked code remained virtualized.
Thus, when analyzing the work of the VM, we used a little bit of several tools, but if we tried, it would most likely be possible to solve the problem with just one of them.