I have been wanting to write an article for a long time and have been waiting for a relatively simple piece of malware to show up on it, and maybe learn something interesting in the comments.
Here, it happened, and I will show on the example of the malware used by the group kimsuky in 2021, how to simplify your analysis using IDAPy for decoding strings during static analysis of the sample (or if not simplify, then at least make it more elegant :)).
The sample in question is available at Malware Bazaarso that everyone can independently explore this malware.
You can also learn something about the group itself at Malpedia.
So let’s move on to the analysis. The full functionality of the sample will not be described, since this is not the purpose of the article, besides, everything is quite clear in it, especially after we decode several lines.
Uploading the sample to IDA Prowhile it is desirable to make sure that the necessary signatures are loaded (Shift+F5), this can be called a good habit, which can sometimes save the analyst from analyzing many library functions. For standard WinAPI applications compiled in the MS VS environment usually have the following signature sets:
or their 64-bit counterparts.
Since the analyzed sample is a library, let’s look at its export, where we will see the function Run. Moving along the control graph and/or along the decompiled representation, we get into the function sub_10002BB0 (imagebase we have 0x10000000). In this function, you can find several interesting calls to one function, which is passed as an argument coded string (it doesn’t look like it’s encrypted).
Examples of calling an interesting function:
What can we say about this function at the moment:
the first argument is passed through ECXthat is, it looks like __fastcall for x86 (the disassembler thought it was __thiscallbut the passed string doesn’t look like a pointer to a class);
12 cross-references point to the function, that is, it is used several times during the operation of the sample, this is typical both for library functions and for some handlers embedded in the logic of the sample;
it can be seen that immediately after returning from this function, a call by value is made EAXthat is, the function returns pointer to another function;
in many cases of calling this function, something is first put on the stack, and you can see interesting lines, constants (like 0x80000001) and something very similar to WinAPI function prototypes.
Thus, we can tentatively conclude that this function most likely decodes the name WinAPI function, resolves and returns its address. That is, it is a technique antidetect from the Kimsuky developers, which consists in dynamically obtaining the addresses of functions specific to malware, as well as hiding strings with the names of these functions. Not a very efficient approach as of now 2023 year, but 2021 year, it was probably more relevant.
Let’s rename this function for further convenience (for example, in resolve_func) and continue the analysis. IN resolve_func it can be seen that the string with the encoded name of the function, as well as some other encoded string (for example, “5WquWMKf.LMM”) are passed as arguments to the function sub_10003B40.
Assuming that the first line (“5WquWMKf.LMM“) – This Name librariesand the second line is the name of the desired functionthen it even slightly resembles the function GetProcAddress.
In the called function sub_10003B40 it can be seen that both strings are alternately passed to the function sub_10003C80after which the result of the function is passed to WinAPI functions LoadLibraryA And GetProcAddress. It looks like what happens in this function decoding line, so for convenience, rename it to decode.
For a more comfortable presentation, I tried to group the uninteresting sections of the execution graph in order to show in the picture the transmission of the string after decoding as an argument to LoadLibraryAI hope you can see something:
In function decode We see the following logic:
a string of random and non-repeating 64 characters is used (I think you can call substitution string);
the original encoded string is compared character by character in a cycle with the characters of the substitution string, if such a character occurs, then it is replaced;
the index of the character to be replaced from the substitution string is chosen by subtracting 22d and applying to the result logical AND with meaning 0x3F.
The decompiled representation of this function:
This is how the string is decoded. Probably not described very legibly, but on the example of a script IDAPy on decoding should be clearer.
So, after such a long preface, we finally got directly to writing the script. The first thing to do is to describe the decoding logic itself in the python representation.
An array is used as a substitution string byte_1002AC08:
To make it more like an array, represent it accordingly (Numpad+*), then choosing byte_1002AC08 and pressing Shift+Echoose the representation “string literal” to copy into our script.
Now let’s proceed directly to the script that describes the decoding logic. Of course, here I have to apologize to everyone who finds and pays attention to some shortcomings and / or errors in the code:
# строка подстановки subst="zcgXlSWkj314CwaYLvyh0U_odZH8OReKiNIr-JM2G7QAxpnmEVbqP5TuB9Ds6fFt%" coded = 'CP9STl-UP19poPvv' # пример исходной строки decoded = '' coded_counter = 0 # счетчик для символов в декодируемой строке len_coded = len(coded) # как и в деккомпилированном представлении, конструкция состоит # из двух while-циклов, при этом в самом образце внешний цикл # выполняется пока строк не закончится нулевым байтом, # но в нашей реализации для этого есть счетчик обработанных символов while len_coded != coded_counter: counter = 0 next = False # флаг нужен, чтобы выйти из внутреннего цикла и перейти к следующему шагу внешнего while coded[coded_counter] != subst[counter]: counter += 1 # повторяем проверку количества сверенных символов # со строкой подстановки if counter >= 0x40: coded_counter += 1 next = True break if next: continue # если символ найден в строке подстановки, высчитываем индекс для # подстановки и дописываем к новой строке decoded += subst[(counter - 22) & 0x3f] coded_counter += 1 print(decoded)
This interpretation of decoding does not handle the case when the original character is not found in the substitution string, but what the decoded string means can be understood without it (it is clear that the string “kernel32dll” despite the absence points means the name of the library).
I didn’t find the correct name for this one. decoding algorithmbut if someone is in the know, it will be interesting to know about it.
Writing the final script
So, we have rewritten the decoding logic, now we need to write a handler for all function calls resolve_funcin order to add the value of the decoded string in the comment to the call to this function and understand what the malware is causing at this stage of execution.
The logic of the script is as follows:
find all cross references function resolve_func (using the XrefsTo method);
find the passed argument in each call – encoded string (traversal up the control graph using the method prev_headworking with the contents of the statement using methods print_insn_mnem, get_operand_type, get_operand_value);
decode line and add like a comment to a function call (the string itself can be obtained using the get_strlit_contentsdecode and add as a comment with set_cmt).
It seems to me a good guide to IDAPy thisalthough not everything that can be useful in the course of malware analysis can be found here.
Thus, the final script will take the following form:
import idautils import idaapi import idc # уже подготовленное нами декодирование строки def decode(coded): subst="zcgXlSWkj314CwaYLvyh0U_odZH8OReKiNIr-JM2G7QAxpnmEVbqP5TuB9Ds6fFt%" decoded = '' coded_counter = 0 len_coded = len(coded) while len_coded != coded_counter: counter = 0 next = False while coded[coded_counter] != subst[counter]: counter += 1 if counter >= 0x40: coded_counter += 1 next = True break if next: continue decoded += subst[(counter - 22) & 0x3f] coded_counter += 1 return(decoded) # в моем случае функция resolve_func расположена по указанному адресу, # получаем перекрестные ссылки на неё xrefs = XrefsTo(0x10003CD0) # добавляем в список адреса источников вызова функции funcs_list =  for i in xrefs: funcs_list.append(i.frm) for ea in funcs_list: # перед каждым вызовом нужно найти инструкцию с передачей # аргумента, чтобы из него найти указатель на кодированную строку # для обхода вверх от инструкции вызова будем использовать # prev_head instr = prev_head(ea) while True: # для получения мнемоники инструкции будем использовать print_insn_mnem if print_insn_mnem(instr) == "mov": # get_operand_type дает возможность получить тип # первого операнда, нас интересует регистр - тип 1 if get_operand_type(instr, 0) == 1 and get_operand_type(instr, 1) != 1: # я не нашел, как правильно в IDAPy понять, какой # именно регистр используется в роли операнда, # поэтому просто решил проверять, что ecx есть # в дизассемблированном представлении if "ecx" in generate_disasm_line(instr, 0): string_address = get_operand_value(instr, 1) # из второго операнда получаем указатель # на кодированную строку coded_string = get_strlit_contents(string_address) # декодировав в utf-8 и передав в нашу функцию, # добавляем декодированную строку как комментарий set_cmt(ea,decode(coded_string.decode("utf-8")),1) else: instr = prev_head(instr) continue else: instr = prev_head(instr) continue break else: instr = prev_head(instr)
After the script is executed, each function call instruction resolve_func will be accompanied by a comment that will indicate what kind of function is called next.
Examples of script results:
The first, second, and a few more times after, the effort spent writing the script may seem incommensurable with the result, but, of course, as in any business, the speed of use IDAPy increases with practice, which in the long run simplifies and speeds up malware analysis.
On this sample I also tested mine pluginwhich helps to find implicit usage patterns WinAPI functions, but the logic of its work breaks down on such a construction as:
call resolve_func call eax
although still sometimes it works out quite well, especially when it comes to analysis shellcode under Windows.
So we looked at another example of using IDAPy in the course of static analysis of malware, I hope that the article was useful to those who take the first steps in malware analysis or are simply interested in this topic.