Google CTF 2024 Quals – auxin2

Rustam Guseinov

Chairman of the cooperative RAD KOP

Well, we are finishing the experimental series of materials on CTF with the last article based on the recent Google event. Even the largest companies do not disdain to organize such events, and this is not only a great opportunity for training and professional growth, but also a good way of networking, career development or onboarding into the desired company. The case when the pleasant is combined with the useful and forms a dialectical whole, removing any private contradictions. Let's start analyzing the case?

Links to solutions to CTF tournament problems

CTF website

Task materials

The auxin2, which was placed in the “bowls” by the organizers, turned out to be a curious task from the category of esoteric programming – in this case, a virtual system Varvaraworking on the machine uxn. During the competition I managed to solve it first (as part of MSLC), although it took most of the first day – the conditions were fairly straightforward, but understanding the peculiarities of an unfamiliar machine's instruction set and working around the limitations was more than interesting for a detailed description of the solution.

Terms and Conditions

In the task archive we are given an image for varvara and a simple service that passes it a line with a length limit of 112 bytes and gives its output to the console – in addition, execution cannot take more than 0.5 seconds. Scrolling through the description uxnwe find out that this is a stack machine with two cyclic stacks, which does not use registers as such, and communicates with virtual peripheral devices (including the console and file system we need, in which – in the file flag – the flag is located) through a special page of memory, inaccessible to standard instructions (special input-output instructions are used for this DEI And DEO). The instruction set is quite minimalistic – there is not even a decrement, but most have variants that work on two-byte words (suffix 2), an alternative stack used to return from functions (suffix r), and leaving used instructions in memory instead of removing them from the stack (suffix k) – this is determined by the flags in the top three bits of the opcode byte (details).

Analysis

The easiest way to disassemble an image is to assemble it in addition to the image itself. uxn (from what is specified in the description commit) utility image uxndis from repository uxn-utils and using it on the file:

$ uxncli uxndis.rom auxin2.rom

You don't have to look too deeply into the short output to notice the byte checks in the loop for equality of the lower nibbles (four bits) 0, 2, 3, 6, 7 or e:

...
0069:   06         DUP
006a:   80 0f      LIT 0f
006c:   1c         AND
006d:   06         DUP
006e:   80 00      LIT 00
0070:   08         EQU
0071:   20 00 34   JCI +52
0074:   06         DUP
0075:   80 02      LIT 02
0077:   08         EQU
0078:   20 00 2d   JCI +45
007b:   06         DUP
007c:   80 03      LIT 03
007e:   08         EQU
007f:   20 00 26   JCI +38
0082:   06         DUP
0083:   80 06      LIT 06
0085:   08         EQU
0086:   20 00 1f   JCI +31
0089:   06         DUP
008a:   80 07      LIT 07
008c:   08         EQU
008d:   20 00 18   JCI +24
0090:   06         DUP
0091:   80 0e      LIT 0e
0093:   08         EQU
0094:   20 00 11   JCI +17
0097:   02         POP
...

The guess that the bytes of the sent code are checked in this way turns out to be correct:

$ nc auxin2.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
input: f1
timeout!
$ nc auxin2.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
input: f2
b'No.\n'

This means that our task is reduced to writing code without opcodes ending with the above nibbles. Let's look at uxn opcode table:

As we can see, opcodes are not available to us. LDZ, JCI, JMI, JCI, LIT, POP, LDR, NIP, STR, DUP, DEI, OVR, DEO, JSR And EOR. Since according to the task conditions we need to read the flag from the file (and output it to the console), and access to devices requires use DEI/DEOit is clear that one way or another we will have to somehow hide these instructions, which means the code must be self-modifying – first we need to write a loader for the actual combat, second stage of the exploit.

The first stage is the decoding loader.

Having thought through possible methods of coding the second stage, for the duration of the competition I decided that the most suitable in this case might be a simple decomposition of the instructions into pairs of terms – fortunately, we have access to the opcode ADDand we can select valid values ​​for any instruction we need. Let's think about the most obvious problems first.

Review of limitations and initial state

The only jumps allowed are unconditional JMP and conditional JCN; since the instructions have been removed LIT (effectively pushing the subsequent literal onto the stack) and POPas well as copying DUP And OVRstack manipulation appears to be limited to the use of SWPswapping the first and second elements of the stack (and SWP2 for two-byte words), ROTwhich rotates the first three elements cyclically, and -k variants of other possible opcodes that allow the instruction bytes not to be removed from the stack after execution. Since LDZ And LDRloading from memory remains possible only with the help of LDAby absolute address; in addition, due to the absence of LIT it won't be so easy for us to put arbitrary values ​​on the stack.

Having generously supplied the instruction handler in src/uxn.c debug prints to be aware of the state of the stack and the executable code, we find out that the sent code gets into memory at the address 0x1d0 and starts to be executed from there (for testing we will send it to the input 01INC):

$ uxncli auxin2.rom 01
...
pc=1d0, i=37 T=2 N=c2 L=1 X=4 Y=2 Z=2
DEO2 1d29aad0, 2, c2, 1
pc=1d1, i=1 T=4 N=2 L=2 X=0 Y=0 Z=0
INC
pc=1d2, i=0 T=5 N=2 L=2 X=0 Y=0 Z=0
BRK

(i here shows the numerical value of the opcode, “registers” T/N/L/X/Y/Z actually represent the first six bytes of the stack, and the counter pc due to technical reasons it is lagging behind by a byte)

Location of the second stage

Since from the instructions that read memory, we only have access to the one that works with absolute values LDAwe must determine in advance the address at which the second stage will be located, and be able to easily put it on the stack to work with forked bytes.

Seeing the initial state of the stack above L=2 X=0for convenience, let's imagine that we can fit the bootloader into 0x2000x1d0that is, 48 ​​bytes – such a task looks feasible, and if we fail, we can adjust this offset later. With “registers” in such values, we can easily get the address we need for the future jump 0x200 – for example, opcode EQU (0x08) can compare 0x4 And 0x2removing them from the stack and putting the result of the comparison on it 0x0having formed the ones we need for LDA T=0 N=2:

$ uxncli auxin2.rom 08
...
pc=1d1, i=8 T=4 N=2 L=2 X=0 Y=0 Z=0
EQU
pc=1d2, i=0 T=0 N=2 L=0 X=0 Y=0 Z=0
BRK

Preparing the required variables

It is clear that to decode the second stage we will need a cycle for sequential data processing one way or another; the only conditional jump instruction available to us for this is JCNand to use it we need either an absolute address in two-byte mode JCN2or a later calculated relative value in one byte. Let's assume the second – we can place the desired byte, whatever it may be (remembering, however, that it must also pass the lower nibbles check), at the beginning of the second stage, load it and save it in a convenient (i.e. easily formed on the stack without using LIT) to a memory address for future use – it's clear that with our limitations, keeping it available on the stack without this will be very difficult.

For this we could simply use 0x0but, remembering that we will need one more variable, and a two-byte one at that – the incremented address of the result of adding two bytes of the second stage – we will take 0x2and we will reserve 0x0 for the address of the decoded byte.

How to get a guaranteed and fairly concise stack 0x0 And 0x2? For example, by placing two obviously unequal bytes on it using INCk INCkand then comparing them – EQU will give us 0x0A NEQ0x1which, of course, can be turned into 0x2 with the help of one INC.

Well, let's download it from 0x200 jump variable (while the default zero will be there) and save it at the address 0x2using the most obvious instruction for this – STZdesigned specifically for writing to the first 256 bytes of memory:

EQU
LDAk (k оставит адрес 0x200 на стеке)
INCk
INCk
NEQ
INC
STZ

Let's check the result:

$ uxncli auxin2.rom 08948181090111
...
pc=1d1, i=8 T=4 N=2 L=2 X=2 Y=2 Z=2
EQU
pc=1d2, i=94 T=0 N=2 L=2 X=2 Y=2 Z=2
LDA
LDA 0 = ram[200]
pc=1d3, i=81 T=0 N=0 L=2 X=2 Y=2 Z=2
INC
pc=1d4, i=81 T=1 N=0 L=0 X=2 Y=2 Z=2
INC
pc=1d5, i=9 T=2 N=1 L=0 X=0 Y=2 Z=2
NEQ
pc=1d6, i=1 T=1 N=0 L=0 X=2 Y=2 Z=2
INC
pc=1d7, i=11 T=2 N=0 L=0 X=2 Y=2 Z=2
STZ
STZ ram[2] = 0
pc=1d8, i=0 T=0 N=2 L=2 X=2 Y=2 Z=2
BRK

Everything looks correct, and the address of the second stage is on top of the stack, allowing us to immediately load it into 0x0 – the same, because we can easily write decoded bytes directly over the already read bytes of the second stage – with the help of

INCk
INCk
EQU
STZ2k (2 - сохраняем два байта, и k - оставляем их на стеке)

Moreover, since, unlike LDAk, STZk will leave the save address on top of the stack – 0x0we need to “unfold” it in such a way that it is replaced by the decoder address counter lying behind it – for this, the double will work ROT:

...
STZ2
STZ2 ram[0,1] = 2,200
pc=1dc, i=5 T=0 N=0 L=2 X=2 Y=2 Z=2
ROT
pc=1dd, i=5 T=2 N=0 L=0 X=2 Y=2 Z=2
ROT
pc=1de, i=0 T=0 N=2 L=0 X=2 Y=2 Z=2
BRK

Decoder cycle

The next steps are pretty obvious – we need to shift the decoder counter, load a byte onto the stack, and repeat this operation so that the loaded bytes end up on top of the stack, ready to be added using ADD. Looking at the state of the stack in debugging, we get the following code:

INC2
LDAk
ROT
ROT
INC2
LDAk
ROT
ROT
SWP2
ADD

It looks like (for now on empty values ​​of the second stage) this actually works:

...
LDA
LDA 0 = ram[201]
pc=1e2, i=5 T=0 N=1 L=2 X=0 Y=2 Z=2
ROT
pc=1e3, i=5 T=2 N=0 L=1 X=0 Y=2 Z=2
ROT
pc=1e4, i=21 T=1 N=2 L=0 X=0 Y=2 Z=2
INC2
pc=1e5, i=94 T=2 N=2 L=0 X=0 Y=2 Z=2
LDA
LDA 0 = ram[202]
pc=1e6, i=5 T=0 N=2 L=2 X=0 Y=0 Z=2
ROT
pc=1e7, i=5 T=2 N=0 L=2 X=0 Y=0 Z=2
ROT
pc=1e8, i=24 T=2 N=2 L=0 X=0 Y=0 Z=2
SWP2
pc=1e9, i=18 T=0 N=0 L=2 X=2 Y=0 Z=2
ADD
pc=1ea, i=0 T=0 N=2 L=2 X=0 Y=2 Z=2
BRK

It's time to load our decoded byte dump address variable from 0x0put the result at this address, increment it and return it to its place. LDZ is not available to us, so we will have to form the full address on the stack for LDA2that is 0x0 0x0. For this you can use here LTH in addition to the already familiar sequence:

INCk INCk EQU LTH LDA2 - загрузка адреса 
STAk - выгрузка декодированного байта
INC2 - инкрементируем переменную
INCk INCk EQU STZ2 - выгрузка адреса

Let's return the jump variable from the stack 0x2in the same way:

INCk INCk EQU LTH INC INC LDA

And we use it for JCN – after our manipulations, the decoded byte ends up exactly in the place of the condition, therefore, if the bytes of the second stage are added up in 0x0 (For example, 0xff01), the cycle will be interrupted and execution will go beyond the first stage.

JCN

Final amendments

Having looked at the result, however, we note that after the jump, we need to jump to the beginning of the cycle INC2 LDAk ROT ROT – we need to return the decoder address counter to its place using SWP2. Because this needs to be done only after the jump, we can place in front INC2 two swaps – SWP2 SWP2which will keep the stack as is in the first iteration, but will allow us to do what we intended by jumping to the second SWP2.

So, let's assemble the final version of the first stage in Python:

code =  "0894" # EQU (prepare 0200), LDAk - load jump back value
code += "8181090111" # INCk INCk NEQ INC STZ, store jump back value to 0x02                                                                         
code += "818108b1" # INCk INCk EQU STZ2k, store start address to 0x00
code += "0505" # ROT ROT

code += "24" # SWP2
code += "24" # SWP2 - not doing anything first go, swaps when we jump in between the instructions

code += "21" # INC2 addr
code += "94" # LDAk
code += "0505" # ROT ROT

code += "21" # INC2 addr
code += "94" # LDAk

code += "0505" # ROT ROT
code += "24" # SWP2
code += "18" # ADD

code += "8181088b34" # INCk INCk EQU LTH LDA2, load start address from 0x00

code += "95" # STAk, store the decoded byte
code += "21" # INC2 addr
code += "81810831" # INCk INCk EQU STZ2, store start address to 0x00

code += "8181088b010114" # INCk INCk EQU LTH INC INC LDA, load jump back value from 0x02

code += "0d" # JCN - if new value is 00, continue (need to mark the end of the payload with e.g. ff01)

Having checked the size of the code, we find that we guessed right with the size – the bootloader fits into 44 bytes. Fill the rest up to 0x200 we can, for example, padding the place from SWP0x04.

Having calculated and checked the jump offset, we find out that we need a value that suits our conditions. 0xe1 – that's nice too! Besides, the instruction corresponding to this byte doesn't do anything terrible – it's INC2krwithout even touching the regular stack. Let's add padding to the code and 0xe1:

pad_len = 0x200 - 0x1d0 - len(code)//2
full = code + ("04" * pad_len) + "e1"  

Adding to this temporarily 0xff01 and looking at the debug output, we make sure that the loop is interrupted correctly:

$ uxncli auxin2.rom 08948181090111818108b105052424219405052194050524188181088b349521818108318181088b0101140d04040404e1ff01
...
pc=1fc, i=d T=e1 N=0 L=2 X=2 Y=0 Z=2
JCN
pc=1fd, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=1fe, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=1ff, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=200, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=201, i=0 T=2 N=2 L=0 X=2 Y=2 Z=2
BRK

Time to move on to stage two!

The second stage is reading and outputting the flag.

Given the total limit of 112 bytes, the 51 of which have already been used, and the fact that all subsequent instructions will take up twice as much space due to our decoding method, we are left with 30 bytes of possible payload. Let's think about what is required of us: in order to access the file system, we need to store the file name somewhere flag; it is necessary, judging by her descriptiontransfer to the correct addresses of the device pages via DEO the address of this name, the desired buffer length and the output address; you need to output to the loop console bytes of the read flag.

File name

The most logical thing would be to place the line flag at the end; we'll know its absolute address when we're done writing the code, so for now we'll load a dummy onto the stack using LIT2:

LIT2 ff ff
...
66 6c 61 67 - flag

Working with a file

Let's upload the received address to the address file name0xa8:

LIT a8
DEO2k - оставим аргументы на стеке

By increasing the call address to 0xaalet's load the length – the same string address will do for us as the length flagsince the meaning 0x2xx will give us more than enough bytes:

INC INC - 0xaa и адрес строки на стеке
DEO2k

Finally, we force the system to write the read flag over the same line (0xac):

INC INC - 0xac и адрес строки на стеке
DEO2k

Let's return the address to the top of the stack – now we don't need it for this ROT ROT:

POP

Output to console

All we have to do is set up a simple loop, calculating (or selecting) the relative jump offset. We can leave it running, since the service will still give us the console output after a half-second timeout:

LDAk - загрузим байт флага на стек
LIT 18 - адрес вывода байта на консоль
DEO - вывод
INC2 - следующий байт
LIT f8 - отрицательный оффсет
JMP

Finalization

Having calculated the address of the string flagwe'll put it in its place and finish building the exploit, breaking the bytes into suitable pairs – they're easy to pick up manually:

payload = ""

payload += "9f01" # a0 LIT2
payload += "0101" # 02 
payload += "1401" # 15

payload += "7f01" # 80 LIT
payload += "9f09" # a8 
payload += "af08" # b7 DEO2k # a8 file name

payload += "fd04" # 01 INC
payload += "fd04" # 01 INC
payload += "af08" # b7 DEO2k # aa file length

payload += "fd04" # 01 INC
payload += "fd04" # 01 INC
payload += "af08" # b7 DEO2k # ac file read

payload += "0101" # 02 POP

payload += "8f05" # 94 LDAk
payload += "7f01" # 80 LIT
payload += "1404" # 18 
payload += "0f08" # 17 DEO
payload += "1d04" # 21 INC2 - next byte
payload += "7f01" # 80 LIT
payload += "ef09" # f8
payload += "010b" # 0c JMP

payload += "6105" # f
payload += "610b" # l
payload += "5d04" # a
payload += "5d0a" # g

full += payload + "ff01"

print(full)

Let's send the generated line to the server:

$ echo $(python3 sol.py) | nc auxin2.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
input: timeout!
b'CTF{Sorry__n0_Music_thi5_t1m3}\n\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

All that remains is to hand over the received flag and enjoy the victory.

Conclusion

A successful mix with a clear goal and a real (albeit rather toy) machine at its core, forcing you to tinker with the choice of payload packing strategy and requiring some codegolfing in an unfamiliar assembler; however, I didn't feel particularly constrained – both stages fit into the limit without any special reductions relative to the original idea (in the comments to the service it is indicated that the author's solution took up 91 bytes – ours takes up 101, although in places it can be reduced without changing the idea due to some tricks and additional stack massage).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *