Detailed analysis of 64b intro: radar
Long live scented soap demoscene! Hello to you too, dear reader đ
I became acquainted with the demoscene about 25 years ago (or a little more). But then this was expressed only in viewing 128-256-byte intros (and demos, of course) with amazement a la: âWhy was that possible?â I think many people begin their acquaintance with this cyberculture in a similar way :). If these words donât mean much to you, read about the demoscene a meager Wiki articleand/or listen to the podcastand also look at what people manage to do by laying down just, for example, in 256 bytes of code (most works have a link to a YouTube video on the right).
I started writing full-fledged intros in my favorite x86 assembler only 5 years ago, in 2018. That’s when I sent to the famous festival Chaos Constructions (which, by the way, the organizers promise to revive in 2024) two sales (from the word âproductionâ): 256b intro StarLine (took 1st place) and 64b intro radar (took 6th place in the same compo). After this demoscene I got sucked in has become a part of my life that I enthusiastically immerse myself in from time to time.
Donât be confused by the fact that I started my analysis with the work that took sixth place, not first. This choice is due to the fact that its code is simpler and shorter. At the same time, there are quite a few tricks that a novice (and maybe not only) stager might be interested in getting acquainted with. So donât expect easy reading, you wonât be able to relax đ
Let’s go!
Let’s not pull the cat’s… whiskers, and look at the code (download fasm 1).
; ------------------------------------------------------
; Radar 64-byte intro [main variant] (c) 2018-2019 Jin X
; ------------------------------------------------------
video_shift = 10h ; we use 9FFFh segment for video output instead of 0A000h
radar_radius = 64 ; should be multiple of circles_step for the best effect
radbg_color = 11h ; [radar background color]
arrow_color = 21h ; should be more than radbg_color
circles_color = 2 ; should be less than radbg_color
circles_step = 10h ; should be power of 2 for the best effect
use16
org 100h
; Assume: ax=bx=0 (if no cmd line params), cx=0FFh, dx=cs=ds=es=ss, si=100h, di=sp=0FFFEh (as a rule), bp=91Xh, flags=7202h or 0202h (all base flags including cf=0; if=1)
; Init
mov al,13h
int 10h
fild qword [si]
lds ax,[bx] ; ds=9FFFh (as a rule), ax=020CDh
fptan ; st0=1=angle, st1=delta (we need about 0.02 - this order of first instructions allows to get near value)
; Main cycle ; di=sp=0FFFEh, ds=9FFFh
.repeat:
; Fadeout
@@: add [di],dh
sbb [di],dh ; for source [di]=0 result=0, cf=0
inc di
jnz @B ; di=0, cf=0 (cos [0FFFFh]=0)
; Radar
mov cl,radar_radius
.next: fld st
fsincos ; st0=cos(angle), st1=sin(angle), st2=angle, st3=delta
@@: mov [di],cx
fimul word [di] ; cx*cos then cx*sin
fistp word [di]
mov ax,[di]
xchg bx,ax
out 61h,al ; sound
cmc
jc @B ; second pass (cos then sin)
; st0=angle, st1=delta
imul si,ax,320
test cl,circles_step-1 ; cf=0
mov dx,arrow_color + (not radbg_color)*256
jnz @F
mov dl,circles_color
@@: mov byte [bx+si+(160+100*320)+video_shift],dl
loop .next
; dh=not radbg_color
fadd st,st1 ; increase angle a tiny bit
jmp .repeat ; dh=not radbg_color
; in al,60h
; dec ax
; jnz .repeat
; ret
Here is a slightly modified code (two instructions field + fmulp replaced with a similar one fimulas a result of which the binary was reduced to 61 bytes instead of the original 63).
I’m not an avid here
scribeclerk; If anyone knows how to implement code coloring, write in the comments, please, and your personal karma will definitely double (probably).
Let me start with the fact that we are creating a program in COM format under DOS, so at startup almost always (except for cases of startup in exotic DOS types/versions), the registers have the following values:
AX=BX=0 (if there are no parameters on the command line with an incorrect drive name);
CX = 0FFh;
DX=CS=DS=ES=SS;
SI = 100h;
DI = SP = 0FFFEh (in some exotic cases SP, but not DI, may be equal to 0FFFCh, but I personally have not seen this);
BP=9??h (usually even 91?h, where â?â depends on the DOS type/version);
basic arithmetic flags are cleared (ZF, CF, SF, OF, PF, A.F.), and DF (as after
cld
); flag IF installed (as aftersti
), i.e. interrupts are enabled.
You can read about it Here. Sizecoders (people who strictly optimize code for the size of the executable file) use these values ââvery often.
The first thing we do is set the graphics video mode to 13h (320×200, 256 colors), the most popular in the demo scene under DOS, since each pixel is encoded in one byte, and the entire screen (having a size of 64,000 bytes) fits into a 64 KB segment. And here we use our sacred knowledge about what is at the start AH = 0so we just write down the video mode number in AL and call int 10h
. Whoosh – graphic mode is installed!
Video memory is mapped per segment 0A000h, so to access the pixels we need to write this value in one of the segment registers. Well… or approximately this meaning. I mean, approximately? Instructions lds ax,[bx]
loads the register pair DS:AX with the value dword at address BX. What do we have at this address (at DS:0)? PSPwhose first word contains the value 20CDh (opcode int 20h
) – it’s coming forest in AX (we don’t need it), and the second is a segment outside our program area. Since the COM program is allocated all available memory, the value here will be 9FFFh (and in this 16-byte block there is information that the memory has run out… remember: â640 KB is enough for everyoneâ?) In some DOS there is no such block, and 0A000h is written there. Other some DOS reserve a few kilobytes, and there might be, say, 9F80h. Ugh! Yes, this is a bit of a dangerous trick, since in the last 2 cases the picture will move. But this is a fairly popular trick among 32 or 64 byte intros. I’ll tell you a secret (only to anyone!), intro A little designed for DOSBox, and everything will be fine there :). Now, having a segment value of 9FFFh in DS instead of 0A000h, we will just need to add the value 10h to the address (offset) (video_shift
), and everything will work as if DS = 0A000h. Generally speaking, since I managed to reduce the size of the intro by a couple of bytes (see note after the code), and up to the 64 byte limit there is the whole world as much as 3 bytes, we could replace lds
for a couple push 0A000h
+ pop ds
. But don’t rush to rejoice. I will tell you about the reasons for possible sadness in a paragraph (and yes, about fild qword [si]
+ fptan
I haven’t forgotten either).
I donât remember what motivated me to place the âFadeoutâ block in front of the main âRadarâ drawing block, because now, looking at this disgrace, I donât see any particular reason for it. Usually in such cases they say that it happened this way for historical reasons. Well, if so, then Iâll talk about this a little later, since to explain even such a small piece of code you need to understand what (what picture) we receive as the input of the loop.
I twist and turn, I want to confuse
We write the radar radius in pixels in CX: radar_radius = 64
(or rather, in CL, we know that at the start CH = 0). Duplicate ST0 using fld st
. Stop! What do we have in ST0? Let’s go back to the beginning of the code and look at the instructions fild qword [si]
which loads ST0 with the qword value from memory at address SI = 100h. At DS(C.S.):100h our code begins (this is bubble gum point of entry). Namely the following block:
mov al,13h
int 10h
fild qword [si]
lds ax,[bx]
These 4 instructions occupy exactly 8 bytes (2 each), which are loaded into ST0. WTF? Yes, bro, we use our code as data (quite a normal theme for sizecoding, get used to it). This is the qword: 2CDF10CD13B0h. Method poking debugging we find out that we loaded the number â 5.599E+17. Beautiful! And then watch your hands: after lds
we have it going fptan
(you don’t see it, but it is there). Which takes one number as input, and the output is ânow there are twoâ: 1.0 (in ST0) and â 0.0294 (in ST1). It’s a magic! Teach materiel. The first number (1.0) is the initial/current angle of the radar needle (any value will suit us, we are not proud), and the second (0.0294) is the angle increment in radians after each frame rendering (1.684 degrees, also normal). So, if instead lds
we had something else, it would not have worked, since the qword would have been different, and the value of the angle increment would have been unclear. However, British scientists have discovered that if you really want to write those very push 0A000h
+ pop ds
That making a feint with your ears By rearranging some instructions, the result will be approximately the same:
fild qword [si]
mov al,13h
push 0A000h
fptan
int 10h
pop ds
This construction takes 2 bytes more and gives us the values 1.0 and â 0.0337which completely satisfy us (and if you canât see the difference, why pay more? Well, except that in some versions of DOS the picture wonât work, but then you need to set video_shift = 0
). In general, now I would do just that, since the result of 63 bytes, as you understand, does not exceed 64 bytes, but I needed to introduce you to the trick with lds
.
How did I guess that this would all work like that and give the desired values? The answer is simple: I really wanted this to happen :))). And then by experimentation: permutations of instructions, enumeration of the type (fld
, fild
) and the size of the loaded value (dword, qword, tbyte) and believing in success, I found the right combination (not just fild
stands in such a strange place). Well… magic, of course! If you think I’m joking, then I have to disappoint you.
Okay, let’s get back to our herd code. After duplicating ST0, we have the following series of numbers in the FPU stack (from ST0 to ST2): 1.0 (angle), 1.0 (angle), 0.0294 (delta). Next instructions fsincos
calculates the sine and cosine from ST0 (in the first frame – from 1.0), writing the result to ST1 and ST0, respectively: 0.54 (cos), 0.841 (sin), 1.0, 0.294 (see picture above). At address DS:DI we write the value CX (radius). Let me remind you that DS = 9FFFh, DI = 0 (after the âFadeoutâ cycle, see below), which means we write to an invisible area near the video memory – this will be our temporary variable (let’s call it temp
). Instructions fimul word [di]
multiply ST0 = 0.54 (cosine of the angle initially equal to 1.0) by an integer temp
, i.e. to CX, writing the result back to ST0. This way we get the X coordinate (for the first iteration of the CX loop of the first frame: round(64 * 0.54) = 35). We write it back to temp
(fistp word [di]
), deleting from ST0, and from there loading into AX (yes, exchanging values ââwith the FPU is only possible through memory, this is sad). Swap places AX And BX (why – it will be clear later). Write to port 61h (out 61h,al
) output the sound (light crackling… do you hear? If you’re hearing it right now, your eggs might be burning.). Yes this strange This one 2-byte instruction generates an interesting sound. Sometimes placing it in different random places gives a quite interesting audio effect. Use it without registration or SMS.
Let’s move on. Instructions cmc
changes the flag CF to the opposite. The last instruction that affected this flag was the instruction sbb
(still in the same âFadeoutâ), which reset CF to 0 (just believe me), there will be more test
, which will do the same. So, rest assured: on the first iteration of the inner loop (from the first label @@
before jc @B
) CF=0and then cmc
CF=1so in the first run jc @B
will jump to the previous mark @@
. This is a fairly common trick: organize 2 or 3 iterations like this, changing some flag (sometimes CF, but more often even PF using a one-byte instruction inc
or dec
): no need to initialize CX and do loop
especially when CX is already in use (exactly our case).
Let’s see what we have left in the FPU stack: 0.841 (sin), 1.0, 0.294. We write in again temp
meaning CX, then multiply it by 0.841 (already the sine of the angle). We get the Y coordinate (for the first iteration of the CX loop of the first frame: round(0.841 * 64) = 54). Write to temp
(removing from ST0), then to AX. We exchange again AX And BX (Are you on time?) As a result, our FPU stack contains 1.0 (angle), 0.0294 (delta). Registers: AX = X, BX = Y. Next execution cmc
will return CF=0 And jc @B
won’t work.
The hardest part is over. You can breathe out, drink a cup of tea or coffee. In the meantime, while you are heating the kettle, I will entertain you with new pictures.
It’s time to grab the canvas, palette and brush
I already have the palette. Hands too. The canvas will be video memory. So, it’s a small matter. Instructions imul si,ax,320
multiplies AX (i.e. X) on 320 (number of dots per line, i.e. screen width), writing the result to S.I.. Further using test cl,15
we check if the current CX radius value is a multiple of 16 (in this case ZF = 0, as well as the lower 4 bits). By using mov dx,arrow_color + (not radbg_color)*256
put in DX meaning 0EE21h. Here the low byte is the color of the radar arrow (we look at the palette – blue-violet, we could specify 20h – blue, then the arrow would be thinner). Don’t worry about the high byte for now. Instructions jnz @F
moves to the next label @@
if radius Not is a multiple of 16 (remember that mov
does not affect flags). Otherwise, change the DL color to circles_color = 2
(green). It is important here that the value arrow_color
(arrow color) was more than radbg_color = 11h
(gray radar background color), and circles_color
– less (you will soon understand why). And finally, with simple instructions mov [bx+si+(160+100*320)+video_shift],dl
(Where 160+100*320 â screen center coordinate + offset video_shift
since we have DS = 9FFFh, not 0A000h) we display a piece of our radar (one pixel) on the screen :). As Ivan Vasilyevich said from the famous film: âAnd thatâs all business!â An attentive reader will notice the catch: âWait a minute! We have AX = X, BX = Y, and we multiply X by 320 and add Y, why is it the other way around?â Well, we mixed up X and Y, what does that change? Except for the direction of rotation :). If you want everything to be right, move it xchg bx,ax
above, after the first mark @@
. The instruction completes the rendering loop .next
repeating the outer loop with CX (radius) from 64 to 1. AND fadd st,st1
adding the value ST1 to the angle at ST0 (0.0294, a little more than a degree, let me remind you). Another attentive reader (are there two of you already?) another stuffy remark may arise: âAs the radius increases, the rotation should be counterclockwise, but you say that the exchange of X and Y changes direction, but the radar needle still rotates counterclockwise. Youâre clearly not saying something!â Everything is correct, because our ordinate axis (which is Y) is directed from top to bottom, and not from bottom to top, as in the classics. More questions? âUh, wait a minute, what about Fadeout…â – âTake him away!â Good!
So, the radar drawing algorithm:
FPU stack:
angle
,delta
.radius = 64
.We calculate the sine and cosine, preserving the angle:
cos(angle)
,sin(angle)
,angle
,delta
.Multiply:
X = cos(a) * radius
;Y = sin(a) * radius
.If
radius and 15 == 0
(can be read asradius % 16 == 0
),color = circles_color
otherwisecolor = arrow_color
.Draw a point in color
color
by coordinatesX
,Y
.if (--radius > 0) goto 2
.
It’s simple, isn’t it? :))
Bring in the curtain!
âTake him back!â I promised to tell you about âFadeoutâ. The boy said it, the boy did it.
Strange design at first glance add [di],dh
+ sbb [di],dh
produces the effect of âfadingâ the arrow to the background color, leaving a white trail. It works like this: if the color is greater than not DH (not 0EEh = 11h = radbg_color
), it is decreased by 1, otherwise it does not change. If you look at the color palette, you can see that from 21h to 11h colors change from blue sharply to white and then smoothly to dark gray (arrow – trail – background).
I explain slowly, at 0.5x. At first add
adds meaning to color 0EEhA sbb
suddenly, this value is subtracted, subtracting also the value of the flag CF. We are interested in the initial values 11h and more (for example, 12h).
Situation #1: 11h + 0EEh = 0FFhthere was no transfer, CF=0Further: 0FFh â 0EEh â CF(0) = 11h. I didn’t cheat, the meaning hasn’t changed.
Situation #2: 12h + 0EEh = 0a transfer has occurred, CF=1Further: 0 â 0EEh â CF(1) = 11h. The value has decreased by 1.
Now you understand why the color arrow_color = 21h
must be greater than radbg_color = 11h
(hint: otherwise the arrow will not go out)A circles_color = 2
– less (hint: otherwise the rings will go out too)?
Magic again! Want more tricks? I have them!
This whole farce (starting from the second frame) is repeated 65536 times, byte-by-byte shoveling the entire segment addressed by the register D.S.since register D.I. increases to exhaustion until its value becomes equal 0 (jnz @B
).
By the way, if, when playing the intro, you get the feeling that someone let flies in here, know: flickering dots in random places on the screen are the result of this cycle, when the value has already increased and entered the display without having time to decrease. You ask: âIs this a bug or a feature?â Of course, this is a specially designed effect, but how could it be otherwise?! :))
Exit without a gypsy
It is worth paying special attention to the commented ending:
; in al,60h
; dec ax
; jnz .repeat
; ret
This is a standard design for exiting an intro (unfortunately, with it the size of the intro will be = 65 bytes, and this is a failure). Reading the value from the keyboard port 60h. The resulting value stores the scan code of the last key pressed. If this value = 1then we hold down the key Esc. Meaning AH = 0 (our radius is significantly less than 256, so there are no options here, sorry), so dec ax
when you click on Esc will set the flag ZF = 0 (as well as the meaning AX). Well, then, I think everything is clear. Instructions ret
exits, since when the program starts there is always a 0 (this is even documented, unlike the values ââof most registers), and what about the address CS:0remember? That’s right, PSP, the first word of which contains the int 20h instruction – program termination. To be honest, in this case the idea is only good when running from DOSBox, because if we write to a segment 9FFFhwe overwrite system data with information about the memory structure – this can lead to undefined behavior (as amateurs and professionals of the C language say).
Above I wrote that the intro A little designed for DOSBox. So, the second point of sharpening is that there is no delay in the program, and when launched on real hardware, the radar will take off, taking the monitor with it (do you need this?) There is another trick on this score (no, do not tie the monitor with ropes). You can make a delay using a single byte instruction hlt
(placing it in front jmp .repeat
), which will wait for any hardware interrupt. Usually “any hardware interrupt” is a timer interrupt (int 8), which occurs every 55 milliseconds. Unless you or your cat decide to dance on the keyboard (in which case, until the keyboard breaks, there will also be int 9). However, for this intro this delay is too long, but in general, replacing lds
you know what and adding hlt
we will get an honest 64 bytes.
P.S. For reference: value from port 60h the most significant bit stores the key hold flag: if the bit resetthen the key is pressed and held right now, and if installed – means released (i.e. if you pressed and released the key Enter â scan code 1Ch– you will constantly read the value 9Ch). In this case, the scan code is changed not only by significant keys, but also by modifiers Ctrl, Alt, Shift, Win and even Caps Lock, etc.
Thank you
…that you read to the end! I hope that my humor did not bore you, and that the material turned out to be interesting and, no less important, useful.
You can continue the conversation about sizecoding in the thematic lamp Telegram chat (you can also get a ticket to the international Discord server about sizecoding and many other places).
Be healthy, live richly! đ