Detailed analysis of 64b intro: radar

Long live scented soap demoscene! Hello to you too, dear reader 😉

I became acquainted with the demoscene about 25 years ago (or a little more). But then this was expressed only in viewing 128-256-byte intros (and demos, of course) with amazement a la: “Why was that possible?” I think many people begin their acquaintance with this cyberculture in a similar way :). If these words don’t mean much to you, read about the demoscene a meager Wiki articleand/or listen to the podcastand also look at what people manage to do by laying down just, for example, in 256 bytes of code (most works have a link to a YouTube video on the right).

I started writing full-fledged intros in my favorite x86 assembler only 5 years ago, in 2018. That’s when I sent to the famous festival Chaos Constructions (which, by the way, the organizers promise to revive in 2024) two sales (from the word “production”): 256b intro StarLine (took 1st place) and 64b intro radar (took 6th place in the same compo). After this demoscene I got sucked in has become a part of my life that I enthusiastically immerse myself in from time to time.

radar, 64b intro (twitchy GIF version, see the normal version on YouTube at the link above)

radar, 64b intro (twitchy GIF version, see the normal version on YouTube at the link above)

Don’t be confused by the fact that I started my analysis with the work that took sixth place, not first. This choice is due to the fact that its code is simpler and shorter. At the same time, there are quite a few tricks that a novice (and maybe not only) stager might be interested in getting acquainted with. So don’t expect easy reading, you won’t be able to relax 🙂

Let’s go!

Let’s not pull the cat’s… whiskers, and look at the code (download fasm 1).

; ------------------------------------------------------
; Radar 64-byte intro [main variant] (c) 2018-2019 Jin X
; ------------------------------------------------------

video_shift     =       10h     ; we use 9FFFh segment for video output instead of 0A000h
radar_radius    =       64      ; should be multiple of circles_step for the best effect
radbg_color     =       11h     ; [radar background color]
arrow_color     =       21h     ; should be more than radbg_color
circles_color   =       2       ; should be less than radbg_color
circles_step    =       10h     ; should be power of 2 for the best effect

use16
org     100h

        ; Assume: ax=bx=0 (if no cmd line params), cx=0FFh, dx=cs=ds=es=ss, si=100h, di=sp=0FFFEh (as a rule), bp=91Xh, flags=7202h or 0202h (all base flags including cf=0; if=1)

        ; Init
        mov     al,13h
        int     10h
        fild    qword [si]
        lds     ax,[bx]         ; ds=9FFFh (as a rule), ax=020CDh
        fptan                   ; st0=1=angle, st1=delta (we need about 0.02 - this order of first instructions allows to get near value)

        ; Main cycle            ; di=sp=0FFFEh, ds=9FFFh
.repeat:
        ; Fadeout
@@:     add     [di],dh
        sbb     [di],dh         ; for source [di]=0 result=0, cf=0
        inc     di
        jnz     @B              ; di=0, cf=0 (cos [0FFFFh]=0)

        ; Radar
        mov     cl,radar_radius
.next:  fld     st
        fsincos                 ; st0=cos(angle), st1=sin(angle), st2=angle, st3=delta
@@:     mov     [di],cx
        fimul   word [di]       ; cx*cos then cx*sin
        fistp   word [di]
        mov     ax,[di]
        xchg    bx,ax
        out     61h,al          ; sound
        cmc
        jc      @B              ; second pass (cos then sin)
        ; st0=angle, st1=delta
        imul    si,ax,320
        test    cl,circles_step-1 ; cf=0
        mov     dx,arrow_color + (not radbg_color)*256
        jnz     @F
        mov     dl,circles_color
@@:     mov     byte [bx+si+(160+100*320)+video_shift],dl
        loop    .next
        ; dh=not radbg_color

        fadd    st,st1          ; increase angle a tiny bit

        jmp     .repeat         ; dh=not radbg_color

;       in      al,60h
;       dec     ax
;       jnz     .repeat                 
;       ret

Here is a slightly modified code (two instructions field + fmulp replaced with a similar one fimulas a result of which the binary was reduced to 61 bytes instead of the original 63).

I’m not an avid here scribe clerk; If anyone knows how to implement code coloring, write in the comments, please, and your personal karma will definitely double (probably).

Let me start with the fact that we are creating a program in COM format under DOS, so at startup almost always (except for cases of startup in exotic DOS types/versions), the registers have the following values:

  • AX=BX=0 (if there are no parameters on the command line with an incorrect drive name);

  • CX = 0FFh;

  • DX=CS=DS=ES=SS;

  • SI = 100h;

  • DI = SP = 0FFFEh (in some exotic cases SP, but not DI, may be equal to 0FFFCh, but I personally have not seen this);

  • BP=9??h (usually even 91?h, where “?” depends on the DOS type/version);

  • basic arithmetic flags are cleared (ZF, CF, SF, OF, PF, A.F.), and DF (as after cld); flag IF installed (as after sti), i.e. interrupts are enabled.

You can read about it Here. Sizecoders (people who strictly optimize code for the size of the executable file) use these values ​​very often.

The first thing we do is set the graphics video mode to 13h (320×200, 256 colors), the most popular in the demo scene under DOS, since each pixel is encoded in one byte, and the entire screen (having a size of 64,000 bytes) fits into a 64 KB segment. And here we use our sacred knowledge about what is at the start AH = 0so we just write down the video mode number in AL and call int 10h. Whoosh – graphic mode is installed!

Video memory is mapped per segment 0A000h, so to access the pixels we need to write this value in one of the segment registers. Well… or approximately this meaning. I mean, approximately? Instructions lds ax,[bx] loads the register pair DS:AX with the value dword at address BX. What do we have at this address (at DS:0)? PSPwhose first word contains the value 20CDh (opcode int 20h) – it’s coming forest in AX (we don’t need it), and the second is a segment outside our program area. Since the COM program is allocated all available memory, the value here will be 9FFFh (and in this 16-byte block there is information that the memory has run out… remember: “640 KB is enough for everyone”?) In some DOS there is no such block, and 0A000h is written there. Other some DOS reserve a few kilobytes, and there might be, say, 9F80h. Ugh! Yes, this is a bit of a dangerous trick, since in the last 2 cases the picture will move. But this is a fairly popular trick among 32 or 64 byte intros. I’ll tell you a secret (only to anyone!), intro A little designed for DOSBox, and everything will be fine there :). Now, having a segment value of 9FFFh in DS instead of 0A000h, we will just need to add the value 10h to the address (offset) (video_shift), and everything will work as if DS = 0A000h. Generally speaking, since I managed to reduce the size of the intro by a couple of bytes (see note after the code), and up to the 64 byte limit there is the whole world as much as 3 bytes, we could replace lds for a couple push 0A000h + pop ds. But don’t rush to rejoice. I will tell you about the reasons for possible sadness in a paragraph (and yes, about fild qword [si] + fptan I haven’t forgotten either).

I don’t remember what motivated me to place the “Fadeout” block in front of the main “Radar” drawing block, because now, looking at this disgrace, I don’t see any particular reason for it. Usually in such cases they say that it happened this way for historical reasons. Well, if so, then I’ll talk about this a little later, since to explain even such a small piece of code you need to understand what (what picture) we receive as the input of the loop.

I twist and turn, I want to confuse

We write the radar radius in pixels in CX: radar_radius = 64 (or rather, in CL, we know that at the start CH = 0). Duplicate ST0 using fld st. Stop! What do we have in ST0? Let’s go back to the beginning of the code and look at the instructions fild qword [si]which loads ST0 with the qword value from memory at address SI = 100h. At DS(C.S.):100h our code begins (this is bubble gum point of entry). Namely the following block:

        mov     al,13h
        int     10h
        fild    qword [si]
        lds     ax,[bx]

These 4 instructions occupy exactly 8 bytes (2 each), which are loaded into ST0. WTF? Yes, bro, we use our code as data (quite a normal theme for sizecoding, get used to it). This is the qword: 2CDF10CD13B0h. Method poking debugging we find out that we loaded the number ≈ 5.599E+17. Beautiful! And then watch your hands: after lds we have it going fptan (you don’t see it, but it is there). Which takes one number as input, and the output is “now there are two”: 1.0 (in ST0) and ≈ 0.0294 (in ST1). It’s a magic! Teach materiel. The first number (1.0) is the initial/current angle of the radar needle (any value will suit us, we are not proud), and the second (0.0294) is the angle increment in radians after each frame rendering (1.684 degrees, also normal). So, if instead lds we had something else, it would not have worked, since the qword would have been different, and the value of the angle increment would have been unclear. However, British scientists have discovered that if you really want to write those very push 0A000h + pop dsThat making a feint with your ears By rearranging some instructions, the result will be approximately the same:

        fild    qword [si]
        mov     al,13h
        push    0A000h
        fptan
        int     10h
        pop     ds

This construction takes 2 bytes more and gives us the values 1.0 and ≈ 0.0337which completely satisfy us (and if you can’t see the difference, why pay more? Well, except that in some versions of DOS the picture won’t work, but then you need to set video_shift = 0). In general, now I would do just that, since the result of 63 bytes, as you understand, does not exceed 64 bytes, but I needed to introduce you to the trick with lds.

How did I guess that this would all work like that and give the desired values? The answer is simple: I really wanted this to happen :))). And then by experimentation: permutations of instructions, enumeration of the type (fld, fild) and the size of the loaded value (dword, qword, tbyte) and believing in success, I found the right combination (not just fild stands in such a strange place). Well… magic, of course! If you think I’m joking, then I have to disappoint you.

Turbo Debugger, pay attention to the PSP area (CD 20 FF 9F)

Turbo Debugger, pay attention to the PSP area (CD 20 FF 9F)

Okay, let’s get back to our herd code. After duplicating ST0, we have the following series of numbers in the FPU stack (from ST0 to ST2): 1.0 (angle), 1.0 (angle), 0.0294 (delta). Next instructions fsincos calculates the sine and cosine from ST0 (in the first frame – from 1.0), writing the result to ST1 and ST0, respectively: 0.54 (cos), 0.841 (sin), 1.0, 0.294 (see picture above). At address DS:DI we write the value CX (radius). Let me remind you that DS = 9FFFh, DI = 0 (after the “Fadeout” cycle, see below), which means we write to an invisible area near the video memory – this will be our temporary variable (let’s call it temp). Instructions fimul word [di] multiply ST0 = 0.54 (cosine of the angle initially equal to 1.0) by an integer temp, i.e. to CX, writing the result back to ST0. This way we get the X coordinate (for the first iteration of the CX loop of the first frame: round(64 * 0.54) = 35). We write it back to temp (fistp word [di]), deleting from ST0, and from there loading into AX (yes, exchanging values ​​with the FPU is only possible through memory, this is sad). Swap places AX And BX (why – it will be clear later). Write to port 61h (out 61h,al) output the sound (light crackling… do you hear? If you’re hearing it right now, your eggs might be burning.). Yes this strange This one 2-byte instruction generates an interesting sound. Sometimes placing it in different random places gives a quite interesting audio effect. Use it without registration or SMS.

Let’s move on. Instructions cmc changes the flag CF to the opposite. The last instruction that affected this flag was the instruction sbb (still in the same “Fadeout”), which reset CF to 0 (just believe me), there will be more test, which will do the same. So, rest assured: on the first iteration of the inner loop (from the first label @@ before jc @B) CF=0and then cmc CF=1so in the first run jc @B will jump to the previous mark @@. This is a fairly common trick: organize 2 or 3 iterations like this, changing some flag (sometimes CF, but more often even PF using a one-byte instruction inc or dec): no need to initialize CX and do loopespecially when CX is already in use (exactly our case).

Let’s see what we have left in the FPU stack: 0.841 (sin), 1.0, 0.294. We write in again temp meaning CX, then multiply it by 0.841 (already the sine of the angle). We get the Y coordinate (for the first iteration of the CX loop of the first frame: round(0.841 * 64) = 54). Write to temp (removing from ST0), then to AX. We exchange again AX And BX (Are you on time?) As a result, our FPU stack contains 1.0 (angle), 0.0294 (delta). Registers: AX = X, BX = Y. Next execution cmc will return CF=0 And jc @B won’t work.

The hardest part is over. You can breathe out, drink a cup of tea or coffee. In the meantime, while you are heating the kettle, I will entertain you with new pictures.

It’s time to grab the canvas, palette and brush

Default color palette for 13h video mode

Default color palette for 13h video mode

I already have the palette. Hands too. The canvas will be video memory. So, it’s a small matter. Instructions imul si,ax,320 multiplies AX (i.e. X) on 320 (number of dots per line, i.e. screen width), writing the result to S.I.. Further using test cl,15 we check if the current CX radius value is a multiple of 16 (in this case ZF = 0, as well as the lower 4 bits). By using mov dx,arrow_color + (not radbg_color)*256 put in DX meaning 0EE21h. Here the low byte is the color of the radar arrow (we look at the palette – blue-violet, we could specify 20h – blue, then the arrow would be thinner). Don’t worry about the high byte for now. Instructions jnz @F moves to the next label @@if radius Not is a multiple of 16 (remember that mov does not affect flags). Otherwise, change the DL color to circles_color = 2 (green). It is important here that the value arrow_color (arrow color) was more than radbg_color = 11h (gray radar background color), and circles_color – less (you will soon understand why). And finally, with simple instructions mov [bx+si+(160+100*320)+video_shift],dl (Where 160+100*320 — screen center coordinate + offset video_shiftsince we have DS = 9FFFh, not 0A000h) we display a piece of our radar (one pixel) on the screen :). As Ivan Vasilyevich said from the famous film: “And that’s all business!” An attentive reader will notice the catch: “Wait a minute! We have AX = X, BX = Y, and we multiply X by 320 and add Y, why is it the other way around?” Well, we mixed up X and Y, what does that change? Except for the direction of rotation :). If you want everything to be right, move it xchg bx,ax above, after the first mark @@. The instruction completes the rendering loop .nextrepeating the outer loop with CX (radius) from 64 to 1. AND fadd st,st1adding the value ST1 to the angle at ST0 (0.0294, a little more than a degree, let me remind you). Another attentive reader (are there two of you already?) another stuffy remark may arise: “As the radius increases, the rotation should be counterclockwise, but you say that the exchange of X and Y changes direction, but the radar needle still rotates counterclockwise. You’re clearly not saying something!” Everything is correct, because our ordinate axis (which is Y) is directed from top to bottom, and not from bottom to top, as in the classics. More questions? “Uh, wait a minute, what about Fadeout…” – “Take him away!” Good!

So, the radar drawing algorithm:

  1. FPU stack: angle, delta.

  2. radius = 64.

  3. We calculate the sine and cosine, preserving the angle: cos(angle), sin(angle), angle, delta.

  4. Multiply: X = cos(a) * radius; Y = sin(a) * radius.

  5. If radius and 15 == 0 (can be read as radius % 16 == 0), color = circles_colorotherwise color = arrow_color.

  6. Draw a point in color color by coordinates X, Y.

  7. if (--radius > 0) goto 2.

It’s simple, isn’t it? :))

Bring in the curtain!

“Take him back!” I promised to tell you about “Fadeout”. The boy said it, the boy did it.

Strange design at first glance add [di],dh + sbb [di],dh produces the effect of “fading” the arrow to the background color, leaving a white trail. It works like this: if the color is greater than not DH (not 0EEh = 11h = radbg_color), it is decreased by 1, otherwise it does not change. If you look at the color palette, you can see that from 21h to 11h colors change from blue sharply to white and then smoothly to dark gray (arrow – trail – background).

I explain slowly, at 0.5x. At first add adds meaning to color 0EEhA sbbsuddenly, this value is subtracted, subtracting also the value of the flag CF. We are interested in the initial values 11h and more (for example, 12h).

  • Situation #1: 11h + 0EEh = 0FFhthere was no transfer, CF=0Further: 0FFh – 0EEh – CF(0) = 11h. I didn’t cheat, the meaning hasn’t changed.

  • Situation #2: 12h + 0EEh = 0a transfer has occurred, CF=1Further: 0 – 0EEh – CF(1) = 11h. The value has decreased by 1.

Now you understand why the color arrow_color = 21h must be greater than radbg_color = 11h (hint: otherwise the arrow will not go out)A circles_color = 2 – less (hint: otherwise the rings will go out too)?

Magic again! Want more tricks? I have them!

This whole farce (starting from the second frame) is repeated 65536 times, byte-by-byte shoveling the entire segment addressed by the register D.S.since register D.I. increases to exhaustion until its value becomes equal 0 (jnz @B).

By the way, if, when playing the intro, you get the feeling that someone let flies in here, know: flickering dots in random places on the screen are the result of this cycle, when the value has already increased and entered the display without having time to decrease. You ask: “Is this a bug or a feature?” Of course, this is a specially designed effect, but how could it be otherwise?! :))

Exit without a gypsy

It is worth paying special attention to the commented ending:

;       in      al,60h
;       dec     ax
;       jnz     .repeat                 
;       ret

This is a standard design for exiting an intro (unfortunately, with it the size of the intro will be = 65 bytes, and this is a failure). Reading the value from the keyboard port 60h. The resulting value stores the scan code of the last key pressed. If this value = 1then we hold down the key Esc. Meaning AH = 0 (our radius is significantly less than 256, so there are no options here, sorry), so dec ax when you click on Esc will set the flag ZF = 0 (as well as the meaning AX). Well, then, I think everything is clear. Instructions ret exits, since when the program starts there is always a 0 (this is even documented, unlike the values ​​of most registers), and what about the address CS:0remember? That’s right, PSP, the first word of which contains the int 20h instruction – program termination. To be honest, in this case the idea is only good when running from DOSBox, because if we write to a segment 9FFFhwe overwrite system data with information about the memory structure – this can lead to undefined behavior (as amateurs and professionals of the C language say).

Above I wrote that the intro A little designed for DOSBox. So, the second point of sharpening is that there is no delay in the program, and when launched on real hardware, the radar will take off, taking the monitor with it (do you need this?) There is another trick on this score (no, do not tie the monitor with ropes). You can make a delay using a single byte instruction hlt (placing it in front jmp .repeat), which will wait for any hardware interrupt. Usually “any hardware interrupt” is a timer interrupt (int 8), which occurs every 55 milliseconds. Unless you or your cat decide to dance on the keyboard (in which case, until the keyboard breaks, there will also be int 9). However, for this intro this delay is too long, but in general, replacing lds you know what and adding hlt we will get an honest 64 bytes.

P.S. For reference: value from port 60h the most significant bit stores the key hold flag: if the bit resetthen the key is pressed and held right now, and if installed – means released (i.e. if you pressed and released the key Enter — scan code 1Ch– you will constantly read the value 9Ch). In this case, the scan code is changed not only by significant keys, but also by modifiers Ctrl, Alt, Shift, Win and even Caps Lock, etc.

Some kind of hare (or rabbit) from an old kinescope

Some kind of hare (or rabbit) from an old kinescope

Thank you

…that you read to the end! I hope that my humor did not bore you, and that the material turned out to be interesting and, no less important, useful.

You can continue the conversation about sizecoding in the thematic lamp Telegram chat (you can also get a ticket to the international Discord server about sizecoding and many other places).

Be healthy, live richly! 😉

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *