Assemblers, 5 pieces – a quick introduction for those who are not familiar
This article is for those who are not familiar with assemblers, but want to take a “sneak peek”. We won't make you an assembly language development guru in 15 minutes – but we will show assemblers for several popular microcontroller architectures (ARM32, AVR, MSP430, 8051) – and for our desktop computers (x86 under Linux and DOS) – to see their differences and similarities – and don't be afraid to dive deeper if any of it might be helpful to you.
Our goal is not to encourage everyone to write in assembly language (assemblers!) – it is not that difficult, but for most tasks it is not very practical. The goal is to introduce you! So that it’s no longer scary to occasionally look into the guts of some debugging – or do some kind of optimization with assembly insertion – or maybe you’re going to write a compiler or something like that.
A bonus – for the curious – is an assembler for the Intel-4004 – a 4-bit processor that is already more than 50 years old. There will also be a small “interactive” for it.
General notes
Assembly language is a language that allows you to write processor commands. Only not in the form of hexadecimal codes (it could have been that way!) but in the form of human-readable “mnemonics”.
There are different processors (of different architectures and types) – and they have different commands. Therefore, assemblers differ at least in these same command mnemonics. In addition, different assembler authors have come up with slightly different formats for recording these same commands.
Does this mean that assembly language programs are completely “non-portable”, unlike C? Optional. Most assemblers have tools (defines, macros, etc.) that allow you to write some parts in a unified form. However, this is rarely necessary.
About processors and system
Getting to know any assembler usually begins with looking at the processor architecture – what tools and capabilities it has. The first thing we talk about is registers. Cells of the processor's own tiny memory. There are usually few of them – somewhere from 1 to 32. Not counting special registers – for example PC (program counter) – a counter containing the address of the instruction being executed (and increasing as it is executed.
Some processor commands are various manipulations of registers. For example, arithmetic operations. But besides them, others are also needed – for example, for transitions through the program (in the case of conditions, cycles) – although if you think about it, a “transition” is simply a forced entry of the required value into the PC register – so that further execution proceeds not in order but from the required places.
Commands for communicating with memory (or better said, with “address space”) are a very important thing. We simply say “at address 0x1020BEDA write the number 0x1F” – and the processor goes and writes – but what is at this address? This “address space” is physically represented by address and data bus conductors – and you can connect both the system’s RAM itself and some additional devices to it.
If we write a number at a given address and the memory is located there, well, later we can read it from there. And if at this address (to the selected conductors) the input-output port of the “legs” of the microcontroller – or an audio synthesizer – is connected? It is quite possible that the voltages on the “legs” will change and some sound will be heard from the speaker. This is exactly how the processor communicates with the system!
This leads to another complicating factor – even for identical processors (cores), the system of connected devices (peripherals) can be different. This can add nuance to portability – and material to learn during development.
However, let's get down to business, that is, assemblers! First, let's look at several microcontroller architectures – some of them (like AtTiny15) have such tiny resources (memory for 512 instructions and a full 0 RAM) that assembler is very convenient there.
ARM32 – the world of microcontrollers
We will start with the “middle weight category” – lovers of electronics and homemade products know that 32-bit ARM architecture controllers provide good competition to basic Arduinos. What is a “microcontroller” anyway? Let's put it this way – a type of processor with useful built-in devices, so that it is convenient to use in various electronic devices. Built-in interfaces, timers, some RAM, etc. Here is one of the first projects I made on the LPC1110 (in my opinion) from their family – a robot controlled by the sounds of a flute:
Well, ARM32 now probably represents the largest share of all processors in the world – because… they go into all sorts of devices, from “smart toilet brushes” to phones and tablets. Their simplest models cost about a dollar apiece – sometimes cheaper than more primitive 8-bit controllers – so it's a nice thing to embed them anywhere.
ARM32, as the name suggests, works primarily with 32-bit data. It has 32-bit registers – the first 12-16 of which are “general purpose”. Using a 32-bit number, you can address as much as 4 gigabytes of memory – it is clear that most systems do not have that much (the simplest controllers have several kilobytes of real RAM) – so all the “free” space is suitable for addressing peripheral devices. Let's look at an example of a program blinking an LED (on a controller (LPC1114 from NXP)
.syntax unified
.equ GPIO0DATA, 0x50003FFC
.equ GPIO0DIR, 0x50008000
.text
Reset_Handler:
ldr r6, =GPIO0DIR
ldr r0, =0x0C
str r0, [r6]
ldr r0, =0x04
ldr r6, =GPIO0DATA
ldr r1, =0x0C
blink:
str r0, [r6]
eors r0, r1
ldr r2, =0x300000
loop:
subs r2, 1
bne loop
b blink
Let's sort this nonsense out! 🙂 First, let's note that lines with a dot at the beginning are not commands for the processor, but directives for the assembler itself. For example .equ
– it's kind of crazy #define
– a way to call a constant by some name. Directive .text
says that what comes next is the actual section with the code. A .syntax ...
at the very top allows you to select one of several popular command writing syntaxes. We will see the difference later.
Where are the teams? Here are the first three: ldr r6, =GPIO0DIR
– the “ldr” command is short for “load register” – that is, load such and such a constant into register R6. In this syntax, constants are preceded by an equal sign for some reason. So, as a result of this command, r6 will be written 0x50008000
(this is the address of the device associated with the “legs” of the controller; specifically, the legs are included at the input or output – each bit is responsible for a separate leg – all this information, of course, no longer relates to the assembler, but to the LPC1114 device and is found from the 400-page instructions for him).
Next ldr r0, =0xC0
we can decipher it easily – a certain number is written into the register R0, in which two bits are set to 1 (bits 14 and 15 counting from zero). Third team str r0, [r6]
– short for “store register” – we write the number from R0 to the address located in R6. In this syntax, this format with brackets explains that we are not writing to R6 itself, but specifically to the address space cell to which R6 points. Obviously, it was done “by analogy” with C.
As a result of these manipulations, a number was written to the GPIODIR address that “turns on” some of the controller’s legs “to the output”. Now, if you write ones and zeros in the corresponding bits at a different address (the one in GPIODATA), voltages will appear on these legs, either close to the plus or minus of the power supply (3.3V and 0V, in other words).
The next three instructions prepare values in registers R0, R1, R6 – let's skip them and see what happens in the main loop. Our main loop starts with a label blink:
– marks are also not processor commands, but rather a way to mark the desired place in the program. In fact, now the name “blink” corresponds to the address of the command that follows it. We can use this value in a jump instruction at the end to “loop” execution.
At the beginning of the cycle, we again see a write from R0 to the cell at the address from R6 – only now it is the address GPIODATA = 0x50003FFC – the voltage on the selected “legs” is controlled here. And the number in R0 will change at each iteration of the main loop thanks to the following command eors r0, r1
– this unusual name was used to denote the familiar XOR (exclusive or) from school – register R0 is “xored” with R1 (and the result is written back to R0). Since the value in R1 does not change, in R0 those bits that are set in R1 will switch from 0 to 1 and back every time this “xor” is executed. And since we send them higher up to the cell that controls the voltage on the “legs,” then the voltages on these legs will switch.
At the very end of the program we see b blink
– in this assembler “b” means “branch” – jump to a given address. This causes the program to loop.
It remains to consider the piece that creates the delay – a small internal loop. Before the mark loop:
we enter a fairly large number into R2 – and then reduce it with the instruction subs
(subtraction) by one. Team bne
means “branch if not equal” – branch if not equal. Doesn't matter what? And why? 🙂 Processors usually have a special register of bit “flags” that are set or cleared depending on the result of the operation. Many arithmetic operations put a Z (zero) flag if the result of the command is 0. Here is the operation bne
and checks whether the null flag is set or not. It is called an “equality” test because it is often used with the comparison command rather than the subtraction command. Comparison is identical to subtraction, only the result is not written anywhere (but only flags are set) – it turns out that if two numbers were equal (for example, if there was one in r2 before subtraction), then after execution the flag is set – and the bne command will not perform the transition but will skip execution further.
Thus, this small cycle creates a delay of something on the order of a second (if the controller operates at a frequency of several megahertz). Enough to see the blinking of the LED connected to the leg with the naked eye.
Perhaps that's enough about ARM32 – let's just mention that the GNU cross-compiler was used here – and you can see the full text and additional details in github – in the build script you can see that the package was used arm-linux-gnueabi
. These controllers can be flashed via UART (like STM32 – they have a ready-made bootloader inside).
AVR and Avrasm
This is the architecture of 8-bit processors from Atmel (long ago purchased by competitors) – known for being widely used in Arduino. There are 32 registers, only they are small, 8-bit. And the area of the address space associated with peripheral devices is strictly allocated (addresses are somewhere from 32 to 95) – therefore, special commands are used to communicate with peripherals IN
And OUT
(read from there or write there). Although you can use regular commands to communicate with memory. Even the registers themselves are mapped into the address space, into the lower 16 addresses.
A popular compiler is AvrAsm
– or its third-party version AvrA
– it has a slightly different syntax. You can find as many examples of “LED blinking” as you like, so let’s not distract attention now with a detailed analysis – just compare some fragment with what we saw earlier:
#define DDRB 0x17
#define PORTD 0x12
setup:
ldi r16, (0xF << 0)
out DDRB, r16
ldi r16, (1 << 2) | (1 << 3)
out PORTD, r16
rjmp again
We see what was said – a different syntax. #define
instead of .equ
(although most likely both are supported). The command to load a number into a register is called ldi
(load immediate – loading an immediate value) – and to write to a peripheral cell with the DDRB address, use the OUT command. The jump command here is called RJMP (relative jump), although conditional jumps in ARM are usually also named with a letter B
from the word branch.
An obvious “inconvenience” in comparison with ARM – for example, when implementing “blinking” an LED – you will find that it will not be possible to make a delay loop for a million operations using a register alone (after all, registers cannot accommodate values more than 255) – so you need to either do nested loops , or in one cycle work with a number contained in several registers (there are operations with transfer and loan from the next register). Another important difference from ARM is that the program memory is separate from the general address space. This makes life a little more difficult when using constants from flash and when developing compilers (like Arduino's).
Let us note the recording of operators in the form of expressions (1 << 2) | (1 << 3)
– this, of course, does not mean that the processor is ready to consider such complex structures written in the “C” style. These are only constants that are calculated at the compilation stage – of course, it will not be possible to use register values here instead of numbers. It was possible to write 0xC – but maybe. this is a little less clear – but it’s clear that the 2nd and 3rd bits are set.
As an example of a project on AtTiny2313 using this assembler, you can look at this simple radio. It has already been said about compilation using AvrAsm/Avra – and for the firmware you need a typical Atmel programmer – you can use Arduino with ISP firmware from standard examples (although I once used 5 wires on the LPT port).
MSP430 – 16-bit controllers
I used these controllers from Texas Instruments mainly because they have a built-in bootloader so you don’t need to buy a programmer separately, but you can flash the compiled code directly via UART – the same applies to LPC and STM32 from the ARM32 family. In terms of price, they are no more profitable than ARMs and have more modest capabilities – but they have versions in DIP and SOIC packages. Of course, many of us know how to design and etch boards for tiny ARM chips, but for some simple crafts, sometimes you want a simpler case.
And of course, the 16-bit processor stands a little apart between AVR and ARM – and the assembler that I found for it is quite reminiscent of assemblers for x86 architecture, which will be discussed further. Let's look at the code fragment for the next “flashing light”:
.org 0xf800
start:
mov.w #WDTPW|WDTHOLD, &WDTCTL
mov.b #2, &P1DIR
repeat:
mov.w #2, r8
mov.b r8, &P1OUT
mov.w #60000, r9
wait1:
dec r9
jnz wait1
Directive .org
specifies the address from which further commands are placed – for these processors, strangely enough, the program memory does not start at 0. Constants are preceded by the symbol #
and if you need to write to the cell to which the register points, then an ampersand is used &
. All these WDTPW and other constants are declared in a separate file, but otherwise the principle is the same.
All kinds of commands for writing to registers and memory are simply called mov
with a suffix indicating the size of the operand (word or byte). Conditional jump command jnz
– analog bne
– this time stands for “jump if not zero”.
This is assembly syntax naken_asm
– at that time the first one that turned up successfully for these controllers. As you can see, the meaning of mnemonics for different processors is usually similar, but each compiler author tends to come up with his own names. There is also a characteristic feature that the target operator (where the result is written) is the second and not the first.
Examples of projects on MSP430 can be found in my github (search for the prefkis “msp430”) – in particular here is the blinker itself.
Family 8051
These controllers are older than many of us 🙂 Intel released them, it seems, in the early 80s – modern versions on this architecture are much more advanced (probably the top controllers are produced by SiLabs) – nevertheless, the core remains the same. A little strange, a little archaic. The controllers are 8-bit, the address spaces for the RAM (if any) and peripherals are separated. They have many features that it’s probably better not to go into for now, but let’s look at a small fragment to note some similarities.
CHANNEL EQU 5Fh ; 0 is FM, 1 - MW/LW, 2 and further - SW
COMMAND_AREA EQU 60h
RESPONSE_AREA EQU 70h
ORG 0
START:
MOV SCON, #01000000b ; UART mode 1 (8bit, variable baud, receive disabled)
MOV PCON, #80h
MOV TMOD, #00100000b ; T1 mode 2 (autoreload)
MOV TH1, #0FFh ; T1 autoreload value, output frq = 24mhz/24/(256-TH1)/16 (X2=0, SCON1=1)
MOV TCON, #01000000b ; T1 on
MOV IE, #90h ; enable interrupts, enable uart interrupt
MOV SBUF, #55h
SJMP MAIN
This is the code again for the radio receiver, on a different chip (project Here) – and the processor is from the same Atmel company that made AVR. These At89s do not have a UART bootloader, so I had to write a bootloader on Arduino (also located somewhere in Github). Otherwise they are not bad, even somewhat funny. An interesting feature – the legs do not have a “direction” register – you can read it when a unit is applied to the leg (similar to the AVR PULL_UP mode).
So – we see the familiar directives for defining constants (albeit without dots) and program address offsets (in this case ORG 0) – also typical labels with a colon.
We have also already seen constants with a sharp sign. But here’s an unusual feature – MOV commands allow you to send these same constants directly to the cells that control the periphery! All these SCON, PCON, TMOD are predefined cell addresses of peripheral devices. Naturally, this shortens the program – in AVR and MSP430, as we saw, the constant must first be written to a register, and the register must be sent to the address space. We can easily decipher the SJMP command as “short jump” – many processors have different types of transitions – a short jump requires fewer bits to write the code of the command itself.
One of the popular compilers that I used asem-51
.
Finally x86 – then (TASM, DOS)
Assembly language for our ordinary computers, mainly x86 architecture – this was the second language with which I began to experiment after Turbo Pascal – since the book on Pascal mentioned somewhere magical “assembler inserts” – the code of which then seemed absolutely incomprehensible.
Possibility to write a program in the format .COM
under DOS, consisting of literally several dozen bytes – it looked very interesting. Now, if you want to play around with the assembler of those times, use DOSBOX and TASM (it was included in the Borland Pascal / C++ set, in my opinion), for example.
The x86 processor also has eight registers, they are initially 16-bit (as a development of the 8-bit predecessor 8008) with the names AX, BX, CX, DX, BP, SP, etc. (that is, not the usual Rnn controllers). Moreover, each of the first four was divided into two 8-bit ones, for example AH and AL – the higher and lower half. I will give the entire program that prints the line:
assume cs:code,ds:code
code segment public
org 100h
main proc near
lea dx, message
mov ah, 9
int 21h
int 20h
main endp
message db 'Hi, Peoplez!', 13, 10, '$'
code ends
end main
Here again there are compiler directives without dots, which is confusing at first. In particular, “code segment public” refers to the organization of memory in the form of 64kb segments and “assume …” at the top tells the compiler that the segment registers will be loaded by the same address of the segment in which both the program and data live. Segment registers were used to indicate in which memory segment (of which almost a megabyte was available in “real mode”) the program and data were currently loaded. A sort of two-stage addressing.
For compilation into a com file, the program start offset was always 256 bytes, which was specified by the ORG directive. Finally, the program goes on – there is a “procedure” label – you could have used the usual one with a colon.
The LEA (load effective address) command loads the memory address at which the string we need is located into the register. You see it further, with the label “message” – it consists of text, bytes 13 and 10 (carriage return and line feed) – and the dollar symbol (more on that later).
The command MOV AH, 9 – no need to explain to you here, writes 9 to the AH register. Then something interesting happens – what is this INT 21h?
We missed a lot in the description of processors and architectures – one of the important features is “interrupts”. At the very beginning of the memory, the addresses of various procedures are written that can be executed upon the occurrence (usually sudden) of one or another event – a signal has arrived at some leg of the controller – or a timer has triggered. The user program is interrupted, the interrupt service routine is executed – and everything returns to the user code in the same place where it was interrupted.
In the x86 architecture there is also a special command to trigger an interrupt “manually” – this is what it is (from the word interrupt) – and then the interrupt number is indicated. Interrupt 21h contained many small procedures related to the DOS operating system. The number of the required function was selected when called by a number in the AH register – in particular, the 9th function is printing a string. And the address of the line must be in the DX register. In general, in addition to the processor instructions, the desktop manual included a large reference book on DOS and BIOS functions (these were called via INT 10h).
The line for this function must end with a dollar symbol – that's all.
Next, INT 20h is called – this is also a DOS interrupt, but with only one function – it exits the program back to the OS. We have not seen such a function for microcontrollers – they have nowhere to “exit” (in general, this is not the only way to exit).
directive DB
after the “message” label, this is not a processor command, but it affects the generated code – thanks to it, the following data bytes (DB – data bytes) are written to the executable file from this location. After all, the line must be present in the code in order to print it.
x86 – now (GAS, Linux)
Recently I had to look for a bug in the compiler (more precisely, the library) of the archaic language BCPL (I recently wrote about it) – and it turned out that part of it, of course, was written in assembly language. Naturally, it seemed quite familiar, although already for a 32-bit system. Let's look at the same program in a “modern version”.
.section .data
msg: .ascii "Hi, Peoplez!\n"
len = . - msg
.section .text
.global _start
_start:
movl $4, %eax
movl $1, %ebx
movl $msg, %ecx
movl $len, %edx
int $0x80
movl $1, %eax
movl $0, %ebx
int $0x80
As you can see, the 32-bit registers are called EAX, EBX, and so on. The syntax in this case is the default for the GNU compiler, although as you remember from the ARM example, it can be switched. In this syntax, percent signs are placed before registers, and dollars before constants. The program has two separate sections – data – where our data is (the line for printing) and text – where the code itself is.
The teams themselves look familiar! We notice that instead of DOS functions we now call Linux functions – but this is also done through a “manual” interrupt, albeit with the number 0x80. The function number in EAX – in particular 4 means data output. The number 1 in EBX is the number of the channel where to output (remember stdout / stderr and file “handles” in C? That’s what it’s about – 1 corresponds to stdout). ECX stores the address of the string, and EDX stores their length – and the length above is calculated using a directive that subtracts the address of the “msg” label from the current address (dot).
The second function – now with code 1 in EAX – is the exit from the program. It's easy to guess that its second parameter (0 in EBX) is the return code. Let's make another one INT 0x80
– and voila.
If you have GCC installed, then most likely you can try this code “on the spot.” Save the code to a file test.s
and run the commands:
as test.s
ld a.out -o test
./test
The first of them compiles into an object file (a.out) – and the second links it into a ready-made binary which we launch as ./test
Intel-4004 – as a bonus
This was the first commercially sold microprocessor, but no popular computers were created based on it. It was used in desktop business calculators and soda fountains – that is, it acted more like a microcontroller, although it did not have microcontroller chips for this. Therefore, it is difficult to talk about any standard system or peripherals for it (whatever you solder is what will happen).
The practical meaning of studying assembler is zero for him, but he is curious from the point of view of exercises. Therefore, a small one was made for him (once by me) Python emulator – and also several tasks on my site (along with a small mold to carry out the programs). Here you can find Bresenham's algorithms for graphics and a micro-version of the game “Life”.
Although the Intel-4004 is the ancestor of the 8008, and through it the 8080, 8086, 80386 and then our modern computers, its situation with registers is somewhat different from what was seen before: There are also 16 registers (all 4-bit) – from R0 to R15 – but There is also a dedicated accumulator register Acc – and many operations (especially arithmetic and logical) can only use it. This feature, however, was also present in the 8051 mentioned above.
We will not consider any extensive programs (if you wish, try solving problems using the instructions included with them) – but let’s look at an example of several commands:
ldm 5 ; загрузить 5 в Acc
xch r2 ; обменять значениями R2 и Acc
Due to the use of an accumulator (forced), it turns out that many commands have only 1 argument or none at all. There are also longer and more complex commands:
fim r4 $57 ; загружает 8-битное число в пару регистров R5:R4
add r10 ; суммирует Acc + R10 + Carry (флаг переноса)
It turns out that before performing arithmetic, you always need to clear or install Carry, depending on the desired operation.
Here you can get acquainted with the call stack and subroutines in practice – we deliberately skipped this part when considering other architectures. This is an extremely important and actively used feature – but still it is not absolutely necessary in small (or test) programs – so you don’t have to worry about it at first.
Conclusion
As mentioned above, the purpose of this article is not to teach you how to write assembly language or to make you a fan of it – but more to show “what it consists of” – and what similarities and differences we usually encounter when faced with different architectures and versions of compilers.
Nevertheless, if you try to program something this way, I think you will agree that it is at least interesting – and in a certain way “exercises the brain” 🙂
We note with some drawback that we did not touch on AMD64 and ARM64 architectures – but on the other hand, your eyes are probably already a little dazzling from these mnemonics – and as you might guess, there will be a certain similarity with x86 and ARM32. At the same time, I did not include the once popular assembler for the Z80 (in which so much was written for the ZX Spectrum) – firstly, it is a derivative of the 8080 (one of the ancestors of the x86 architecture) – secondly, it’s probably unlikely for you now will come in handy – unlike the five architectures mentioned.