Stack Device for Intel386
Introduction
Stack (from English. Stack) is a specially designated place in memory for storing temporary data. It is subject to the following rules
LIFO (Last In First Out), which implies that the element that was pushed last on the stack will be the first one removed from there.
The stack grows towards the beginning of the address space (they also say that the stack grows down)
The minimum unit that can be removed/put from/on the stack(s) is 16 bits (2 bytes)
The maximum unit that can be removed/pushed from/on the stack(s) is 32 bits (4 bytes)
Stack device

As you can see in Figure 1, when you put something on the stack, it grows downward. stack pointer (ESP – Stack Pointer) points to the last element of the stack that was pushed there, this part of the stack is also called the top of the stack (from the English. TOS – Top Of Stack)
When something is pushed onto the stack, the processor decrements the value in the ESP register and writes the pushed value to the top of the stack. In the case when something needs to be removed from the stack, the processor first copies the value to be removed from the top of the stack, and then increments the value in the ESP register.
In order for the processor to understand what it needs to store on the stack, the instruction is used push
in assembly code, in case of deletion – pop
push instruction
Its syntax may vary from one assembly language to another, but its essence remains the same – pushing a value onto the stack
PUSH r/m16
PUSH r/m32
PUSH r16
PUSH r32
PUSH imm8
PUSH imm16
PUSH imm32
Prefix
r/m
(from English. register/memory) means that the value that needs to be placed on the stack is in memory, which in turn is in the register, for example, the register contains the value0x87654321
– memory address where the value is stored0x11223344
respectively, the value will be placed on the stack0x11223344
Prefix
r
(from English. register) means that the value to be pushed onto the stack is in a registerPrefix
imm
(from English immediate), i.e. the value that is directly passed to the instruction as a parameterPostfixes 8, 16, 32 mean how many bits the value transmitted to the instruction contains, which in turn is usually called operand
At the moment, the question may arise from the fact that, as written above, the minimum unit that can be placed on the stack is 16 bits, but in the instruction syntax push
there is imm8
, indicating that the operand of the instruction can be an 8-bit value. In fact, 8-bit values are padded to 16-bit ones, because the stack is 16-bit aligned, but this also matters for signed types that use 2’s complement.
pop instruction
The syntax for the pop statement is the same as for the pop statement. push
except that the value is popped off the stack and placed on the operand of the instruction
POP r/m16
POP r/m32
POP r16
POP r32
And also, for obvious reasons, the instruction operand pop
can’t be immediate value, because we can’t save anything there
Stack control registers
One of the registers has already been mentioned, which allows you to manage the stack – ESP (stack pointer) is perhaps the most important register that stores the pointer to the top of the stack, however there are several other registers associated with the stack
SS (Stack Segment) – a register pointing to a specific memory segment in which the current frame is located, only one frame can be manipulated at a particular time, this register is used by the processor for all operations related to the stack
EBP (Base Pointer) – register pointing to the current frame, i.e. to the top of the stack for a particular procedure/function, usually used to address local variables and procedure/function arguments
Function Calling Convention in System V Intel386
AT System V Intel386 (Further System V) there are several rules for calling functions and passing arguments to them accordingly, these rules apply only to global functions, local functions may not follow these rules (however this is considered not the best choice)
The arguments of the called function are pushed onto the stack in the reverse order of the calling function, i.e. calling function caller) must push the last argument onto the stack first, then the penultimate one, and so on. to the first, then the called function (from the English. callee) can pop arguments off the stack in their usual order
The registers EBP, EBX, EDI, ESI and ESP can be changed by the called function, respectively, if the calling function stores any value in one of these registers, it must first place the values of these registers on the stack, and then restore them, the exception is the EBP register , which does not change when the function is called and continues to point to the previous frame (to the frame of the calling function), so it is pushed on the stack by the called function at the beginning of its execution and restored at the end, the same applies to the ESP register
When making a function call, the instruction is used
call
in assembler that stores on the stack the address of the next [call] instructions, commonly referred to as return addressIf the function returns any value, it must place it in the EAX register, otherwise it must not store anything in any register (restore all registers on completion)
Thus, after calling the function (after executing the instruction call
), the stack will look like this

As shown in Figure 2, the first thing on the current frame is the address of the next call
instructions, followed by the stored value of the EBP register, pointing to the previous frame, and then followed by local variables related to a specific function
Everything above return address refers to the previous frame, including the arguments passed to the called function, hence the first argument will be in EBP+8second in EBP+12etc., except when the argument is a 16-bit value
An example of working with the stack on GNU Assembler x86
As an example of working with the stack, we will consider the output of command line arguments, which will be indexed using a local variable, as well as the output of environment variables.
The example will use GNU Assembler (GAS)which uses AT&T syntax, for assembly uses GCC
Let’s start with what the stack for the function will look like main
:
Stack
envp <-- EBP + 16
argv <-- EBP + 12
argc <-- EBP + 8
return address <-- EBP + 4
saved EBP <-- EBP
local argv index <-- EBP - 4
passed as arguments argc, argv and envp and they are, respectively, in EBP+8, EBP+12 and EBP+16, return address – as already mentioned – the address following the instruction call
, local argv indexx – a local variable for a beautiful output (well, almost) of an array argv
To start in the section .rodata (Read-Only Data) create 3 variables that will be formatted strings for output argc, argv and envp
.section .rodata
argc_str:
.string "argc = %d\n\n"
argv_str:
.string "argv[%d] = %s\n"
envp_str:
.string "%s\n"
Then we declare the main function in the .text section, where, in fact, the program code will be, and mark it as global
.section .text
.globl main
main:
At the very beginning of the function, we put the EBP register on the stack and make it a local pointer to the current frame
pushl %ebp
pushl %esp, %ebp
We place argc and a pointer to an array argv in EDI and ESI respectively, and also allocate 4 bytes on the stack for a local variable and initialize it with a value of zero
/* move argc into edi */
movl 8(%ebp), %edi
/* move argv base address into esi */
movl 12(%ebp), %esi
/* allocate 4 bytes on the stack */
subl $4, %esp
/* initialize local variable by 0 */
movl $0, -4(%ebp)
Now we can infer argcfor this you need to put all the arguments of the function in reverse order (in this case, use printf
), and clear the stack after the call, because in System V the stack is cleaned up by the caller
The function arguments will be the formatted string and the value argcwhich is stored in the register EDI
pushl %edi
pushl $argc_str
call printf
addl $4, %esp
popl %edi
It is worth noting that we put on the stack not the value that is stored in argc_str
and its address, because printf
expects a pointer to a formatted string
The next step is to output the array argvwhich will be carried out in a cycle, but before that it is necessary to understand that argv contains the memory address at which the array of characters (the string we need) lies, so the first argument will be the address located in argvthe second argument will be the address in argv+4, etc., here we add 4 bytes, because address is a 32-bit (4-byte) number
_pr_args:
cmpl -4(%ebp), %edi /* if index == argc: */
je _pr_args_out /* goto pr_args_out */
pushl (%esi) /* element in argv */
pushl -4(%ebp) /* index */
pushl $argv_str /* format string */
call printf
addl $4, %esp
popl -4(%ebp)
popl (%esi)
incl -4(%ebp) /* increment index */
/* point to the next arg */
addl $4, %esi
jmp _pr_args
_pr_args_out:
First of all, the loop checks the value of the index and argcif they are equal (I remind you that the array indexing in this case starts from zero, so the last element in the array argv will be argv + (argc - 1)
), then we just exit the loop, otherwise we call the function printf (register ESI contains the address argv), clear the stack, increase the index by one, move the pointer to the next element in the array argv and return to the beginning of the loop
After that, to separate the array argv from array envpmake a line break (line break character – \n
which in decimal representation has the value 10, and in hexadecimal, respectively, 0xA), for this we use the function putchar
pushl $0xA
call putchar
addl $4, %esp
Next, we will use the same case ESI to store pointers to environment variables in an array envpas in the case of argv
movl 16(%ebp), %esi
Now, because we don’t have an index for the array envpand also we do not know in advance how many elements there will be, we must remember that the array envp – this is a null-terminated array, therefore, to find out that there are no more elements, it is enough to check if the element is equal to zero (it will follow the last one)
_pr_envp:
cmpl $0, (%esi)
je _out
pushl (%esi) /* environment variable */
pushl $envp_str /* format string */
call printf
addl $4, %esp
popl (%esi)
/* point to the next element in envp */
addl $4, %esi
jmp _pr_envp
_out:
The environment variable output loop is not much different from what was used to print an array of command line arguments, except for the loop exit check and the absence of an index
Well, at the very end, you need to set the return value of the function to zero, and also clear the stack
/* set up return value */
movl $0, %eax
popl %ebp
movl %ebp, %esp
ret
Stack clearing is done by restoring the register EBP and ESP to their original values, therefore everything that was in this function can be overwritten and used by other functions / procedures, the instruction ret
sets in EIP (Instruction Pointer) meaning return addressso control returns to the caller
It is important to mention that there is a more convenient instruction for clearing the stack – leave
it does exactly these two things – restoring registers EBP and ESPrespectively, the last part of the code can be rewritten as follows
/* set up return value */
movl $0, %eax
leave
ret
However, this instruction has a less attractive companion – the instruction enter
it has two operands, the first is responsible for the number of bytes that need to be allocated on the stack, and the second for the nesting level, which is why the implementation of such an instruction is rather complicated and is not limited to these three instructions
pushl %ebp
movl %esp, %ebp
subl $N, %esp
Therefore, it is many times slower, which is why most compilers try to avoid it, however, for demonstration, those three lines can be replaced with one
enter $4, $0
Now it’s time to test the program, for this we use gcc
to compile and link assembly code
$ gcc -o args args.S
Or, in case the host is x64
$ gcc -m32 -o args args.S
And finally, you can run the program
$ ./args
Conclusion
Working with the stack is an important part for any programming language, especially a low-level one, but it is useful to know how it works for high-level languages, however, for them it can differ significantly due to the concepts of the language itself