Stack Device for Intel386

Introduction

Stack (from English. Stack) is a specially designated place in memory for storing temporary data. It is subject to the following rules

  • LIFO (Last In First Out), which implies that the element that was pushed last on the stack will be the first one removed from there.

  • The stack grows towards the beginning of the address space (they also say that the stack grows down)

  • The minimum unit that can be removed/put from/on the stack(s) is 16 bits (2 bytes)

  • The maximum unit that can be removed/pushed from/on the stack(s) is 32 bits (4 bytes)

Stack device

Figure 1. Stack device on Intel386
Figure 1. Stack device on Intel386

As you can see in Figure 1, when you put something on the stack, it grows downward. stack pointer (ESP – Stack Pointer) points to the last element of the stack that was pushed there, this part of the stack is also called the top of the stack (from the English. TOS – Top Of Stack)

When something is pushed onto the stack, the processor decrements the value in the ESP register and writes the pushed value to the top of the stack. In the case when something needs to be removed from the stack, the processor first copies the value to be removed from the top of the stack, and then increments the value in the ESP register.

In order for the processor to understand what it needs to store on the stack, the instruction is used push in assembly code, in case of deletion – pop

push instruction

Its syntax may vary from one assembly language to another, but its essence remains the same – pushing a value onto the stack

PUSH r/m16
PUSH r/m32
PUSH r16
PUSH r32
PUSH imm8
PUSH imm16
PUSH imm32
  • Prefix r/m (from English. register/memory) means that the value that needs to be placed on the stack is in memory, which in turn is in the register, for example, the register contains the value 0x87654321 – memory address where the value is stored 0x11223344respectively, the value will be placed on the stack 0x11223344

  • Prefix r (from English. register) means that the value to be pushed onto the stack is in a register

  • Prefix imm (from English immediate), i.e. the value that is directly passed to the instruction as a parameter

  • Postfixes 8, 16, 32 mean how many bits the value transmitted to the instruction contains, which in turn is usually called operand

At the moment, the question may arise from the fact that, as written above, the minimum unit that can be placed on the stack is 16 bits, but in the instruction syntax push there is imm8, indicating that the operand of the instruction can be an 8-bit value. In fact, 8-bit values ​​are padded to 16-bit ones, because the stack is 16-bit aligned, but this also matters for signed types that use 2’s complement.

pop instruction

The syntax for the pop statement is the same as for the pop statement. pushexcept that the value is popped off the stack and placed on the operand of the instruction

POP r/m16
POP r/m32
POP r16
POP r32

And also, for obvious reasons, the instruction operand pop can’t be immediate value, because we can’t save anything there

Stack control registers

One of the registers has already been mentioned, which allows you to manage the stack – ESP (stack pointer) is perhaps the most important register that stores the pointer to the top of the stack, however there are several other registers associated with the stack

  • SS (Stack Segment) – a register pointing to a specific memory segment in which the current frame is located, only one frame can be manipulated at a particular time, this register is used by the processor for all operations related to the stack

  • EBP (Base Pointer) – register pointing to the current frame, i.e. to the top of the stack for a particular procedure/function, usually used to address local variables and procedure/function arguments

Function Calling Convention in System V Intel386

AT System V Intel386 (Further System V) there are several rules for calling functions and passing arguments to them accordingly, these rules apply only to global functions, local functions may not follow these rules (however this is considered not the best choice)

  • The arguments of the called function are pushed onto the stack in the reverse order of the calling function, i.e. calling function caller) must push the last argument onto the stack first, then the penultimate one, and so on. to the first, then the called function (from the English. callee) can pop arguments off the stack in their usual order

  • The registers EBP, EBX, EDI, ESI and ESP can be changed by the called function, respectively, if the calling function stores any value in one of these registers, it must first place the values ​​of these registers on the stack, and then restore them, the exception is the EBP register , which does not change when the function is called and continues to point to the previous frame (to the frame of the calling function), so it is pushed on the stack by the called function at the beginning of its execution and restored at the end, the same applies to the ESP register

  • When making a function call, the instruction is used call in assembler that stores on the stack the address of the next [call] instructions, commonly referred to as return address

  • If the function returns any value, it must place it in the EAX register, otherwise it must not store anything in any register (restore all registers on completion)

Thus, after calling the function (after executing the instruction call), the stack will look like this

Figure 2. Stack after function call
Figure 2. Stack after function call

As shown in Figure 2, the first thing on the current frame is the address of the next call instructions, followed by the stored value of the EBP register, pointing to the previous frame, and then followed by local variables related to a specific function

Everything above return address refers to the previous frame, including the arguments passed to the called function, hence the first argument will be in EBP+8second in EBP+12etc., except when the argument is a 16-bit value

An example of working with the stack on GNU Assembler x86

As an example of working with the stack, we will consider the output of command line arguments, which will be indexed using a local variable, as well as the output of environment variables.

The example will use GNU Assembler (GAS)which uses AT&T syntax, for assembly uses GCC

Let’s start with what the stack for the function will look like main:

						Stack
envp                  <-- EBP + 16
argv                  <-- EBP + 12
argc                  <-- EBP + 8
return address        <-- EBP + 4
saved EBP             <-- EBP
local argv index      <-- EBP - 4

passed as arguments argc, argv and envp and they are, respectively, in EBP+8, EBP+12 and EBP+16, return address – as already mentioned – the address following the instruction call, local argv indexx – a local variable for a beautiful output (well, almost) of an array argv

To start in the section .rodata (Read-Only Data) create 3 variables that will be formatted strings for output argc, argv and envp


.section .rodata
argc_str:
	.string "argc = %d\n\n"
argv_str:
	.string "argv[%d] = %s\n"
envp_str:
	.string "%s\n"

Then we declare the main function in the .text section, where, in fact, the program code will be, and mark it as global


.section .text
	.globl main
  
main:

At the very beginning of the function, we put the EBP register on the stack and make it a local pointer to the current frame

  pushl %ebp
  pushl %esp, %ebp

We place argc and a pointer to an array argv in EDI and ESI respectively, and also allocate 4 bytes on the stack for a local variable and initialize it with a value of zero

	/* move argc into edi */
	movl 8(%ebp), %edi
	/* move argv base address into esi */
	movl 12(%ebp), %esi

	/* allocate 4 bytes on the stack */
	subl $4, %esp
  /* initialize local variable by 0 */
	movl $0, -4(%ebp)

Now we can infer argcfor this you need to put all the arguments of the function in reverse order (in this case, use printf), and clear the stack after the call, because in System V the stack is cleaned up by the caller

The function arguments will be the formatted string and the value argcwhich is stored in the register EDI

  pushl %edi
  pushl $argc_str
  call printf
  addl $4, %esp
  popl %edi

It is worth noting that we put on the stack not the value that is stored in argc_strand its address, because printf expects a pointer to a formatted string

The next step is to output the array argvwhich will be carried out in a cycle, but before that it is necessary to understand that argv contains the memory address at which the array of characters (the string we need) lies, so the first argument will be the address located in argvthe second argument will be the address in argv+4, etc., here we add 4 bytes, because address is a 32-bit (4-byte) number

_pr_args:
  cmpl -4(%ebp), %edi /* if index == argc: */
  je _pr_args_out /*          goto pr_args_out */
  
  pushl (%esi) /* element in argv */
  pushl -4(%ebp) /* index */
  pushl $argv_str /* format string */
  call printf
  addl $4, %esp
  popl -4(%ebp)
  popl (%esi)
  
  incl -4(%ebp) /* increment index */
  /* point to the next arg */
  addl $4, %esi
  
  jmp _pr_args
  
_pr_args_out:

First of all, the loop checks the value of the index and argcif they are equal (I remind you that the array indexing in this case starts from zero, so the last element in the array argv will be argv + (argc - 1)), then we just exit the loop, otherwise we call the function printf (register ESI contains the address argv), clear the stack, increase the index by one, move the pointer to the next element in the array argv and return to the beginning of the loop

After that, to separate the array argv from array envpmake a line break (line break character – \nwhich in decimal representation has the value 10, and in hexadecimal, respectively, 0xA), for this we use the function putchar

  pushl $0xA
  call putchar
  addl $4, %esp

Next, we will use the same case ESI to store pointers to environment variables in an array envpas in the case of argv

movl 16(%ebp), %esi

Now, because we don’t have an index for the array envpand also we do not know in advance how many elements there will be, we must remember that the array envp – this is a null-terminated array, therefore, to find out that there are no more elements, it is enough to check if the element is equal to zero (it will follow the last one)

_pr_envp:
  cmpl $0, (%esi)
  je _out
  
  pushl (%esi) /* environment variable */
  pushl $envp_str /* format string */
  call printf
  addl $4, %esp
  popl (%esi)
  
  /* point to the next element in envp */
  addl $4, %esi
  jmp _pr_envp
  
_out:

The environment variable output loop is not much different from what was used to print an array of command line arguments, except for the loop exit check and the absence of an index

Well, at the very end, you need to set the return value of the function to zero, and also clear the stack

	/* set up return value */
  movl $0, %eax
  
  popl %ebp
  movl %ebp, %esp
  
  ret

Stack clearing is done by restoring the register EBP and ESP to their original values, therefore everything that was in this function can be overwritten and used by other functions / procedures, the instruction ret sets in EIP (Instruction Pointer) meaning return addressso control returns to the caller

It is important to mention that there is a more convenient instruction for clearing the stack – leaveit does exactly these two things – restoring registers EBP and ESPrespectively, the last part of the code can be rewritten as follows

/* set up return value */
  movl $0, %eax
  
  leave
  ret

However, this instruction has a less attractive companion – the instruction enterit has two operands, the first is responsible for the number of bytes that need to be allocated on the stack, and the second for the nesting level, which is why the implementation of such an instruction is rather complicated and is not limited to these three instructions

  pushl %ebp
  movl %esp, %ebp
  subl $N, %esp

Therefore, it is many times slower, which is why most compilers try to avoid it, however, for demonstration, those three lines can be replaced with one

  enter $4, $0

Now it’s time to test the program, for this we use gccto compile and link assembly code

$ gcc -o args args.S

Or, in case the host is x64

$ gcc -m32 -o args args.S

And finally, you can run the program

$ ./args

Conclusion

Working with the stack is an important part for any programming language, especially a low-level one, but it is useful to know how it works for high-level languages, however, for them it can differ significantly due to the concepts of the language itself

  1. System V ABI Intel386

  2. GNU Assembler Documentation

  3. Stack as a data structure

  4. Comparison table of performance of assembler instructions

  5. Description of the operations of the ENTER instruction

  6. Description of operations of the LEAVE instruction

  7. Description of the PUSH instruction

  8. Description of the POP instruction

  9. Program source code on GitHub

Similar Posts

Leave a Reply