Interrupts and system calls

Note. The authors recommend reading the book together with xv6 source text. The authors prepared and laboratory work on xv6.

Xv6 runs on RISC-V, so building it requires RISC-V versions of the tools: QEMU 5.1+, GDB 8.3+, GCC, and Binutils. The instructions will help you install the tools.

The processor aborts and transfers control to the kernel when:

The program executes a system call.
The instruction caused an error, such as division by zero. This error is called an exception.
The device requires processor attention, for example, the disk has finished reading data that the program required.

The program does not notice that it has been interrupted – the processor will save the program state, handle the interruption and continue executing the program.

Xv6 handles interrupts in kernel mode. The kernel executes system call code, operates on devices, and handles exceptions.

Xv6 contains code for three scenarios:

Interrupts from user mode
Kernel mode interrupts
Timer interrupts

Interrupts on RISC-V

Each RISC-V processor owns a set of control registers that determine how the processor responds to interrupts. RISC-V Documentation talks about them in detail. XV6 uses the following registers:

stvec stores the address of the interrupt handler – the processor will transfer control to this address.
sepc – the processor saves the instruction counter pc programs in the register sepcbefore passing control to the interrupt handler. Return instruction from interrupt handler sret will restore the register value pc from sepc. The kernel will force the handler to transfer control to other code if it changes the register sepc.
scause stores the interrupt number – the reason for the interruption.
sscratch. The interrupt handler stores processor registers in memory. Memory Write Instruction store requires specifying a memory address in a processor register, but the registers are busy. The handler will preserve the register a0 V sscratch and uses a0 in the instructions store.
sstatus determines the state of the processor when processing an interrupt:
- Bit SIE register sstatus determines whether the processor responds to kernel mode interrupts.
- Bit SPP register sstatus determines whether the interrupt occurred in user or kernel mode. Instructions sret will return the processor to the mode that specifies the bit SPP.

The processor works with these registers only in kernel mode – the registers are not accessible from user mode.

The kernel uses a similar set of registers mtvec, mepc, mcause, mscratch, mstatus for processing timer interrupts in machine mode.

Each processor owns its own set of registers and handles interrupts independently of other processors.

The processor acts as follows when responding to an interrupt other than a timer interrupt:

Checks the flag SIE register sstatus, if the interrupt came from the device. Does nothing if the flag is cleared, otherwise goes to the next step
Resets the flag SIE register sstatus
Assigns sepc = pc
Sets a flag SPP register sstatus V 1if the processor is running in kernel mode, otherwise it resets the flag to 0
Writes the reason for the interruption in scause
Switches to kernel mode
Continues working from the address in the register pc

The processor does not switch to the kernel page table, the kernel stack, or save processor registers except pc. The processor leaves these tasks to the interrupt handler. Other operating systems optimize interrupt handling, for example, they do not switch to the kernel page table.

Think about what steps you can skip without compromising security. For example, a program can break the kernel if the processor does not assign a register pc = stvec and will continue to execute program instructions in kernel mode.

User Mode Interrupts

Xv6 handles user mode interrupts differently than kernel mode interrupts. This section will cover user mode interrupts.

The processor terminates operation in user mode if:

The program executes a system call
The program makes an error and throws an exception
The device requires CPU attention

The processor switches to the kernel page table and kernel stack in an assembler routine uservecbefore calling the interrupt handler usertrap in C language. The processor will then execute usertrapret and will return to the user mode stack and process page table in the assembly routine userret.

The processor does not switch to the kernel page table when responding to an interrupt, so the process’s page table contains the page trampoline with code uservec. Procedure uservec switches the processor to the kernel page table, so the kernel page table maps the page trampoline to the same virtual address to register pc pointed to the correct virtual address of the next instruction and uservec continued execution after changing tables. Page trampoline contains a procedure userretwhich returns the processor to the process page table.

Flag PTE_U at the page trampoline reset, so uservec And userret work only in kernel mode.

Procedure uservec stores program state – 32 registers – per page trapframeswitches to the kernel page table and kernel stack and transfers control usertrap. The process stores the address of the kernel stack and kernel page table on the page trapframe.

Procedure usertrap determines the cause of the interrupt, handles the interrupt and transfers control usertrapret. Procedure usertrap assigns a procedure as an interrupt handler kernelvecthen saves sepc V trapframesince a timer interrupt will force the execution thread to be switched by calling yieldand another thread will change sepcwhen returning from kernel mode to user mode. Procedure usertrap causes syscallif the interrupt is a system call, devintr, if the interrupt is from a device, otherwise terminates the process in response to the exception. Procedure usertrap adds 4 to saved pcwhen processing a system call so that the process continues with the next one ecall instructions.

Procedure usertrapret writes to stvec user mode interrupt handler address uservecwrites in trapframe addresses of the kernel page table, kernel stack, which are needed uservecand restores the register sepc from trapframe. Then usertrapret transfers control userret along with the address of the process’s page table.

Procedure userret switches to the process page table and user mode stack, restores processor registers from trapframe and executes the instruction sretto return to user mode and continue program execution.

Code: system calls

Chapter 2 covered how xv6 makes its first system call exec. This section will explain how the kernel executes code exec.

Program initcode.S puts the system call number in a register a7 processor, and the call arguments are in registers a0 And a1. System call number – array element index syscalls – an array of function pointers. Instructions ecall interrupts the processor – forces you to switch to kernel mode and execute the interrupt handler uservecthen functions usertrap And syscall.

Function syscall gets the system call number from the one stored in trapframe register a7. Constant SYS_exec defines the system call number execand the array element syscalls[SYS_exec] indicates a function sys_exec.

Function syscall writes to p->trapframe->a0 the value that returns sys_execto call exec returned this value in the program. The RISC-V C calling convention says that functions write their return value to a register a0. System calls return 0, when successful, or a negative number to report an error. Function syscall will print an error message and return -1if the program passed an incorrect system call number.

Code: System Call Arguments

System call code uses functions argint, argaddr And argfdto get to the arguments stored in trapframe. The program passes arguments through processor registers, and the interrupt handler uservec stores registers in trapframe. Functions argint, argaddr And argfd cause argrawto extract nth argument from trapframe and return it as a number, address and file descriptor respectively.

The kernel is not able to access the process memory address, since it works with the kernel page table, so it implements copying functions from process memory to kernel memory.

Function fetchstr copies a string from process memory to kernel memory. Function fetchstr causes copyinstrwhich copies up to max bytes to buffer at address dst from buffer to virtual address srcva in the process page table pagetable. Function copyinstr uses walkaddrto find a physical address pa0 via virtual srcva in the page table pagetable. The kernel page table maps virtual addresses to the same physical ones, so copyinstr copies bytes from pa0 V dst. Function walkaddr checks that the virtual address belongs to the process memory, so the program will not fool the kernel by spoofing the address. Similar function copyout copies bytes from kernel memory to process memory.

Kernel mode interrupts

Xv6 assigns a procedure as an interrupt handler kernelvecwhen entering kernel mode. Procedure kernelvec knows that it is working with the kernel page table and the kernel stack. A timer interrupt will switch the processor to another thread, so kernelvec stores processor registers on the kernel stack.

Procedure kernelvec calls a procedure kerneltrap, which handles device interrupts and exceptions. Procedure kerneltrap causes devintrto recognize an interrupt from the device. The kernel will call panic and will stop running if an exception occurs in the kernel.

A timer interrupt will cause kerneltrap call yieldto give up the processor to another thread. Each thread calls yield on a timer, so kerneltrap will continue working later. Chapter 7 will cover process planning and operation yield.

Procedure kerneltrap saves case sepc in a local variable on the kernel stack to protect against thread switching.

Procedure kerneltrap returns control kernelvecwhich restores the processor registers and returns control to the kernel code that was running before the interruption.

The processor disables interrupts – resets the bit SIE – when responding to an interruption. User mode interrupt handler designates as handler kernelvec and enables interrupts so the processor won’t call uservec twice. Procedure kernelvec does not enable interrupts, so the processor will not call kernelvec twice. Instructions sret will return the flag SIE to value before interruption SIE = SPIEthat is, it will enable interrupts again.

Page access errors

Xv6 terminates the process that throws the exception and stops running if the kernel throws the exception.

Other OSes use page access errors to implement the following techniques:

Copy on Write
Lazy memory allocation
Issue pages as needed
Flushing pages to disk

The processor will report a page access error if:

The processor did not find a virtual address in the process page table – flag PTE_V the entry has been reset
The instruction performs an action prohibited for the page: reading, writing, executing code on the page, or accessing from user mode

RISC-V distinguishes between three types of page access errors:

Instructions load cannot contact virtual address
Instructions store cannot contact virtual address
Register pc contains an unavailable virtual address

Copy on Write

System call fork does not copy the memory of the parent process into the memory of the child until the parent or child process writes to the memory. Such fork will take away write permission from the parent process’s pages and give a copy of the page table to the child process. Writing to a page throws an exception – the kernel will then issue a new page, copy the contents, add it to the child process’s page table, and return write permission to both pages.

OS monitors calls fork, exec, exit and page access errors when optimizing performance using copy-on-write. The same physical page ends up in multiple page tables after calls forkand the challenges exec And exit free virtual pages that reference this physical page.

Copy-on-write speeds up programs that fork cause exec – fork doesn’t copy a single byte, but exec replaces memory with a program from a file.

Lazy memory allocation

The kernel does not issue memory pages when a program calls sbrk, but remembers the increase in memory and waits for access to new memory. The kernel will issue the memory page when it handles a page access error.

This approach saves memory when a program requests more memory than it uses. The kernel does not return pages that are not accessed by the program.

The program does not wait a second if it requests a large amount of memory, thanks to lazy issuance. Call sbrk per gigabyte of memory would force the program to wait until the kernel issues 262144 pages by 4096 bytes Lazy rendering will distribute the waiting time evenly. The kernel will speed up its work if, when there is a page access error, it produces not just one, but a sequence of pages.

Issue pages as needed

Rendering pages as needed speeds up program launches. Call exec for a large program it will take a long time if exec loads the program immediately. The kernel will speed up startup if it creates an empty page table and loads pages from a file on the first access.

Flushing pages to disk

Programs run even if virtual memory is larger than physical memory by flushing pages to disk. The OS stores some pages in RAM and the rest on disk. The kernel resets the flag PTE_V for pages flushed to disk and loads the page into memory when a page access error occurs.

The kernel replaces the page in RAM with the one it reads from disk when free memory runs out. Disk is slower than RAM, so the less often the OS replaces pages, the faster programs run. There is no need to replace pages if each program works with a subset of pages that fit in RAM.

Cloud providers take up as much free computer memory as possible to recoup costs. Dozens of applications run simultaneously on smartphones and do not fit into RAM. Flushing pages to disk will help both.

Lazy memory allocation and need-based page allocation help when there is little free memory – without them, a greedy program will occupy free memory and force other programs to constantly flush pages to disk.

The OS automatically expands the program stack and maps files into memory using page access errors.

Reality

Interrupt handling seems unnecessarily complex because the RISC-V processor automatically does the bare minimum, but other operating systems take advantage of this to speed up interrupt handling.

Other OSes map kernel memory pages to process page tables, so do not use the page trampoline, do not switch to the kernel page table, and the system call code operates on process memory addresses. Xv6 avoids this optimization to avoid kernel security bugs due to mishandling of user memory addresses.

Modern operating systems implement copy-on-write, lazy memory allocation, page allocation as needed, page flushing to disk, memory mapping, etc. Modern operating systems, unlike xv6, tend to use as much free memory as possible – they place disk cache, file system cache, etc. in memory. Xv6 does not flush pages to disk and will terminate a program if there is no free memory available for it.

Chapter 8 will tell you about the disk cache..

Exercises

Functions copyin And copyinstr accesses the process’s page table. Map process memory to the kernel page table to copyin And copyinstr called memcpy to copy system call arguments into kernel memory, leaving the page table work to the processor.
Implement lazy memory allocation
Implement fork copy-on-write
Is it possible to get rid of the page? trapframe in process page tables? Can uservec store 32 processor registers on the kernel stack or structure proc?
How to rid xv6 from the page trampoline in page tables?