Interrupts and system calls
Note. The authors recommend reading the book together with xv6 source text. The authors prepared and laboratory work on xv6.
Xv6 runs on RISC-V, so building it requires RISC-V versions of the tools: QEMU 5.1+, GDB 8.3+, GCC, and Binutils. The instructions will help you install the tools.
The processor aborts and transfers control to the kernel when:
The program executes a system call.
The instruction caused an error, such as division by zero. This error is called an exception.
The device requires processor attention, for example, the disk has finished reading data that the program required.
The program does not notice that it has been interrupted – the processor will save the program state, handle the interruption and continue executing the program.
Xv6 handles interrupts in kernel mode. The kernel executes system call code, operates on devices, and handles exceptions.
Xv6 contains code for three scenarios:
Interrupts from user mode
Kernel mode interrupts
Timer interrupts
Interrupts on RISC-V
Each RISC-V processor owns a set of control registers that determine how the processor responds to interrupts. RISC-V Documentation talks about them in detail. XV6 uses the following registers:
stvec
stores the address of the interrupt handler – the processor will transfer control to this address.sepc
– the processor saves the instruction counterpc
programs in the registersepc
before passing control to the interrupt handler. Return instruction from interrupt handlersret
will restore the register valuepc
fromsepc
. The kernel will force the handler to transfer control to other code if it changes the registersepc
.scause
stores the interrupt number – the reason for the interruption.sscratch
. The interrupt handler stores processor registers in memory. Memory Write Instructionstore
requires specifying a memory address in a processor register, but the registers are busy. The handler will preserve the registera0
Vsscratch
and usesa0
in the instructionsstore
.sstatus
determines the state of the processor when processing an interrupt:Bit
SIE
registersstatus
determines whether the processor responds to kernel mode interrupts.Bit
SPP
registersstatus
determines whether the interrupt occurred in user or kernel mode. Instructionssret
will return the processor to the mode that specifies the bitSPP
.
The processor works with these registers only in kernel mode – the registers are not accessible from user mode.
The kernel uses a similar set of registers mtvec
, mepc
, mcause
, mscratch
, mstatus
for processing timer interrupts in machine mode.
Each processor owns its own set of registers and handles interrupts independently of other processors.
The processor acts as follows when responding to an interrupt other than a timer interrupt:
Checks the flag
SIE
registersstatus
, if the interrupt came from the device. Does nothing if the flag is cleared, otherwise goes to the next stepResets the flag
SIE
registersstatus
Assigns
sepc = pc
Sets a flag
SPP
registersstatus
V1
if the processor is running in kernel mode, otherwise it resets the flag to0
Writes the reason for the interruption in
scause
Switches to kernel mode
Continues working from the address in the register
pc
The processor does not switch to the kernel page table, the kernel stack, or save processor registers except pc
. The processor leaves these tasks to the interrupt handler. Other operating systems optimize interrupt handling, for example, they do not switch to the kernel page table.
Think about what steps you can skip without compromising security. For example, a program can break the kernel if the processor does not assign a register pc = stvec
and will continue to execute program instructions in kernel mode.
User Mode Interrupts
Xv6 handles user mode interrupts differently than kernel mode interrupts. This section will cover user mode interrupts.
The processor terminates operation in user mode if:
The program executes a system call
The program makes an error and throws an exception
The device requires CPU attention
The processor switches to the kernel page table and kernel stack in an assembler routine uservec
before calling the interrupt handler usertrap
in C language. The processor will then execute usertrapret
and will return to the user mode stack and process page table in the assembly routine userret
.
The processor does not switch to the kernel page table when responding to an interrupt, so the process’s page table contains the page trampoline
with code uservec
. Procedure uservec
switches the processor to the kernel page table, so the kernel page table maps the page trampoline
to the same virtual address to register pc
pointed to the correct virtual address of the next instruction and uservec
continued execution after changing tables. Page trampoline
contains a procedure userret
which returns the processor to the process page table.
Flag PTE_U
at the page trampoline
reset, so uservec
And userret
work only in kernel mode.
Procedure uservec
stores program state – 32 registers – per page trapframe
switches to the kernel page table and kernel stack and transfers control usertrap
. The process stores the address of the kernel stack and kernel page table on the page trapframe
.
Procedure usertrap
determines the cause of the interrupt, handles the interrupt and transfers control usertrapret
. Procedure usertrap
assigns a procedure as an interrupt handler kernelvec
then saves sepc
V trapframe
since a timer interrupt will force the execution thread to be switched by calling yield
and another thread will change sepc
when returning from kernel mode to user mode. Procedure usertrap
causes syscall
if the interrupt is a system call, devintr
, if the interrupt is from a device, otherwise terminates the process in response to the exception. Procedure usertrap
adds 4 to saved pc
when processing a system call so that the process continues with the next one ecall
instructions.
Procedure usertrapret
writes to stvec
user mode interrupt handler address uservec
writes in trapframe
addresses of the kernel page table, kernel stack, which are needed uservec
and restores the register sepc
from trapframe
. Then usertrapret
transfers control userret
along with the address of the process’s page table.
Procedure userret
switches to the process page table and user mode stack, restores processor registers from trapframe
and executes the instruction sret
to return to user mode and continue program execution.
Code: system calls
Chapter 2 covered how xv6 makes its first system call exec
. This section will explain how the kernel executes code exec
.
Program initcode.S
puts the system call number in a register a7
processor, and the call arguments are in registers a0
And a1
. System call number – array element index syscalls
– an array of function pointers. Instructions ecall
interrupts the processor – forces you to switch to kernel mode and execute the interrupt handler uservec
then functions usertrap
And syscall
.
Function syscall
gets the system call number from the one stored in trapframe
register a7
. Constant SYS_exec
defines the system call number exec
and the array element syscalls[SYS_exec]
indicates a function sys_exec
.
Function syscall
writes to p->trapframe->a0
the value that returns sys_exec
to call exec
returned this value in the program. The RISC-V C calling convention says that functions write their return value to a register a0
. System calls return 0
, when successful, or a negative number to report an error. Function syscall
will print an error message and return -1
if the program passed an incorrect system call number.
Code: System Call Arguments
System call code uses functions argint
, argaddr
And argfd
to get to the arguments stored in trapframe
. The program passes arguments through processor registers, and the interrupt handler uservec
stores registers in trapframe
. Functions argint
, argaddr
And argfd
cause argraw
to extract n
th argument from trapframe
and return it as a number, address and file descriptor respectively.
The kernel is not able to access the process memory address, since it works with the kernel page table, so it implements copying functions from process memory to kernel memory.
Function fetchstr
copies a string from process memory to kernel memory. Function fetchstr
causes copyinstr
which copies up to max
bytes to buffer at address dst
from buffer to virtual address srcva
in the process page table pagetable
. Function copyinstr
uses walkaddr
to find a physical address pa0
via virtual srcva
in the page table pagetable
. The kernel page table maps virtual addresses to the same physical ones, so copyinstr
copies bytes from pa0
V dst
. Function walkaddr
checks that the virtual address belongs to the process memory, so the program will not fool the kernel by spoofing the address. Similar function copyout
copies bytes from kernel memory to process memory.
Kernel mode interrupts
Xv6 assigns a procedure as an interrupt handler kernelvec
when entering kernel mode. Procedure kernelvec
knows that it is working with the kernel page table and the kernel stack. A timer interrupt will switch the processor to another thread, so kernelvec
stores processor registers on the kernel stack.
Procedure kernelvec
calls a procedure kerneltrap
, which handles device interrupts and exceptions. Procedure kerneltrap
causes devintr
to recognize an interrupt from the device. The kernel will call panic
and will stop running if an exception occurs in the kernel.
A timer interrupt will cause kerneltrap
call yield
to give up the processor to another thread. Each thread calls yield
on a timer, so kerneltrap
will continue working later. Chapter 7 will cover process planning and operation yield
.
Procedure kerneltrap
saves case sepc
in a local variable on the kernel stack to protect against thread switching.
Procedure kerneltrap
returns control kernelvec
which restores the processor registers and returns control to the kernel code that was running before the interruption.
The processor disables interrupts – resets the bit SIE
– when responding to an interruption. User mode interrupt handler designates as handler kernelvec
and enables interrupts so the processor won’t call uservec
twice. Procedure kernelvec
does not enable interrupts, so the processor will not call kernelvec
twice. Instructions sret
will return the flag SIE
to value before interruption SIE = SPIE
that is, it will enable interrupts again.
Page access errors
Xv6 terminates the process that throws the exception and stops running if the kernel throws the exception.
Other OSes use page access errors to implement the following techniques:
Copy on Write
Lazy memory allocation
Issue pages as needed
Flushing pages to disk
The processor will report a page access error if:
The processor did not find a virtual address in the process page table – flag
PTE_V
the entry has been resetThe instruction performs an action prohibited for the page: reading, writing, executing code on the page, or accessing from user mode
RISC-V distinguishes between three types of page access errors:
Instructions
load
cannot contact virtual addressInstructions
store
cannot contact virtual addressRegister
pc
contains an unavailable virtual address
Register scause
indicates the type of error, and stval
contains an unavailable virtual address.
Copy on Write
System call fork
does not copy the memory of the parent process into the memory of the child until the parent or child process writes to the memory. Such fork
will take away write permission from the parent process’s pages and give a copy of the page table to the child process. Writing to a page throws an exception – the kernel will then issue a new page, copy the contents, add it to the child process’s page table, and return write permission to both pages.
OS monitors calls fork
, exec
, exit
and page access errors when optimizing performance using copy-on-write. The same physical page ends up in multiple page tables after calls fork
and the challenges exec
And exit
free virtual pages that reference this physical page.
Copy-on-write speeds up programs that fork
cause exec
– fork
doesn’t copy a single byte, but exec
replaces memory with a program from a file.
Lazy memory allocation
The kernel does not issue memory pages when a program calls sbrk
, but remembers the increase in memory and waits for access to new memory. The kernel will issue the memory page when it handles a page access error.
This approach saves memory when a program requests more memory than it uses. The kernel does not return pages that are not accessed by the program.
The program does not wait a second if it requests a large amount of memory, thanks to lazy issuance. Call sbrk
per gigabyte of memory would force the program to wait until the kernel issues 262144
pages by 4096
bytes Lazy rendering will distribute the waiting time evenly. The kernel will speed up its work if, when there is a page access error, it produces not just one, but a sequence of pages.
Issue pages as needed
Rendering pages as needed speeds up program launches. Call exec
for a large program it will take a long time if exec
loads the program immediately. The kernel will speed up startup if it creates an empty page table and loads pages from a file on the first access.
Flushing pages to disk
Programs run even if virtual memory is larger than physical memory by flushing pages to disk. The OS stores some pages in RAM and the rest on disk. The kernel resets the flag PTE_V
for pages flushed to disk and loads the page into memory when a page access error occurs.
The kernel replaces the page in RAM with the one it reads from disk when free memory runs out. Disk is slower than RAM, so the less often the OS replaces pages, the faster programs run. There is no need to replace pages if each program works with a subset of pages that fit in RAM.
Cloud providers take up as much free computer memory as possible to recoup costs. Dozens of applications run simultaneously on smartphones and do not fit into RAM. Flushing pages to disk will help both.
Lazy memory allocation and need-based page allocation help when there is little free memory – without them, a greedy program will occupy free memory and force other programs to constantly flush pages to disk.
The OS automatically expands the program stack and maps files into memory using page access errors.
Reality
Interrupt handling seems unnecessarily complex because the RISC-V processor automatically does the bare minimum, but other operating systems take advantage of this to speed up interrupt handling.
Other OSes map kernel memory pages to process page tables, so do not use the page trampoline
, do not switch to the kernel page table, and the system call code operates on process memory addresses. Xv6 avoids this optimization to avoid kernel security bugs due to mishandling of user memory addresses.
Modern operating systems implement copy-on-write, lazy memory allocation, page allocation as needed, page flushing to disk, memory mapping, etc. Modern operating systems, unlike xv6, tend to use as much free memory as possible – they place disk cache, file system cache, etc. in memory. Xv6 does not flush pages to disk and will terminate a program if there is no free memory available for it.
Exercises
Functions
copyin
Andcopyinstr
accesses the process’s page table. Map process memory to the kernel page table tocopyin
Andcopyinstr
calledmemcpy
to copy system call arguments into kernel memory, leaving the page table work to the processor.Implement lazy memory allocation
Implement
fork
copy-on-writeIs it possible to get rid of the page?
trapframe
in process page tables? Canuservec
store 32 processor registers on the kernel stack or structureproc
?How to rid xv6 from the page
trampoline
in page tables?