let's increase the size of virtual memory pages from 4K to 2M
Xv6 – educational OS – tells about the ideas that underlie operating systems.
Virtual memory helps programs run simultaneously and not interfere with each other. The OS loads a program into memory to execute it. Each program requires its own address space, which other programs do not interfere with. The program works with virtual memory addresses, which the OS maps to physical ones. So two programs access the same virtual address, but the OS will give one program one physical address, and another program another. Chapter 3 will tell you more about virtual memory and page tables.
The OS divides memory into pages – contiguous areas. Example: 4 KB memory pages occupy address ranges 0:0xFFF
, 0x1000:0x1FFF
and pages of 2 MB size – 0:0x1FFFFF
, 0x200000:0x3FFFFF
.
The OS adds entries to the page table when a program requests memory. The OS places the program's code, data, and stack in memory – the more memory the program occupies, the more memory pages it requires.
Example: Let's say the program takes up 4 MB of memory. The page size is 4 KB. The program will take up (1024 * 1024 * 4) / (1024 * 4) = 1024
pages of memory.
The OS will relieve the processor of unnecessary work if it increases the page size to 2 MB – then the program will take up 2 pages.
We will teach xv6 to work with virtual pages of 2 MB, learn how the linker creates a program memory image, and teach the xv6 file system to handle large files.
RISC-V Virtual Memory
The processor looks into the page table every time it accesses memory to translate a virtual address into a physical address. A virtual address consists of a virtual page number and an offset within the page. The page table maps virtual memory pages to physical pages.
Xv6 runs under the QEMU virtual machine on a 64-bit RISC-V architecture. RISC-V offers a hierarchy of page tables. The processor descends the page table hierarchy in search of a physical page. A page table entry points to the next page table when the bits RWX
reset, otherwise – to the page of physical memory.
Xv6 uses the mode Sv39
RISC-V virtual addressing. The processor parses the 64-bit virtual address as shown in the figure.
Discards the most significant bits 63-39.
Uses groups of 9 bits to find the next page table entry.
Uses the remaining least significant bits as an offset within the physical page.
Example:
The level 0 page table entry points to a page of size
2^30 = 1073741824 байтов = 1
Gb when bitsRWX != 0
that is, they store permissions to access the physical page.The level 1 page table entry points to a page of size
2^21 = 2097152
bytes = 2 MB when bitsRWX != 0
.The level 2 page table entry points to a page of size
2^12 = 4096
bytes = 4 KB when bitsRWX != 0
.
Object file, executable file, program memory image
The compiler creates a relocatable object file from the program source text. The object file contains sections of code, program data, and information about links between sections – relocatable elements. Program instructions refer to the data section – they store absolute or relative addresses of memory cells.
The linker creates a program memory image – an executable file from an object file – declares memory pages with permissions for code, data and places code and data sections on pages, calculates the addresses of relocatable elements. Now the OS will load the program memory image from the file and run it.
Note: The size of the file on disk will increase along with the size of the virtual memory pages. The linker sacrifices disk space so that the OS does not have to calculate the addresses of relocatable elements each time the program is started.
Program segments, section alignment by page size. kernel.ld linker script
The xv6 OS requires that program segments start at addresses that are multiples of the page size, or it will refuse to execute the program. The linker script describes how program sections are placed on memory pages.
Scenario kernel.ld describes how the kernel memory image is structured. The command SECTIONS
determines how the linker arranges sections from the input files into the output file. The command .text : { *(.text) }
tells the linker to place the sections .text
from input files to section .text
weekend. Recording *(...)
means any input file name.
Team ENTRY(_entry)
defines the entry point into the kernel – a procedure _entry
from file entry.S
.
Assignment . = 0x80000000;
means that the section .text
kernel is located at virtual address 0x80000000 – QEMU loads the kernel at this address. Symbol .
denotes the current position in the output file. Record . = ALIGN(0x1000);
increases the position to the nearest multiple of the address 0x1000
– aligns the position to the page size of 4 KB. ALIGN
helps to place the next section on a new page.
File trampoline.s announces the section trampsec
and contains the page code trampoline
. Team ASSERT(. - _trampoline == 0x1000, "error: trampoline larger than one page");
checks that the section trampsec
fits into one 4 KB page.
Note: Page
trampoline
contains instructions that switch the processor from user mode to kernel mode and back when xv6 processes device interrupts and services system calls. Chapter 4 coverstrampoline
more details.
PROVIDE(etext = .)
will add a symbol to the program's symbol table etext
with section end address .text
. Function kvmmake uses this address when creating the kernel page table.
Section code .rodata
starts on a new page because the commands ALIGN(0x1000)
frame the page trampoline
.
Sections .data
And .bss
require write permissions, so they are located on a new page – separate from the sections .text
And .rodata
.
Linker command line option ld
-z max-page-size=
sets the memory page size. Command CONSTANT(MAXPAGESIZE)
will return the size of the memory page – use . = ALIGN(CONSTANT(MAXPAGESIZE));
instead of ALIGN(0x1000)
to align sections.
Linker script user.ld
Scenario user.ld tells the linker how to build the user's programs – init
, sh
, cat
, ls
etc.
Team ENTRY(_main)
sets the entry point – function main
. The compiler precedes the names of symbols – functions, variables – with an underscore _
.
Team . = 0x0;
locates the code section at virtual address 0.
Section .rodata
follows .text
without transferring to a new page. Section .rodata
does not require write permission, so it is located on code pages.
Note: The composer will extend the section to the beginning of the next page if you place
ALIGN
inside the section description. This will slow down the program startup – the OS will spend more time reading the section from the file.The section size will remain the same if
ALIGN
place after section description. The next section will start on a new page, the linker will add alignment bytes to the file, but the bytes will not get into the section and the OS will not load these bytes from the file.
Code: increase the size of xv6 pages
File kernel/risc.v defines a constant PGSIZE
– virtual page size – set PGSIZE = (1024 * 1024 * 2) = 2
Mb.
Constant PGSHIFT
specifies the number of bits in the address that specify the offset from the beginning of the page. The page size has increased from 2^12 = 4096
before 2^21 = 2097152
– now 21 bits define the offset.
Function kvmmake
creates a kernel page table – maps device registers, kernel code and data, page into memory trampoline
and prepares pages for process stacks. File kernel/memlayout.h
defines constants UART0
, VIRTIO0
, PLIC0
, KERNBASE
– make sure that the addresses are multiples of the new page size.
#include <stdio.h>
#define MAXVA (1L << (9 + 9 + 9 + 12 - 1))
#define PGSIZE 1024 * 1024 * 2
#define UART0 0x10000000L
#define VIRTIO0 0x10001000
#define PLIC 0x0c000000L
#define KERNBASE 0x80000000L
#define TRAMPOLINE (MAXVA - PGSIZE)
#define PHYSTOP (KERNBASE + 128*1024*1024)
#define CHECK_ALIGNMENT(addr, name) \
printf("Address %s is %saligned to page size\n", #name, 0 == ((addr) % PGSIZE) ? "" : "NOT ");
int main() {
CHECK_ALIGNMENT(UART0, UART0);
CHECK_ALIGNMENT(VIRTIO0, VIRTIO0);
CHECK_ALIGNMENT(PLIC, PLIC);
CHECK_ALIGNMENT(KERNBASE, KERNBASE);
CHECK_ALIGNMENT(TRAMPOLINE, TRAMPOLINE);
}
Now VIRTIO0 - UART0 = 0x1000 < PGSIZE
That's why VIRTIO0
gets to the same page as UART0
– remove the call kvmmap(kpgtbl, VIRTIO0, VIRTIO0, PGSIZE, PTE_R | PTE_W);
otherwise we will get an error panic: mappages: remap
.
Function walk(pagetable, va, alloc)
searches for a page table entry pagetable
for virtual address va
. The function will add the page to the table if it does not find the flag alloc
equals 1. Now the OS works with 2 MB pages, so walk
does not descend below the second level of the page table hierarchy.
IN Makefile let's change the page size for the composer in LDFLAGS
.
Xv6 runs on a QEMU virtual machine with 128 MB of memory.
QEMUOPTS = -machine virt -bios none -kernel $K/kernel -m 128M -smp $(CPUS) -nographic
Function proc_mapstacks
takes up memory for process stacks. The function requires 2 pages per process – a stack page and a protective page – and for NPROC = 64
processes – 64 * 2 = 128
pages. The processes required 128 * 4 = 512
Kb of memory, now they require it 128 * 2 = 256
Mb. Let's reduce the number of processes NPROC
or increase the machine's memory capacity.
“Both, and you can do without bread,” said Winnie the Pooh.
Constant MAXFILE
limits the maximum file size. Constant MAXFILE
equal to the largest number of file system blocks that a file occupies – MAXFILE = NDIRECT + NINDIRECT
. Structure dinode
describes how a file is stored on disk. Structure dinode
contains NDIRECT
file content blocks and one indirect block, which contains the disk block numbers for the rest of the file content.
#define BSIZE 1024 // block size
#define NDIRECT 12
#define NINDIRECT (BSIZE / sizeof(uint)) // 1024 / 4 = 256
#define MAXFILE (NDIRECT + NINDIRECT) // 268
The file system limits the file size to 268 blocks of 1024 bytes – the file size does not exceed 268 KB. Program files take up more space after the page size is increased, so the command mkfs
will end with the error:
make qemu // или make fs.img
...
Assertion failed: fbn < MAXFILE
The program contains at least two pages of memory – code and data – so we focus on a file size of at least 4 MB.
The block size is a multiple of the structure size dinode
so that one block can accommodate an integer number of structures dinode
– this makes it easier for the file system to find dinode
by number. Let's increase the number of file blocks so that the structure dinode
occupied 1024 bytes:
NDIRECT = (1024 - 4 * sizeof(short) - sizeof(uint)) / 4 - 1 = (1024 - 12) / 4 - 1 = 252
This will increase the maximum file size to 252 + 256 = 508
Kb. Let's increase the block size to 4096 bytes – we'll get the largest file size of ~4.98 MB.
MAXFILE = 252 + 4096 / 4 = 252 + 1024 = 1276
max_size = 1276 * 4096 = 5226496
Chapter 8 will cover the file system in more detail.
Run usertests
Programmers write tests so that tests can be run – run tests before you change the code, run tests after.
Program usertests
tests the kernel for strength – passes invalid memory addresses to system calls, writes to a file at an invalid offset, and simultaneously calls fork
, exit
, wait
to provoke mutual blocking of processes, etc.
Program usertests
works slower after increasing the page size because fork
takes longer to copy memory. Now the program usertests
requires more memory, so let's increase the machine's memory to 1024 MB and the constant PHYSTOP
so that xv6 uses this memory.
# Makefile
QEMUOPTS = -machine virt -bios none -kernel $K/kernel -m 1024M -smp $(CPUS) -nographic
// kernel/memlayout.h
#define PHYSTOP (KERNBASE + 1024*1024*1024)
Conclusion
We've had enough practice 🙂 Xv6 started working slower when the page size increased – program usertests
works 10-15 times longer. System call fork
works longer because it copies more memory.
Maybe we'll see the benefit of large pages when we write an xv6 archiver, compiler, video compressor, or other program that works with large amounts of data in memory.