let's increase the size of virtual memory pages from 4K to 2M

Xv6 – educational OS – tells about the ideas that underlie operating systems.

Virtual memory helps programs run simultaneously and not interfere with each other. The OS loads a program into memory to execute it. Each program requires its own address space, which other programs do not interfere with. The program works with virtual memory addresses, which the OS maps to physical ones. So two programs access the same virtual address, but the OS will give one program one physical address, and another program another. Chapter 3 will tell you more about virtual memory and page tables.

The OS divides memory into pages – contiguous areas. Example: 4 KB memory pages occupy address ranges 0:0xFFF, 0x1000:0x1FFFand pages of 2 MB size – 0:0x1FFFFF, 0x200000:0x3FFFFF.

The OS adds entries to the page table when a program requests memory. The OS places the program's code, data, and stack in memory – the more memory the program occupies, the more memory pages it requires.

Example: Let's say the program takes up 4 MB of memory. The page size is 4 KB. The program will take up (1024 * 1024 * 4) / (1024 * 4) = 1024 pages of memory.

The OS will relieve the processor of unnecessary work if it increases the page size to 2 MB – then the program will take up 2 pages.

We will teach xv6 to work with virtual pages of 2 MB, learn how the linker creates a program memory image, and teach the xv6 file system to handle large files.

RISC-V Virtual Memory

The processor looks into the page table every time it accesses memory to translate a virtual address into a physical address. A virtual address consists of a virtual page number and an offset within the page. The page table maps virtual memory pages to physical pages.

Xv6 runs under the QEMU virtual machine on a 64-bit RISC-V architecture. RISC-V offers a hierarchy of page tables. The processor descends the page table hierarchy in search of a physical page. A page table entry points to the next page table when the bits RWX reset, otherwise – to the page of physical memory.

Xv6 uses the mode Sv39 RISC-V virtual addressing. The processor parses the 64-bit virtual address as shown in the figure.

Virtual address Sv39

Virtual address Sv39

  1. Discards the most significant bits 63-39.

  2. Uses groups of 9 bits to find the next page table entry.

  3. Uses the remaining least significant bits as an offset within the physical page.

Example:

  • The level 0 page table entry points to a page of size 2^30 = 1073741824 байтов = 1 Gb when bits RWX != 0that is, they store permissions to access the physical page.

  • The level 1 page table entry points to a page of size 2^21 = 2097152 bytes = 2 MB when bits RWX != 0.

  • The level 2 page table entry points to a page of size 2^12 = 4096 bytes = 4 KB when bits RWX != 0.

Object file, executable file, program memory image

The compiler creates a relocatable object file from the program source text. The object file contains sections of code, program data, and information about links between sections – relocatable elements. Program instructions refer to the data section – they store absolute or relative addresses of memory cells.

The linker creates a program memory image – an executable file from an object file – declares memory pages with permissions for code, data and places code and data sections on pages, calculates the addresses of relocatable elements. Now the OS will load the program memory image from the file and run it.

Relocatable and executable files

Relocatable and executable files

Note: The size of the file on disk will increase along with the size of the virtual memory pages. The linker sacrifices disk space so that the OS does not have to calculate the addresses of relocatable elements each time the program is started.

Program segments, section alignment by page size. kernel.ld linker script

The xv6 OS requires that program segments start at addresses that are multiples of the page size, or it will refuse to execute the program. The linker script describes how program sections are placed on memory pages.

Scenario kernel.ld describes how the kernel memory image is structured. The command SECTIONS determines how the linker arranges sections from the input files into the output file. The command .text : { *(.text) } tells the linker to place the sections .text from input files to section .text weekend. Recording *(...) means any input file name.

Team ENTRY(_entry) defines the entry point into the kernel – a procedure _entry from file entry.S.

Assignment . = 0x80000000; means that the section .text kernel is located at virtual address 0x80000000 – QEMU loads the kernel at this address. Symbol . denotes the current position in the output file. Record . = ALIGN(0x1000); increases the position to the nearest multiple of the address 0x1000 – aligns the position to the page size of 4 KB. ALIGN helps to place the next section on a new page.

File trampoline.s announces the section trampsec and contains the page code trampoline. Team ASSERT(. - _trampoline == 0x1000, "error: trampoline larger than one page"); checks that the section trampsec fits into one 4 KB page.

Note: Page trampoline contains instructions that switch the processor from user mode to kernel mode and back when xv6 processes device interrupts and services system calls. Chapter 4 covers trampoline more details.

PROVIDE(etext = .) will add a symbol to the program's symbol table etext with section end address .text. Function kvmmake uses this address when creating the kernel page table.

Section code .rodata starts on a new page because the commands ALIGN(0x1000) frame the page trampoline.

Sections .data And .bss require write permissions, so they are located on a new page – separate from the sections .text And .rodata.

Linker command line option ld -z max-page-size= sets the memory page size. Command CONSTANT(MAXPAGESIZE) will return the size of the memory page – use . = ALIGN(CONSTANT(MAXPAGESIZE)); instead of ALIGN(0x1000)to align sections.

Linker script user.ld

Scenario user.ld tells the linker how to build the user's programs – init, sh, cat, ls etc.

Team ENTRY(_main) sets the entry point – function main. The compiler precedes the names of symbols – functions, variables – with an underscore _.

Team . = 0x0; locates the code section at virtual address 0.

Section .rodata follows .text without transferring to a new page. Section .rodata does not require write permission, so it is located on code pages.

Note: The composer will extend the section to the beginning of the next page if you place ALIGN inside the section description. This will slow down the program startup – the OS will spend more time reading the section from the file.

The section size will remain the same if ALIGN place after section description. The next section will start on a new page, the linker will add alignment bytes to the file, but the bytes will not get into the section and the OS will not load these bytes from the file.

ALIGN within section

ALIGN within section

ALIGN outside section

ALIGN outside section

Code: increase the size of xv6 pages

File kernel/risc.v defines a constant PGSIZE – virtual page size – set PGSIZE = (1024 * 1024 * 2) = 2 Mb.

Constant PGSHIFT specifies the number of bits in the address that specify the offset from the beginning of the page. The page size has increased from 2^12 = 4096 before 2^21 = 2097152 – now 21 bits define the offset.

kernel/riscv.h changes

kernel/riscv.h changes

Function kvmmake creates a kernel page table – maps device registers, kernel code and data, page into memory trampoline and prepares pages for process stacks. File kernel/memlayout.h defines constants UART0, VIRTIO0, PLIC0, KERNBASE – make sure that the addresses are multiples of the new page size.

#include <stdio.h>

#define MAXVA (1L << (9 + 9 + 9 + 12 - 1))
#define PGSIZE 1024 * 1024 * 2

#define UART0 0x10000000L
#define VIRTIO0 0x10001000
#define PLIC 0x0c000000L
#define KERNBASE 0x80000000L
#define TRAMPOLINE (MAXVA - PGSIZE)

#define PHYSTOP (KERNBASE + 128*1024*1024)

#define CHECK_ALIGNMENT(addr, name) \
    printf("Address %s is %saligned to page size\n", #name, 0 == ((addr) % PGSIZE) ? "" : "NOT ");

int main() {
    CHECK_ALIGNMENT(UART0, UART0);
    CHECK_ALIGNMENT(VIRTIO0, VIRTIO0);
    CHECK_ALIGNMENT(PLIC, PLIC);
    CHECK_ALIGNMENT(KERNBASE, KERNBASE);
    CHECK_ALIGNMENT(TRAMPOLINE, TRAMPOLINE);
}

Now VIRTIO0 - UART0 = 0x1000 < PGSIZEThat's why VIRTIO0 gets to the same page as UART0 – remove the call kvmmap(kpgtbl, VIRTIO0, VIRTIO0, PGSIZE, PTE_R | PTE_W);otherwise we will get an error panic: mappages: remap.

Function walk(pagetable, va, alloc) searches for a page table entry pagetable for virtual address va. The function will add the page to the table if it does not find the flag alloc equals 1. Now the OS works with 2 MB pages, so walk does not descend below the second level of the page table hierarchy.

Now walk works with 2 levels of page table hierarchy

Now walk works with 2 levels of page table hierarchy

IN Makefile let's change the page size for the composer in LDFLAGS.

Now the virtual page size is 2 MB

Now the virtual page size is 2 MB

Xv6 runs on a QEMU virtual machine with 128 MB of memory.

QEMUOPTS = -machine virt -bios none -kernel $K/kernel -m 128M -smp $(CPUS) -nographic

Function proc_mapstacks takes up memory for process stacks. The function requires 2 pages per process – a stack page and a protective page – and for NPROC = 64 processes – 64 * 2 = 128 pages. The processes required 128 * 4 = 512 Kb of memory, now they require it 128 * 2 = 256 Mb. Let's reduce the number of processes NPROC or increase the machine's memory capacity.

“Both, and you can do without bread,” said Winnie the Pooh.

Constant MAXFILE limits the maximum file size. Constant MAXFILE equal to the largest number of file system blocks that a file occupies – MAXFILE = NDIRECT + NINDIRECT. Structure dinode describes how a file is stored on disk. Structure dinode contains NDIRECT file content blocks and one indirect block, which contains the disk block numbers for the rest of the file content.

kernel/fs.h

#define BSIZE 1024 // block size
#define NDIRECT 12
#define NINDIRECT (BSIZE / sizeof(uint)) // 1024 / 4 = 256
#define MAXFILE (NDIRECT + NINDIRECT) // 268

The file system limits the file size to 268 blocks of 1024 bytes – the file size does not exceed 268 KB. Program files take up more space after the page size is increased, so the command mkfs will end with the error:

make qemu // или make fs.img
...
Assertion failed: fbn < MAXFILE

The program contains at least two pages of memory – code and data – so we focus on a file size of at least 4 MB.

The block size is a multiple of the structure size dinodeso that one block can accommodate an integer number of structures dinode – this makes it easier for the file system to find dinode by number. Let's increase the number of file blocks so that the structure dinode occupied 1024 bytes:

NDIRECT = (1024 - 4 * sizeof(short) - sizeof(uint)) / 4 - 1 = (1024 - 12) / 4 - 1 = 252

This will increase the maximum file size to 252 + 256 = 508 Kb. Let's increase the block size to 4096 bytes – we'll get the largest file size of ~4.98 MB.

MAXFILE = 252 + 4096 / 4 = 252 + 1024 = 1276
max_size = 1276 * 4096 = 5226496

Chapter 8 will cover the file system in more detail.

Run usertests

Programmers write tests so that tests can be run – run tests before you change the code, run tests after.

Program usertests tests the kernel for strength – passes invalid memory addresses to system calls, writes to a file at an invalid offset, and simultaneously calls fork, exit, waitto provoke mutual blocking of processes, etc.

Program usertests works slower after increasing the page size because fork takes longer to copy memory. Now the program usertests requires more memory, so let's increase the machine's memory to 1024 MB and the constant PHYSTOPso that xv6 uses this memory.

# Makefile
QEMUOPTS = -machine virt -bios none -kernel $K/kernel -m 1024M -smp $(CPUS) -nographic

// kernel/memlayout.h
#define PHYSTOP (KERNBASE + 1024*1024*1024)

Conclusion

We've had enough practice 🙂 Xv6 started working slower when the page size increased – program usertests works 10-15 times longer. System call fork works longer because it copies more memory.

Maybe we'll see the benefit of large pages when we write an xv6 archiver, compiler, video compressor, or other program that works with large amounts of data in memory.

Links

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *