Operating system interfaces
Xv6 Chapter 1: Operating System Interfaces
The operating system divides the computer between several programs so that the programs run simultaneously. The operating system abstracts the work with the hardware – the program does not care what type of disk the computer is working with. The operating system allows programs to communicate and work together.
The operating system defines the interface with which programs operate. A good interface is difficult to design. A primitive interface is easier to implement, but often programs want complex functions. An interface must combine primitive mechanisms to create complex functions.
This book talks about the principles of operation of operating systems using xv6 as an example. The xv6 operating system implements the basic interface that Ken Thompson and Dennis Ritchie proposed in the Unix operating system and emulates the internal workings of Unix. Combinations of the simplest Unix mechanisms give amazing freedom of action. Modern operating systems have recognized the success of Unix and implement similar interfaces – BSD, Linux, macOS, Solaris, and even Microsoft Windows. Learning xv6 will help you understand how other operating systems work.
The xv6 kernel is a special program that provides operating system services to other programs. A program, when running, is called a process. Each process owns its own memory, which stores instructions, data, and the process stack. Instructions are calculations that the program performs. Data are the variables that instructions operate on. The stack stores the procedure calls that the program executes. There are several processes running on a computer, but only one core.
A process makes a system call to use kernel services. A system call passes control to the kernel, the kernel runs and returns control – this is how the process switches between user and kernel spaces.
The kernel protects itself from user processes using CPU mechanisms. The kernel works with the hardware, but user processes do not. The user process executes the system call, the processor escalates privileges, and executes the kernel code.
A set of system calls is a kernel interface accessible to user programs. The xv6 kernel provides some of the same system calls as the Unix kernel.
System call | Description |
---|---|
int fork() | Creates a new process, returns the PID of the child process. |
int exit(int status) | Terminates the current process, passes status to the wait() call, and does not return control to the program. |
int wait(int *status) | Waits for the child process to terminate, writes the termination code to *status, returns the PID of the terminated process. |
int kill(int pid) | Terminates the process with the specified PID. Returns 0 on success and -1 on error. |
int getpid() | Returns the PID of the current process. |
int sleep(int n) | Pauses the process for n processor cycles. |
int exec(char *file, char *argv[]) | Loads the program from file and executes it with arguments argv. Returns control only in case of error. |
char *sbrk(int n) | Expands process memory by n bytes. Returns the address of the start of new memory. |
int open(char *file, int flags) | Opens a file. flags means read and write permissions. Returns a file descriptor. |
int write(int fd, char *buf, int n) | Writes n bytes from buf to file descriptor fd. Returns the number of bytes written. |
int read(int fd, char *buf, int n) | Reads n bytes from a file into buf. Returns the number of bytes read, or 0 if the end of the file is reached. |
int close(int fd) | Releases the open file descriptor fd. |
int dup(int fd) | Returns a new file descriptor that refers to the same file as fd. |
int pipe(int p[]) | Creates a pipe, places read and write file descriptors in p[0] and p[1]. |
int chdir(char *dir) | Changes the current directory. |
int mkdir(char *dir) | Creates a new directory. |
int mknod(char *file, int, int) | Creates a device file. |
int fstat(int fd, struct stat *st) | Writes file information to *st. |
int stat(char *file, struct stat *st) | Writes file information to *st. |
int link(char *file1, char *file2) | Creates a new name file2 for file file1. |
int unlink(char *file) | Deletes a file. |
The following text describes xv6 services – processes, memory, file handles, pipes and the file system. Code examples show how the program shell
uses kernel services, and how carefully system calls are designed.
Program shell
– Unix command line interface. Shell
reads and executes user commands. Shell
– a user program, not part of the kernel – this is the power of system calls. Shell
easy to replace with another program – modern Unix systems offer several such programs with different interfaces and work automation capabilities. Shell
xv6 implements the basic ideas of the Bourne shell. Program code shell
– V user/sh.c
.
Processes and memory
Each xv6 process owns memory in user space. Process memory consists of code, data, and stack. The xv6 kernel divides processor time between processes. Xv6 saves processor registers while the process is not running and restores the registers when the process starts executing. The kernel assigns a PID to each process and manages the state of each process.
A process starts a new process by calling fork
. Call fork
copies process memory – code, data and stack – then returns control. Call fork
will return to the original process the PID of the new process, and to the new process – 0
. The original process is called the parent process, and the new process is called the child process.
int pid = fork();
if (0 < pid) {
printf("parent: child=%d\n", pid);
pid = wait((int *) 0);
printf("child %d is done\n", pid);
} else if (0 == pid) {
printf("child: exiting\n");
exit(0);
} else {
printf("fork error\n");
}
System call exit
stops the current process and frees its associated resources, such as memory and open files. Argument exit
– exit code – 0
if successful and 1
in case of error.
System call wait
returns the PID of the completed child process to the current process and writes the exit code to the passed address. Call wait
authorizes the transfer 0
, if the exit code is not needed. Call wait
waits for the child process to terminate if no child process has terminated yet, and will return immediately -1
if the current process has no children.
Line order
parent: child=1234
child: exiting
depends on which process calls first printf
. After the child process terminates, wait
will return control to the parent and the code will print
parent: child 1234 is done
Although after the call fork
The memory contents of both processes are the same, one process does not affect the operation of the other. Each process operates with its own processor variables and registers. For example, variable pid
child process will remain unchanged when the parent process executes
pid = wait((int *) 0);
Call exec
will replace the memory image of the current process with the one it loads from the file. In case of success exec
does not return control to the calling program, but begins execution of the program from the file. exec
takes 2 arguments: the name of the executable file and an array of program argument strings.
An executable file is the result of compiling the source text of a program. The executable file format determines where the code, data, first instruction of the program, etc. are. Xv6 uses the format ELF
which is discussed in detail in Chapter 3. The header of the ELF file determines the entry point into the program – the address of the first instruction.
char* argv[3];
argv[0] = "echo";
argv[1] = "hello";
argv[2] = 0;
exec("/bin/echo", argv);
printf("exec error\n");
This code snippet will replace the current program with the program /bin/echo
, launched with the arguments “echo”, “hello”. The first argument is the program name.
Program shell
xv6 uses calls fork
And exec
to execute programs at the user’s request. Cycle in main
receives a string from the user by calling getcmd
then creates a copy of the process shell
challenge fork
. Parental shell
causes wait
and waits for the child to complete, and the child shell
executes the command entered by the user. For example, the user enters “echo hello” parsecmd
parses the input and runcmd
executes a command by calling exec
. So shell
will execute the code echo
instead of further code runcmd
. The echo code will then call exit
after which the parent shell
will return from wait
V main
.
// Read and run input commands.
while(getcmd(buf, sizeof(buf)) >= 0){
if(buf[0] == 'c' && buf[1] == 'd' && buf[2] == ' '){
// Chdir must be called by the parent, not the child.
buf[strlen(buf)-1] = 0; // chop \n
if(chdir(buf+3) < 0)
fprintf(2, "cannot cd %s\n", buf+3);
continue;
}
if(fork1() == 0)
runcmd(parsecmd(buf));
wait(0);
}
The value of individual calls fork
And exec
we’ll see later when we study the code shell
which redirects I/O.
The operating system optimizes the code fork
and does not copy memory until processes write to memory. Copying memory is a waste of time when fork
should exec
which will again replace the memory.
Xv6 allocates most memory implicitly – fork
requests the necessary memory to copy the process, and exec
– to load a program from a file. The process calls sbrk
, when additional memory is required to expand the memory by n bytes. Call sbrk
returns the address of the new memory.
I/O and File Descriptors
A file handle is an integer that represents an object in the kernel that can be read and written by a process. A process obtains a file handle when it opens a file, directory, device, creates a pipe, or copies another handle. The file descriptor abstracts the work with these objects. Instead of “file descriptor” we will say “file”.
The kernel maintains a table of open files for each process, and a file descriptor is an index into that table. It was agreed that three descriptors for each process were predefined:
0
– standard input orstdin
1
– standard output orstdout
2
– error output orstderr
Program shell
follows this convention to redirect I/O and run the command pipeline. Code shell
ensures that each process opens these handles immediately upon startup and associates with a terminal.
// Ensure that three file descriptors are open.
while((fd = open("console", O_RDWR)) >= 0){
if(fd >= 3){
close(fd);
break;
}
}
Call read
reads bytes from the file associated with the handle, and the call write
– writes bytes to a file. read(fd, buf, n)
reads up n
bytes, writes to buf
and returns the number of bytes read. Each handle remembers a position in the file. Call read
reads bytes from the current position and shifts the position by the number of bytes read. Next read
will read the next bytes of the file. The call to read will return 0
when the position reaches the end of the file.
Call write(fd, buf, n)
writes n
bytes from buf
to a file and returns the number of bytes written. Refund less n
bytes indicates a write error. Call write
writes from the current position in the file and shifts the position by the number of bytes written. Next write
continues recording where the previous one left off.
This is how the program works cat
– copies bytes from standard input to standard output:
char buf[512];
int n;
for (;;) {
n = read(0, buf, sizeof buf);
if (0 == n)
break;
if (n < 0) {
fprintf(2, "read error\n");
exit(1);
}
if (write(1, buf, n) != n) {
fprintf("write error\n");
exit(1);
}
}
Program cat
does not know whether it is reading input from a terminal, a file, or a pipe. Also cat
doesn’t know where it prints the output. File Descriptor Convention 0
, 1
And 2
simplifies the program code.
Call close
releases the file descriptor. Next challenge open
, pipe
, dup
uses this handle. The kernel selects the smallest free handle among those owned by the process.
Call fork
also copies the process’s file descriptors: the child process receives the same open files owned by the parent process. Call exec
overwrites process memory, but does not touch file descriptors. Program shell
uses this to redirect command input/output: shell
causes fork
closes and reopens file handles stdin
, stdout
, stderr
child process, then calls exec
and executes the specified program. Here’s how shell
could run the command cat < input.txt
:
char *argv[2];
argv[0] = "cat";
argv[1] = 0;
if (0 == fork()) {
close(0);
open("input.txt", O_RDONLY);
exec("cat", argv);
}
Call open
uses handle 0
after close(0)
. This way the cat program will work with input from the file input.txt
. The parent process’s file descriptors will remain untouched.
Program shell
redirects command input/output the same way:
// Execute cmd. Never returns.
void runcmd(struct cmd *cmd) {
/*...*/
switch (cmd->type) {
/*...*/
case REDIR:
rcmd = (struct redircmd*)cmd;
close(rcmd->fd);
if (open(rcmd->file, rcmd->mode) < 0) {
fprintf(2, "open %s failed\n", rcmd->file);
exit(1);
}
runcmd(rcmd->cmd);
break;
The code has already called fork
and the code operates on the child process’s file descriptors.
The second argument to open, a combination of bit flags, specifies the open action. File kernel/fcntl.h
lists the available flags:
O_RDONLY
– open file read-onlyO_WRONLY
– open file for writing onlyO_RDWR
– open the file for reading and writingO_CREATE
– create the file if it does not existO_TRUNC
– trim file length to 0 bytes
Between calls fork
And exec
program shell
is capable of redirecting a command’s I/O while leaving its own I/O the same. The code will become more complicated if instead of separate fork
And exec
announce one forkexec
: The shell program will have to redirect its own handles, execute the command, and then restore the handles, or will have to teach the commands to redirect I/O themselves.
Call fork
copies file handles, but the parent and child processes share the same position within the file.
if (0 == fork) {
write(1, "hello ", 6);
exit(0);
} else {
wait(0);
write(1, "world\n", 6);
}
This code snippet will print “hello world” in stdout
. The parent process will wait for the child process to complete and continue printing. This behavior helps to get sequential output from multiple commands, for example
(echo hello; echo world) >output.txt
Call dup
copies a file descriptor – returns a new file descriptor that is associated with the same object. Both handles refer to the same position in the file. Here’s another way to write the string “hello world” to a file:
int fd = dup(1);
write(1, "hello ", 6);
write(fd, "world\n", 6);
And like this bash
on Unix combines standard output and error output:
ls existing-file non-existing-file > tmp1 2>&1
2>&1
tells the program shell
that descriptor 2
– a copy of the descriptor 1
. This will direct the error output to the same place as the standard output. Shell
xv6 does not know how to redirect error output, but now it is clear how to implement it.
Recall open
for the same file name will return a new descriptor whose position in the file is independent of other descriptors. This open
differs from dup
And fork
.
File descriptors are a useful abstraction: a process writes to standard output and does not care whether the output is directed to the terminal, to a file, or to another process’s input.
Channels
A channel is a small buffer in the kernel. A pipe provides processes with two file descriptors: one for reading and one for writing. One process writes to the channel, and the other reads.
Sample code runs the program wc
which reads from the channel:
int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;
pipe(p);
if (0 == fork()) {
close(0);
dup(p[0]);
close(p[0]);
close(p[1]);
exec("/bin/wc", argv);
} else {
close(p[0]);
write(p[1], "hello world\n");
close(p[1]);
}
The code creates a channel by calling pipe
and stores the channel’s read and write handles in array elements p[0]
And p[1]
respectively. Both parent and child processes own channel handles after calling fork
.
The child process calls close
And dup
to direct the output of a channel to the standard input handle 0
. The child process then closes unnecessary handles in p
and starts the program wc
. Now wc
reads from the channel.
The parent process closes the pipe’s unneeded read handle, writes the string “hello world\n” to the pipe, and closes the pipe’s write handle.
Call read
will cause the process to wait to write to a pipe if the pipe is empty. Call read
will return 0
only when the code has closed the last write handle to the pipe, so it is important to close the pipe write handle in the child process before reading the pipe. Program wc
will hang if the handle is not closed.
Program shell
in xv6 implements command pipelines using pipes, for example
grep fork sh.c | wc -l
The child process creates a pipe to connect the output of the left command with the input of the right one. The process then calls fork
And runcmd
for the left command, calls fork
And runcmd
for the right command and waits for both commands to complete. The right team is also a conveyor, for example
cat wordlist.txt | sort | uniq
________________ ___________
left right
The right command will call twice fork
. Thus the program shell
builds a process tree. The leaves of the tree are commands, and the nodes of the tree are processes that wait for the completion of the left and right child processes.
The pipeline also works using temporary files instead of pipes:
# pipe
echo hello world | wc
# temporary file
echo hello world >/tmp/xyz; wc </tmp/xyz
Channels are better than temporary files for three reasons:
Channels | Temporary files |
---|---|
Channels leave no trace – the kernel automatically destroys channels after closing. | The file remains on the disk. |
The amount of data transferred is unlimited | Data volume is limited by available disk space. |
Commands are executed in parallel | The first command must complete its work before the second begins. |
The second command waits for the first to complete before starting work.
File system
The xv6 file system stores files and directories. A file is an arbitrary byte array. The file name is a link to this array. The number of file names is not limited. Directories contain the names of files and subdirectories.
The xv6 file system is a directory tree. Root directory name /
. Path /a/b/c
points to a file or directory c
what lies in the directory named b
which is nested in a directory named a
which is nested in the root directory /
.
Path that starts in the root directory /
– absolute. Relative path does not start at /
. Searching for a file using a relative path starts in the current directory of the process. Calling chdir changes the current directory of the process.
chdir("/a");
chdir("b");
// open using relative path
open("c", O_RDONLY);
// open using absolute path
open("/a/b/c", O_RDONLY);
Both code snippets open the file /a/b/c
. The first fragment changes the current directory of the process twice and opens the file at a relative path. The second fragment does not change the current directory of the process and opens the file using an absolute path.
These system calls create files and directories:
mkdir
creates a new directoryopen
with a flagO_CREATE
creates a new data filemknod
creates a new device file
mkdir("/dir");
fd = open("/dir/file", O_CREATE|O_WRONLY);
close(fd);
mknod("/console", 1, 1);
The kernel identifies the device by two numbers that are sent mknod
. The process operates on the device file and the kernel routes calls read
And write
device driver.
The file name and the file itself are not the same thing. A file is a block of data that has one or more names. File name – link to the file. The file system stores a structure for each file inode
. Structure inode
stores file metadata – type, size, disk location and number of links to the file. Directory entry – file name and link to inode
.
Call fstat
fills the structure stat
information from inode
.
// kernel/stat.h
#define T_DIR 1 // Directory
#define T_FILE 2 // File
#define T_DEVICE 3 // Device
struct stat {
int dev; // File system's disk device
uint ino; // Inode number
short type; // Type of file
short nlink; // Number of links to file
uint64 size; // Size of file in bytes
};
Call link
creates a filename that references the same inode
as the specified file.
open("a", O_CREATE|O_WRONLY);
link("a", "b");
This code snippet creates one file with two names – a
And b
. Read or write to a
– same as reading or writing to b
.
The file system assigns each inode
identifier. Call fstat
will write down the identifier inode
in field ino
structures stat
– this way you can find out what the names are a
And b
point to the same file. Field nlink
stores the number of names that reference this file.
Call unlink
will remove the file name. The kernel will remove inode
file and will free up disk space only when the number of links to the file is nlink
turns out to be equal 0
and no file descriptor references the file.
Code
open("a", O_CREATE|O_WRONLY);
link("a", "b");
unlink("a");
will leave the file accessible by name b
.
Code
fd = open("/tmp/xyz", O_CREATE|O_RDWR);
unlink("/tmp/xyz");
will create a temporary file that the kernel will delete after the handle is closed fd
or process completion.
Unix implements file utilities as user programs that can be called by a program shell
– mkdir
, ln
, rm
etc. Users add new programs and expand capabilities shell
. This solution seems obvious, but other systems – contemporaries of Unix – built such commands into shell
and herself shell
– to the core.
The only exception is shell
implements the command itself cd
. Team cd
changes the current directory of the process. The user program runs in a separate process after the call fork
and is not able to change the current process directory shell
.
Real world
Standard file descriptors, pipes and syntax shell
to work with them – the main advantage of Unix – they help write common programs that are easy to combine to solve new problems. Unix ideas gave rise to the culture of tool programs and made Unix so powerful and popular. Shell
became the first scripting language. The Unix system call interface remains to this day in BSD, Linux, and macOS.
The Unix system call interface became a standard called the Portable Operating System Interface, or POSIX. The xv6 system is not POSIX compliant: xv6 implements only a subset of POSIX system calls and does not implement it as required by the standard. The authors made xv6 simple and clear, similar to Unix.
Enthusiasts have extended xv6 by implementing more system calls and a C library to run simple Unix programs, but xv6 is a long way from modern operating systems. Such systems support networking, graphical window systems, user threads, drivers for a wide variety of devices, etc. Modern systems are evolving rapidly and offer more than POSIX.
Unix unified access to files, directories, and devices using an access interface based on names and file descriptors. Plan 9 went further and applied this idea to networking and graphics resources, but many Unix adopters did not go that route.
Multics – the predecessor of Unix – worked with files in the same way as with RAM. The Unix developers took the ideas from Multics, but simplified the system.
Unix allows multiple users to work on the system simultaneously. Unix runs each process as a specific user. Xv6 runs processes on behalf of a single user.
This book shows how xv6 implements a Unix-like interface, but the ideas are not unique to Unix. An operating system runs multiple processes simultaneously on the same computer, protects processes from each other, but allows processes to communicate. Xv6 will teach you to see these ideas in complex operating systems.
Exercises
Write a program using Unix system calls that transfers a single byte back and forth between two processes over a pair of pipes. Estimate the speed of the program in the number of transfers per second.