Operating system interfaces

Xv6 Chapter 1: Operating System Interfaces

The operating system divides the computer between several programs so that the programs run simultaneously. The operating system abstracts the work with the hardware – the program does not care what type of disk the computer is working with. The operating system allows programs to communicate and work together.

The operating system defines the interface with which programs operate. A good interface is difficult to design. A primitive interface is easier to implement, but often programs want complex functions. An interface must combine primitive mechanisms to create complex functions.

This book talks about the principles of operation of operating systems using xv6 as an example. The xv6 operating system implements the basic interface that Ken Thompson and Dennis Ritchie proposed in the Unix operating system and emulates the internal workings of Unix. Combinations of the simplest Unix mechanisms give amazing freedom of action. Modern operating systems have recognized the success of Unix and implement similar interfaces – BSD, Linux, macOS, Solaris, and even Microsoft Windows. Learning xv6 will help you understand how other operating systems work.

The xv6 kernel is a special program that provides operating system services to other programs. A program, when running, is called a process. Each process owns its own memory, which stores instructions, data, and the process stack. Instructions are calculations that the program performs. Data are the variables that instructions operate on. The stack stores the procedure calls that the program executes. There are several processes running on a computer, but only one core.

A process makes a system call to use kernel services. A system call passes control to the kernel, the kernel runs and returns control – this is how the process switches between user and kernel spaces.

The kernel protects itself from user processes using CPU mechanisms. The kernel works with the hardware, but user processes do not. The user process executes the system call, the processor escalates privileges, and executes the kernel code.

A set of system calls is a kernel interface accessible to user programs. The xv6 kernel provides some of the same system calls as the Unix kernel.

System call

Description

int fork()

Creates a new process, returns the PID of the child process.

int exit(int status)

Terminates the current process, passes status to the wait() call, and does not return control to the program.

int wait(int *status)

Waits for the child process to terminate, writes the termination code to *status, returns the PID of the terminated process.

int kill(int pid)

Terminates the process with the specified PID. Returns 0 on success and -1 on error.

int getpid()

Returns the PID of the current process.

int sleep(int n)

Pauses the process for n processor cycles.

int exec(char *file, char *argv[])

Loads the program from file and executes it with arguments argv. Returns control only in case of error.

char *sbrk(int n)

Expands process memory by n bytes. Returns the address of the start of new memory.

int open(char *file, int flags)

Opens a file. flags means read and write permissions. Returns a file descriptor.

int write(int fd, char *buf, int n)

Writes n bytes from buf to file descriptor fd. Returns the number of bytes written.

int read(int fd, char *buf, int n)

Reads n bytes from a file into buf. Returns the number of bytes read, or 0 if the end of the file is reached.

int close(int fd)

Releases the open file descriptor fd.

int dup(int fd)

Returns a new file descriptor that refers to the same file as fd.

int pipe(int p[])

Creates a pipe, places read and write file descriptors in p[0] and p[1].

int chdir(char *dir)

Changes the current directory.

int mkdir(char *dir)

Creates a new directory.

int mknod(char *file, int, int)

Creates a device file.

int fstat(int fd, struct stat *st)

Writes file information to *st.

int stat(char *file, struct stat *st)

Writes file information to *st.

int link(char *file1, char *file2)

Creates a new name file2 for file file1.

int unlink(char *file)

Deletes a file.

The following text describes xv6 services – processes, memory, file handles, pipes and the file system. Code examples show how the program shell uses kernel services, and how carefully system calls are designed.

Program shell – Unix command line interface. Shell reads and executes user commands. Shell – a user program, not part of the kernel – this is the power of system calls. Shell easy to replace with another program – modern Unix systems offer several such programs with different interfaces and work automation capabilities. Shell xv6 implements the basic ideas of the Bourne shell. Program code shell – V user/sh.c .

Processes and memory

Each xv6 process owns memory in user space. Process memory consists of code, data, and stack. The xv6 kernel divides processor time between processes. Xv6 saves processor registers while the process is not running and restores the registers when the process starts executing. The kernel assigns a PID to each process and manages the state of each process.

A process starts a new process by calling fork. Call fork copies process memory – code, data and stack – then returns control. Call fork will return to the original process the PID of the new process, and to the new process – 0. The original process is called the parent process, and the new process is called the child process.

int pid = fork();
if (0 < pid) {
  printf("parent: child=%d\n", pid);
  pid = wait((int *) 0);
  printf("child %d is done\n", pid);
} else if (0 == pid) {
  printf("child: exiting\n");
  exit(0);
} else {
  printf("fork error\n");
}

System call exit stops the current process and frees its associated resources, such as memory and open files. Argument exit – exit code – 0 if successful and 1 in case of error.

System call wait returns the PID of the completed child process to the current process and writes the exit code to the passed address. Call wait authorizes the transfer 0, if the exit code is not needed. Call wait waits for the child process to terminate if no child process has terminated yet, and will return immediately -1if the current process has no children.

Line order

parent: child=1234
child: exiting

depends on which process calls first printf. After the child process terminates, wait will return control to the parent and the code will print

parent: child 1234 is done

Although after the call fork The memory contents of both processes are the same, one process does not affect the operation of the other. Each process operates with its own processor variables and registers. For example, variable pid child process will remain unchanged when the parent process executes

pid = wait((int *) 0);

Call exec will replace the memory image of the current process with the one it loads from the file. In case of success exec does not return control to the calling program, but begins execution of the program from the file. exec takes 2 arguments: the name of the executable file and an array of program argument strings.

An executable file is the result of compiling the source text of a program. The executable file format determines where the code, data, first instruction of the program, etc. are. Xv6 uses the format ELFwhich is discussed in detail in Chapter 3. The header of the ELF file determines the entry point into the program – the address of the first instruction.

char* argv[3];
argv[0] = "echo";
argv[1] = "hello";
argv[2] = 0;
exec("/bin/echo", argv);
printf("exec error\n");

This code snippet will replace the current program with the program /bin/echo , launched with the arguments “echo”, “hello”. The first argument is the program name.

Program shell xv6 uses calls fork And execto execute programs at the user’s request. Cycle in main receives a string from the user by calling getcmdthen creates a copy of the process shell challenge fork. Parental shell causes wait and waits for the child to complete, and the child shell executes the command entered by the user. For example, the user enters “echo hello” parsecmd parses the input and runcmd executes a command by calling exec. So shell will execute the code echo instead of further code runcmd. The echo code will then call exitafter which the parent shell will return from wait V main.

  // Read and run input commands.
  while(getcmd(buf, sizeof(buf)) >= 0){
    if(buf[0] == 'c' && buf[1] == 'd' && buf[2] == ' '){
      // Chdir must be called by the parent, not the child.
      buf[strlen(buf)-1] = 0;  // chop \n
      if(chdir(buf+3) < 0)
        fprintf(2, "cannot cd %s\n", buf+3);
      continue;
    }
    if(fork1() == 0)
      runcmd(parsecmd(buf));
    wait(0);
  }

The value of individual calls fork And exec we’ll see later when we study the code shellwhich redirects I/O.

The operating system optimizes the code fork and does not copy memory until processes write to memory. Copying memory is a waste of time when fork should execwhich will again replace the memory.

Xv6 allocates most memory implicitly – fork requests the necessary memory to copy the process, and exec – to load a program from a file. The process calls sbrk, when additional memory is required to expand the memory by n bytes. Call sbrk returns the address of the new memory.

I/O and File Descriptors

A file handle is an integer that represents an object in the kernel that can be read and written by a process. A process obtains a file handle when it opens a file, directory, device, creates a pipe, or copies another handle. The file descriptor abstracts the work with these objects. Instead of “file descriptor” we will say “file”.

The kernel maintains a table of open files for each process, and a file descriptor is an index into that table. It was agreed that three descriptors for each process were predefined:

  • 0 – standard input or stdin

  • 1 – standard output or stdout

  • 2 – error output or stderr

Program shell follows this convention to redirect I/O and run the command pipeline. Code shell ensures that each process opens these handles immediately upon startup and associates with a terminal.

  // Ensure that three file descriptors are open.
  while((fd = open("console", O_RDWR)) >= 0){
    if(fd >= 3){
      close(fd);
      break;
    }
  }

Call read reads bytes from the file associated with the handle, and the call write – writes bytes to a file. read(fd, buf, n) reads up n bytes, writes to buf and returns the number of bytes read. Each handle remembers a position in the file. Call read reads bytes from the current position and shifts the position by the number of bytes read. Next read will read the next bytes of the file. The call to read will return 0when the position reaches the end of the file.

Call write(fd, buf, n) writes n bytes from buf to a file and returns the number of bytes written. Refund less n bytes indicates a write error. Call write writes from the current position in the file and shifts the position by the number of bytes written. Next write continues recording where the previous one left off.

This is how the program works cat – copies bytes from standard input to standard output:

char buf[512];
int n;

for (;;) {
  n = read(0, buf, sizeof buf);
  if (0 == n)
    break;
  if (n < 0) {
    fprintf(2, "read error\n");
    exit(1);
  }
  if (write(1, buf, n) != n) {
    fprintf("write error\n");
    exit(1);
  }
}

Program cat does not know whether it is reading input from a terminal, a file, or a pipe. Also cat doesn’t know where it prints the output. File Descriptor Convention 0, 1 And 2 simplifies the program code.

Call close releases the file descriptor. Next challenge open, pipe, dup uses this handle. The kernel selects the smallest free handle among those owned by the process.

Call fork also copies the process’s file descriptors: the child process receives the same open files owned by the parent process. Call exec overwrites process memory, but does not touch file descriptors. Program shell uses this to redirect command input/output: shell causes forkcloses and reopens file handles stdin, stdout, stderr child process, then calls exec and executes the specified program. Here’s how shell could run the command cat < input.txt:

char *argv[2];
argv[0] = "cat";
argv[1] = 0;
if (0 == fork()) {
  close(0);
  open("input.txt", O_RDONLY);
  exec("cat", argv);
}

Call open uses handle 0 after close(0). This way the cat program will work with input from the file input.txt. The parent process’s file descriptors will remain untouched.

Program shell redirects command input/output the same way:

// Execute cmd.  Never returns.
void runcmd(struct cmd *cmd) {
  /*...*/
  switch (cmd->type) {
    /*...*/
    case REDIR:
      rcmd = (struct redircmd*)cmd;
      close(rcmd->fd);
      if (open(rcmd->file, rcmd->mode) < 0) {
        fprintf(2, "open %s failed\n", rcmd->file);
        exit(1);
      }
      runcmd(rcmd->cmd);
      break;

The code has already called fork and the code operates on the child process’s file descriptors.

The second argument to open, a combination of bit flags, specifies the open action. File kernel/fcntl.h lists the available flags:

  • O_RDONLY – open file read-only

  • O_WRONLY – open file for writing only

  • O_RDWR – open the file for reading and writing

  • O_CREATE – create the file if it does not exist

  • O_TRUNC – trim file length to 0 bytes

Between calls fork And exec program shell is capable of redirecting a command’s I/O while leaving its own I/O the same. The code will become more complicated if instead of separate fork And exec announce one forkexec: The shell program will have to redirect its own handles, execute the command, and then restore the handles, or will have to teach the commands to redirect I/O themselves.

Call fork copies file handles, but the parent and child processes share the same position within the file.

if (0 == fork) {
  write(1, "hello ", 6);
  exit(0);
} else {
  wait(0);
  write(1, "world\n", 6);
}

This code snippet will print “hello world” in stdout. The parent process will wait for the child process to complete and continue printing. This behavior helps to get sequential output from multiple commands, for example

(echo hello; echo world) >output.txt

Call dup copies a file descriptor – returns a new file descriptor that is associated with the same object. Both handles refer to the same position in the file. Here’s another way to write the string “hello world” to a file:

int fd = dup(1);
write(1, "hello ", 6);
write(fd, "world\n", 6);

And like this bash on Unix combines standard output and error output:

ls existing-file non-existing-file > tmp1 2>&1

2>&1 tells the program shellthat descriptor 2 – a copy of the descriptor 1. This will direct the error output to the same place as the standard output. Shell xv6 does not know how to redirect error output, but now it is clear how to implement it.

Recall open for the same file name will return a new descriptor whose position in the file is independent of other descriptors. This open differs from dup And fork.

File descriptors are a useful abstraction: a process writes to standard output and does not care whether the output is directed to the terminal, to a file, or to another process’s input.

Channels

A channel is a small buffer in the kernel. A pipe provides processes with two file descriptors: one for reading and one for writing. One process writes to the channel, and the other reads.

Sample code runs the program wcwhich reads from the channel:

int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;

pipe(p);
if (0 == fork()) {
  close(0);
  dup(p[0]);
  close(p[0]);
  close(p[1]);
  exec("/bin/wc", argv);
} else {
  close(p[0]);
  write(p[1], "hello world\n");
  close(p[1]);
}

The code creates a channel by calling pipe and stores the channel’s read and write handles in array elements p[0] And p[1] respectively. Both parent and child processes own channel handles after calling fork.

The child process calls close And dupto direct the output of a channel to the standard input handle 0. The child process then closes unnecessary handles in p and starts the program wc. Now wc reads from the channel.

The parent process closes the pipe’s unneeded read handle, writes the string “hello world\n” to the pipe, and closes the pipe’s write handle.

Call read will cause the process to wait to write to a pipe if the pipe is empty. Call read will return 0 only when the code has closed the last write handle to the pipe, so it is important to close the pipe write handle in the child process before reading the pipe. Program wc will hang if the handle is not closed.

Program shell in xv6 implements command pipelines using pipes, for example

grep fork sh.c | wc -l

The child process creates a pipe to connect the output of the left command with the input of the right one. The process then calls fork And runcmd for the left command, calls fork And runcmd for the right command and waits for both commands to complete. The right team is also a conveyor, for example

cat wordlist.txt | sort | uniq
________________   ___________
      left            right

The right command will call twice fork. Thus the program shell builds a process tree. The leaves of the tree are commands, and the nodes of the tree are processes that wait for the completion of the left and right child processes.

The pipeline also works using temporary files instead of pipes:

# pipe
echo hello world | wc

# temporary file
echo hello world >/tmp/xyz; wc </tmp/xyz

Channels are better than temporary files for three reasons:

Channels

Temporary files

Channels leave no trace – the kernel automatically destroys channels after closing.

The file remains on the disk.

The amount of data transferred is unlimited

Data volume is limited by available disk space.

Commands are executed in parallel

The first command must complete its work before the second begins.

The second command waits for the first to complete before starting work.

File system

The xv6 file system stores files and directories. A file is an arbitrary byte array. The file name is a link to this array. The number of file names is not limited. Directories contain the names of files and subdirectories.

The xv6 file system is a directory tree. Root directory name /. Path /a/b/c points to a file or directory cwhat lies in the directory named bwhich is nested in a directory named awhich is nested in the root directory /.

Path that starts in the root directory / – absolute. Relative path does not start at /. Searching for a file using a relative path starts in the current directory of the process. Calling chdir changes the current directory of the process.

chdir("/a");
chdir("b");
// open using relative path
open("c", O_RDONLY);

// open using absolute path
open("/a/b/c", O_RDONLY);

Both code snippets open the file /a/b/c. The first fragment changes the current directory of the process twice and opens the file at a relative path. The second fragment does not change the current directory of the process and opens the file using an absolute path.

These system calls create files and directories:

  • mkdir creates a new directory

  • open with a flag O_CREATE creates a new data file

  • mknod creates a new device file

mkdir("/dir");
fd = open("/dir/file", O_CREATE|O_WRONLY);
close(fd);
mknod("/console", 1, 1);

The kernel identifies the device by two numbers that are sent mknod. The process operates on the device file and the kernel routes calls read And write device driver.

The file name and the file itself are not the same thing. A file is a block of data that has one or more names. File name – link to the file. The file system stores a structure for each file inode. Structure inode stores file metadata – type, size, disk location and number of links to the file. Directory entry – file name and link to inode.

Call fstat fills the structure stat information from inode.

// kernel/stat.h
#define T_DIR     1   // Directory
#define T_FILE    2   // File
#define T_DEVICE  3   // Device

struct stat {
  int dev;     // File system's disk device
  uint ino;    // Inode number
  short type;  // Type of file
  short nlink; // Number of links to file
  uint64 size; // Size of file in bytes
};

Call link creates a filename that references the same inodeas the specified file.

open("a", O_CREATE|O_WRONLY);
link("a", "b");

This code snippet creates one file with two names – a And b. Read or write to a – same as reading or writing to b.

The file system assigns each inode identifier. Call fstat will write down the identifier inode in field ino structures stat – this way you can find out what the names are a And b point to the same file. Field nlink stores the number of names that reference this file.

Call unlink will remove the file name. The kernel will remove inode file and will free up disk space only when the number of links to the file is nlink turns out to be equal 0 and no file descriptor references the file.

Code

open("a", O_CREATE|O_WRONLY);
link("a", "b");
unlink("a");

will leave the file accessible by name b.

Code

fd = open("/tmp/xyz", O_CREATE|O_RDWR);
unlink("/tmp/xyz");

will create a temporary file that the kernel will delete after the handle is closed fd or process completion.

Unix implements file utilities as user programs that can be called by a program shellmkdir, ln, rm etc. Users add new programs and expand capabilities shell. This solution seems obvious, but other systems – contemporaries of Unix – built such commands into shelland herself shell – to the core.

The only exception is shell implements the command itself cd. Team cd changes the current directory of the process. The user program runs in a separate process after the call fork and is not able to change the current process directory shell.

Real world

Standard file descriptors, pipes and syntax shell to work with them – the main advantage of Unix – they help write common programs that are easy to combine to solve new problems. Unix ideas gave rise to the culture of tool programs and made Unix so powerful and popular. Shell became the first scripting language. The Unix system call interface remains to this day in BSD, Linux, and macOS.

The Unix system call interface became a standard called the Portable Operating System Interface, or POSIX. The xv6 system is not POSIX compliant: xv6 implements only a subset of POSIX system calls and does not implement it as required by the standard. The authors made xv6 simple and clear, similar to Unix.

Enthusiasts have extended xv6 by implementing more system calls and a C library to run simple Unix programs, but xv6 is a long way from modern operating systems. Such systems support networking, graphical window systems, user threads, drivers for a wide variety of devices, etc. Modern systems are evolving rapidly and offer more than POSIX.

Unix unified access to files, directories, and devices using an access interface based on names and file descriptors. Plan 9 went further and applied this idea to networking and graphics resources, but many Unix adopters did not go that route.

Multics – the predecessor of Unix – worked with files in the same way as with RAM. The Unix developers took the ideas from Multics, but simplified the system.

Unix allows multiple users to work on the system simultaneously. Unix runs each process as a specific user. Xv6 runs processes on behalf of a single user.

This book shows how xv6 implements a Unix-like interface, but the ideas are not unique to Unix. An operating system runs multiple processes simultaneously on the same computer, protects processes from each other, but allows processes to communicate. Xv6 will teach you to see these ideas in complex operating systems.

Exercises

  • Write a program using Unix system calls that transfers a single byte back and forth between two processes over a pair of pipes. Estimate the speed of the program in the number of transfers per second.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *