Steal Time. What is it and how does it work?

Steal Time and Load Average are things that a Linux admin sees every day, but few dig under the hood and understand how it works.

Kirill Kazarin, Senior DevOps and SRE manager, speaker of our course “Linux Administration.Mega”, told what Steal Time is: what kind of processor metric is it, how it works and how to understand it. Let's figure it out:

What is Steal Time

Steal Time (or “steal time”) is a term used in the context of virtualization and operating systems to describe the amount of time a virtual machine (VM) waits for access to a physical processor because that processor is currently being used by another VM.

In virtualization systems, multiple virtual machines can run on a single physical server, sharing its resources – processors (CPU), memory, and input/output devices.

The hypervisor (the software that manages virtual machines) allocates resources between virtual machines by determining which VM to grant access to a physical processor and when.

Steal time occurs when a virtual machine is ready to perform tasks and requests access to the processor, but the hypervisor postpones the execution of these tasks, since at that moment the physical processor is busy performing tasks of another virtual machine. Simply put, this is the time during which the virtual machine is “stolen” from the processor by another virtual environment.

Why you should know about this

Because in 2024, most servers are virtual machines. If you use some public cloud, then even more so. For example, the last time I worked with a “hardware” server was about 5 years ago. Since then, only virtual machines.

We understand what this time is, but how is it calculated? How can VM know anything about the situation outside of it?

The virtual machine (VM) itself does not track the time it waits for access to the physical processor. Instead, the hypervisor (the virtualization software that runs virtual machines) monitors CPU usage and provides steal time information to the VM.

How does this work

1. The role of the hypervisor

The hypervisor manages the allocation of physical resources such as the processor.
(CPU), between multiple virtual machines running on the same
physical server. When a virtual machine wants to perform a task,
it requests access to the CPU. The hypervisor decides when and to which virtual machine
provide this access.

  1. Waiting for access:

    If all available CPUs are currently busy executing tasks of other virtual machines, the hypervisor places the current virtual machine's request in a waiting queue. During this time, the virtual machine does not have access to a physical processor and, accordingly, cannot execute its tasks. The hypervisor “steals” time from this virtual machine, providing resources to another.

  2. Stealth time tracking:

    The hypervisor keeps track of how much time each virtual machine spends waiting because the CPU is busy running tasks for other VMs. When a virtual machine gains control of the CPU, the hypervisor can tell it how much time it has spent waiting for access to the CPU. This time is called steal time.

  3. Transferring information to a virtual machine:

    The hypervisor provides information about steal time to the operating system running inside the virtual machine. This information is transmitted through a special interface – paravirtualization (paravirtualized CPU). In the virtual machine operating system, steal time can be accessed through standard monitoring tools such as top, vmstat and others, which simply interpret the data provided by the hypervisor.

    Do you feel how deep the rabbit hole goes?)

What is Paravirtualization

Paravirtualization is a special method of interaction between a virtual machine and a hypervisor that allows the transfer of performance information.
and resources directly to the operating system inside the virtual machine.

1. Paravirtualization (Paravirt):

Paravirtualization is a technology that allows the hypervisor to provide virtual machines with a more accurate view of the state of their virtualized environment, including metrics such as steal time. In the case of steal time, the hypervisor provides the Linux kernel inside the virtual machine with information about how much time the processor was unavailable due to the hypervisor performing tasks for other virtual machines.

2. Transferring steal time via KVM (or other hypervisors):

The most common hypervisors, such as KVM (Kernel-based Virtual Machine), VMware, Xen or Hyper-V, use special interfaces to communicate steal time information. In the case of KVM, steal time information is communicated via a virtualized processor (vCPU), which informs the Linux kernel about the time that was “stolen” from the virtual machine.

3. Integration with Linux kernel:

There is code in the Linux kernel that integrates with the paravirtualization mechanism and processes steal time information. This code was added to the Linux kernel as part of the implementation of paravirtualized CPU support. When the Linux kernel receives steal time data, it updates the corresponding value, which can then be used by system monitoring tools.

4. Custom monitoring tools:

Steal time information is then available through standard monitoring utilities such as top, htop, vmstat, and others. These utilities get steal time data from /proc/stat, a special file that the Linux kernel updates in real time. For example, in top you can see steal time in the %st column.

How it works in the kernel

Below is an example of code from the Linux kernel that deals with steal time handling.
This code is located in kernel/sched/cputime.c, the file that is responsible for accounting for processor time, including steal time.

For example:
https://github.com/torvalds/linux/blob/master/kernel/sched/cputime.c#L211
https://github.com/torvalds/linux/blob/master/kernel/sched/cputime.c#L253

Hidden text

More subtleties and details on knowledge and administration of Linux – on our course “Linux Administration.Mega”. You can join the current stream until September 15 inclusive. We are giving away the video course “Ansible: Infrastructure as Code” for this stream as a gift. And more useful materials personally from Kirill are in his telegram channel.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *