Extreme Raspberry Pi Boot Speed ​​Optimization

Some time ago a project was created SolarCamPi — autonomous solar-powered camera with Wi-Fi.

In this project, a Raspberry Pi Zero 2 W boots into Linux, takes a photo, connects to Wi-Fi, and then powers off (to save power). The cycle is repeated every few minutes to continually send up-to-date images to the cloud.

Every second the Pi Zero operates consumes valuable electrical energy, a resource that is always in short supply for solar-powered devices (at least in the winter in Western Europe). The user software (connecting to the server, uploading images, etc.) has already been optimized to the maximum. The electronics have also been specially designed to consume minimal power in sleep mode.

There are two possible ways to further reduce overall energy consumption:

  1. Reduce energy/current consumption.

  2. Reduce working hours.

However, in some situations it is necessary to find a balance between the two. For example, disabling CPU boost to reduce power consumption may result in increased runtime, which will offset the impact on power consumption.

Preparing equipment

Short cycle times for making and verifying changes are critical to optimizing boot processes in embedded systems. Replacing SD cards, handling card readers, and power supplies during operation is distracting and annoying.

There are several useful tools to make this process easier:

Power Profiler Kit

The Power Profiler Kit II (PPK) can power a device under test (DUT) and accurately measure its power consumption. You can turn the DUT on and off, monitor power consumption at any time, and see the status of 8 digital inputs. We will connect one of the digital inputs to a GPIO pin on the Raspberry Pi.

So the first action of our application (i.e. the finish line) will be toggling the GPIO pin. All we have to do is measure the time between power on and GPIO toggling.

USB-SD-Mux

USB-SD-Mux is a very useful tool for hardware enthusiasts. It is an adapter between a microSD card and a device under test (DUT) with a USB-C interface. The computer can “take” the microSD card from the DUT, rewrite its contents, and then reconnect the card back to the DUT without touching the device.

This greatly simplifies and speeds up the process of testing changes, eliminating the need to remove the card, insert it into a card reader, flash it, and then reconnect the card to the DUT, etc. In addition, it can be used to automate resetting or powering on the DUT using the onboard GPIOs.

USB-UART adapter

We'll also need some form of UART interface. Changes we make could break the system boot, WiFi connection, etc. at some point, and without a UART console we'll be flying blind. Standard adapters like CP2102, FTDI, etc. are great for this task.

Preparing tests

Clean Debian 12 Lite image for arm64, the only change is in the file /boot/firmware/cmdline.txt parameter added init=/init.shso that the script /init.sh started by the kernel first (before systemd or anything else started).

Script init.sh may look something like this:

#!/bin/bash

gpioset 0 4=0
sleep 1
gpioset 0 4=1
sleep 1
gpioset 0 4=0

exec /sbin/init

The script switches GPIO4 and continues normal boot, starting /sbin/init (that is, systemd).

This screenshot from Nordic's Power Profiler shows the current draw of the Raspberry Pi (at 5V) during boot. After about 12 seconds, the voltage on digital input 0 goes low, indicating that the script has finished running. init.sh.

A total of 1.90 coulombs were consumed (coulomb and ampere-second are equivalent). Calculating 1.9 A s * 5.0 V gives 9.5 W s of energy consumed in the loading process.

For reference, one AA alkaline battery can provide about 13,500 Wh of power.

Reducing current consumption

Let's get down to the easy part and try to reduce the current consumption as much as possible.

Disabling HDMI

We can completely disable the HDMI encoder. It is not possible to disable the GPU, as it is used to decode data from the camera. If your software does not require the GPU, you can try disabling it. This will reduce the current consumption from 136.7 mA to 122.6 mA (more than 10%).

The corresponding parameters are in config.txt:

# disable HDMI (saves power)
dtoverlay=vc4-kms-v3d,nohdmi
max_framebuffers=1
disable_fw_kms_setup=1
disable_overscan=1

# disable composite video output
enable_tvout=0

Disabling the activity indicator

By simply turning off the activity indicator, we can save 2mA (from 122.6mA to 120.6mA).

dtparam=act_led_trigger=none
dtparam=act_led_activelow=on

Disabling the camera indicator

Repeat the same for the camera indicator (if present). This will also reduce the chance of light from the indicator being reflected in the image.

disable_camera_led=1

Setting up CPU boost

As mentioned earlier, the power savings from disabling CPU boost may be offset by a possible increase in startup time.

With the above changes and CPU boost enabled, the Pi boots up drawing 1.62A s.

force_turbo=0
initial_turbo=10
arm_boost=0

If you disable CPU boost, the consumption decreases to 1.58A s:

For some unknown reason disabling CPU boost also changes the initial state of GPIO4 (that's why I changed the polarity in init.sh).

Reduced loading time

Reducing current consumption by ~13% is, of course, good, but still far from ideal.

The Pi takes 8 seconds (at about 1A) before the first line of Linux output appears on the console. Fortunately, there are a few ways to get more information about those 8 seconds.

Debugging the download

During the boot process, the Raspberry Pi first initializes the GPU. It accesses the SD card and looks for the file bootcode.bin (for Pi 4 and newer, EEPROM is used).

We can modify bootcode.binto enable detailed UART logging.

sed -i -e "s/BOOT_UART=0/BOOT_UART=1/" /boot/firmware/bootcode.bin

Make a backup copy of the original file bootcode.binsince changes to it may disrupt the bootloader.

Reboot with enabled BOOT_UART will give us a lot of useful information:

Raspberry Pi Bootcode

Found SD card, config.txt = 1, start.elf = 1, recovery.elf = 0, timeout = 0
Read File: config.txt, 1322 (bytes)

Raspberry Pi Bootcode
Read File: config.txt, 1322
Read File: start.elf, 2981376 (bytes)
Read File: fixup.dat, 7303 (bytes)
MESS:00:00:01.295242:0: brfs: File read: /mfs/sd/config.txt
MESS:00:00:01.300131:0: brfs: File read: 1322 bytes
MESS:00:00:01.335680:0: HDMI0:EDID error reading EDID block 0 attempt 0
[..]
MESS:00:00:01.392537:0: HDMI0:EDID error reading EDID block 0 attempt 9
MESS:00:00:01.398632:0: HDMI0:EDID giving up on reading EDID block 0
MESS:00:00:01.406335:0: brfs: File read: /mfs/sd/config.txt
MESS:00:00:01.411272:0: gpioman: gpioman_get_pin_num: pin LEDS_PWR_OK not defined
MESS:00:00:01.918176:0: gpioman: gpioman_get_pin_num: pin LEDS_PWR_OK not defined
MESS:00:00:01.923999:0: *** Restart logging
MESS:00:00:01.927872:0: brfs: File read: 1322 bytes
MESS:00:00:01.933328:0: hdmi: HDMI0:EDID error reading EDID block 0 attempt 0
[..]
MESS:00:00:01.995436:0: hdmi: HDMI0:EDID error reading EDID block 0 attempt 9
MESS:00:00:02.002052:0: hdmi: HDMI0:EDID giving up on reading EDID block 0
MESS:00:00:02.007955:0: hdmi: HDMI0:EDID error reading EDID block 0 attempt 0
[..]
MESS:00:00:02.070610:0: hdmi: HDMI0:EDID error reading EDID block 0 attempt 9
MESS:00:00:02.077225:0: hdmi: HDMI0:EDID giving up on reading EDID block 0
MESS:00:00:02.082840:0: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
MESS:00:00:02.091586:0: HDMI0: hdmi_pixel_encoding: 162000000
MESS:00:00:02.799203:0: brfs: File read: /mfs/sd/initramfs8
MESS:00:00:02.803082:0: Loaded 'initramfs8' to 0x0 size 0xb0898e
MESS:00:00:02.821799:0: initramfs loaded to 0x1b4e7000 (size 0xb0898e)
MESS:00:00:02.836318:0: dtb_file 'bcm2710-rpi-zero-2-w.dtb'
MESS:00:00:02.840194:0: brfs: File read: 11569550 bytes
MESS:00:00:02.849171:0: brfs: File read: /mfs/sd/bcm2710-rpi-zero-2-w.dtb
MESS:00:00:02.854262:0: Loaded 'bcm2710-rpi-zero-2-w.dtb' to 0x100 size 0x8258
MESS:00:00:02.876038:0: brfs: File read: 33368 bytes
MESS:00:00:02.892755:0: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
MESS:00:00:02.927145:0: brfs: File read: 5255 bytes
MESS:00:00:02.933541:0: brfs: File read: /mfs/sd/config.txt
MESS:00:00:02.937568:0: dtparam: audio=on
MESS:00:00:02.948005:0: brfs: File read: 1322 bytes
MESS:00:00:02.971952:0: brfs: File read: /mfs/sd/overlays/vc4-kms-v3d.dtbo
MESS:00:00:03.023016:0: Loaded overlay 'vc4-kms-v3d'
MESS:00:00:03.026278:0: dtparam: nohdmi=true
MESS:00:00:03.031105:0: dtparam: act_led_trigger=none
MESS:00:00:03.048180:0: dtparam: act_led_activelow=on
MESS:00:00:03.149316:0: brfs: File read: 2760 bytes
MESS:00:00:03.154502:0: brfs: File read: /mfs/sd/cmdline.txt
MESS:00:00:03.158504:0: Read command line from file 'cmdline.txt':
MESS:00:00:03.164369:0: 'console=serial0,115200 console=tty1 root=PARTUUID=26bbce6b-02 rootfstype=ext4 fsck.repair=yes rootwait cfg80211.ieee80211_regdom=DE init=/init.sh'
MESS:00:00:03.195926:0: gpioman: gpioman_get_pin_num: pin EMMC_ENABLE not defined
MESS:00:00:03.269361:0: brfs: File read: 146 bytes
MESS:00:00:03.812401:0: brfs: File read: /mfs/sd/kernel8.img
MESS:00:00:03.816343:0: Loaded 'kernel8.img' to 0x200000 size 0x8d8bd7
MESS:00:00:05.364579:0: Device tree loaded to 0x1b4de900 (size 0x8605)
MESS:00:00:05.370571:0: uart: Set PL011 baud rate to 103448.300000 Hz
MESS:00:00:05.377080:0: uart: Baud rate change done...
MESS:00:00:05.380495:0: uart: Baud rate[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]

Disable HDMI Detection

The bootloader spends a lot of time trying to auto-detect video settings for a possibly connected HDMI monitor. We don't have HDMI (it's already disabled, remember?), so there's no point in expecting an I2C response with EDID information (resolution, refresh rate, etc.).

By simply hardcoding the EDID, we can disable device detection:

# don't try to read HDMI eeprom
hdmi_blanking=2
hdmi_ignore_edid=0xa5000080
hdmi_ignore_cec_init=1
hdmi_ignore_cec=1

Disable HAT, PoE and LCD detection

The boot process will also try to detect EEPROM on HAT devices, detect PoE (which requires a fan), etc. We can safely disable these features:

# all these options cause a wait for an I2C bus response, we don't need any of them, so let's disable them.
force_eeprom_read=0
disable_poe_fan=1
ignore_lcd=1
disable_touchscreen=1
disable_fw_kms_setup=1

Disable camera and display detection

Detecting a connected MIPI camera or display also takes some time. We know which camera is connected (HQ Camera, IMX477), so let's just hardcode it:

# no autodetection for anything (will wait for I2C answers)
camera_auto_detect=0
display_auto_detect=0

# load HQ camera IMX477 sensor manually
dtoverlay=imx477

Disabling initramfs

The changes made reduced the (self-reported) load time from 5.38 seconds to 4.75 seconds. We can disable it completely initramfsremoving the parameter auto_initramfs=1.

Savings depend on size initramfsin our case this reduced the time to 4.47 seconds.

Tested but does not affect loading speed

The Internet often recommends overclocking SD peripherals to 100 MHz, but in our case this did not provide any increase in loading speed.

# not recommended! data corruption risk!
dtoverlay=sdtweak,overclock_50=100

Working with SD peripherals at such high speeds also carries the risk of data corruption (during write operations), which is highly undesirable for remote IoT devices.

Loading the kernel

At this stage, one of the slowest operations is loading the kernel:

MESS:00:00:03.816343:0: Loaded 'kernel8.img' to 0x200000 size 0x8d8bd7
MESS:00:00:05.364579:0: Device tree loaded to 0x1b4de900 (size 0x8605)

Loading 9,276,375 bytes takes about 1.54 seconds, which corresponds to a transfer rate of about 6 MiB/s. This loading is performed by the GPU (!) using the built-in proprietary VideoCore IV processor. It is possible that the bootloader code is simply inefficient and slow, or uses very conservative settings. Unfortunately, we do not know how it is designed, and we cannot change its parameters by interfering with registers or in any other way.

I haven't found a good way to optimize boot yet, so I'll need to shrink the kernel itself.

Theoretically, it is possible to overclock the GPU processor core by setting the parameters:

# Overclock GPU VideoCore IV processor (not recommended!)
core_freq_min=500
core_freq=550

This results in a 20% reduction in kernel boot time, but the side effects (reliability, etc.) of this are unknown.

Buildroot and Custom Kernel

It was time to switch from Raspbian/Debian to a custom Buildroot build (primarily to build a custom kernel). Using Buildroot 2024.02.1, a very simplified system was set up:

  • Native toolchain aarch64

  • Full glibc

  • Raspberry Pi tools (e.g. camera utilities)

The kernel was configured:

  • No sound support

  • Without most block device drivers and file systems (excluding SD/MMC and ext4)

  • No RAID support

  • No USB support

  • Without HID support

  • No DVB support

  • No video support and framebuffer (HDMI is still disabled)

  • Without advanced networking features (tunnels, bridges, firewalls, etc.)

  • Without compression

  • Modules are not compressed

In tests, it turned out that using an uncompressed kernel and modules leads to a positive effect in terms of power consumption (even if more time is spent loading the GPU kernel). Gzip decompression requires a lot of power (and involves an additional translation step).

A security feature called KASLR has also been disabled. KASLR randomly changes the kernel's load address in memory, making it harder to write exploit code (since the kernel's location in memory is unknown). This requires moving the kernel after it is loaded by the GPU.

In our case, the network attack surface is very limited, so KASLR can be disabled (all applications run with root privileges anyway). Protection against speculative execution vulnerabilities such as Spectre is also disabled.

The resulting kernel is 8.5 MiB (uncompressed), 4.1 MiB after Gzip compression (which is not used here, just for comparison).

The original Raspbian kernel was 25 MiB (uncompressed), 8.9 MiB after Gzip compression.

Final result

We can now boot into a Linux user space program in less than 3.5 seconds! About 400ms is spent in the Linux kernel (the difference between pin 0 and pin 1).

The total energy consumption is 0.364 Ac * 5.0 V = 1.82 W s. We reduced power consumption by 5x (compared to standard Debian, where it was 9.5 Ws to user space).

Reducing input voltage

After this article was published Graham Sutherland / Polynomial notedthat the power regulators on the Pi Zero are not very efficient at 5.0V input voltage. This will not always be a suitable solution, but in our test scenario and also in the finished product, we can simply reduce the input voltage to 4.0V.

At 5.0V:

Note the units used. MilliCoulombs (mC) increase when moving to 4.0V (due to higher current), but the power consumption decreases significantly!

350.94 mAs * 5.0 V = 1.754 W s

At 4.0V:

390.77 mAs * 4.0 V = 1.563 W s

Let's try to reduce the voltage even more:

At 3.6V:

399.60 mAs * 3.6 V = 1.438 W s

We just reduced power consumption by another 20% by simply adjusting the switching voltage regulators! Of course, this requires further testing for stability and reliability (since it is technically out of spec), but the result is very impressive.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *