Dereferencing NULL on the RISC-V core scr1
While working at the ASIC design center, I spent a lot of time debugging errors and core crashes, looking at timing diagrams on the AXI buses from the processor to the memory. Sometimes it turned out that the memory read address was 0x00000000
– a classic case of null pointer dereference in C. On OS systems, this leads to the segfault known to all C programmers. And in bare metal systems dereferencing NULL
can lead to interesting situations. In this article, we'll look at what happens when NULL is dereferenced, using the open source RISC-V kernel scr1 and the open source RTL Verilator simulation tool as an example.
Preparation
All manipulations with kernel simulation in this article were performed on a machine running Ubuntu 22.04. The following programs and projects were used:
Verilator – Software for simulating kernel sources. Converts HDL code to C++ code and executable file. Installation instructions on the project website: https://verilator.org/guide/latest/install.html . You must install version 4.102 or later to support the `|=>` operator used in the scr1 source code. The Ubuntu 22.04 repository has version 4.038, so I compiled a stable release from the sources from GitHub.
scr1 – directly source code of the kernel on HDL System Verilog. It also includes sample C programs so you have something to run on the processor for testing.
git clone https://github.com/syntacore/scr1.git
RISC-V GCC toolchain – a set of programs for compiling C code and assembling firmware for the RISC-V core. Toolchain binaries can be downloaded from the Syntacore website https://syntacore.com/tools/development-tools and indicate the path to them:
export PATH=<GCC_INSTALL_PATH>/bin:$PATH
GTKwave – Software for visualizing timing diagrams obtained as a result of simulation. https://gtkwave.sourceforge.net/.
Running Hello World Simulation
In the Git repository scr1 in the directory`sim/tests
there are test examples of programs that will be executed on the simulated kernel. Let's start with a simple test hello
. In this example, the kernel “prints” the line `"Hello from SCR1!\n"
. Printing occurs in the form of writing to a special memory address 0xF0000000
– SCR1_SIM_PRINT_ADDR
when writing an 8-bit value to this address, the corresponding symbol will be printed in the testbench console. To make it more convenient to analyze timing diagrams and assembly listings in a file common.mk
to the compilation flags I added the flag -fno-inline
so the functions will not be inlined.
To run a simulation using Verilator and saving waveforms with test hello
you can use the following command:
make run_verilator_wf CFG=MAX BUS=AXI TARGETS="hello" TRACE=1
The console will display this “Hello World!” line:
---Test: hello.hex
Hello from SCR1!
Test passed
#--------------------------------------
# Summary: 1/1 tests passed
#--------------------------------------
In the directory build/{}
with a name corresponding to the command used (in this case verilator_wf_AXI_MAX_imc_IPIC_1_TCM_1_VIRQ_1_TRACE_1
) will contain the results of compilation and simulation.
Timing diagrams saved in file simx.vcd
. Now we are interested in how bytes are printed from a buffer through a special memory address to the testbench console. To see what this print looks like, let's open the file in GTKwave and display the signals on a diagram.
The diagram shows the AXI4 interface signals for writing to data memory. According to the specification scr1 (SCR1 External Architecture Specification) prefix io_axi_dmem
refers to data memory access signals. Signal awaddr
– the address at which the recording takes place, wdata
– data to be recorded is presented in the figure in ASCII character format. When the signals wvalid
And wready
the unit simultaneously directly records data from the signal wdata
to the address from awaddr
. These signals are hierarchically located at the top of the testbench scr1_top_tb_axi
.
By exposure wvalid
V 1
you can see that characters are written to memory sequentially H
, e
, l
and so on from the line "Hello from SCR1!\n"
which is printed in the example.
The screenshot with the timing diagram also shows the value of the current program counter register – program counter curr_pc
. By its value you can roughly understand where the processor is currently located in program memory, what instructions it executes and in what function. This signal is located deeper in the hierarchy, in the processor pipeline (core pipeline), in scr1_top_tb_axi/i_top/i_core_top/i_pipe_top
.
In file hello.dump
The assembler listing of the program has been saved. Directly printing a line (writing bytes from a buffer to a special memory address) is performed by the function sc_puts
from file sc_print.c
. In my case this function was placed at 0x00480000
in RAM memory. This is where it comes in handy -fno-inline
because calls to these functions and transition to the addresses where their code is located will be explicit.
00480000 <sc_puts>:
480000: 87aa mv a5,a0
480002: 0ab05f63 blez a1,4800c0 <sc_puts+0xc0>
480006: 0075f693 andi a3,a1,7
48000a: f0000737 lui a4,0xf0000
Direct entry to the address SCR1_SIM_PRINT_ADDR
happens here at 0x0048006E
as was evident from the time diagram:
480068: 0007c503 lbu a0,0(a5)
48006c: 0785 addi a5,a5,1
48006e: 00a70023 sb a0,0(a4)
480072: 04b78e63 beq a5,a1,4800ce <sc_puts+0xce>
Instructions sb a0,0(a4)
– Store Byte – writes one least significant byte from the register a0
to memory at address from register a4
. In the register a4
put the value at the beginning of the function 0xF0000000
instructions lui a4,0xf0000
. And the value in the register a0
placed from the buffer passed to the function by pointer lbu a0,0(a5)
.
The success or failure of the test is determined by the testbench by checking the values in the registers of the Multi-Port Register File, which is written to if any exceptions occur. During normal operation, the processor exits main
(and returns to _start
) and proceeds to execute the function sc_exit
which ends the simulation by writing to a special address.
NULL Dereference
Theory
First, let's look at what the C standard says about the null pointer and its dereference. Let's consider the C17 standard, or rather draft N2310 https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2310.pdf
Section “6.3.2.3 Pointers” in paragraph 3 defines that a null pointer (null pointer) is reduced to void*
integer value 0
. And the footnote states that in the file stddef.h
macro defined NULL
expanding to a null pointer.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.67) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
The macro NULL is defined in
(and other headers) as a null pointer constant; see 7.19.
Footnote 67: In the header file stddef.h
(located in riscv-gcc/lib/gcc/riscv64-unknown-elf/13.2.0/include/
) NULL
defined as #define NULL ((void *)0)
those. pointer to data of any type at address 0
.
Null pointer dereference is undefined behavior (undefined behavior) as follows from section 6.5.3.2 Address and indirection operators about the unary operation *
dereferencing and footnotes to it.
If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
…
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, …
From the compiler's point of view, undefined behavior means a situation that never occurs, which means the compiler can do anything, including deleting code, based on this assumption.
For us, this means that dereferencing actually NULL
not so easy. You'll need a means to force the compiler not to remove bad and/or code that doesn't do anything. Optimization is enabled during compilation using the compiler flag -O3
so the compiler will use all its capabilities.
Practice
Let's start with a simple dereference directly into `main`. To combat compiler optimizations, we will use the `volatile` type qualifier. Using `volatile` prevents the compiler from optimizing reads and writes of such objects. From section 6.7.3 Type qualifiers paragraph 8:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3.
Let's try to do this naively right in main
. Let's include the necessary header files and change the example code to dereferencing NULL
. Let's add instead to the pointer and variable type volatile
:
int main()
{
volatile uint32_t a = *(volatile uint32_t*) NULL;
return 0;
}
Function assembler main
looks like this:
00480006 <main>:
480006: 00002783 lw a5,0(zero)
48000a: 9002 ebreak
In assembler there is a real read from memory at address 0
– this is an instruction lw a5,0(zero)
. It reads a 4-byte word from memory from the address 0
and puts it in the register a5
.
Instructions ebreak
– this is an instruction to transfer control to the debugger – a software exception. Those. the compiler saw that we were trying to dereference NULL
said “you can’t do that” and inserted a program exception call. What gcc puts __builtin_trap
after dereferencing NULL is specified explicitly in description this optimization option and in source code. __builtin_trap
– this is the instruction that causes an exception on RISC-V ebreak
: link.
On the timing diagrams, we will consider read signals on the AXI4 bus from the data memory, program counter register pc
and control and status registers mepc
And mcause
with the address of the instruction where the exception occurred and information about the exception. You can take them from scr1_top_tb_axi/i_top/i_core_top/i_pipe_top/i_pipe_csr
.
The diagram shows an interesting feature. Reading from address zero itself does not raise an exception. Meaning mepc
matches the instruction address ebreak
in the listing main
(the value from the diagram must be shifted to the left by 1 bit). IN mcause
put the code 3
which means Breakpoint. This means the instruction is being executed ebreak
and a software exception occurs and its processing. Iron allowed us to read from NULL
without any problems.
Let's try to persuade the compiler not to insert it. You can use it when compiling flag -fno-delete-null-pointer-checks
and then during optimization the dereferences will not be removed NULL
and no exception calls will be added.
We will do without this option. Let's write an example close to real code in bare metal projects – a function that accepts volatile
a pointer to one of the memory-mapped devices of the microcontroller, “forget” about checking for NULL
and we'll randomly pass it on NULL
to this function.
By the way, if naive dereferencing NULL
put it inside a function and try to call this function, then nothing will work. The compiler will remove the call to this function, because it contains UB, and UB cannot exist, which means the function is never actually called.
void simple() {
volatile uint32_t a = *(volatile uint32_t*) NULL;
}
void foo(volatile uint32_t* p) {
volatile uint32_t a = *p;
}
int main()
{
simple();
foo(NULL);
return 0;
}
In functions main
And foo
the compiler could not prove that dereferencing was occurring NULL
and compiled everything honestly. Function main
puts in the register a0
number 0
instructions li a0,0
. The RISC-V calling convention is to pass an argument to the called function. Instructions jal 480006 <foo>
calls a function foo
and it ends up passing a null pointer as an argument. In function foo
reads from the address passed to it: lw a5,0(a0)
.
00480000 <simple>:
480000: 00002783 lw a5,0(zero) # 0 <CL_SIZE-0x20>
480004: 9002 ebreak
00480006 <foo>:
480006: 411c lw a5,0(a0)
480008: 1141 addi sp,sp,-16
48000a: c63e sw a5,12(sp)
48000c: 0141 addi sp,sp,16
48000e: 8082 ret
00480010 <main>:
480010: 1141 addi sp,sp,-16
480012: 4501 li a0,0
480014: c606 sw ra,12(sp)
480016: 3fc5 jal 480006 <foo>
480018: 40b2 lw ra,12(sp)
48001a: 4501 li a0,0
48001c: 0141 addi sp,sp,16
48001e: 8082 ret
Timing diagrams show successful dereference NULL
reading from address 0
continuation of program execution without any exceptions or breakpoints, and successful completion of the simulation. “Forgotten” check for NULL
most likely will not be detected.
Load and store access fault
The RISC-V specification provides mechanisms for reporting memory access problems – load access fault and store access fault exceptions. In the scr1 kernel, the LSU module – load-store unit – is responsible for memory accesses.
The LSU sends a load/store access fault error code to the EXU (Execution Unit) if the signal dmem_resp_er
exhibited in 1
.
-- src/core/pipeline/scr1_pipe_lsu.sv:212
always_comb begin
case (1'b1)
dmem_resp_er : lsu2exu_exc_code_o = lsu_cmd_ff_load ? SCR1_EXC_CODE_LD_ACCESS_FAULT
: lsu_cmd_ff_store ? SCR1_EXC_CODE_ST_ACCESS_FAULT
: SCR1_EXC_CODE_INSTR_MISALIGN;
Signal dmem_resp_er
exhibited in 1
when the memory responds to a request with an error code.
-- src/core/pipeline/scr1_pipe_lsu.sv:115
assign dmem_resp_er = (dmem2lsu_resp_i == SCR1_MEM_RESP_RDY_ER);
The DMEM router distributes requests from the processor to memory. It also assigns a response signal from memory. The router sends a request to one of 3 ports depending on the address in the request. There are checks to ensure that the address is within the range in the TCM memory and in the memory-mapped timer. All addresses that do not pass these checks are sent to the AXI4 interface.
Let's implement the address 0x00000000
not readable or writable. To do this, add one more port to the router – SCR1_PORT_INVALID
to which the address will be sent 0
and which will return an error. The full code with changes can be found here: https://github.com/DuzaBF/scr1/pull/1/files
In the router source code src/top/scr1_dmem_router.sv
Let's add an enum value for an invalid port and a check for address 0 when selecting a port:
-- src/top/scr1_dmem_router.sv:89
always_comb begin
port_sel = SCR1_SEL_PORT0;
if ((dmem_addr & SCR1_PORT1_ADDR_MASK) == SCR1_PORT1_ADDR_PATTERN) begin
port_sel = SCR1_SEL_PORT1;
end else if ((dmem_addr & SCR1_PORT2_ADDR_MASK) == SCR1_PORT2_ADDR_PATTERN) begin
port_sel = SCR1_SEL_PORT2;
end else if (dmem_addr == 32'b0) begin
port_sel = SCR1_SEL_INVALID;
end
end
Although the port is not valid, it must respond to the processor that it has received a request from it. Otherwise, the processor will freeze – it will wait forever for reading. An invalid port will always acknowledge receipt of the request.
-- src/top/scr1_dmem_router.sv:142
SCR1_SEL_PORT2 : sel_req_ack = port2_req_ack;
SCR1_SEL_INVALID : sel_req_ack = 1'b1;
Attempting to read or write from an address 0
and, accordingly, an invalid port will cause an error. For clarity, on the timing diagrams the value when reading is set to 0xbadbadba
:
-- src/top/scr1_dmem_router.sv:165
default : begin
sel_rdata = 32'hBADBADBA;
sel_resp = SCR1_MEM_RESP_RDY_ER;
end
The processor receives an error message from the router via a signal sel_resp
(it is connected to dmem_resp
), and it is updated only when a request is received from the processor to access memory – when a signal is set dmem_req
V 1
. This will lead to unpleasant behavior. If an error occurs at least once when accessing memory, the error signal will continue to appear. This means that the processor will assume that all subsequent instructions cause a memory access exception. The processor will jump into the exception handler, the very first instruction of the handler will cause an exception due to a hanging error, and the processor will jump into the handler again, to the same instruction. To prevent this from happening, you need to somehow reset this signal, for example, by default, redirect the router to port 0 until the request arrives:
-- src/top/scr1_dmem_router.sv:110
case (fsm)
SCR1_FSM_ADDR : begin
if (dmem_req & sel_req_ack) begin
fsm <= SCR1_FSM_DATA;
port_sel_r <= port_sel;
end else begin
port_sel_r <= SCR1_SEL_PORT0;
end
end
These changes are also enough to cause a Store Access Fault when trying to write to the address 0
.
With these modifications to the router, the simulation ends in fail, and the timing diagrams show that when reading from NULL
a load access fault exception occurs.
Great! Now our kernel will not silently read from NULL
pointer, the program will instead crash with a Load Access Fault exception. The programmer will understand that there is a dereferencing error NULL
and where it occurs by looking at the registers mepc
And mcause
.