Dereferencing NULL on the RISC-V core scr1

While working at the ASIC design center, I spent a lot of time debugging errors and core crashes, looking at timing diagrams on the AXI buses from the processor to the memory. Sometimes it turned out that the memory read address was 0x00000000 – a classic case of null pointer dereference in C. On OS systems, this leads to the segfault known to all C programmers. And in bare metal systems dereferencing NULL can lead to interesting situations. In this article, we'll look at what happens when NULL is dereferenced, using the open source RISC-V kernel scr1 and the open source RTL Verilator simulation tool as an example.

Preparation

All manipulations with kernel simulation in this article were performed on a machine running Ubuntu 22.04. The following programs and projects were used:

Verilator – Software for simulating kernel sources. Converts HDL code to C++ code and executable file. Installation instructions on the project website: https://verilator.org/guide/latest/install.html . You must install version 4.102 or later to support the `|=>` operator used in the scr1 source code. The Ubuntu 22.04 repository has version 4.038, so I compiled a stable release from the sources from GitHub.

scr1 – directly source code of the kernel on HDL System Verilog. It also includes sample C programs so you have something to run on the processor for testing.

git clone https://github.com/syntacore/scr1.git

RISC-V GCC toolchain – a set of programs for compiling C code and assembling firmware for the RISC-V core. Toolchain binaries can be downloaded from the Syntacore website https://syntacore.com/tools/development-tools and indicate the path to them:

export PATH=<GCC_INSTALL_PATH>/bin:$PATH

GTKwave – Software for visualizing timing diagrams obtained as a result of simulation. https://gtkwave.sourceforge.net/.

Running Hello World Simulation

In the Git repository scr1 in the directory`sim/tests there are test examples of programs that will be executed on the simulated kernel. Let's start with a simple test hello. In this example, the kernel “prints” the line `"Hello from SCR1!\n". Printing occurs in the form of writing to a special memory address 0xF0000000SCR1_SIM_PRINT_ADDRwhen writing an 8-bit value to this address, the corresponding symbol will be printed in the testbench console. To make it more convenient to analyze timing diagrams and assembly listings in a file common.mk to the compilation flags I added the flag -fno-inlineso the functions will not be inlined.

To run a simulation using Verilator and saving waveforms with test helloyou can use the following command:

make run_verilator_wf CFG=MAX BUS=AXI TARGETS="hello" TRACE=1

The console will display this “Hello World!” line:

---Test:                    	hello.hex
Hello from SCR1!
Test passed

#--------------------------------------
# Summary: 1/1 tests passed
#--------------------------------------

In the directory build/{} with a name corresponding to the command used (in this case verilator_wf_AXI_MAX_imc_IPIC_1_TCM_1_VIRQ_1_TRACE_1) will contain the results of compilation and simulation.

Timing diagrams saved in file simx.vcd. Now we are interested in how bytes are printed from a buffer through a special memory address to the testbench console. To see what this print looks like, let's open the file in GTKwave and display the signals on a diagram.

Memory write timing diagrams from the hello example.

Example memory write timing diagrams hello.

The diagram shows the AXI4 interface signals for writing to data memory. According to the specification scr1 (SCR1 External Architecture Specification) prefix io_axi_dmem refers to data memory access signals. Signal awaddr – the address at which the recording takes place, wdata – data to be recorded is presented in the figure in ASCII character format. When the signals wvalid And wready the unit simultaneously directly records data from the signal wdata to the address from awaddr. These signals are hierarchically located at the top of the testbench scr1_top_tb_axi.

By exposure wvalid V 1 you can see that characters are written to memory sequentially H, e, l and so on from the line "Hello from SCR1!\n"which is printed in the example.

The screenshot with the timing diagram also shows the value of the current program counter register – program counter curr_pc. By its value you can roughly understand where the processor is currently located in program memory, what instructions it executes and in what function. This signal is located deeper in the hierarchy, in the processor pipeline (core pipeline), in scr1_top_tb_axi/i_top/i_core_top/i_pipe_top.

In file hello.dump The assembler listing of the program has been saved. Directly printing a line (writing bytes from a buffer to a special memory address) is performed by the function sc_puts from file sc_print.c. In my case this function was placed at 0x00480000 in RAM memory. This is where it comes in handy -fno-inline because calls to these functions and transition to the addresses where their code is located will be explicit.

00480000 <sc_puts>:
  480000:	87aa                	mv	a5,a0
  480002:	0ab05f63          	blez	a1,4800c0 <sc_puts+0xc0>
  480006:	0075f693          	andi	a3,a1,7
  48000a:	f0000737          	lui	a4,0xf0000

Direct entry to the address SCR1_SIM_PRINT_ADDR happens here at 0x0048006Eas was evident from the time diagram:

 480068: 0007c503            lbu a0,0(a5)
 48006c: 0785                  addi  a5,a5,1
 48006e: 00a70023            sb  a0,0(a4)
 480072: 04b78e63            beq a5,a1,4800ce <sc_puts+0xce>

Instructions sb a0,0(a4) – Store Byte – writes one least significant byte from the register a0 to memory at address from register a4. In the register a4 put the value at the beginning of the function 0xF0000000 instructions lui a4,0xf0000. And the value in the register a0 placed from the buffer passed to the function by pointer lbu a0,0(a5).

The success or failure of the test is determined by the testbench by checking the values ​​in the registers of the Multi-Port Register File, which is written to if any exceptions occur. During normal operation, the processor exits main (and returns to _start) and proceeds to execute the function sc_exitwhich ends the simulation by writing to a special address.

NULL Dereference

Theory

First, let's look at what the C standard says about the null pointer and its dereference. Let's consider the C17 standard, or rather draft N2310 https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2310.pdf

Section “6.3.2.3 Pointers” in paragraph 3 defines that a null pointer (null pointer) is reduced to void* integer value 0. And the footnote states that in the file stddef.h macro defined NULLexpanding to a null pointer.

6.3.2.3 Pointers

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.67) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

  1. The macro NULL is defined in (and other headers) as a null pointer constant; see 7.19.

Footnote 67: In the header file stddef.h (located in riscv-gcc/lib/gcc/riscv64-unknown-elf/13.2.0/include/) NULL defined as #define NULL ((void *)0) those. pointer to data of any type at address 0.

Null pointer dereference is undefined behavior (undefined behavior) as follows from section 6.5.3.2 Address and indirection operators about the unary operation * dereferencing and footnotes to it.

If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

Among the invalid values ​​for dereferencing a pointer by the unary * operator are a null pointer, …

From the compiler's point of view, undefined behavior means a situation that never occurs, which means the compiler can do anything, including deleting code, based on this assumption.

For us, this means that dereferencing actually NULL not so easy. You'll need a means to force the compiler not to remove bad and/or code that doesn't do anything. Optimization is enabled during compilation using the compiler flag -O3so the compiler will use all its capabilities.

Practice

Let's start with a simple dereference directly into `main`. To combat compiler optimizations, we will use the `volatile` type qualifier. Using `volatile` prevents the compiler from optimizing reads and writes of such objects. From section 6.7.3 Type qualifiers paragraph 8:

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3.

Let's try to do this naively right in main. Let's include the necessary header files and change the example code to dereferencing NULL. Let's add instead to the pointer and variable type volatile:

int main()
{
   volatile uint32_t a = *(volatile uint32_t*) NULL;
   return 0;
}

Function assembler main looks like this:

00480006 <main>:
 480006: 00002783            lw  a5,0(zero)
 48000a: 9002                  ebreak

In assembler there is a real read from memory at address 0 – this is an instruction lw  a5,0(zero). It reads a 4-byte word from memory from the address 0 and puts it in the register a5.

Instructions ebreak – this is an instruction to transfer control to the debugger – a software exception. Those. the compiler saw that we were trying to dereference NULLsaid “you can’t do that” and inserted a program exception call. What gcc puts __builtin_trap after dereferencing NULL is specified explicitly in description this optimization option and in source code. __builtin_trap – this is the instruction that causes an exception on RISC-V ebreak: link.

On the timing diagrams, we will consider read signals on the AXI4 bus from the data memory, program counter register pc and control and status registers mepc And mcause with the address of the instruction where the exception occurred and information about the exception. You can take them from scr1_top_tb_axi/i_top/i_core_top/i_pipe_top/i_pipe_csr.

Timing diagrams of NULL dereferences with software exception.

Dereference timing diagrams NULL with a software exception.

The diagram shows an interesting feature. Reading from address zero itself does not raise an exception. Meaning mepc matches the instruction address ebreak in the listing main (the value from the diagram must be shifted to the left by 1 bit). IN mcause put the code 3which means Breakpoint. This means the instruction is being executed ebreak and a software exception occurs and its processing. Iron allowed us to read from NULL without any problems.

Let's try to persuade the compiler not to insert it. You can use it when compiling flag -fno-delete-null-pointer-checks and then during optimization the dereferences will not be removed NULL and no exception calls will be added.

We will do without this option. Let's write an example close to real code in bare metal projects – a function that accepts volatile a pointer to one of the memory-mapped devices of the microcontroller, “forget” about checking for NULL and we'll randomly pass it on NULL to this function.

By the way, if naive dereferencing NULL put it inside a function and try to call this function, then nothing will work. The compiler will remove the call to this function, because it contains UB, and UB cannot exist, which means the function is never actually called.

void simple() {
   volatile uint32_t a = *(volatile uint32_t*) NULL;
}


void foo(volatile uint32_t* p) {
   volatile uint32_t a = *p;
}


int main()
{
   simple();
   foo(NULL);
   return 0;
}

In functions main And foo the compiler could not prove that dereferencing was occurring NULL and compiled everything honestly. Function main puts in the register a0 number 0instructions li  a0,0. The RISC-V calling convention is to pass an argument to the called function. Instructions jal 480006 <foo> calls a function foo and it ends up passing a null pointer as an argument. In function foo reads from the address passed to it: lw  a5,0(a0).

00480000 <simple>:
 480000: 00002783            lw  a5,0(zero) # 0 <CL_SIZE-0x20>
 480004: 9002                  ebreak


00480006 <foo>:
 480006: 411c                  lw  a5,0(a0)
 480008: 1141                  addi  sp,sp,-16
 48000a: c63e                  sw  a5,12(sp)
 48000c: 0141                  addi  sp,sp,16
 48000e: 8082                  ret


00480010 <main>:
 480010: 1141                  addi  sp,sp,-16
 480012: 4501                  li  a0,0
 480014: c606                  sw  ra,12(sp)
 480016: 3fc5                  jal 480006 <foo>
 480018: 40b2                  lw  ra,12(sp)
 48001a: 4501                  li  a0,0
 48001c: 0141                  addi  sp,sp,16
 48001e: 8082                  ret

Timing diagrams show successful dereference NULLreading from address 0continuation of program execution without any exceptions or breakpoints, and successful completion of the simulation. “Forgotten” check for NULL most likely will not be detected.

Timing diagrams of successful NULL dereference.

Successful dereference timing diagrams NULL.

Load and store access fault

The RISC-V specification provides mechanisms for reporting memory access problems – load access fault and store access fault exceptions. In the scr1 kernel, the LSU module – load-store unit – is responsible for memory accesses.

The LSU sends a load/store access fault error code to the EXU (Execution Unit) if the signal dmem_resp_er exhibited in 1.

-- src/core/pipeline/scr1_pipe_lsu.sv:212
always_comb begin
   case (1'b1)
       dmem_resp_er     : lsu2exu_exc_code_o = lsu_cmd_ff_load  ? SCR1_EXC_CODE_LD_ACCESS_FAULT
                                             : lsu_cmd_ff_store ? SCR1_EXC_CODE_ST_ACCESS_FAULT
                                                                : SCR1_EXC_CODE_INSTR_MISALIGN;

Signal dmem_resp_er exhibited in 1when the memory responds to a request with an error code.

-- src/core/pipeline/scr1_pipe_lsu.sv:115
assign dmem_resp_er       = (dmem2lsu_resp_i == SCR1_MEM_RESP_RDY_ER);

The DMEM router distributes requests from the processor to memory. It also assigns a response signal from memory. The router sends a request to one of 3 ports depending on the address in the request. There are checks to ensure that the address is within the range in the TCM memory and in the memory-mapped timer. All addresses that do not pass these checks are sent to the AXI4 interface.

Let's implement the address 0x00000000 not readable or writable. To do this, add one more port to the router – SCR1_PORT_INVALIDto which the address will be sent 0 and which will return an error. The full code with changes can be found here: https://github.com/DuzaBF/scr1/pull/1/files

In the router source code src/top/scr1_dmem_router.sv Let's add an enum value for an invalid port and a check for address 0 when selecting a port:

-- src/top/scr1_dmem_router.sv:89
always_comb begin
   port_sel    = SCR1_SEL_PORT0;
   if ((dmem_addr & SCR1_PORT1_ADDR_MASK) == SCR1_PORT1_ADDR_PATTERN) begin
       port_sel    = SCR1_SEL_PORT1;
   end else if ((dmem_addr & SCR1_PORT2_ADDR_MASK) == SCR1_PORT2_ADDR_PATTERN) begin
       port_sel    = SCR1_SEL_PORT2;
   end else if (dmem_addr == 32'b0) begin
       port_sel    = SCR1_SEL_INVALID;
   end
end

Although the port is not valid, it must respond to the processor that it has received a request from it. Otherwise, the processor will freeze – it will wait forever for reading. An invalid port will always acknowledge receipt of the request.

-- src/top/scr1_dmem_router.sv:142
           SCR1_SEL_PORT2    : sel_req_ack   = port2_req_ack;
           SCR1_SEL_INVALID  : sel_req_ack   = 1'b1;

Attempting to read or write from an address 0 and, accordingly, an invalid port will cause an error. For clarity, on the timing diagrams the value when reading is set to 0xbadbadba:

-- src/top/scr1_dmem_router.sv:165
           default         : begin
           sel_rdata   = 32'hBADBADBA;
           sel_resp    = SCR1_MEM_RESP_RDY_ER;
       end

The processor receives an error message from the router via a signal sel_resp (it is connected to dmem_resp), and it is updated only when a request is received from the processor to access memory – when a signal is set dmem_req V 1. This will lead to unpleasant behavior. If an error occurs at least once when accessing memory, the error signal will continue to appear. This means that the processor will assume that all subsequent instructions cause a memory access exception. The processor will jump into the exception handler, the very first instruction of the handler will cause an exception due to a hanging error, and the processor will jump into the handler again, to the same instruction. To prevent this from happening, you need to somehow reset this signal, for example, by default, redirect the router to port 0 until the request arrives:

-- src/top/scr1_dmem_router.sv:110
       case (fsm)
           SCR1_FSM_ADDR : begin
               if (dmem_req & sel_req_ack) begin
                   fsm         <= SCR1_FSM_DATA;
                   port_sel_r  <= port_sel;
               end else begin
                   port_sel_r  <= SCR1_SEL_PORT0;
               end
           end

These changes are also enough to cause a Store Access Fault when trying to write to the address 0.

With these modifications to the router, the simulation ends in fail, and the timing diagrams show that when reading from NULL a load access fault exception occurs.

Timing diagrams of load access fault.

Timing diagrams of load access fault.

Great! Now our kernel will not silently read from NULL pointer, the program will instead crash with a Load Access Fault exception. The programmer will understand that there is a dereferencing error NULLand where it occurs by looking at the registers mepc And mcause.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *