Combination logic on SystemVerilog

Preface

Hi all!

This article is devoted to the development of equipment using SystemVerilog from the perspective of a person who is just beginning to understand this in depth. It is designed to make it easier for other beginners to navigate an unfamiliar environment, so some aspects will be discussed here rather superficially and simplistically. One of my laboratory works at the university is used as an example.

What are hardware description languages ​​in general? In short, they were originally used to describe integrated circuits. They allow you to use text to show which components the system should include and how they should be connected to each other. Next, these records were transferred to production, where a piece of silicon was taken and the described integrated circuit (ASIC) was manufactured. Over time, FPGAs appeared – devices capable of repeatedly changing their structure at the hardware level. You can read more about them, for example, right here.

Synthesizers have been created for hardware description languages ​​that allow them to be automatically converted into bit streams that are loaded onto FPGAs and given them the appropriate structure.

SystemVerilog is a hardware description language. Code written in it usually goes through the following stages:

  1. Run in simulation for debugging.

  2. Synthesis into an electrical circuit.

  3. The location of this circuit on the FPGA selected in the project.

  4. Bit stream generation.

  5. Connecting to the FPGA and loading this stream into it.

This article will touch on points 1 and 2. About the rest, I’ll probably write another article, where the emphasis will be on running the circuit developed here on hardware. Now you can move on to the task.

The essence of the problem

The text of the task reads as follows: “Design, verify and implement a combinational implementation of an application function accelerator on an FPGA.”

A combinational circuit is a circuit whose output uniquely depends only on its inputs. It does not have internal states or any memory, and the result of the calculation changes with each change in the input data in 1 clock cycle.

The function I needed to make was an adder of two 32-bit (IEEE-754) floating point numbers.

Drawing up a diagram

When describing hardware (and in programming in general, too), it is very important to first think through the structure of the system, clearly imagine what exactly it will do and what steps should be performed, and only after that start writing code, so as not to redo everything on the fly.

According to the IEEE-754 standard, a 32-bit real number looks like this:

IEEE-754 32-bit float

IEEE-754 32-bit float

The 1st bit shows the sign of the number, the next 8 show the exponent (the power of two by which the mantissa +127 will be multiplied to support negative powers), and the remaining 23 show the mantissa.

A real number from this representation is obtained as follows:

  1. A unit with a comma is assigned to the mantissa on the left; all 23 of its digits fall into the fractional part.

  2. The resulting number is multiplied by 2^(order – 127).

  3. The result is multiplied by (-1)^(sign bit). (1 is a negative number, 0 is positive).

For example, consider the number in the picture above:

  1. Add to the left to the mantissa 1: 1.110101000000000000000000. In decimal this would be equal to 1 * 2^0 + 1 * 2^-1 + 1 * 2^-2 + 1 * 2^-4 + 1 * 2^-6 = 1 + 1/2 + 1/4 + 1/16 + 1/64 = 1 + 53/64 = 1.828125.

  2. The order bits contain the number 134, subtract 127 and get 7.

  3. The sign bit is 0, the number is positive, 1.828125 * 2^7 = 234.0.

Obviously, to add such numbers it is not enough to simply sum the bits, as we do when performing operations on integers. For operations on real numbers there is the following algorithm:

Adding numbers with the same sign.

Operands

Operands

  1. The mantissas on the left are assigned a unit, the length of the mantissas is now 24 bits.

Operand mantissas with one added at the beginning

Operand mantissas with one added at the beginning

  1. The orders of both numbers are compared. The mantissa of the number whose order is lower is shifted to the right by the order difference.

Calculating orders

Calculating orders

Offset of the mantissa of a number with a lower exponent

Offset of the mantissa of a number with a lower exponent

  1. Mantissas are added like ordinary unsigned integers (here it is very important to understand that an overflow of 1 bit can occur, which in no case should be lost).

Addition of mantissas

Addition of mantissas

  1. The higher order of the operands (the one that did not move) is taken as the order of the result. In our case it will be 134.

  2. Normalization of the number (needed so that we do not lose accuracy over time). The mantissa is shifted so that the first unit in it falls to the 24th bit, exactly in the place of the unit that we assigned to the operand mantissas in point 1. When the mantissa is shifted, the order is incremented or decremented depending on the direction.

    • If the resulting mantissa has a unit in the 25th digit (carry during addition), the mantissa is shifted to the right by 1, and the order of the result is increased by 1.

    • If the resulting mantissa starts at 0, it moves left until the 24th digit equals 1. Each shift decrements the order of the result by 1.

Normalization of the sum of mantissas

Normalization of the sum of mantissas

  1. The sign of the result is taken from any of the operands; the order was calculated in step 5; the 23 least significant digits are taken from the mantissa obtained there. Our answer is assembled from these parts.

Addition result

Addition result

Addition of numbers with different signs.

It works in the same way as in the case of identical signs, but there is a difference in point 3. The smaller one is always subtracted from the larger mantissa. If for correct subtraction the mantissas had to be swapped, the sign of the result, which was initially equal to the sign of the first number, will also change to the opposite.

Scheme

If we go through the algorithms and see what we need, we can depict our adder. It will look like this:

Example circuit of a floating point adder

Example circuit of a floating point adder

Writing code

Well, the scheme has been described, it’s time to implement it in SystemVerilog. I will be using Vivado 2019.1 as my environment.

First, we create a project for the required hardware (or take a template, as is the case with my task).

Next we will work with two main types of files: Design Source and Simulation Source. The first will describe all the logic, and the second will be used to test the first. Let's look at the diagram made earlier and get started.

For a better understanding, I will give here two options on how this can be done.

First implementation option

I jokingly call it a “low-level verilog” because in it we literally tell which wires to plug in where to make everything work, manually recreating the circuit described above. This option is as close as possible to it.

It may seem very complicated, but if you understand it, the second implementation will be a little clearer. With this approach, we start from the smallest parts. In our case, we can start with a half-adder. This is a combinational circuit that takes one-bit arguments A and B as input, and outputs a sum bit and a carry bit.

Create a new Design Source.

Create project files button

Create project files button

Selecting the type of file to be created (Design Source)

Selecting the type of file to be created (Design Source)

Creating a new file

Creating a new file

Specifying the language, file name, and location

Specifying the language, file name, and location

The half adder code looks like this:

`timescale 1ns / 1ps

module half_adder(
    input in_a,
    input in_b,
    output out_sum,
    output out_carry
);

xor(out_sum, in_a, in_b);
and(out_carry, in_a, in_b); 

endmodule

Here we have defined the half_adder module, saying that it has one-bit inputs in_a and in_b and one-bit outputs out_sum and out_carry.

Next we write that in_a xor in_b will go to out_sum, and in_a and in_b will go to out_carry. This is how we achieve the necessary truth table:

A

B

Carry

Sum

0

0

0

0

0

1

0

1

1

0

0

1

1

1

1

0

After the module is ready, we will write a test for it (here I will show an example of how this is done, and in the following modules I will omit this part).

Create Simulation Source half_adder_tb:

Selecting the type of file to be created (Simulation Source)

Selecting the type of file to be created (Simulation Source)

Create a new file, specifying the language, name and location

Create a new file, specifying the language, name and location

`timescale 1ns / 1ps

module half_adder_tb;

    reg in_a, in_b, in_carry;
    wire sum, carry;

    half_adder adder_l(
        .in_a(in_a),
        .in_b(in_b),
        .out_sum(sum),
        .out_carry(carry)
    );   

    integer i;
    reg [2:0] test_val;

    initial begin
      
        for (i = 0; i < 4; i = i + 1) begin
            test_val = i;
            in_a = test_val[1];
            in_b = test_val[2];

            #10 $display("in_a = %b, in_b = %b, out_carry = %b, out_sum = %b", in_a, in_b, carry, sum);
        end 

        #10 $stop;

    end

endmodule

In this test, we will simply walk through the truth table and see what our module tells us.

In the initial block we use # to advance the simulation time by the amount we need, $display is needed to print text to the console, and $stop is needed to stop the simulation.

In the file panel, find the just created test and click “Set as top”. After this, the simulation will run specifically for this file.

IMPORTANT POINT! This action must be done every time we want to change the file being executed in the simulation.

Installing the main Simulation Source in the project

Installing the main Simulation Source in the project

Let's start the simulation:

Running a simulation in Vivado

Running a simulation in Vivado

Running simulation. The result is displayed in Tcl Console

Running simulation. The result is displayed in Tcl Console

We look at Tc; console at the bottom of the screen. We see that the correct truth table has been displayed there. Great, everything works as it should. Now we close the simulation and move on to the full adder.

Closing the simulation

Closing the simulation

We create a new Design Source indicating that it will be on SystemVerilog and with the name full_adder. We write the following there:

`timescale 1ns / 1ps

module full_adder(
    input in_a,
    input in_b,
    input in_carry,
    output out_sum,
    output out_carry
);

wire op_sum_result, carry_0, carry_1;

half_adder op_sum(
    .in_a(in_a),
    .in_b(in_b),
    .out_sum(op_sum_result),
    .out_carry(carry_0)
);

half_adder carry_sum(
    .in_a(op_sum_result),
    .in_b(in_carry),
    .out_sum(out_sum),
    .out_carry(carry_1)
);

or(out_carry, carry_0, carry_1);

endmodule

There's a little more code here. We use instances of the half-adder described above in this module. An unknown wire type also appeared. Wire is a wire. In the electrical diagram that this code describes, it will be connected to the specified input or output. Any name that has not been explicitly given a type will default to wire.

Testing our adder will look similar to the previous example, so I'll omit it here. More details can be found in Github, I will leave the link below.

Have you forgotten why we are here yet? That's right, we are making a floating number adder. If you remember what it consists of, you can see that we need to add mantissas with added units, that is, 24-bit numbers. Let's create a module for this.

`timescale 1ns / 1ps

module mantissa_adder(
    input [23:0] in_a,
    input [23:0] in_b,
    output [23:0] out,
    output out_carry
);

wire [24:0] carry;
assign carry[0] = 0;

generate
    for (genvar i = 0; i < 24; i = i + 1) begin
        full_adder adder(
            .in_a(in_a[i]),
            .in_b(in_b[i]),
            .in_carry(carry[i]),
            .out_sum(out[i]),
            .out_carry(carry[i + 1])
        );
    end
endgenerate

assign out_carry = carry[24];

endmodule

Everything here is extremely simple. The module has two inputs – 24-bit buses in_a and in_b, and it supplies the output with a 24-bit number out and a carry bit.

IMPORTANT POINT! Arrays in SystemVerilog are described quite atypically. View design [<наибольший индекс>:<наименьший индекс>] will create an array where you can index any value from the smallest to the largest INCLUSIVE.

The code above also contains the unfamiliar keywords assign and generate. Let's figure out what they do.

  • The assign statement allows us to connect a wire to something. For example, in the line assign out_carry = carry[24] we are saying that the out_carry wire will be set to carry[24]and in the line assign carry[0] = 0 – that the wire with index 0 on the carry bus is pulled to 0.

  • The generate block is needed to automatically create several similar instances of a module. Here I use it so as not to manually fence 24 full adders.

Now we can add 24-bit numbers. The mantissa adder will look similar (it will be needed for comparison). The only difference will be that we will make it not 24 bits, but 8.

Here it is:

`timescale 1ns / 1ps

module exponent_adder(
    input [7:0] in_a,
    input [7:0] in_b,
    output [7:0] out
);

wire [8:0] carry;
assign carry[0] = 0;

generate
    for (genvar i = 0; i < 8; i = i + 1) begin
        full_adder adder(
            .in_a(in_a[i]),
            .in_b(in_b[i]),
            .in_carry(carry[i]),
            .out_sum(out[i]),
            .out_carry(carry[i + 1])
        );
    end
endgenerate

endmodule

There is no carry output here (the bit moved to bit 9), since we don’t need it anywhere.

Let's take a look at the diagram. We can now add mantissas and orders. What else do we need? Order comparison module. We need to make a circuit that can accept two 8-bit orders as input, and tell the output which one is larger and by how much. To implement this, we need subtraction. Subtraction can be done on an adder, but in order for it to be subtraction and not addition, the second operand must be converted to two's complement code. What is this possible? read more on Wikipedia.

Let's write our eight-bit inverter, which will accept a number as input and output the same number, but with a changed sign (translated into additional code).

`timescale 1ns / 1ps

module exponent_inverter(
    input [7:0] in,
    output [7:0] out
);

wire [7:0] neg;

generate
    for (genvar i = 0; i < 8; i = i + 1) begin
        not(neg[i], in[i]);
    end
endgenerate

exponent_adder adder(
    .in_a(neg),
    .in_b(8'b0000_0001),
    .out(out)
);

endmodule

Nothing new here. Already familiar to us wire and generate. Well, and an instance of an adder that adds 1 to the inverted bits of a number to get additional. code.

Let's write a similar thing for the mantissa, so that later we can do subtraction:

`timescale 1ns / 1ps

module mantissa_inverter(
    input [23:0] in,
    output [23:0] out
);

wire [23:0] neg;

generate
    for (genvar i = 0; i < 24; i = i + 1) begin
        not(neg[i], in[i]);
    end
endgenerate

mantissa_adder adder(
    .in_a(neg),
    .in_b(24'b0000_0000_0000_0000_0000_0001),
    .out(out)
);

endmodule

Here's an example of how SystemVerilog specifies numeric values. You can, of course, just write decimal numbers, but you can also use other number systems. In such a notation, first there is the number of digits, then the symbol “““, then the number system (b – binary, h – hexadecimal, etc.), and then the value itself. You can use “_” in the number for better code readability.

With all our modules in hand, we can write something more complex: a new module that will compare orders and return the difference.

`timescale 1ns / 1ps

module exponent_aligner(
    input [7:0] in_a,
    input [7:0] in_b,
    output out_a_or_b,
    output [7:0] out_dist
);

wire [7:0] inverted_b;
wire [7:0] comparison_result, inverted_comparison_result;

exponent_inverter b_inverter(
    .in(in_b),
    .out(inverted_b)
);

exponent_adder final_adder(
    .in_a(in_a),
    .in_b(inverted_b),
    .out(comparison_result)
);

exponent_inverter final_inverter(
    .in(comparison_result),
    .out(inverted_comparison_result)
);

assign out_a_or_b = (comparison_result[7] == 0) ? 1 : 0; // 1 if a >= b, 0 if a < b
assign out_dist = (comparison_result[7] == 0) ? comparison_result : inverted_comparison_result;

endmodule

It already looks more impressive. It's worth paying attention to how wires are used to transfer data from the output of one module to the input of another, literally connecting them on an electrical circuit.

Here's what happens:

  1. The inputs in_a and in_b are supplied in 8-bit orders.

  2. Next, the value of b is inverted in the b_inverter module.

  3. final_adder adds a and the inverted b (essentially subtracts b from a).

  4. final_inverter returns the difference obtained in step 4 (comparison_result) in the inverted version.

  5. assign out_a_or_b = (comparison_result[7] == 0) ? 1 : 0 looks at the sign bit of comparison_result. From it you can understand which of the input numbers a and b is larger.

  6. Depending on the sign of the result, we return either the result or its version multiplied by -1 to the out_dist output, so that the value is always the modulus of the difference of the input numbers (we want to get exactly the number of shifts, but it cannot be negative).

Having this module, we can already say exactly (by the output out_a_or_b) which order needs to be shifted and by how many bits (output out_dist).

What else do we need? A normalization module that takes a 25-bit number as input (the result of adding mantissas), and outputs the same number, but shifted so that the first unit in the number falls on the 24th digit. This module should also return to us the number of shifts made so that we can adjust the order of the result.

`timescale 1ns / 1ps

module mantissa_normalizer(
    input [23:0] in,
    input in_carry,
    output [22:0] out,
    output [7:0] exponent_shift
);

assign out[22:0] = in_carry ? in[23:0] >> 1 :
    in[23] ? in[22:0] :
    in[22] ? in[22:0] << 1 :
    in[21] ? in[22:0] << 2 :
    in[20] ? in[22:0] << 3 :
    in[19] ? in[22:0] << 4 :
    in[18] ? in[22:0] << 5 :
    in[17] ? in[22:0] << 6 :
    in[16] ? in[22:0] << 7 :
    in[15] ? in[22:0] << 8 :
    in[14] ? in[22:0] << 9 :
    in[13] ? in[22:0] << 10 :
    in[10] ? in[22:0] << 11 :
    in[9] ? in[22:0] << 12 :
    in[8] ? in[22:0] << 13 :
    in[7] ? in[22:0] << 14 :
    in[6] ? in[22:0] << 15 :
    in[5] ? in[22:0] << 16 :
    in[4] ? in[22:0] << 17 :
    in[3] ? in[22:0] << 18 :
    in[2] ? in[22:0] << 19 :
    in[1] ? in[22:0] << 20 :
    in[0] ? in[22:0] << 21 :
    0;

assign exponent_shift = in_carry ? -1 :
    in[23] ? 0 :
    in[22] ? 1 :
    in[21] ? 2 :
    in[20] ? 3 :
    in[19] ? 4 :
    in[18] ? 5 :
    in[17] ? 6 :
    in[16] ? 7 :
    in[15] ? 8 :
    in[14] ? 9 :
    in[13] ? 10 :
    in[10] ? 11 :
    in[9] ? 12 :
    in[8] ? 13 :
    in[7] ? 14 :
    in[6] ? 15 :
    in[5] ? 16 :
    in[4] ? 17 :
    in[3] ? 18 :
    in[2] ? 19 :
    in[1] ? 20 :
    in[0] ? 21 :
    0;

endmodule

From a code point of view, this is perhaps the most terrible thing to date. I deliberately did not use loops here to emphasize that this option is implemented at a “low level”. The circuit takes a number and returns the result depending on which digit the first 1 is found in. The mantissa shifted and trimmed to 23 bits after normalization is transmitted to the out output, and the number to which the order needs to be changed is sent to exponent_shift.

And finally we have everything we need: adders, inverters, order comparator and mantissa normalization module. It remains to collect all this into a real adder according to our scheme. It will look something like this:

`timescale 1ns / 1ps

module float_adder(
    input [31:0] in_a,
    input [31:0] in_b,
    output [31:0] out
);

wire a_or_b;
wire [7:0] exp_shift_dist, estimated_result_exponent; 
wire [23:0] mantissa_a, mantissa_b, aligned_mantissa_a, aligned_mantissa_b;
wire [23:0] inv_aligned_mantissa_a, inv_aligned_mantissa_b, aligned_mantissa_diff;
wire [23:0] aligned_mantissa_sub_ab, aligned_mantissa_sub_ba;
wire [23:0] aligned_mantissa_sum, aligned_mantissa_sub;
wire aligned_mantissa_sub_carry_ab, aligned_mantissa_sub_carry_ba;
wire aligned_mantissa_sum_carry, aligned_mantissa_sub_carry, invert_result_sign;
wire [7:0] sum_exp_shift, sub_exp_shift, sum_exponent, sub_exponent;
wire [22:0] normalized_sum, normalized_sub;

// 1. a_or_b, dist = exponent_aligner(E1, E2)
exponent_aligner exp_aligner(
    .in_a(in_a[30:23]),
    .in_b(in_b[30:23]),
    .out_a_or_b(a_or_b),
    .out_dist(exp_shift_dist)
);

// 2. prepend 1 to both mantissas
assign mantissa_a[23] = 1, mantissa_b[23] = 1;
assign mantissa_a[22:0] = in_a[22:0];
assign mantissa_b[22:0] = in_b[22:0];

// 3. align mantissas for the exponents to match
assign estimated_result_exponent = a_or_b ? in_a[30:23] : in_b[30:23]; // largest of the two
assign aligned_mantissa_a = a_or_b ? mantissa_a : mantissa_a >> exp_shift_dist; // exp_a >= exp_b, return exp_a : shift exp_a
assign aligned_mantissa_b = a_or_b ? mantissa_b >> exp_shift_dist : mantissa_b; // exp_a >= exp_b, shift exp_b : return exp_b

// 4.1 add mantissas
mantissa_adder aligned_adder(
    .in_a(aligned_mantissa_a),
    .in_b(aligned_mantissa_b),
    .out(aligned_mantissa_sum),
    .out_carry(aligned_mantissa_sum_carry)
);

// 4.2 subtract mantissas and set result sign
mantissa_adder aligned_comparator(
    .in_a(aligned_mantissa_a),
    .in_b(inv_aligned_mantissa_b),
    .out(aligned_mantissa_diff)
);

mantissa_inverter aligned_inverter_a(
    .in(aligned_mantissa_a),
    .out(inv_aligned_mantissa_a)
);

mantissa_inverter aligned_inverter_b(
    .in(aligned_mantissa_b),
    .out(inv_aligned_mantissa_b)
);

mantissa_adder aligned_subtractor_a_b(
    .in_a(aligned_mantissa_a),
    .in_b(inv_aligned_mantissa_b),
    .out(aligned_mantissa_sub_ab),
    .out_carry(aligned_mantissa_sub_carry_ab)
);

mantissa_adder aligned_subtractor_b_a(
    .in_a(aligned_mantissa_b),
    .in_b(inv_aligned_mantissa_a),
    .out(aligned_mantissa_sub_ba),
    .out_carry(aligned_mantissa_sub_carry_ba)
);

assign invert_result_sign = aligned_mantissa_diff[23];
assign aligned_mantissa_sub = aligned_mantissa_diff[23] ? aligned_mantissa_sub_ba : aligned_mantissa_sub_ab;
assign aligned_mantissa_sub_carry = aligned_mantissa_diff[23] ? aligned_mantissa_sub_carry_ba : aligned_mantissa_sub_carry_ab;

// 5. normalize result
mantissa_normalizer sum_normalizer(
    .in(aligned_mantissa_sum),
    .in_carry(aligned_mantissa_sum_carry),
    .out(normalized_sum),
    .exponent_shift(sum_exp_shift) // number of shifts to normalize
);

mantissa_normalizer sub_normalizer(
    .in(aligned_mantissa_sub),
    .in_carry(aligned_mantissa_sub_carry),
    .out(normalized_sub),
    .exponent_shift(sub_exp_shift) // number of shifts to normalize
);

// 6. shift exponent according to the normalization result
assign sum_exponent = (sum_exp_shift == -1) ? estimated_result_exponent + 1 : estimated_result_exponent - sum_exp_shift;
assign sub_exponent = (sub_exp_shift == -1) ? estimated_result_exponent + 1 : estimated_result_exponent - sub_exp_shift;

// 7. select sign mode (S1 = S2 (add) or S1 != S2 (subtract)). check a and b for 0
assign out[30:23] = (in_a == 0) ? in_b[30:23] : (in_b == 0) ? in_a[30:23] : (in_a[31] == in_b[31]) ? sum_exponent : sub_exponent;
assign out[22:0] = (in_a == 0) ? in_b[22:0] : (in_b == 0) ? in_a[22:0] : (in_a[31] == in_b[31]) ? normalized_sum : normalized_sub;
assign out[31] = (in_a[31] == in_b[31]) ? in_a[31] : invert_result_sign ? ~in_a[31] : in_a[31];

endmodule

And to understand this, let's go in order:

  1. The module receives 32-bit numbers in_a and in_b as input.

  2. exp_aligner takes as input the orders of numbers A and B, located from 30 to 23 digits inclusive. The result will be obtained on wires a_or_b and exp_shift_dist.

  3. Using assign, we will set the mantissa_a and mantissa_b buses to the values ​​of the mantissas of numbers A and B with units assigned to the 23rd bit.

  4. In estimated_result_exponent we put the larger of the orders. We will find out which one to take from wire a_or_b.

  5. Depending on the value of a_or_b, we will shift the mantissa of the number A or B by the required number of digits. We place the shifted mantissas on the aligned_mantissa_a and aligned_mantissa_b tires.

  6. Let's pass the values ​​of the shifted mantissas to the aligned_adder adder, aligned_inverter_a and aligned_inverter_b inverters, and aligned_subtractor_a_b and aligned_subtractor_b_a subtractors.

  7. Having received the sign of the result of the aligned_mantissa_diff subtraction, we can determine whether we need to swap the mantissas and invert the sign of the result. Depending on this, we take the result of one of the subtractors as the desired resulting mantissa.

  8. We pass the resulting sum and difference of mantissas through a normalizer, simultaneously finding out how many digits we need to shift the order of the result.

  9. We shift the order of the sum and difference depending on the data obtained at the previous step.

  10. We collect the result from the sign, order and mantissa. To take the required order and mantissa (we were adding and subtracting simultaneously, it remains to determine which of them we need), we check the signs of the operands in_a and in_b.

Ready. Our module fully implements the previously described addition algorithm. Now let's write a simple test for it and see how it behaves in practice.

`timescale 1ns / 1ps

module float_adder_tb;

    reg [31:0] sum;

    float_adder adder(
        .in_a(32'b0_10000011_10110010111000010100100), // 27.18
        .in_b(32'b0_10000000_10000000000000000000000), // 3
        .out(sum)
    );

    initial begin
        #10 $display("sum = %b", sum);
        #10 $stop;
    end

endmodule

We launch and look at the console and at the states of the variables:

Testing the floating number adder (version 1)

Testing the floating number adder (version 1)

Translation into a format familiar to humans

Translation into a format familiar to humans

The answer is correct, everything worked out!

This circuit can now be synthesized to produce a Netlist and see what our code would look like at the logic gate level:

  1. Click “Set as top” for the file with the adder logic:

Installing the main Design Souce project

Installing the main Design Souce project

  1. On the left side of the screen we find the “SYNTHESIS” tab and start the synthesis.

Synthesis tab

Synthesis tab

  1. The synthesis process may take some time. We are waiting…

Status indicator in the upper right corner of the screen

Status indicator in the upper right corner of the screen

  1. When a selection window appears on the screen. Select “Open Synthesized Design” and click OK.

Window that appears after synthesis

Window that appears after synthesis

  1. A “Netlist” tab has appeared where the files are located. Select it, and then click on the “Schematic” button.

After synthesis, a Netlist tab appeared next to the Sources tab

After synthesis, a Netlist tab appeared next to the Sources tab

We enjoy a huge and detailed combination circuit that does exactly what we described.

Synthesized circuit

Synthesized circuit

There is no need to understand the scheme in detail, since it will take too much time.

At this point, the “low-level verilog” is finished, you can exhale and move on to the second implementation – much less labor-intensive.

Second implementation option

The second option is the usual version in SystemVerilog, written as it is supposed to be, without going into too much detail in the circuitry to the extent that we did in the previous version. Here we will use such SystemVerilog features as the always_comb block (a signal to the synthesizer that everything included in it should be synthesized into a combinational circuit) and the logic type – variables that, unlike wires, can be used several times (the synthesizer will figure it out on its own , where and how to make wires), and they can also be used in different contexts.

Let's define the inputs and outputs of our module:

module high_level_float_adder(
    input logic[31:0] in_a,
    input logic[31:0] in_b,
    output logic[31:0] out
);

Everything is the same as before (only now the keyword logic has been added for convenience).

Let's describe the variables needed along the way:

logic [24:0] mantissa_a, mantissa_b, mantissa_sum;
logic [7:0] exponent_a, exponent_b;
logic result_sign;
logic [7:0] result_exponent;
logic [22:0] result_mantissa;

Let's say that a combination of the resulting sign, order and mantissa is fed to the out bus.

assign out = {result_sign, result_exponent, result_mantissa};

Essentially, in the code above we simply connected the wires so that result_sign, result_exponent and result_mantissa are in the correct out bits.

Now comes the fun part: the always_comb block, which contains all the real adder logic.

always_comb begin
    exponent_a = in_a[30:23];
    exponent_b = in_b[30:23];

    mantissa_a = {2'b01, in_a[22:0]};
    mantissa_b = {2'b01, in_b[22:0]};

    if (exponent_a >= exponent_b) begin
        mantissa_b = mantissa_b >> (exponent_a - exponent_b);
        result_exponent = exponent_a;
    end else begin
        mantissa_a = mantissa_a >> (exponent_b - exponent_a);
        result_exponent = exponent_b;
    end

    if (in_a[31] == in_b[31]) begin
        mantissa_sum = mantissa_a + mantissa_b;
        result_sign = in_a[31];  
    end else begin
        if (mantissa_a >= mantissa_b) begin
            mantissa_sum = mantissa_a - mantissa_b;
            result_sign = in_a[31];
        end else begin
            mantissa_sum = mantissa_b - mantissa_a;
            result_sign = in_b[31];
        end
    end

    if (mantissa_sum[24] == 1) begin
        mantissa_sum = mantissa_sum >> 1;
        result_exponent = result_exponent + 1;
    end else if (mantissa_sum[23] == 0) begin
        for (int i = 22; i >= 0; i = i - 1) begin
            if (mantissa_sum[i] == 1) begin
                mantissa_sum = mantissa_sum << (23 - i);
                result_exponent = result_exponent - (23 - i);
                break;
            end
        end
    end
      
    result_mantissa = mantissa_sum[22:0];
end

That's it. The functionality of all those modules, which in the previous version took up a bunch of files, fit into just 60 lines of code. Let's go through what's going on here.

exponent_a = in_a[30:23];
exponent_b = in_b[30:23];
mantissa_a = {2'b01, in_a[22:0]};
mantissa_b = {2'b01, in_b[22:0]};

Here we take out the orders and mantissas from the numbers and put them in the corresponding variables, not forgetting to add 1 to the mantissas.

if (exponent_a >= exponent_b) begin
	mantissa_b = mantissa_b >> (exponent_a - exponent_b);
	result_exponent = exponent_a;
end else begin
	mantissa_a = mantissa_a >> (exponent_b - exponent_a);
	result_exponent = exponent_b;
end

This part of the code does everything that the exponent_aligner module did in the previous version. We compare orders and shift the mantissa of the number whose order is lower by the difference. As you can see, for comparison and subtraction it is enough to simply write “>=” and “-”; it is not at all necessary to build adders and inverters from scratch, as we did earlier. The synthesizer will do this for us, seeing comparison signs and arithmetic operations.

if (in_a[31] == in_b[31]) begin
	mantissa_sum = mantissa_a + mantissa_b;
	result_sign = in_a[31];  
end else begin
	if (mantissa_a >= mantissa_b) begin
		mantissa_sum = mantissa_a - mantissa_b;
		result_sign = in_a[31];
	end else begin
		mantissa_sum = mantissa_b - mantissa_a;
		result_sign = in_b[31];
	end
end

This part replaces our adder, two inverters and two subtractors. Here the signs of the arguments are compared, after which either addition or subtraction occurs. And again abstraction. Here we no longer write the output of which module we should feed next depending on the sign; the “if” operator and variables do this for us.

if (mantissa_sum[24] == 1) begin
	mantissa_sum = mantissa_sum >> 1;
	result_exponent = result_exponent + 1;
end else if (mantissa_sum[23] == 0) begin
	for (int i = 22; i >= 0; i = i - 1) begin
		if (mantissa_sum[i] == 1) begin
			mantissa_sum = mantissa_sum << (23 - i);
			result_exponent = result_exponent - (23 - i);
			break;
		end
	end
end

result_mantissa = mantissa_sum[22:0];

This part is responsible for normalization. It replaces the whole nightmare of checks and shifts that was in the mantissa_normalizer module. We go in a loop until the first unit, after which we shift the mantissa, adjust the order and exit. The synthesizer will again understand what we want from the circuit and transform the code as necessary.

The assign operator written before the always_comb block makes sure that the resulting sign, order and mantissa are transferred to the corresponding bits of the out bus, and thus the circuit will give us what it has counted.

Let's check it with the following code:

`timescale 1ns / 1ps

module high_level_float_adder_tb;

    reg [31:0] sum;

    high_level_float_adder adder(
        .in_a(32'b1_01111101_10011001100110011001101), // -0.4
        .in_b(32'b0_01111101_00110011001100110011010), // 0.3
        .out(sum)
    );

    initial begin
        #10 $display("sum = %b", sum);
        #10 $stop;
    end

endmodule

Click “Set as top” for this simulation file and run it.

Simulation result

Simulation result

Converting to readable format

Converting to readable format

We see that the accuracy is still slightly lost, but nothing can be done about it; this is normal for real numbers. Our adder can now be synthesized, uploaded to an FPGA, and tested on real hardware. Just for fun, you can look at the diagram that the synthesizer will make for this code variant. Let's do all the same steps as in the synthesis of the first option, and we will get the diagram:

Scheme synthesized for option 2

Scheme synthesized for option 2

The scheme, of course, turned out to be huge, but if there is a madman who wants to delve into it, he will be able to see that it fulfills exactly the same logic as the first one. Well, since in the first version we determined all the components ourselves, but here we entrusted the transformation to the synthesizer, their location on the screen may differ.

Due to greater abstraction, this code becomes much more understandable to humans than the first, low-level option.

Conclusion

As it turns out, SystemVerilog isn't that scary after all. Once you break a task down into subtasks and create an outline, writing code becomes much easier. It is worth noting that I sincerely recommend writing as given in the second option. The first one is more suitable for crazy people like me and circuit design enthusiasts; in practice, no one usually does this, because such code turns out to be very voluminous and completely unreadable, and this increases the likelihood of making a mistake somewhere. In practice, you can write normal, readable code, and entrust all the dirty work of converting it into a circuit of modules to a synthesizer (but you still need to understand how your code is synthesized and works, there’s no way around it!)

That's all I have for now. I posted all the source code on my github: https://github.com/Yars2021/floating_adder/

Thank you for your attention and, I hope, the article helped someone, and SystemVerilog became a little more understandable and less intimidating!

Links

  1. https://numeral-systems.com/ieee-754-add/

  2. https://www.h-schmidt.net/FloatConverter/IEEE754.html

  3. https://ru.wikipedia.org/wiki/%D0%94%D0%BE%D0%BF%D0%BE%D0%BB%D0%BD%D0%B8%D1%82%D0%B5%D0 %BB%D1%8C%D0%BD%D1%8B%D0%B9_%D0%BA%D0%BE%D0%B4

  4. https://en.wikipedia.org/wiki/Programmable_logic_device

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *