Part 2: RT-Level Combinational Circuit

Objective

This tutorial contains information about RT-level combinational circuit and design examples of combinational circuits.

Source Code

This repository contains all of the code required in order to follow this tutorial.

zybo_tutorial/part_2 at main · weenslab/zybo_tutorialGitHub

References

Pong P. Chu, FPGA Prototyping by Verilog Examples, https://onlinelibrary.wiley.com/doi/book/10.1002/9780470374283
Register-transfer level, https://en.wikipedia.org/wiki/Register-transfer_level
Fixed Point and Floating Point Number Representations, https://www.tutorialspoint.com/fixed-point-and-floating-point-number-representations

1. Verilog

1.1. RTL Design

In the previous tutorial, we only used logic gates to design circuits. In his tutorial, we are going to use a higher abstraction level, which is the register-transfer level (RTL).

In digital circuit design, RTL is a design abstraction which models a synchronous digital circuit in terms of the flow of digital signals (data) between hardware registers, and the logical operations performed on those signals.

A synchronous circuit consists of two kinds of elements: registers (sequential logic) and combinational logic. Registers (usually implemented as D flip-flops) synchronize the circuit's operation to the edges of the clock signal, and are the only elements in the circuit that have memory properties. Combinational logic performs all the logical functions in the circuit and it typically consists of logic gates. The following figure shows an example of synchronous circuit.

1.2. Operators

1.2.1. Arithmetic Operators

There are six arithmetic operators: +, -, *, /, %, and **. They represent addition, subtraction, multiplication, division, modulus, and exponentiation operations, respectively. The + and - operators can also be used as unary operators, as in -a.

During synthesis, the + and - operators infer the adder and subtractor and they are synthesized by FPGA's logic cells. Multiplication is a complicated operation and synthesis of the multiplication operator * depends on synthesis software and target device technology. The /, %, and ** operators usually cannot be synthesized automatically.

1.2.2. Relational and Equality Operators

There are four relational operators: >, <, <=, and >=. These operators compare two operands and return a Boolean result, which can be false (represented by 1-bit scalar value 0) or true (represented by 1-bit scalar value 1).

wire [3:0] a, b, c;
wire x, y, z;

assign a = 4'd1;
assign b = 4'd8;
assign c = b;

assign x = a > b; // x = 0
assign y = a < b; // y = 1
assign z = b >= c; // z = 1

There are two commonly used equality operators: == and !=. As with the relational operators, they return false (1-bit 0) or true (1-bit 1).

assign x = a == b; // x = 0
assign y = a != b; // y = 1
assign z = b == c; // z = 1

1.2.3. Bitwise, Reduction, and Logical Operators

There are four basic bitwise operators: & (and), | (or), ^ (xor), and ~ (not). The first three operators require two operands. Negation and xor operation can be combined, as in ~^, to form the xnor operator. The operations are performed on a bit-by-bit basis and thus are known as bitwise operators. For example, let a, b, and c be 4-bit signals:

wire [3:0] a, b, c;

This statement

assign c = a | b;

is the same as

assign c[3] = a[3] | b[3];
assign c[2] = a[2] | b[2];
assign c[1] = a[1] | b[1];
assign c[0] = a[0] | b[0];

The previous &, | , and ^ operators may have only one operand and then are known as reduction operators. The single operand usually has an array data type. The designated operation is performed on all elements of the array and returns a 1-bit result. For example, let a be a 4-bit signal and y be a 1-bit signal:

wire [3:0] a;
wire y;

This statement

assign y = |a;

is the same as

assign y = a[3] | a[2] | a[1] | a[0];

There are three logical operators: && (logical and), || (logical or), and ! (logical negate). The logical operators are different from the bitwise operators. If we assume that no x or z is used, the operands of a logical operator are interpreted as false (when all bits are 0's) or true (when at least one bit is 1), and the operation always returns a 1-bit result.

wire [3:0] a, b;
wire x, y, z;

assign a = 4'b0001;
assign b = 4'd1001;

assign x = !a; // x = 0
assign y = a && 0; // y = 0
assign z = b || 0; // z = 1

1.2.4. Concatenation and Replication Operators

The concatenation operator, { } combines segments of elements and small arrays to form a large array. The following example illustrates its use:

wire a1;
wire [3:0] a4;
wire [7:0] b8, c8, d8;

assign b8 = {a4, a4};
assign c8 = {al, al, a4, 2'bOO};
assign d8 = {b8[3:0], c8[3:0]};

Implementation of the concatenation operator involves reconnection of the input and output signals and only requires "wiring."

One application of the concatenation operator is to shift and rotate a signal by a fixed amount, as shown in the following example:

wire [7:0] a;
wire [7:0] rot, shl, sha;

// Rotate a to right 3 bits
assign rot = {a[2:0], a[7:3]};
// Shift a to right 3 bits and insert 0 (logic shift)
assign shl = {3'b000, a[7:3]};
// Shift a to right 3 bits and insert MSB (arithmetic shift)
assign sha = {a[7], a[7], a[7], a[7:3]};

The concatenation operator, {N{ }}, replicates the enclosed string. The replication constant, N, specifies the number of replications. For example, {4{2'b01}} returns 8'b01010101. The previous arithmetic shift operation can be simplified with replication operator.

wire [7:0] a;
wire [7:0] sha;

// Replication of a constant
assign a = {4{2'b01}}; // a = 8'b01010101
// Arithmetic shift using concatenation and replication
assign sha = {{3{a[7]}}, a[7:3]};

1.2.5. Conditional Operators

The conditional operator, ? :, takes three operands and its general format is

[signal] = [boolean_exp] ? [true_exp] : [false_exp];

For example, the following circuit obtains the maximum of a and b:

assign max = (a > b) ? a : b;

Despite its simplicity, the conditional operators can be cascaded or nested to specify the desired selection, for example:

assign eq = (~i1 & ~i0) ? 1'b1 :
            (~i1 &  i0) ? 1'b0 :
            ( i1 & ~i0) ? 1'b0 : 1'b1;

While synthesized, a conditional operator infers a 2-to-1 multiplexing circuit.

1.3. Procedural Assignment

To facilitate system modeling, Verilog contains a number of procedural statements, which are executed in sequence. Since their behavior is different from the normal concurrent circuit model, these statements are encapsulated inside an always block or initial block.

An always block can be thought of as a black box whose behavior is described by the internal procedural statements. The initial block can be used in simulation testbench. Since the procedural statement is more abstract, this type of code is sometimes known as behavioral description.

The simplified syntax of an always block with a sensitivity list (also known as event control expression) is

always @([sensitivity_list])
begin
    [optional local variable declaration];
    
    [procedural statement];
    [procedural statement];
    ...
end

The [sensitivity_list] term is a list of signals and events to which the always block responds (i.e., is "sensitive to"). For a combinational circuit, all the input signals should be included in this list or use the wildcard *. An always block actually "loops forever" and the initiation of each loop is controlled by the sensitivity list.

There are two types of procedural assignments: blocking assignment and non-blocking assignment. Their basic syntax is

[variable_name] = [expression];  // Blocking assignmnet
[variable_name] <= [expression]; // Non-blocking assignmnet

In a blocking assignment, the expression is evaluated and then assigned to the variable immediately, before execution of the next statement (the assignment thus "blocks" the execution of other statements). It behaves like the normal variable assignment in the C language. In a non-blocking assignment, the evaluated expression is assigned at the end of the always block (the assignment thus does not block the execution of other statements).

The blocking and non-blocking assignments frequently confuse new Verilog users and failing to comprehend their differences can lead to unexpected behavior or race conditions. The basic rule of thumb is:

Use blocking assignments for a combinational circuit.
Use non-blocking assignments for a sequential circuit.

The two most commonly used statements in procedural assignments are the if and case statements.

1.3.1. If Statement

The simplified syntax of an if statement is:

if [boolean_expr_1]
begin
    [procedural statement];
    [procedural statement];
    ...
end
else if [boolean_expr_2]
begin
    [procedural statement];
    [procedural statement];
    ...
end
...
else
begin
    [procedural statement];
    [procedural statement];
    ...
end

1.3.2. Case Statement

The simplified syntax of a case statement is

case [case_expr]
    [item]:
    begin
        [procedural statement];
        [procedural statement];
        ...    
    end
    [item]:
    begin
        [procedural statement];
        [procedural statement];
        ...    
    end
    ...
    default:
    begin
        [procedural statement];
        [procedural statement];
        ...    
    end
endcase

1.4. Parameter and Constant

A Verilog module can be instantiated as a component and becomes a part of a larger design. Verilog provides a construct, known as a parameter, to pass information into a module. Its simplified syntax is

module [module_name]
    #(
        parameter [parameter_name] = [default_value],
                  ...
                  [parameter_name] = [default_value];
    )
    (
        // I/O port declaration
    );

In Verilog, a constant can be declared using the localparam (for "local parameter") keyword. For example, we can declare the width and range of a data bus as

localparam DATA_WIDTH = 8,
           DATA_RANGE = 2**DATA_WIDTH-1;

1.5. Coding Guidelines

1.5.1. Common Errors

Following are common errors found in combinational circuit codes:

Variable assigned in multiple always blocks
Incomplete sensitivity list
Incomplete branch and incomplete output assignment

Variable assigned in multiple always blocks. In Verilog, variables can be assigned (i.e., appear on the left-hand side) in multiple always blocks. For example, the y variable is shared by two always blocks is the following code segment:

reg y, a, b, clear;

always @(*)
    if (clear)
        y = 1'b0;
        
always @(*)
    y = a & b;

Although the code is syntactically correct and can be simulated, it cannot be synthesized. Recall that each always block can be interpreted as a circuit part. The code above indicates that y is the output of both circuit parts and can be updated by each part. No physical circuit exhibits this kind of behavior and thus the code cannot be synthesized. We must group the assignment statements in a single always block, as in

always @(*)
    if (clear)
        y = 1'b0;
    else
        y = a & b;

Incomplete sensitivity list. For a combinational circuit, the output is a function of input and thus any change in an input signal should activate the circuit. This implies that all input signals should be included in the sensitivity list. For example, a two-input and gate can be written as

always @(a, b) // Both a and b are in the sensitivity list
    y = a & b;

If we forget to include b, the code becomes

always @(a) // b missing from sensitivity list
    y = a & b;

Although the code is still syntactically correct, its behavior is very different. When a changes, the always block is activated and y gets the value of a&b. When b changes, the always block remains suspended since it is not "sensitive to" b and y keeps its previous value. No physical circuit exhibits this kind of behavior. Most synthesis software will issue a warning message and infer the and gate instead.

In Verilog-2001, a special notation, @(*) is introduced to implicitly include all the relevant input signals and thus eliminates this problem.

Incomplete branch and incomplete output assignment. The output of a combinational circuit is a function of input only and the circuit should not contain any internal state (i.e., memory). One common error with an always block is the inference of unintended memory (inferred latch) in a combinational circuit.

Consider the following code segment, which intends to describe a circuit that generates greater-than gt and equal-to eq output signals.

always @(*)
    if (a > b)       // eq is not assigned in this branch
        gt = 1'b1;
    else if (a == b) // gt is not assigned in this branch
        eq = 1'b1;
                     // final else branch is omitted

Let us first examine the incomplete branch error. There is no else branch in the segment. If both the a>b and a==b expressions are false, both gt and eq are not assigned values. According to Verilog definition, they keep their previous values (i.e., the outputs depend on the internal state) and unintended latches are inferred.

The segment also has incomplete output assignment errors. For example, when the a>b expression is true, eq is not assigned a value and thus will keep its previous state. A latch will be inferred accordingly.

There are two ways to fix the errors. The first is to add the else branch and explicitly assign all output variables. The code becomes

always @(*)
    if (a > b)
    begin
        gt = 1'b1;
        eq = 1'b0;
    end
    else if (a == b)
    begin
        gt = 1'b0;
        eq = 1'b1;
    end
    else // i.e., a < b
    begin
        gt = 1'b0;
        eq = 1'b0;    
    end

The alternative is to assign a default value to each variable in the beginning of the always block to cover the unspecified branch and unassigned variable. The code becomes

always @(*)
begin
    gt = 1'b0; // Default value for gt
    eq = 1'b0; // Default value for eq
    if (a > b)
        gt = 1'b1;
    else if (a == b)
        eq = 1'b1;
end

Both gt and eq assume 0 if they are not assigned a value later.

The case statement experiences the same errors if some values of the [case_expr] expression are not covered by the item expressions. Consider the following code segment:

reg [1:0] s;

always @(*)
begin
    case (s)
        2'b00: y = 1'b1;
        2'b10: y = 1'b0;
        2'b11: y = 1'b1;
    endcase
end

The 2'b01 value is not covered by any branch. If s assumes this combination, y will keep its previous value and an unintended latch is inferred.

To fix this error, we have to add a default value:

always @(*)
begin
    case (s)
        2'b00: y = 1'b1;
        2'b10: y = 1'b0;
        2'b11: y = 1'b1;
        default: y = 1'b0;
    endcase
end

Alternatively, we can assign a default value in the beginning of the always block:

always @(*)
begin
    y = 1'b0; // Default value for y
    case (s)
        2'b00: y = 1'b1;
        2'b10: y = 1'b0;
        2'b11: y = 1'b1;
    endcase
end

1.5.2. Guidelines

Following are the coding guidelines for the description of combinational circuits:

Assign a variable only in a single always block.
Use blocking statements for combinational circuits.
Use @(*) to include all inputs automatically in the sensitivity list.
Make sure that all branches of the if and case statements are included.
Make sure that the outputs are assigned in all branches.
One way to satisfy the two previous guidelines is to assign default values for outputs in the beginning of the always block.
Think hardware, not C code.

2. Fixed Point Representation

There are two major approaches to store real numbers (i.e., numbers with fractional component) in modern computing. These are (i) fixed point notation and (ii) floating point notation.

In fixed point notation, there are a fixed number of digits after the decimal point, whereas floating point number allows for a varying number of digits after the decimal point.

In digital signal processing (DSP) applications, where performance is usually more important than precision, fixed point data encoding is extensively used.

Fixed point representation has fixed number of bits for integer part and for fractional part. There are three parts of a fixed-point number representation: the sign field, integer field, and fractional field.

The Q notation is a way to specify the parameters of a binary fixed point number format. For example, in Q notation, the number format denoted by Q5.10 means that the fixed point numbers in this format have 5 bits for the integer part and 10 bits for the fraction part.

Assume number is using 16-bit format (Q5.10) which reserve 1 bit for the sign, 5 bits for the integer part and 10 bits for the fractional part. Then, +1.5 and -1.5 are represented as following:

wire [15:0] a, b;

assign a = 16'b0_00001_1000000000 // +1.5
assign b = 16'b1_11111_1000000000 // -1.5

For example a, 0 is used to represent + sign. 00001 is 5 bits 2's complement value for decimal 1 and 1000000000 is 10 bits binary value for fractional 0.5.

For example b, 1 is used to represent - sign. 11111 is 5 bits 2's complement value for decimal -1 and 1000000000 is 10 bits binary value for fractional 0.5.

3. Design Examples

3.1. Processing Element

In this circuit, we design a basic processing element (PE) for many digital signal processing systems. It consists of an adder and a multiplier. This operation is called multiply-accumulate (MAC).

The following code is the implementation of the PE in Verilog. It uses continuous assignment.

pe.v

module pe
    #( 
        parameter WIDTH = 16,
        parameter FRAC_BIT = 10
    )
    (
        input wire signed [WIDTH-1:0]  a_in,
        input wire signed [WIDTH-1:0]  y_in,
        input wire signed [WIDTH-1:0]  b,
        output wire signed [WIDTH-1:0] a_out,
        output wire signed [WIDTH-1:0] y_out
    );
    
    wire signed [WIDTH*2-1:0] y_out_i;
    
    assign a_out = a_in;
    assign y_out_i = a_in * b;
    assign y_out = y_in + y_out_i[WIDTH+FRAC_BIT-1:FRAC_BIT];

endmodule

In this module, we use a fixed-point representation that can be configured in parameters. The default configuration for this module Q5.10.

The following code is the testbench for the PE module.

pe_tb.v

`timescale 1ns / 1ps

module pe_tb();
    localparam T = 10;
    
    reg signed [WIDTH-1:0] a_in, y_in, b;
    wire signed [WIDTH-1:0] a_out, y_out;
    
    pe#(.WIDTH(16), .FRAC_BIT(10))
    dut(.a_in(a_in), .y_in(y_in), .b(b), .a_out(a_out), .y_out(y_out));
    
    initial
    begin
        a_in = 0; y_in = 0; b = 0;
        #T;
        a_in = 16'b0_00000_1000000000;
        y_in = 16'b0_00000_1000000000;
        b = 16'b0_00001_0000000000;
        #T; 
        a_in = 16'b0_00000_1010100001;
        y_in = 16'b1_11110_1110000110;
        b = 16'b0_00101_1010111100;
        #T;
        a_in = 0; y_in = 0; b = 0;
        #T;
    end
endmodule

This is the simulation result of the PE circuit.

Then, we can add the PE module to the Vivado block design and add a virtual input/output (VIO) IP. This VIO IP functions as virtual input/output because the FPGA board doesn't have enough switches and LEDs to be connected to the PE module.

This is the constraints for this block design. We only use clock signal.

#Clock signal
set_property -dict { PACKAGE_PIN L16   IOSTANDARD LVCMOS33 } [get_ports { clk }]; #IO_L11P_T1_SRCC_35 Sch=sysclk
create_clock -add -name sys_clk_pin -period 8.00 -waveform {0 4} [get_ports { clk }];

The following figure shows the synthesis result.

We can view the detailed resource utilization. The PE module only uses 16 LUTs and 1 DSP block. Most of the resources are for the VIO IP.

The following figure shows how we can test the PE module using the VIO interface in Vivado (after programming the FPGA).

3.2. Multiplexer

A multiplexer (or mux), also known as a data selector, is a device that selects between several input signals and forwards the selected input to a single output line as shown in the following figure.

The following code is the implementation of the 2-to-1 multiplexer in Verilog. It uses continuous assignment with the conditional operator ? :.

mux_2to1.v

module mux_2to1
    #( 
        parameter WIDTH = 8
    )
    (
        input wire [WIDTH-1:0]  a,
        input wire [WIDTH-1:0]  b,
        input wire [0:0]        sel,
        output wire [WIDTH-1:0] y
    );

    assign y = (sel == 1'b0) ? a : b;  

endmodule

The same circuit is implemented using procedural assignment.

module mux_2to1
    #( 
        parameter WIDTH = 8
    )
    (
        input wire [WIDTH-1:0]  a,
        input wire [WIDTH-1:0]  b,
        input wire [0:0]        sel,
        output reg [WIDTH-1:0]  y
    );

    always @(*)
        if (sel == 1'b0)
            y = a;
        else
            y = b;  

endmodule

The following code is the testbench for the 2-to-1 multiplexer.

mux_2to1_tb.v

`timescale 1ns / 1ps

module mux_2to1_tb();
    localparam T = 10;
    
    reg [7:0] a, b;
    reg [0:0] sel;
    wire [7:0] y;
    
    mux_2to1#(.WIDTH(8))
    dut(.a(a), .b(b), .sel(sel), .y(y));
    
    initial
    begin
        a = 8; b = 16;
        
        sel = 0; #T;
        sel = 1; #T;
        
        a = 0; b = 0;
    end
endmodule

This is the simulation result of the multiplexer module.

3.3. Shifter

In this circuit, we implement a bit shifter. The data input is 8-bit. This input can be left or right shifted, controlled by dir signal. The amount of shift is controlled by amt signal. The circuit implementation uses an 8-to-1 multiplexer.

The following code is the implementation of the shifter in Verilog. It uses procedural assignment with case statement.

shifter.v

module shifter
    (
        input wire [7:0]  a,
        input wire        dir, // 0: left, 1: right
        input wire [1:0]  amt,
        output wire [7:0] y
    );

    reg [7:0] y_tmp;
    
    // Module body using "always block"
    always @(a or dir or amt) // or
//    always @(*) // Wildcard, produce the same result
    begin
        case ({dir, amt})
            3'b000: y_tmp = {a[6:0], 1'b0};
            3'b001: y_tmp = {a[5:0], 2'b00};
            3'b010: y_tmp = {a[4:0], {3{1'b0}}}; // Replicate bit
            3'b011: y_tmp = {a[3:0], 4'b0000};
            3'b100: y_tmp = {1'b0, a[7:1]};
            3'b101: y_tmp = {2'b00, a[7:2]};
            3'b110: y_tmp = {3'b000, a[7:3]};
            3'b111: y_tmp = {4'b0000, a[7:4]};
            default: y_tmp = 8'h00;
        endcase
    end
    
    // Module body using continuous assignment
    assign y = y_tmp;  

endmodule

The following code is the testbench for the shifter.

shifter_tb.v

`timescale 1ns / 1ps

module shifter_tb();
    localparam T = 10;
    
    reg [7:0] a;
    reg dir;
    reg [1:0] amt;
    wire [7:0] y;
    
    shifter dut(.a(a), .dir(dir), .amt(amt), .y(y));
    
    initial
    begin
        dir = 0;     
        a = 8'b10101100; amt = 0; #T;
        a = 8'b10101100; amt = 1; #T;
        a = 8'b10101100; amt = 2; #T;
        a = 8'b10101100; amt = 3; #T;
        
        dir = 1;
        a = 8'b10101100; amt = 0; #T;
        a = 8'b10101100; amt = 1; #T;
        a = 8'b10101100; amt = 2; #T;
        a = 8'b10101100; amt = 3; #T;
        
        a = 0;
        amt = 0;
    end 
endmodule

This is the simulation result of the shifter module.

3.4. Lookup Table

A lookup table (LUT) circuit is often used to implement complex mathematical operations that are not easy to implement in FPGA.

For example, the sigmoid function is defined by the following formula. This function is commonly used in neural networks as an activation function.

\sigma(x)=\frac{1}{1+e^{-x}}

Since exponential implementation in Verilog is not easy, So we can create a table that maps input x to output sigmoid as follows:

σ(x)

...

-0.1250

0.4687500000

-0.0625

0.4843750000

+0.0000

0.5000000000

+0.0625

0.5146484375

...

Then, we can create a circuit that implements this table as shown in the following figure.

The following code is the implementation of the sigmoid LUT in Verilog. It uses procedural assignment and continuous assignment. We use fixed point Q3.4 for input and fixed point Q5.10 for output.

lut_sigmoid.v

module lut_sigmoid
    (
        input wire         en,
        input wire [7:0]   x,  // Fixed-point: 1 sign, 3 integer, 4 fraction
        output wire [15:0] sig // Fixed-point: 1 sign, 5 integer, 10 fraction
    );
    
    reg [15:0] sig_mem;
    
    // Module body using "always block"
    always @(x)
    begin
        case (x)
            // Add more ...
            8'b1_111_1110: sig_mem = 16'b0_00000_0111100000; // sig(-0.1250)
            8'b1_111_1111: sig_mem = 16'b0_00000_0111110000; // sig(-0.0625)
            8'b0_000_0000: sig_mem = 16'b0_00000_1000000000; // sig(+0.0000)
            8'b0_000_0001: sig_mem = 16'b0_00000_1000001111; // sig(+0.0625)
            // Add more ...
            default: sig_mem <= 16'h0000;
        endcase
    end
    
    // Module body using continuous assignment
    assign sig = (en == 1'b1) ? sig_mem : 16'h0000;
    // Module body using module instantiation (produce the same result)
//    mux_2to1 #(16) mux_2to1_0(16'h0000, sig_mem, en, sig);
    
endmodule

The following code is the testbench for the sigmoid LUT.

lut_sigmoid_tb.v

`timescale 1ns / 1ps

module lut_sigmoid_tb();
    localparam T = 10;
    
    reg en;
    reg [7:0] x;
    wire [15:0] sig;
    
    lut_sigmoid dut(.en(en), .x(x), .sig(sig));
    
    initial
    begin
        en = 1;
        
        x = 8'b1_111_1110; #T;
        x = 8'b1_111_1111; #T;
        x = 8'b0_000_0000; #T;
        x = 8'b0_000_0001; #T;   
          
        x = 0;
    end   
endmodule

This is the simulation result of the sigmoid LUT module.

4. Conclusion

In this tutorial, we covered the RT-level combinational circuit, coding guidelines, and design examples of combinational circuits.

PreviousPart 1: Gate-Level Combinational Circuit NextPart 3: Regular Sequential Circuit

Last updated 10 months ago