Part 7: FPGA Memory

Objective

This tutorial contains information on how to use block memory generator. As an example, we are going to use the PE module for testing. The input and output for this module are from the BRAM.

Source Code

This repository contains all of the code required in order to follow this tutorial.

References

1. Overview

1.1. Block Memory Generator

Block memory generator is a dedicated memory block on the FPGA. This means that BRAM does not use flip-flop or LUT resources. This core has two fully independent ports that access a shared memory space. Both A and B ports have a write and a read interface.

Block memory has a limited size. Even on the high-end Zynq chip, the size is only 38.0 Mb (4.75 MB). On the Z7010, it is only 2.1 Mb (0.2625 MB).

Block memory can be added to the design using block design (GUI) or with Verilog/VHDL (Xilinx Parameterized Macros, XPM) code.

Block memory has two operating modes as shown in Figure 1.

  • BRAM controller

  • Stand Alone

Figure 1. Block memory mode

The Block Memory Generator core uses embedded block RAM to generate five types of memories as shown in Figure 2.

  • Single-port RAM

  • Simple Dual-port RAM

  • True Dual-port RAM

  • Single-port ROM

  • Dual-port ROM

Figure 2. Block memory type

In this tutorial, we are going to use standalone mode and true dual-port RAM type.

1.2. BRAM Controller Mode

In this mode, the block memory should be used together with the AXI BRAM controller IP. In this mode, most of the block memory settings are grayed out as shown in Figure 3.

Figure 3. Block memory settings in BRAM controller mode

The size of the memory can be configured from Address Editor of the AXI BRAM Controller IP, instead of the block memory configuration wizard.

Figure 4. Configure block memory size in BRAM controller mode

The relationship between address range and data depth (32-bit or 64-bit) is shown in the table below.

Range
Depth (32-bit)
Depth (64-bit)

4K

1024

512

8K

2048

1024

16K

4096

2048

1.3. BRAM Standalone Mode

In this mode, we can change the block memory data width and size, as well as other settings. This memory type can be used for internal use within our RTL module, which is not connected directly to the PS.

Figure 5. Block memory settings in standalone mode

1.4. BRAM Timing Diagram

The following figure shows the BRAM write timing diagram for the BRAM controller mode. Every address is incremented every 4 because the address is 32-bit. Every piece of data is byte-addressable, as indicated by the we signal.

Figure 6. BRAM write timing diagram

The following figure shows the BRAM read timing diagram for the BRAM controller mode. The address for output latency is one clock cycle.

Figure 7. BRAM read timing diagram

The reset type for BRAM is active-high.

1.5. Accessing BRAM from PS

From the software side, we can write and read the data to and from BRAM using a simple memory map program. We initialize a pointer mem_p to the base address of the BRAM. Then, we can use this pointer to write and read the data.

2. PE Module

2.1. Control and Datapath

The following figure shows the PE module. It consists of a multiplier and an adder. The circuit is a combinational circuit.

Figure 8. A processing element (PE)

Next, we have to design the top module for this PE module, as shown in the following figure. We add pipeline registers to the input and output of the PE module. Then, we add a control unit, which is implemented as a counter. This module controls the BRAM input and output. For the start signal, we use a rising edge detector that is implemented with a register, a not gate, and an and gate.

Figure 9. PE top module block diagram

This is the Verilog implementation of the PE top module.

2.2. Testbench

Now, we already have the PE top module. We can test this module using a testbench file. The block design that we are going to test is shown in the following figure. The PE top module is connected to the BRAM input and output.

Figure 10. PE top module for simulation

This is the Verilog testbench for the PE top module.

The following figure shows the timing diagram of the PE top module. First, the data is written to the BRAM input. Then, the module starts when the start signal is one. For the duration of the computation, the ready signal is zero, indicating that the PE top module is busy. Finally, the PE result is stored in the output BRAM.

Figure 11. Timing diagram of PE top module

3. System Design

The following figure shows the overall SoC system design for the PE top module. We use the AXI BRAM controller IP to connect PS and BRAM. We use AXI GPIO for the control signal, the start signal, and the ready signal.

Figure 12. Block diagram of PE system

The following figure shows the block design in Vivado.

Figure 13. Block diagram of PE system in Vivado block design

4. Software Design

For the software design, first we write the input data to BRAM input. Then, we start the PE module by writing 1 to the AXI GPIO channel 0. After that, we read the ready signal from the AXI GPIO channel 1 and wait until it is 1. Finally, we can read the output data from the BRAM output.

5. Result

The following figure shows the result on the serial terminal. We can verify the result manually.

Figure 14. PE top module output printed on the serial terminal

6. Conclusion

In this tutorial, we covered a project on how to use BRAM with an example of a PE module.

Last updated