This tutorial contains information on how to use block memory generator. As an example, we are going to use the PE module for testing. The input and output for this module are from the BRAM.
This repository contains all of the code required in order to follow this tutorial.
1.1. Block Memory Generator
Block memory generator is a dedicated memory block on the FPGA. This means that BRAM does not use flip-flop or LUT resources. This core has two fully independent ports that access a shared memory space. Both A and B ports have a write and a read interface.
Block memory has a limited size. Even on the high-end Zynq chip, the size is only 38.0 Mb (4.75 MB). On the Z7010, it is only 2.1 Mb (0.2625 MB).
Block memory can be added to the design using block design (GUI) or with Verilog/VHDL (Xilinx Parameterized Macros, XPM) code.
Block memory has two operating modes as shown in Figure 1.
Figure 1. Block memory mode The Block Memory Generator core uses embedded block RAM to generate five types of memories as shown in Figure 2.
Figure 2. Block memory type In this tutorial, we are going to use standalone mode and true dual-port RAM type.
1.2. BRAM Controller Mode
In this mode, the block memory should be used together with the AXI BRAM controller IP. In this mode, most of the block memory settings are grayed out as shown in Figure 3.
Figure 3. Block memory settings in BRAM controller mode The size of the memory can be configured from Address Editor of the AXI BRAM Controller IP, instead of the block memory configuration wizard.
Figure 4. Configure block memory size in BRAM controller mode The relationship between address range and data depth (32-bit or 64-bit) is shown in the table below.
Range
Depth (32-bit)
Depth (64-bit)
1.3. BRAM Standalone Mode
In this mode, we can change the block memory data width and size, as well as other settings. This memory type can be used for internal use within our RTL module, which is not connected directly to the PS.
Figure 5. Block memory settings in standalone mode 1.4. BRAM Timing Diagram
The following figure shows the BRAM write timing diagram for the BRAM controller mode. Every address is incremented every 4 because the address is 32-bit. Every piece of data is byte-addressable, as indicated by the we signal.
Figure 6. BRAM write timing diagram The following figure shows the BRAM read timing diagram for the BRAM controller mode. The address for output latency is one clock cycle.
Figure 7. BRAM read timing diagram The reset type for BRAM is active-high.
1.5. Accessing BRAM from PS
From the software side, we can write and read the data to and from BRAM using a simple memory map program. We initialize a pointer mem_p to the base address of the BRAM. Then, we can use this pointer to write and read the data.
2.1. Control and Datapath
The following figure shows the PE module. It consists of a multiplier and an adder. The circuit is a combinational circuit.
Figure 8. A processing element (PE) Next, we have to design the top module for this PE module, as shown in the following figure. We add pipeline registers to the input and output of the PE module. Then, we add a control unit, which is implemented as a counter. This module controls the BRAM input and output. For the start signal, we use a rising edge detector that is implemented with a register, a not gate, and an and gate.
Figure 9. PE top module block diagram This is the Verilog implementation of the PE top module.
Now, we already have the PE top module. We can test this module using a testbench file. The block design that we are going to test is shown in the following figure. The PE top module is connected to the BRAM input and output.
Figure 10. PE top module for simulation This is the Verilog testbench for the PE top module.
The following figure shows the timing diagram of the PE top module. First, the data is written to the BRAM input. Then, the module starts when the start signal is one. For the duration of the computation, the ready signal is zero, indicating that the PE top module is busy. Finally, the PE result is stored in the output BRAM.
Figure 11. Timing diagram of PE top module 3. System Design
The following figure shows the overall SoC system design for the PE top module. We use the AXI BRAM controller IP to connect PS and BRAM. We use AXI GPIO for the control signal, the start signal, and the ready signal.
Figure 12. Block diagram of PE system The following figure shows the block design in Vivado.
Figure 13. Block diagram of PE system in Vivado block design 4. Software Design
For the software design, first we write the input data to BRAM input. Then, we start the PE module by writing 1 to the AXI GPIO channel 0. After that, we read the ready signal from the AXI GPIO channel 1 and wait until it is 1. Finally, we can read the output data from the BRAM output.
The following figure shows the result on the serial terminal. We can verify the result manually.
Figure 14. PE top module output printed on the serial terminal In this tutorial, we covered a project on how to use BRAM with an example of a PE module.