Part 4: AXI-Stream Multiplier with DMA
Last updated
Last updated
This tutorial contains information on how to create a simple AXI-Stream IP core in Verilog. The IP core does a simple multiplication operation. We use AXI DMA to write to and read from the AXI-Stream multiplier. Then, we are going to compare the performance result of the AXI DMA multiplier in this tutorial with the AXI-Lite multiplier in the previous tutorial.
Unlike the memory map interface, the stream interface does not have an address. This interface is mainly used for point-to-point data transfer between IP modules in the FPGA. In the AXI-Stream interface, the sender is known as a master and the receiver a slave. The data moves only in one direction, from master port to slave port.
This is the same RTL module that did simple multiplication in the previous tutorial. It has one 32-bit input a
and one 32-bit output r
. Every input will be multiplied by 8
to produce the output.
Now, we are going to make a wrapper for the AXI-Stream interface. Later, this stream interface will be connected to the AXI DMA. So, the multiplier core can get access to the PS DRAM via the AXI DMA.
This is the Verilog code for the wrapper module.
The AXI-Stream slave port is the input port for the data to the multiplier core, and the AXI-Stream master port is the output port for the data from the multiplier core. The s_axis_*
indicates the slave port, and the m_axis_*
indicates the master port. Every AXI-Stream port usually has the following signals:
The tready
signal indicates that a receiver can accept a transfer.
The tdata
is the primary signal used to provide the data that is passing across the interface.
The tvalid
signal indicates the sender is driving a valid transfer. A transfer takes place when both tvalid
and tready
are one.
The tlast
signal indicates the boundary of a packet.
Since our multiplier is a simple combinational circuit, we can just connect the control signal (tready
, tvalid
, tlast
) from the slave to the master port.
This diagram shows our system. It consists of an ARM CPU, DRAM, AXI DMA, and our AXI-Stream multiplier module. Our AXI-Stream multiplier module is connected to the AXI DMA. Between the multiplier module and AXI DMA, we also add AXI-Stream FIFO IP.
The following figure shows the Zynq IP high-performance port configuration. There are two pots enabled, which are AXI HP0 and AXI HP2.
The following figure shows the AXI DMA IP configuration. The read and write channel data width is 32-bit.
This is the final block design diagram as shown in Vivado.
First, we need to create DMA, DMA send channel, and DMA receive channel objects.
Then, we need to allocate the buffer. We use allocate()
function to allocate the buffer, and NumPy will be used to specify the type of the buffer, which is unsigned int 32-bit in this case.
Next, write the input to be multiplied to the input_buffer
.
We do the MM2S and S2MM DMA transfers. The MM2S DMA reads the input_buffer
and then sends it to the multiplier. The S2MM DMA reads the multiplier output and then sends it to the output_buffer
.
We print the multiplication result from the output_buffer
.
We can create a function to do a multiplication like this:
We can calculate the time required to do 1 million multiplications. Then, we can compare this with the AXI-Lite multiplier module.
The result from AXI-Lite multiplier in the previous tutorial:
Compared to the AXI-Lite multiplier, the AXI-Stream multiplier (with AXI DMA) speeds up the calculation by 13.79.
Don’t forget to free the memory buffers to avoid memory leaks!
This video contains detailed steps for making this project.
In this tutorial, we covered some of the basics of AXI-Stream based IP core creation integrated with AXI DMA.