Part 2: Direct Memory Access

Objective

This tutorial contains an introduction to the to the AXI DMA and how to get started using the AXI DMA.

References

AXI DMA v7.1 LogiCORE IP Product Guide, https://docs.amd.com/viewer/book-attachment/ePquvyIHSl7mKfi0ecEn4Q/CPPzqKuxCU1Q4a3rXOA1jw
Tutorial: PYNQ DMA (Part 1: Hardware design), https://discuss.pynq.io/t/tutorial-pynq-dma-part-1-hardware-design/3133
Tutorial: PYNQ DMA (Part 2: Using the DMA from PYNQ), https://discuss.pynq.io/t/tutorial-pynq-dma-part-2-using-the-dma-from-pynq/3134

1. Hardware Design

1.1. Direct Memory Access

Direct memory access (DMA) enables certain hardware subsystems to access system memory without relying on the CPU.

When doing data transfer without DMA, the CPU will be fully occupied for the entire duration of the read or write operation and is thus unavailable to perform other work.

With DMA, the CPU first initiates the transfer, then it does other operations while the transfer is in progress, and it finally receives an interrupt from the DMA controller when the operation is done.

1.2. Xilinx DMA

There are several IP DMAs contained in the Xilinx library, either in PS or PL:

The PS DMA controller (DMAC) provides a flexible DMA engine that can provide moderate levels of throughput with little PL logic resource usage. The DMAC resides in the PS and must be programmed via DMA instructions residing in memory, typically prepared by a CPU.
The AXI Direct Memory Access (AXI DMA) IP provides high-bandwidth direct memory access between memory and AXI4-Stream-type target peripherals.
The AXI Central Direct Memory Access (AXI CDMA) provides high-bandwidth Direct Memory Access (DMA) between a memory-mapped source address and a memory-mapped destination address using the AXI4 protocol.
The AXI Video Direct Memory Access (AXI VDMA) core is a soft AMD IP core that provides high-bandwidth direct memory access between memory and AXI4-Stream type video target peripherals.

In this tutorial we will focus only on AXI DMA.

1.3. System Design

This figure shows the AXI DMA IP. This DMA allows you to stream data from memory, specifically PS DRAM, to an AXI stream interface. This is called the READ channel of the DMA. The DMA can also receive data from an AXI stream and output it to PS DRAM. This is the WRITE channel.

The read and write access to PS DRAM is done via the high-performance AXI ports, AMBA interconnect, DRAM controller, and finally to the DRAM itself outside the Zynq chip.

For the READ channel, AXI DMA reads memory-mapped data from DRAM via the M_AXI_MM2S port. MM2S stands for memory-mapped-to-stream. Then, the data will be streamed out via the M_AXIS_MM2S port.

For the WRITE channel, AXI DMA receives stream data from AXI-Stream IP via the S_AXIS_S2MM port. S2MM stands for stream-to-memory-mapped. Then, the data will be written to the DRAM via the M_AXIS_S2MM port.

To control the AXI DMA operations, we can use S_AXI_LITE. We can send instructions to AXI DMA, such as the source address, destination address, and number of data to be transferred.

The AXI DMA can be configured in the IP configuration dialog.

For this project we need to do the following:

Uncheck Enable Scatter Gather Engine to disable Scatter Gather
Set the Width of Buffer Length Register to 26
Change the Address Width is to 40. In this example, I will connect the DMA to the PS memory which is 40-bit for Zynq Ultrascale+. You can set this to 32-bit if you are connecting this to Zynq-7000.
Set the Memory Map Data Width to 64 match the HP port.
Set the Stream Map Data Width to 64.

The AXI DMA master ports need to be connected to the DRAM via PS. This will be done through the PS HP (AXI Slave) ports. These ports are not enabled by default. Double click the Zynq PS block, and go to the PS-PL Configuration, expand HP Slave AXI Interface and enable S AXI HP0 and S AXI HP2 and set the data width to 64.

Internally there are two connections to the PS memory that the four HP ports are connected to. HP0 and HP1 share a switch to one port, and HP2 and HP3 share a switch to the other. The difference may not be noticeable for this example and some design, but when only two HP ports are required, it is more efficient to connect them to HP ports that don’t share a switch. i.e. HP 0 and HP 2 or HP 1 and HP 3 together.

After these ports are enabled, you can see it on the IP block.

Finally, this is our system block diagram, which consists of PS, AXI DMA, and AXIS FIFO. Here, we use the master AXI_GP_0 to connect to the AXI DMA control port via the AXI Interconnect. The AXI DMA data ports to DRAM are connected via the AXI_HP_0 and AXI_HP_2 ports.

This is the final block design diagram as shown in Vivado. The output of the AXI DMA MM2S is connected to the AXIS FIFO and then back to the AXI DMA S2MM.

2. Software Design

2.1. Hardware-Software Partition

In order to control AXI DMA in FPGA from our application, we can use the dma library from the PYNQ library. Without this PYNQ library, if you want to use AXI DMA under Linux, you have to understand the Linux kernel module. The PYNQ library provides abstractions to use AXI DMA in Python.

2.2. User Application

First, we need to create DMA, DMA send channel, and DMA receive channel objects.

# Access to AXI DMA
dma = overlay.axi_dma_0
dma_send = overlay.axi_dma_0.sendchannel
dma_recv = overlay.axi_dma_0.recvchannel

We define the maximum data word (64-bit) that can be processed for a single DMA transfer. The maximum data byte for a single DMA transfer is 67,108,863 bytes. We divide this by 8 because our memory-mapped data width is 64-bit.

# Maximum data that can be sent by AXI DMA for 1 transaction is 67108863 bytes
# floor(67108863 bytes/8) = 8388607 word (64-bit)
# We divide by 8 because we use uint64 data type
data_size = 8388607

Read DMA (MM2S)

We will read some data from DRAM, and write to AXIS FIFO.

The first step is to allocate the buffer. We use allocate() function to allocate the buffer, and NumPy will be used to specify the type of the buffer, which is unsigned int 64-bit in this case.

# Allocate physical memory for AXI DMA MM2S
input_buffer = allocate(shape=(data_size,), dtype=np.uint64)

The array can be used like any other NumPy array. We can write some test data to the array. Later the data will be transferred by the DMA to the FIFO.

# Write data to physical memory
for i in range(data_size):
    input_buffer[i] = i + 0xcafe000000000000

Let’s check the contents of the array.

# Check the written data
for i in range(10):
    print(hex(input_buffer[i]))

0xcafe000000000000
0xcafe000000000001
0xcafe000000000002
0xcafe000000000003
0xcafe000000000004
0xcafe000000000005
0xcafe000000000006
0xcafe000000000007
0xcafe000000000008
0xcafe000000000009

Now we are ready to carry out AXI DMA transfer from DRAM to AXIS FIFO.

# Do AXI DMA MM2S transfer
dma_send.transfer(input_buffer)

Write DMA (S2MM)

Let’s read the data back from AXIS FIFO, and write to DRAM. We will prepare an empty array before reading data back from FIFO.

# Allocate physical memory for AXI DMA S2MM
output_buffer = allocate(shape=(data_size,), dtype=np.uint64)

Let’s check the contents of the array to make sure it is empty.

# Check the memory content
for i in range(10):
    print(hex(output_buffer[i]))

0x0
0x0
0x0
0x0
0x0
0x0
0x0
0x0
0x0
0x0

Now we are ready to carry out AXI DMA transfer from AXIS FIFO to DRAM

# Do AXI DMA S2MM transfer
dma_recv.transfer(output_buffer)

Let’s check the contents of the array after DMA transfer, and compare it with the original data.

# Check the memory content after DMA transfer
for i in range(10):
    print(hex(output_buffer[i]))

0xcafe000000000000
0xcafe000000000001
0xcafe000000000002
0xcafe000000000003
0xcafe000000000004
0xcafe000000000005
0xcafe000000000006
0xcafe000000000007
0xcafe000000000008
0xcafe000000000009

# Compare arrays
print("Arrays are equal: {}".format(np.array_equal(input_buffer, output_buffer)))

Arrays are equal: True

Don’t forget to free the memory buffers to avoid memory leaks!

# Delete buffer to prevent memory leak
del input_buffer, output_buffer

3. Full Step-by-Step Tutorial

This video contains detailed steps for making this project.

4. Conclusion

In this tutorial, we covered some of the basics of AXI DMA with the PYNQ framework.

PreviousPart 1: Memory Mapped NextPart 3: AXI-Lite Multiplier

Last updated 5 months ago