📈
Ween's Lab
UdemyYouTubeTikTok
  • Welcome
  • 📻FPGA Tutorials
    • FPGA Boards: Getting Started
      • Getting Started with PYNQ on Kria KV260 Vision AI Starter Kit
      • Getting Started with PYNQ on Red Pitaya STEMlab 125-14
      • Getting Started with PYNQ on ZYBO
    • FPGA Ethernet Tutorial
      • FPGA Tutorial Ethernet 1: Simple TCP Server
    • PYNQ FPGA Tutorial 101
      • Part 0: Introduction
      • Part 1: GPIO
      • Part 2: Custom IP
      • Part 3: Memory
      • Part 4: ANN Processor
    • PYNQ FPGA Tutorial 102
      • Part 0: Introduction
      • Part 1: Memory Mapped
      • Part 2: Direct Memory Access
      • Part 3: AXI-Lite Multiplier
      • Part 4: AXI-Stream Multiplier with DMA
      • Part 5: AXI-Lite GCD
      • Part 6: AXI-Stream GCD with DMA
      • Part 7: Access to DDR from PL
    • ZYNQ FPGA Tutorial
      • Part 1: Gate-Level Combinational Circuit
      • Part 2: RT-Level Combinational Circuit
      • Part 3: Regular Sequential Circuit
      • Part 4: FSM Sequential Circuit
      • Part 5: ZYNQ Architecture
      • Part 6: ARM CPU and FPGA Module
      • Part 7: FPGA Memory
      • Part 8: Hardware Accelerator for Neural Networks
    • ZYNQ FPGA Linux Kernel Module
      • Cross Compiling Kernel, Kernel Module, and User Program for PYNQ
      • Configure PL to PS Interrupt in Kernel Module
      • Configure AXI DMA in Kernel Module
  • 📟Proyek Arduino
    • Kumpulan Proyek
      • Rangkaian LED
      • LED Berkedip Nyala Api
      • LED Chaser
      • LED Binary Counter
      • OLED 128x4 Bitcoin Ticker
      • Rangkaian Button
      • Button Multifungsi
      • Button Interrupt
      • Button Debouncing
    • Pelatihan Mikrokontroler Arduino ESP32
      • Bab 1 Pengenalan Mikrokontroler
      • Bab 2 Pengenalan Arduino
      • Bab 3 Pengenalan Bahasa C
      • Bab 4 Digital Output
      • Bab 5 Digital Input
      • Bab 6 Serial Communication
      • Bab 7 Analog-to-Digital Conversion
      • Bab 8 Interrupt
      • Bab 9 Timer
      • Bab 10 Pulse-Width Modulation
      • Bab 11 SPI Communication
      • Bab 12 I2C Communication
  • 💰Finance
    • Coding for Finance
      • Build a Bitcoin Price Alert with Google Cloud and Telegram
      • Build a Bitcoin Ticker with ESP32 and Arduino
      • Stock Price Forecasting with LSTM
    • Trading dan Investasi
      • Istilah Ekonomi, Keuangan, Bisnis, Trading, dan Investasi
      • Jalan Menuju Financial Abundance
      • Memahami Korelasi Emas, Oil, Dollar, BTC, Bonds, dan Saham
      • Mindset Trading dan Investasi
      • Rangkuman Buku: Rahasia Analisis Fundamental Saham
      • Rangkuman Buku: The Psychology of Money
      • Rangkuman Kuliah: Introduction to Adaptive Markets
      • Rumus Menjadi Orang Kaya
  • 📝Life
    • Life Quotes
Powered by GitBook
On this page
  • Objective
  • Source Code
  • 1. Hardware Design
  • 1.1. Stream Interface
  • 1.2. RTL Design of Multiplier
  • 1.3. AXI-Stream Wrapper
  • 1.4. System Design
  • 2. Software Design
  • 3. Full Step-by-Step Tutorial
  • 4. Conclusion
  1. FPGA Tutorials
  2. PYNQ FPGA Tutorial 102

Part 4: AXI-Stream Multiplier with DMA

PreviousPart 3: AXI-Lite MultiplierNextPart 5: AXI-Lite GCD

Last updated 2 months ago

Objective

This tutorial contains information on how to create a simple AXI-Stream IP core in Verilog. The IP core does a simple multiplication operation. We use AXI DMA to write to and read from the AXI-Stream multiplier. Then, we are going to compare the performance result of the AXI DMA multiplier in this tutorial with the AXI-Lite multiplier in the previous tutorial.

Source Code

This repository contains all of the code required in order to follow this tutorial.


1. Hardware Design

1.1. Stream Interface

Unlike the memory map interface, the stream interface does not have an address. This interface is mainly used for point-to-point data transfer between IP modules in the FPGA. In the AXI-Stream interface, the sender is known as a master and the receiver a slave. The data moves only in one direction, from master port to slave port.

1.2. RTL Design of Multiplier

This is the same RTL module that did simple multiplication in the previous tutorial. It has one 32-bit input a and one 32-bit output r. Every input will be multiplied by 8 to produce the output.

mult_core.v
module mult_core
    (
        input wire [31:0]  a,
        output wire [31:0] r    
    );
    
    assign r = a * 8;
    
endmodule

1.3. AXI-Stream Wrapper

Now, we are going to make a wrapper for the AXI-Stream interface. Later, this stream interface will be connected to the AXI DMA. So, the multiplier core can get access to the PS DRAM via the AXI DMA.

This is the Verilog code for the wrapper module.

axis_mult.v
module axis_mult
    (
        // ### Clock and reset signals #########################################
        input  wire        aclk,
        input  wire        aresetn,
        // ### AXI4-stream slave signals #######################################
        output wire        s_axis_tready,
        input wire [31:0]  s_axis_tdata,
        input wire         s_axis_tvalid,
        input wire         s_axis_tlast,
        // ### AXI4-stream master signals ######################################
        input wire         m_axis_tready,
        output wire [31:0] m_axis_tdata,
        output wire        m_axis_tvalid,
        output wire        m_axis_tlast
    );
    
    assign s_axis_tready = m_axis_tready;
    assign m_axis_tvalid = s_axis_tvalid;
    assign m_axis_tlast = s_axis_tlast;
    
    mult_core mult_core_0
    (
        .a(s_axis_tdata),
        .r(m_axis_tdata)    
    );
   
endmodule

The AXI-Stream slave port is the input port for the data to the multiplier core, and the AXI-Stream master port is the output port for the data from the multiplier core. The s_axis_* indicates the slave port, and the m_axis_* indicates the master port. Every AXI-Stream port usually has the following signals:

  • The tready signal indicates that a receiver can accept a transfer.

  • The tdata is the primary signal used to provide the data that is passing across the interface.

  • The tvalid signal indicates the sender is driving a valid transfer. A transfer takes place when both tvalid and tready are one.

  • The tlast signal indicates the boundary of a packet.

Since our multiplier is a simple combinational circuit, we can just connect the control signal (tready, tvalid, tlast) from the slave to the master port.

1.4. System Design

This diagram shows our system. It consists of an ARM CPU, DRAM, AXI DMA, and our AXI-Stream multiplier module. Our AXI-Stream multiplier module is connected to the AXI DMA. Between the multiplier module and AXI DMA, we also add AXI-Stream FIFO IP.

The following figure shows the Zynq IP high-performance port configuration. There are two pots enabled, which are AXI HP0 and AXI HP2.

The following figure shows the AXI DMA IP configuration. The read and write channel data width is 32-bit.

This is the final block design diagram as shown in Vivado.

2. Software Design

First, we need to create DMA, DMA send channel, and DMA receive channel objects.

# Access to AXI DMA
dma = overlay.axi_dma_0
dma_send = overlay.axi_dma_0.sendchannel
dma_recv = overlay.axi_dma_0.recvchannel

Then, we need to allocate the buffer. We use allocate() function to allocate the buffer, and NumPy will be used to specify the type of the buffer, which is unsigned int 32-bit in this case.

# Allocate physical memory for AXI DMA
data_size = 1
input_buffer = allocate(shape=(data_size,), dtype=np.uint32)
output_buffer = allocate(shape=(data_size,), dtype=np.uint32)

Next, write the input to be multiplied to the input_buffer.

# Write input to be multiplied
input_buffer[0] = 10

We do the MM2S and S2MM DMA transfers. The MM2S DMA reads the input_buffer and then sends it to the multiplier. The S2MM DMA reads the multiplier output and then sends it to the output_buffer.

# Do AXI DMA MM2S and S2MM transfer
dma_send.transfer(input_buffer)
dma_recv.transfer(output_buffer)

We print the multiplication result from the output_buffer.

# Print multiplication result
print(output_buffer[0])
80

We can create a function to do a multiplication like this:

# Function to calculate multiplication
def calc_mult_axi_dma(a, r):
    dma_send.transfer(a)
    dma_recv.transfer(r)

We can calculate the time required to do 1 million multiplications. Then, we can compare this with the AXI-Lite multiplier module.

data_size = 1000000
input_buffer = allocate(shape=(data_size,), dtype=np.uint32)
output_buffer = allocate(shape=(data_size,), dtype=np.uint32)

# Measure the time required to calculate 1 million multiplication
t1 = time()
for i in range(data_size):
    input_buffer[i] = 3578129
calc_mult_axi_dma(input_buffer, output_buffer)
t2 = time()
t_diff = t2 - t1
print('Time used for AXI DMA multiplier: {}s'.format(t_diff))
Time used for AXI DMA multiplier: 1.1429970264434814s

The result from AXI-Lite multiplier in the previous tutorial:

Time used for AXI lite multiplier: 15.767791271209717s

Compared to the AXI-Lite multiplier, the AXI-Stream multiplier (with AXI DMA) speeds up the calculation by 13.79.

Don’t forget to free the memory buffers to avoid memory leaks!

# Delete buffer to prevent memory leak
del input_buffer, output_buffer

3. Full Step-by-Step Tutorial

This video contains detailed steps for making this project.

4. Conclusion

In this tutorial, we covered some of the basics of AXI-Stream based IP core creation integrated with AXI DMA.

📻
GitHub - weenslab/pynq102GitHub
Logo