This tutorial contains information on how to create a simple AXI-Lite IP core in Verilog. The IP core does a simple multiplication operation. Then, we are going to compare the performance result of AXI-Lite multiplier in this tutorial with AXI-Stream multiplier (with AXI DMA) in the next tutorial.
1. Hardware Design
1.1. RTL Design of Multiplier
This RTL module does a simple multiplication. It has one 32-bit input a and one 32-bit output r. Every input will be multiplied by 8 to produce the output.
mult_core.v
modulemult_core (inputwire [31:0] a,outputwire [31:0] r );assign r = a *8;endmodule
1.2. AXI-Lite Wrapper
This figure shows the Xilinx Zynq Ultrascale+ MPSoC block diagram. The RTL design of the multiplier module will be implemented inside the programmable logic. Now, the question is: how does the multiplier module communicate with the ARM processor?
Well, to be able to communicate with the ARM processor, we have to connect the multiplier module to the AMBA interconnect via the General-Purpose AXI Ports, which are based on the AXI4 protocol.
To be able to connect our multiplier module to one of these ports, we have to create a kind of wrapper module that translates the AXI4 protocol to our multiplier's I/O. This RTL wrapper module does the translation of the AXI-Lite protocol to our multiplier I/O.
The AXI-Lite bus that connects the ARM processor and the wrapper module can be modeled as a master-slave connection. The ARM processor is master, and the wrapper module is slave. The AXI-Lite bus is a collection of I/O signals that can be categorized into five types:
Write address channel
Write data channel
Write data response channel
Read address channel
Read data channel
The sub-module AXI Write is a state machine that translates the AXI-Lite protocol when the ARM processor wants to write data to the addressable register a. The sub-module AXI Read is a state machine that translates the AXI-Lite protocol when the ARM processor wants to read data from the addressable register r.
Every register in the wrapper module has an address. For Zynq Ultrascale+, this address is incremented by 8, while for Zynq-7000 this address is incremented by 4.
This part of the code is taken from lines 165–176. This is the Verilog implementation of register a. The register a_reg will be updated when the handshake signal w_hs is 1 and the address waddr is 0.
This part of the code is taken from lines 150–163. This is the Verilog implementation of the AXI read register. The register's address raddr is checked by using a case block, and then the register rdata is loaded with appropriate data, either with input a or result r.
This diagram shows our system. It consists of an ARM CPU, DRAM, and our AXI Lite multiplier module. Our AXI-Lite module is connected to the ARM CPU via the AXI interconnect.
This is the final block design diagram as shown in Vivado.
We can change the memory-mapped base address of this AXI-Lite multiplier in the Address Editor:
2. Software Design
Our AXI-Lite multiplier module is connected to the ARM CPU. The CPU can access the module using memory mapping. This is done using the MMIO object from the PYNQ library.
# Access to memory map of the AXI multiplierADDR_BASE =0xA0000000ADDR_RANGE =0x80mult_obj =MMIO(ADDR_BASE, ADDR_RANGE)
To write data to input a of multiplier module, we can use write() method from the MMIO object. To read data from output r of multiplier module, we can use read() method from the MMIO object.
# Write input and read multiplication resultmult_obj.write(0x0, 10)mult_obj.read(0x8)
80
We can create a function to do a multiplication like this:
# Function to calculate multiplicationdefcalc_mult_axi_lite(a): mult_obj.write(0x0, a) r = mult_obj.read(0x8)return r
We can calculate the time required to do 1 million multiplications. Later, we can compare this with the AXI-Stream multiplier (with AXI DMA) module.
# Measure the time required to calculate 1 million multiplicationt1 =time()for i inrange(1000000):calc_mult_axi_lite(3578129)t2 =time()t_diff = t2 - t1print('Time used for AXI lite multiplier: {}s'.format(t_diff))
Time used for AXI lite multiplier: 15.767791271209717s
3. Full Step-by-Step Tutorial
This video contains detailed steps for making this project.
4. Conclusion
In this tutorial, we covered some of the basics of AXI-Lite based IP core creation.