Part 6: ARM CPU and FPGA Module

Objective

This tutorial contains information about the ZYNQ SoC and a simple example on how to integrate our custom RTL module to the ZYNQ PS.

Source Code

This repository contains all of the code required in order to follow this tutorial.

References

1. Computer Architecture

1.1. System Bus

In computer architecture, the system bus is an interconnection that connects the CPU with memory and I/O. The following figure provides an illustration. The system bus consists of control, data, and address lines. Data can be sent both ways from the CPU to memory or I/O, or vice versa with the CPU as the master.

The following figure is an illustration of the FPGA SoC architecture. There is an FPGA that can be connected to the CPU via the system bus.

There are various types of system buses: APB, AHB, AXI, Avalon, etc. On the Zynq SoC, the system bus used is APB, AHB, and AXI. These buses belong to the ARM Advanced Microcontroller Bus Architecture (AMBA). APB and AHB are used on internal PS only, while AXI can be used to connect to PL.

This is a detailed block diagram of the Xilinx Zynq architecture. It consists of the CPU, controller for DRAM and flash memory, input/output, FPGA, and system bus.

1.2. Memory Mapped Access

The method of CPU access to memory and I/O using addresses is called memory mapping. Each DDR memory location and I/O register has its own address.

The number of addresses is determined by the bit width of the address. If the address bit width is 32, then there are 2322^{32} or 4 GB of addresses. If the address bit width is 40, then there are 2402^{40} or 1 TB of addresses.

The following is the memory map on the Zynq-7000:

The Zynq-7000 still uses a 32-bit address width, so the maximum total address space is 4 GB.

  • Location from 0x0000_0000 for DDR memory

  • Location from 0x4000_0000 for AXI slave port 0 in PL

  • Location from 0x8000_0000 for AXI slave port 1 in PL

  • Location from 0xE000_0000 for IO peripherals such as UART, USB, Ethernet, etc.

In AXI, the components are known as master and slave. The master controls whether to read or write. The slave can only respond by reading or writing.

The master is usually the CPU, but custom modules that we create in the FPGA can also act as masters. For example, in the case of an FPGA module, it must read or write from or to DDR memory.

2. Design Example

In this design example, we are going to integrate the PE module into the ZYNQ system with memory map access. The following figure shows the block diagram of the PE module.

The following code shows the code for the PE module.

pe.v
module pe
    #( 
        parameter WIDTH = 8,
        parameter FRAC_BIT = 0
    )
    (
        input wire signed [WIDTH-1:0]  a_in,
        input wire signed [WIDTH-1:0]  y_in,
        input wire signed [WIDTH-1:0]  b,
        output wire signed [WIDTH-1:0] a_out,
        output wire signed [WIDTH-1:0] y_out
    );
    
    wire signed [WIDTH*2-1:0] y_out_i;
    
    assign a_out = a_in;
    assign y_out_i = a_in * b;
    assign y_out = y_in + y_out_i[WIDTH+FRAC_BIT-1:FRAC_BIT];

endmodule

The PE module is a simple module. The I/O of the PE module is not a standard protocol. Therefore, we have to make a top module that wraps the PE module with a standard protocol that can be integrated with the ZYNQ system. The following code shows the AXI-Stream wrapper module for the PE.

axis_pe.v
module axis_pe
    (
        input wire         aclk,
        input wire         aresetn,
        // *** AXIS slave port ***
        output wire        s_axis_tready,
        input wire [31:0]  s_axis_tdata,
        input wire         s_axis_tvalid,
        input wire         s_axis_tlast,
        // *** AXIS master port ***
        input wire         m_axis_tready,
        output wire [31:0] m_axis_tdata,
        output wire        m_axis_tvalid,
        output wire        m_axis_tlast
    );
    
    wire [7:0] y_out;
    
    // AXI-Stream control
    assign s_axis_tready = m_axis_tready;
    assign m_axis_tdata = {24'h000000, y_out};
    assign m_axis_tvalid = s_axis_tvalid;
    assign m_axis_tlast = s_axis_tlast;
    
    // PE
    pe #(8, 0) pe_0
    (
        .a_in(s_axis_tdata[7:0]),
        .y_in(s_axis_tdata[23:16]),
        .b(s_axis_tdata[15:8]),
        .a_out(),
        .y_out(y_out)
    );
    
endmodule

Now that we have our PE module that can talk with AXI-Stream protocol, the next step is to build the block design. The following figure shows the block design. We use an IP called AXI-Stream FIFO. This IP converts memory map access to the AXI-Stream interface.

This is the configuration for the AXI-Stream FIFO IP.

After the AXI-Stream FIFO is connected to the PS, it gets the address as shown in the following figure. This address will be used in the C program.

The following code shows the C code to access the AXI-Stream FIFO. In this example, we send a packet of data that consists of 8x32-bit of data.

helloworld.c
#include <stdio.h>
#include "xparameters.h"
#include "xllfifo.h"
#include "xstatus.h"

#define WORD_SIZE 		4 // Size of words in bytes
#define DATA_LEN 		8 // Number of data

int Init_XLlFifo(XLlFifo *InstancePtr, u16 DeviceId);
int TxSend(XLlFifo *InstancePtr, u32 *SourceAddr);
int RxReceive(XLlFifo *InstancePtr, u32 *DestinationAddr);

XLlFifo FifoInstance;
u32 SourceBuffer[DATA_LEN];
u32 DestinationBuffer[DATA_LEN];

int main()
{
    // Initialize AXI Stream FIFO IP
    Init_XLlFifo(&FifoInstance, XPAR_AXI_FIFO_0_DEVICE_ID);

    printf("Initialization success\n");

    printf("Input:\n");
    for (int i = 0; i <= 7; i++)
    {
    	uint8_t a = i + 1;
    	uint8_t b = 8 - i;
    	uint8_t y = i + 1;
    	SourceBuffer[i] = (y << 16) | (b << 8) | a;
    	printf(" a=%d, b=%d, y=%d\n", a, b, y);
    }

    // Send to NN core
    TxSend(&FifoInstance, SourceBuffer);

    // Read from NN core
    RxReceive(&FifoInstance, DestinationBuffer);

    // Read input
    printf("Output:\n");
    for (int i = 0; i <= 7; i++)
    	printf(" %ld\n", DestinationBuffer[i]);

    return 0;
}

int Init_XLlFifo(XLlFifo *InstancePtr, u16 DeviceId)
{
    XLlFifo_Config *Config;
    int Status;

    Config = XLlFfio_LookupConfig(DeviceId);
    if (!Config)
    {
        printf("No config found for %d\n", DeviceId);
	return XST_FAILURE;
    }

    Status = XLlFifo_CfgInitialize(InstancePtr, Config, Config->BaseAddress);
    if (Status != XST_SUCCESS)
    {
	printf("Initialization failed\n");
	return XST_FAILURE;
    }

    XLlFifo_IntClear(InstancePtr, 0xffffffff);
    Status = XLlFifo_Status(InstancePtr);
    if (Status != 0x0)
    {
	printf("Reset failed\n");
	return XST_FAILURE;
    }

    return XST_SUCCESS;
}

int TxSend(XLlFifo *InstancePtr, u32 *SourceAddr)
{
    // Writing into the FIFO transmit buffer
    for(int i = 0; i < DATA_LEN; i++)
        if (XLlFifo_iTxVacancy(InstancePtr))
	    Xil_Out32(InstancePtr->Axi4BaseAddress + XLLF_TDFD_OFFSET, *(SourceAddr+i));

    // Start transmission by writing transmission length into the TLR
    XLlFifo_iTxSetLen(InstancePtr, (DATA_LEN * WORD_SIZE));

    // Check for transmission completion
    while (!(XLlFifo_IsTxDone(InstancePtr)));

    return XST_SUCCESS;
}

int RxReceive(XLlFifo *InstancePtr, u32* DestinationAddr)
{
    static u32 ReceiveLength;
    u32 RxWord;
    int Status;

    while (XLlFifo_iRxOccupancy(InstancePtr))
    {
	// Read receive length
	ReceiveLength = XLlFifo_iRxGetLen(InstancePtr) / WORD_SIZE;
	// Reading from the FIFO receive buffer
	for (int i = 0; i < ReceiveLength; i++)
	{
	    RxWord = Xil_In32(InstancePtr->Axi4BaseAddress + XLLF_RDFD_OFFSET);
            *(DestinationAddr+i) = RxWord;
	}
    }

    // Check for receive completion
    Status = XLlFifo_IsRxDone(InstancePtr);
    if (Status != TRUE)
    {
	printf("Failing in receive complete\n");
	return XST_FAILURE;
     }

    return XST_SUCCESS;
}

The following figure shows the result of the serial terminal.

3. Conclusion

In this tutorial, we covered the ZYNQ SoC and a simple example on how to integrate our custom RTL module to the ZYNQ PS.

Last updated