【分享】Xilinx QDMA软件简明教程
Xilinx QDMA软件简明教程
目录- Xilinx QDMA软件简明教程
- 1. 概述
- 2. 资源
- 2.1. Linux内核驱动
- 2.2. 文档
- 2.2.1. QDMA PCIe v4.0 PG302
- 2.2.2. 驱动简要说明
- 2.2.3. Github.io document
- 2. 测试流程
- 2.1. 总体测试流程
- 2.2. 测试工具dma-ctl
- 2.3. 测试工具dma-to-device
- 2.4. 测试工具dma-from-device
- 2.5. 测试脚本
1. 概述
所有工具和参考设计使用2021.2。编译和测试X86主机(Host)的操作系统是CentOS 7.9.2009。测试的单板是VCK190,测试的是CPM QDMA。
记录和脚本里的井号,或者第一行开始处的井号,由于和Markdown语法有冲突,替换成了星号。有些软件打印的记录非常长,于是把其中部分内容替换成了“......”。
同事马鸿伟(hongweim@xilinx.com)在测试过程中提供了很大的帮助,非常感谢。
作者: 付汉杰 hankf@xilinx.com hankf@amd.com
2. 资源
2.1. Linux内核驱动
X86 PCIe Host侧的Linux内核驱动:
https://github.com/Xilinx/dma_ip_drivers/QDMA/linux-kernel
简单来说,使用命令“make”可以编译,使用命令“sudo make install”可以安装驱动。测试中,使用了2020.1分支。
2.2. 文档
2.2.1. QDMA PCIe v4.0 PG302
QDMA Subsystem for PCI Express v4.0 Product Guide PG302 (v4.0) January 5, 2022
2.2.2. 驱动简要说明
X86 PCIe Host侧的Linux内核驱动里有简要说明。
https://github.com/Xilinx/dma_ip_drivers/tree/master/QDMA/linux-kernel/docs
2.2.3. Github.io document
xilinx.github.io有更详细的文档,比较全面。
https://xilinx.github.io/dma_ip_drivers/2020.1/linux-kernel/html/index.html
2. 测试流程
2.1. 总体测试流程
如果只使用PCIe 物理功能(PF),不使用虚拟功能(VF),QDMA的总体测试流程如下。
- 查找PCIe设备
- 确保Linux加载QDMA驱动
- 创建QDMA队列
- 启动QDMA队列
- 启动QDMA传输
- 停止QDMA队列
- 删除QDMA队列
操作队列时,需要提供队列的从0开始编号的序号(index)。另外,C2H和H2C分开计数序号(index),也就是C2H和H2C可以使用相同的数序号(index)。
2.2. 测试工具dma-ctl
dma-ctl是测试的配置工具,可以列出qdma设备、队列;也可以创建、删除、启动、停止队列。下面是它的帮助信息。
Usage: dma-ctl [dev|qdma[vf]] [operation]
dev [operation]: system wide FPGA operations
list list all qdma functions
qdma[N] [operation]: per QDMA FPGA operations
cap.... lists the Hardware and Software version and capabilities
stat statistics of qdma[N] device
stat clear clear all statistics data of qdma[N} device
global_csr dump the Global CSR of qdma[N} device
q list list all queues
q add idx [mode ] [dir ] - add a queue
*mode default to mm
*dir default to h2c
q add list [mode ] [dir ] - add multiple queues at once
q start idx [dir ] [idx_ringsz <0:15>] [idx_bufsz <0:15>] [idx_tmr <0:15>]
[idx_cntr <0:15>] [trigmode ] [cmptsz <0|1|2|3>] [sw_desc_sz <3>]
[mm_chn <0|1>] [desc_bypass_en] [pfetch_en] [pfetch_bypass_en] [dis_cmpl_status]
[dis_cmpl_status_acc] [dis_cmpl_status_pend_chk] [c2h_udd_en]
[cmpl_ovf_dis] [fetch_credit ] [dis_cmpl_status] [c2h_cmpl_intr_en] - start a single queue
q start list [dir ] [idx_bufsz <0:15>] [idx_tmr <0:15>]
[idx_cntr <0:15>] [trigmode ] [cmptsz <0|1|2|3>] [sw_desc_sz <3>]
[mm_chn <0|1>] [desc_bypass_en] [pfetch_en] [pfetch_bypass_en] [dis_cmpl_status]
[dis_cmpl_status_acc] [dis_cmpl_status_pend_chk] [cmpl_ovf_dis]
[fetch_credit ] [dis_cmpl_status] [c2h_cmpl_intr_en] - start multiple queues at once
q stop idx dir [] - stop a single queue
q stop list dir [] - stop list of queues at once
q del idx dir [] - delete a queue
q del list dir [] - delete list of queues at once
q dump idx dir [] dump queue param
q dump list dir [] - dump queue param
q dump idx dir [] desc - dump desc ring entry x ~ y
q dump list dir [] desc - dump desc ring entry x ~ y
q dump idx dir [] cmpt - dump cmpt ring entry x ~ y
q dump list dir [] cmpt - dump cmpt ring entry x ~ y
q cmpt_read idx - read the completion data
reg dump [dmap ] - register dump. Only dump dmap registers if dmap is specified.
specify dmap range to dump: Q=queue, N=num of queues
reg read [bar ] - read a register
reg write [bar ] - write a register
intring dump vector - interrupt ring dump for vector number
for intrrupt entries : ---
下面是列出qdma设备的例子。
[root@localhost pcie]# dma-ctl dev list
qdma01000 0000:01:00.0 max QP: 32, 0~31
下面是qdma队列操作的例子。
[root@localhost pcie]# ls /dev/qdma*
ls: cannot access /dev/qdma*: No such file or directory
[root@localhost pcie]# dma-ctl qdma01000 q list
Zero Qs
Test Xilinx PCIe QDMA h2c channel.
[root@localhost pcie]# dma-ctl qdma01000 q add idx 0 mode mm dir h2c
qdma01000-MM-0 H2C added.
Added 1 Queues.
[root@localhost pcie]# ls /dev/qdma*
/dev/qdma01000-MM-0
[root@localhost pcie]# dma-ctl qdma01000 q start idx 0 dir h2c
dma-ctl: Info: Default ring size set to 2048
1 Queues started, idx 0 ~ 0.
[root@localhost pcie]# dma-ctl qdma01000 q list
H2C Q: 1, C2H Q: 0, CMPT Q 0.
qdma01000-MM-0 H2C online
hw_ID 0, thp ?, desc 0xffff98fe5f750000/0x1f750000, 1536
[root@localhost pcie]# dma-ctl qdma01000 q stop idx 0 dir h2c
Stopped Queues 0 -> 0.
[root@localhost pcie]# dma-ctl qdma01000 q del idx 0 dir h2c
Deleted Queues 0 -> 0.
[root@localhost pcie]# dma-ctl qdma01000 q list
Zero Qs
[root@localhost pcie]# dma-ctl qdma01000 stat
qdma01000:statistics
Total MM H2C packets processed = 34000
Total MM C2H packets processed = 17
Total ST H2C packets processed = 0
Total ST C2H packets processed = 0
Min Ping Pong Latency = 0
Max Ping Pong Latency = 0
Avg Ping Pong Latency = 0
[root@localhost pcie]# ls /dev/qdma*
ls: cannot access /dev/qdma*: No such file or directory
注意,只有在有qdma队列的时候,在系统目录“/dev/”下,才有qdma设备,执行命令“ls /dev/qdma*”才有设备节点。
重复添加同样序号的队列,会报告错误“Queue compatibility check failed against existing queues”。应该删除同样序号的队列后,才添加同样序号的队列。
[root@localhost pcie]# dma-ctl qdma01000 q add idx 0 mode mm dir h2c
qdma01000-MM-0 H2C added.
Added 1 Queues.
[root@localhost pcie]# dma-ctl qdma01000 q add idx 0 mode mm dir h2c
Queue compatibility check failed against existing queues
重复启动同样序号的队列,会报告错误“Error. Required Q state=cfg'ed, Current Q state=online”。应该停止同样序号的队列后,才启动同样序号的队列。
[root@localhost pcie]# dma-ctl qdma01000 q start idx 0 dir h2c
dma-ctl: Info: Default ring size set to 2048
1 Queues started, idx 0 ~ 0.
[root@localhost pcie]# dma-ctl qdma01000 q start idx 0 dir h2c
dma-ctl: Info: Default ring size set to 2048
Error. Required Q state=cfg'ed, Current Q state=online
2.3. 测试工具dma-to-device
dma-to-device可以用于发起X86 Host到PCIe Card数据传输测试的工具,下面是它的帮助信息。
[hankf@localhost pcie]$ dma-to-device --help
dma-to-device
usage: dma-to-device [OPTIONS]
Write via SGDMA, optionally read input from a file.
-d (--device) device (defaults to /dev/qdma01000-MM-0)
-a (--address) the start address on the AXI bus
-s (--size) size of a single transfer in bytes, default 32,
-o (--offset) page offset of transfer
-c (--count) number of transfers, default 1
-f (--data infile) filename to read the data from.
-w (--data outfile) filename to write the data of the transfers
-h (--help) print usage help and exit
-v (--verbose) verbose output
下面是在启动队列后,运行命令“dma-to-device”的信息。
[root@localhost pcie]# dma-to-device -d /dev/qdma01000-MM-0 -s 65536 -c 1000
** Average BW = 65536, 3981.411377
如果设备没有启动,虽然也有设备节点,dma-to-device会执行失败,输出信息如下:
[root@localhost pcie]# ls /dev/qdma*
/dev/qdma01000-MM-0
[root@localhost pcie]# dma-to-device -d /dev/qdma01000-MM-0 -s 4096 -c 1000
/dev/qdma01000-MM-0, W off 0x0, 0x1000 failed -1.
write file: Invalid argument
2.4. 测试工具dma-from-device
dma-from-device可以用于发起X86 Host到PCIe Card数据传输测试的工具,下面是它的帮助信息。
[root@localhost qdma]# dma-from-device --help
dma-from-device
usage: dma-from-device [OPTIONS]
Read via SGDMA, optionally save output to a file
-d (--device) device (defaults to /dev/qdma01000-MM-0)
-a (--address) the start address on the AXI bus
-s (--size) size of a single transfer in bytes, default 32.
-o (--offset) page offset of transfer
-c (--count) number of transfers, default is 1.
-f (--file) file to write the data of the transfers
-h (--help) print usage help and exit
-v (--verbose) verbose output
下面是在启动队列后,运行命令“dma-from-device”的信息。
[root@localhost pcie]# ls /dev/qdma*
/dev/qdma01000-MM-0
[root@localhost pcie]# dma-from-device -d /dev/qdma01000-MM-0 -s 65536
** Average BW = 65536, 1039.313599
2.5. 测试脚本
为了简化后续操作,把相关命令写在了一个Linux脚本文件中。注意,这只是一个简化的脚本,没有错误处理。如果PCIe设备不存在,或者多次运行,会有错误信息。请根据实际情况处理。多次运行时,可以重复创建队列时的错误信息。
*!/bin/bash
echo -e "\nBegin to run script: $0"
echo -e "\nRun script: $0 as root user!!!"
echo -e "\nUsage: $0 pcie_bus_number pcie_device_number"
* Example: QDMA_BUS_NO=01
QDMA_BUS_NO=$1
if [ "$QDMA_BUS_NO" = "" ]; then
QDMA_BUS_NO=01
fi
echo -e "\nUse PCIe bus number: $QDMA_BUS_NO"
QDMA_DEV_NO=$1
if [ "$QDMA_DEV_NO" = "" ]; then
QDMA_DEV_NO=00
fi
echo -e "\nUse PCIe device number: $QDMA_DEV_NO"
QDMA_FUNC_NO=0
* QDMA_DEV_NAME: example: qdma01000
QDMA_DEV_NAME=qdma$QDMA_BUS_NO$QDMA_DEV_NO$QDMA_FUNC_NO
echo -e "\nUse PCIe device name: $QDMA_DEV_NAME"
echo -e "\nCheck system information"
lsb_release -a
dmidecode |grep -A16 "System Information$"
echo -e "\nCheck Xilinx PCIe board information"
lspci | grep -i xilinx
lspci -vvv -s $QDMA_BUS_NO:$QDMA_DEV_NO.0
lsmod | grep -i xilinx
lsmod | grep -i qdma
dmesg | grep qdma
echo -e "\nConfigure Xilinx PCIe QDMA."
ls -l -a /sys/bus/pci/devices/
cd /sys/bus/pci/devices/0000\:$QDMA_BUS_NO\:$QDMA_DEV_NO.0/qdma/
ls -l
cat qmax
* [root@localhost qdma]# echo 32 > qmax
* bash: echo: write error: Invalid argument
echo 32 > qmax
cat qmax
echo 2>intr_rngsz
cat intr_rngsz
dma-ctl dev list
dma-ctl $QDMA_DEV_NAME cap
dma-ctl $QDMA_DEV_NAME stat
echo -e "\nList Xilinx PCIe QDMA queues."
dma-ctl $QDMA_DEV_NAME q list
echo -e "\nTest Xilinx PCIe QDMA h2c channel."
dma-ctl $QDMA_DEV_NAME q add idx 0 mode mm dir h2c
ls /dev/qdma*
dma-ctl $QDMA_DEV_NAME q start idx 0 dir h2c
dma-to-device -d /dev/$QDMA_DEV_NAME-MM-0 -s 4096 -c 1000
dma-to-device -d /dev/$QDMA_DEV_NAME-MM-0 -s 65536 -c 1000
dma-ctl $QDMA_DEV_NAME q stop idx 0 dir h2c
dma-ctl $QDMA_DEV_NAME q del idx 0 dir h2c
dma-ctl $QDMA_DEV_NAME stat
echo -e "\nTest Xilinx PCIe QDMA c2h channel."
dma-ctl $QDMA_DEV_NAME q add idx 0 mode mm dir c2h
ls /dev/qdma*
dma-ctl $QDMA_DEV_NAME q start idx 0 dir c2h
dma-from-device -d /dev/$QDMA_DEV_NAME-MM-0 -s 4096
dma-from-device -d /dev/$QDMA_DEV_NAME-MM-0 -s 65536
dma-ctl $QDMA_DEV_NAME q stop idx 0 dir c2h
dma-ctl $QDMA_DEV_NAME q del idx 0 dir c2h
dma-ctl $QDMA_DEV_NAME stat
echo -e "\nTest done."
它执行的输出信息如下:
[root@localhost pcie]# ./qdma-test-cmd.sh
Begin to run script: ./qdma-test-cmd.sh
Run script: ./qdma-test-cmd.sh as root user!!!
Usage: ./qdma-test-cmd.sh pcie_bus_number pcie_device_number
Use PCIe bus number: 01
Use PCIe device number: 00
Use PCIe device name: qdma01000
Check system information
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core
System Information
Manufacturer: ASUS
Product Name: System Product Name
Version: System Version
Serial Number: System Serial Number
UUID: 8c1cddc3-8472-bd61-6e23-d45d64ef96f0
Wake-up Type: Power Switch
SKU Number: SKU
Family: Default string
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: ROG STRIX Z490-F GAMING
Version: Rev 1.xx
Serial Number: 200467903600324
Asset Tag: Default string
Check Xilinx PCIe board information
01:00.0 RAM memory: Xilinx Corporation Device b03f
01:00.0 RAM memory: Xilinx Corporation Device b03f
Subsystem: Xilinx Corporation Device 0007
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 0002)
[ 4.201749] qdma_pf:qdma_device_attributes_get: qdma01000-p0000:01:00.0: num_pfs:1, num_qs:2048, flr_present:1, st_en:1, mm_en:1, mm_cmpt_en:0, mailbox_en:0, mm_channel_max:2, qid2vec_ctx:1, cmpt_ovf_chk_dis:0, mailbox_intr:0, sw_desc_64b:0, cmpt_desc_64b:0, dynamic_bar:0, legacy_intr:0, cmpt_trig_count_timer:0
[ 4.201751] qdma_pf:qdma_device_open: Vivado version = vivado 2019.2
[ 4.201752] qdma_dev_entry_create: Created the dev entry successfully
[ 4.201754] qdma_pf:intr_setup: current device supports only (8) msix vectors per function. ignoring input for (32) vectors
[ 4.201768] qdma-pf 0000:01:00.0: irq 157 for MSI/MSI-X
[ 4.201772] qdma-pf 0000:01:00.0: irq 158 for MSI/MSI-X
[ 4.201776] qdma-pf 0000:01:00.0: irq 159 for MSI/MSI-X
[ 4.201779] qdma-pf 0000:01:00.0: irq 160 for MSI/MSI-X
[ 4.201782] qdma-pf 0000:01:00.0: irq 161 for MSI/MSI-X
[ 4.201784] qdma-pf 0000:01:00.0: irq 162 for MSI/MSI-X
[ 4.201787] qdma-pf 0000:01:00.0: irq 163 for MSI/MSI-X
[ 4.201791] qdma-pf 0000:01:00.0: irq 164 for MSI/MSI-X
[ 4.201875] qdma_s80_hard_init_ctxt_memory: clearing the context for all qs
[ 4.211836] qdma_pf:xdev_identify_bars: User BAR 2.
[ 4.211840] qdma_pf:qdma_device_open: 0000:01:00.0, 01000, pdev 0xffff990e6d4a2000, xdev 0xffff98ff993f7800, ch 2, q 0, vf 0.
Configure Xilinx PCIe QDMA.
total 0
drwxr-xr-x. 2 root root 0 Feb 14 22:51 .
drwxr-xr-x. 5 root root 0 Feb 14 17:49 ..
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:01.0 -> ../../../devices/pci0000:00/0000:00:01.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:02.0 -> ../../../devices/pci0000:00/0000:00:02.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:14.0 -> ../../../devices/pci0000:00/0000:00:14.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:14.2 -> ../../../devices/pci0000:00/0000:00:14.2
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:15.0 -> ../../../devices/pci0000:00/0000:00:15.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:15.1 -> ../../../devices/pci0000:00/0000:00:15.1
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:16.0 -> ../../../devices/pci0000:00/0000:00:16.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:17.0 -> ../../../devices/pci0000:00/0000:00:17.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1b.0 -> ../../../devices/pci0000:00/0000:00:1b.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1c.0 -> ../../../devices/pci0000:00/0000:00:1c.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1c.4 -> ../../../devices/pci0000:00/0000:00:1c.4
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1c.5 -> ../../../devices/pci0000:00/0000:00:1c.5
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1d.0 -> ../../../devices/pci0000:00/0000:00:1d.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1f.0 -> ../../../devices/pci0000:00/0000:00:1f.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1f.3 -> ../../../devices/pci0000:00/0000:00:1f.3
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1f.4 -> ../../../devices/pci0000:00/0000:00:1f.4
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:00:1f.5 -> ../../../devices/pci0000:00/0000:00:1f.5
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:01:00.0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:04:00.0 -> ../../../devices/pci0000:00/0000:00:1c.4/0000:04:00.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:05:00.0 -> ../../../devices/pci0000:00/0000:00:1c.5/0000:05:00.0
lrwxrwxrwx. 1 root root 0 Feb 14 17:49 0000:06:00.0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:06:00.0
total 0
-rw-r--r--. 1 root root 4096 Feb 14 22:51 buf_sz
-rw-r--r--. 1 root root 4096 Feb 14 22:51 c2h_cnt_th
-rw-r--r--. 1 root root 4096 Feb 14 22:51 c2h_timer_cnt
-rw-r--r--. 1 root root 4096 Feb 14 22:51 cmpl_status_acc
-rw-r--r--. 1 root root 4096 Feb 14 22:51 glbl_rng_sz
-rw-r--r--. 1 root root 4096 Feb 14 22:51 intr_rngsz
-rw-r--r--. 1 root root 4096 Feb 14 22:51 qmax
0
32
0
qdma01000 0000:01:00.0 max QP: 32, 0~31
=============Hardware Version============
RTL Version : RTL Base
Vivado ReleaseID : vivado 2019.2
QDMA Device Type : Versal S80 Hard IP
QDMA IP Type : Versal Hard IP
============Software Version============
qdma driver version : 2020.1.=============Hardware Capabilities============
Number of PFs supported : 1
Total number of queues supported : 2048
MM channels : 2
FLR Present : yes
ST enabled : yes
MM enabled : yes
Mailbox enabled : no
MM completion enabled : no
qdma01000:statistics
Total MM H2C packets processed = 0
Total MM C2H packets processed = 0
Total ST H2C packets processed = 0
Total ST C2H packets processed = 0
Min Ping Pong Latency = 0
Max Ping Pong Latency = 0
Avg Ping Pong Latency = 0
List Xilinx PCIe QDMA queues.
Zero Qs
Test Xilinx PCIe QDMA h2c channel.
qdma01000-MM-0 H2C added.
Added 1 Queues.
/dev/qdma01000-MM-0
dma-ctl: Info: Default ring size set to 2048
1 Queues started, idx 0 ~ 0.
** Average BW = 4096, 592.050415
** Average BW = 65536, 3776.811523
Stopped Queues 0 -> 0.
Deleted Queues 0 -> 0.
qdma01000:statistics
Total MM H2C packets processed = 17000
Total MM C2H packets processed = 0
Total ST H2C packets processed = 0
Total ST C2H packets processed = 0
Min Ping Pong Latency = 0
Max Ping Pong Latency = 0
Avg Ping Pong Latency = 0
Test Xilinx PCIe QDMA c2h channel.
qdma01000-MM-0 C2H added.
Added 1 Queues.
/dev/qdma01000-MM-0
dma-ctl: Info: Default ring size set to 2048
1 Queues started, idx 0 ~ 0.
** Average BW = 4096, 389.094696
** Average BW = 65536, 2717.306641
Stopped Queues 0 -> 0.
Deleted Queues 0 -> 0.
qdma01000:statistics
Total MM H2C packets processed = 17000
Total MM C2H packets processed = 17
Total ST H2C packets processed = 0
Total ST C2H packets processed = 0
Min Ping Pong Latency = 0
Max Ping Pong Latency = 0
Avg Ping Pong Latency = 0
Test done.