Management, Diagnostics and Testing
Presented By: | Jason Gunthorpe - CTO Obsidian Research |
---|---|
Date: | OFA Monterey 2011-04-05 |
RDMA device discovery:
for I in rdma.get_devices(): print I.name;
RDMA Verbs:
with rdma.get_verbs(path.end_port) as ctx: print ctx.query_device();
IB Management:
cpi = umad.SubnAdmGet(IBA.MADClassPortInfo);
Plus more!
Python is a modern, very high level, multi-paradigm programming language:
Python is slow!
RDMA is for high performance!
Why!!?!?
Sometimes correct and simple is more important than fast..
... and good algorithms can help.
Pure Python except for rdma.ibverbs!
GPL licensed
OFA Module | Python-rdma |
---|---|
libibmad | Near 100% coverage via rdma.madtransactor and rdma.IBA |
libibumad | 100% coverage via rdma.umad |
libibverbs | 100% coverage via rdma.ibverbs (through Pyrex) |
libibnetdisc | ~80% coverage. No support for switch chassis grouping. |
librdmacm | Not covered |
libibcm | Not covered |
infiniband-diags | 45 commands re-implemented, 2 un-implemented. Review ibtool |
ibutils | Good coverage of the internal APIs but no coverage for the user tools. |
perftest | rdma_bw is implemented as an example. |
$ ibtool rdma_bw 127.0.0.1 path to peer IBPath(end_port='mlx4_0/1', DGID=GID('fe80::2:c903:9:1edd'), DLID=1, MTU=4, packet_life_time=0, SGID=GID('fe80::2:c903:9:1edd'), SLID=1, dack_resp_time=15L, dqpn=524361L, dqpsn=6645404, drdatomic=0, rate=3, sack_resp_time=15L, sqpn=524360L, sqpsn=1754047, srdatomic=0) MR peer raddr=7fd268a9c000 peer rkey=8002200 1000 iterations of 1048576 is 1048576000 bytes 3065.7 MB/sec
MT26428 using internal loopback, 2.8GHz i5-2300
Re-implementation of infiniband-diags using Python as the implementation language:
Also:
45 commands are implemented, > 90% complete
Mostly looks the same:
$ ibtool ibaddr 7 GID fe80::17:77ff:feb6:2ca4 LID start 7 end 7 $ ibtool ibswitches Switch : 0017:77ff:feb6:2ca4 ports 2 "Obsidian Longbow X100 - LBXR43D1FF" base port 0 lid 7 lmc 0 Switch : 0017:77ff:fef9:6e79 ports 2 "Obsidian Longbow X100 - LBXREAF28B" base port 0 lid 9 lmc 0 $ ibtool smpquery -P 2 NI -D 0,2 # Node info: DR Path (0, 2) BaseVers:........................1 ClassVers:.......................1 NodeType:........................2 NumPorts:........................2 SystemGuid:......................0017:77ff:fef9:6e79 Guid:............................0017:77ff:fef9:6e79 PortGuid:........................0017:77ff:fef9:6e79 PartCap:.........................1 DevId:...........................0x0009 Revision:........................0x00010001 LocalPort:.......................1 VendorId:........................0x001777
Some are new:
$ ibtool perfquery --vl-xmit-wait 9 # Port counters: Lid 9 (fe80::17:77ff:fef9:6e79) port 1 PortSelect:......................1 CounterSelect:...................0x0000 PortVLXmitWait[0]:...............606
$ ibtool subnet_diff ref Current subnet has 4 end ports, reference subnet has 4 end ports All end ports in the current subnet are in the reference subnet. All end ports in the reference subnet are in the current subnet. Current subnet has 3 nodes, reference subnet has 3 nodes All nodes in the current subnet are in the reference subnet. All nodes in the reference subnet are in the current subnet. Current subnet has 3 links, reference subnet has 3 links All links in the current subnet are in the reference subnet. All links in the reference subnet are in the current subnet. All links in the current subnet have the same rate in the reference subnet. Current subnet has 4 LIDs, reference subnet has 4 LIDs All LIDs in the current subnet are the same as the reference subnet.
Section 8 of the Python RDMA manual details the various differences between ibtool and infiniband-diags:
Format | Example |
---|---|
device | mlx4_0 |
Node GUID | 0002:c903:0000:1491 |
Format | Example |
---|---|
device | mlx4_0 (defaults to the first port) |
device/port | mlx4_0/1 |
Port GID | fe80::2:c903:0:1491 |
Port GUID | 0002:c903:0000:1491 |
Library features flow into ibtool:
$ ibtool ibaddr -P fe80::2:c903:0:14a6 9 -d D: Using end port mlx4_0/2 fe80::2:c903:0:14a6 D: SMP Path 10 -> 9 SL=0 PKey=0xffff DQPN=0 IBPath(end_port='mlx4_0/2', DLID=10, SLID=10, dqpn=0, qkey=0x0, sqpn=0) D: RPC MAD_METHOD_GET(1) SMPFormat(1.1) SMPNodeInfo(17) completed to 'Path 10 -> 9 SL=0 PKey=0xffff DQPN=0' len 256. D: RPC MAD_METHOD_GET(1) SMPFormat(1.1) SMPPortInfo(21) completed to 'Path 10 -> 9 SL=0 PKey=0xffff DQPN=0' len 256. GID fe80::17:77ff:fef9:6e79 LID start 9 end 9
Structures and constants from the IBA:
Everything can be decoded and dumped:
$ ibtool ibaddr 9 -dd D: Reply MAD_METHOD_GET_RESP(129) SMPFormat(1.1) SMPNodeInfo(17) 0 01010181 baseVersion=1,mgmtClass=1,classVersion=1,method=129 4 00000000 status=0,classSpecific=0 8 000079FF transactionID=134139628569652 12 D0E94434 + data SMPNodeInfo 64 01010202 baseVersion=1,classVersion=1,nodeType=2,numPorts=2 68 001777FF systemImageGUID=GUID('0017:77ff:fef9:6e79') 72 FEF96E79 76 001777FF nodeGUID=GUID('0017:77ff:fef9:6e79') 80 FEF96E79 84 001777FF portGUID=GUID('0017:77ff:fef9:6e79') 88 FEF96E79 92 00010009 partitionCap=1,deviceID=9 96 00010001 revision=65537 100 02001777 localPortNum=2,vendorID=6007
Dynamic language with introspection makes this dead easy:
$ ibtool query SubnAdmGetTable SANodeRecord \ -f nodeInfo.systemImageGUID=0017:77ff:fef9:6e79 Reply structure #0 LID..............................9 nodeInfo.NumPorts................2 nodeInfo.SystemImageGUID.........0017:77ff:fef9:6e79 nodeInfo.PortGUID................0017:77ff:fef9:6e79 nodeInfo.VendorID................0x001777 nodeDescription.NodeString.......'Obsidian Longbow X100 - LBXREAF28B'
45 LOC! - perform any RPC, with any arguments and pretty print the result. Widely used in implementing ibtool.
$ ibtool ibaddr 10 --sa -d D: RPC MAD_METHOD_GET(1) SAFormat(3.2) SANodeRecord(17) completed to 'Path 8 -> 8 SL=0 PKey=0xffff DQPN=1' len 256. D: RPC MAD_METHOD_GET(1) SAFormat(3.2) SAPortInfoRecord(18) completed to 'Path 8 -> 8 SL=0 PKey=0xffff DQPN=1' len 256. GID fe80::2:c903:0:14a6 LID start 10 end 10
$ ibtool ibnetdiscover --sa -d D: Performing discovery using mode 'SA' D: RPC MAD_METHOD_GET_TABLE(18) SAFormat(3.2) SANodeRecord(17) completed to 'Path 8 -> 8 SL=0 PKey=0xffff DQPN=1' len 504. D: RPC MAD_METHOD_GET_TABLE(18) SAFormat(3.2) SAPortInfoRecord(18) completed to 'Path 8 -> 8 SL=0 PKey=0xffff DQPN=1' len 568. D: RPC MAD_METHOD_GET_TABLE(18) SAFormat(3.2) SALinkRecord(32) completed to 'Path 8 -> 8 SL=0 PKey=0xffff DQPN=1' len 104.
Python Co-Routines - one thread, multiple execution contexts:
def get_pinf(sched,path,idx): pinf = yield sched.SubnGet(IBA.SMPPortInfo, path,idx); sched.mqueue(get_pinf(sched,path,idx) for I in range(1,ninf.numPorts+1));
Run numPorts copies of get_pinf in parallel. Automatically limits outstanding RPCs, tracks completion, manages timeouts, etc.
Fetch, store and manipulate an IB subnet:
All ibtool discovery using functions support common options and caching:
$ ibtool ibnetdiscover --cache disc \ --refresh-cache $ ibtool ibcheckerrors --cache disc ## Summary: 4 nodes checked, 0 bad nodes found ## 8 ports checked, 0 ports with bad state found ## 4 ports checked, 0 ports have errors beyond threshold
No MADs will be issued by ibcheckerrors
Easy to use wrappers around verbs:
with get_verbs(path.end_port) as ctx: cq = ctx.cq(100,ctx.comp_channel()); pd = ctx.pd(); qp = pd.qp(ibv.IBV_QPT_UD,100,100,cq);
Simplifications for WC processing:
poller = CQPoller(cq); for wc in poller.iterwc(timeout=1): if wc.status != ibv.IBV_WC_SUCCESS: raise ibv.WCError(wc,cq,obj=qp);
Tight integration with IBPath concept:
path = get_mad_path(umad,"10"); qp.establish(path); qp.post_send(ibv.send_wr( opcode=ibv.IBV_WR_SEND,ah=pd.ah(path), remote_qpn=path.dqpn,remote_qkey=path.qkey));
(__) (oo) /------\/ / | || * /\---/\ ~~ ~~