I am going to close the article series about Infiniband technology on RHEL platform (check the previous posts 1, 2, 3) with posts intended to the IB troubleshooting. I would like to introduce a basic diagnostic steps of IB environment which may help you to uncover errors and misconfiguration.
The most of troubles you may meet with are traceable via OFED diagnostics tools. They are part of openib-diags package until OFED 1.2. Since version 1.3, it is replaced with infiniband-diags package. Let's take a look at the most useful ones:
At first, I would like to explain the usage of last two tools - ibping and sysfs. They are simple enough and known from other fields. The IB ping works in client-server fashion. That means you need to run ibping in server mode at one side and another side will act as a client. The server is ponging to the client's pings.
ibwarn: [6795] ibping_serv: starting to serve...
ibwarn: [6795] ibping_serv: Pong: node2.(none)
The pongs have to be visible at the client side:
ibwarn: [17946] ibping: Ping..
Pong from node2.(none) (Lid 4): time 0.235 ms
If you aren't able to see them you should check the connectivity status of your IB HCA. One method to do it is via sysfs. Each IB HCA is represented with a subdirectory under the /sys/class/infiniband directory where you can find a lof of useful stuff. For example, if you have dual ported HCA from Mellanox then there should be the following entries for port states:
The most of troubles you may meet with are traceable via OFED diagnostics tools. They are part of openib-diags package until OFED 1.2. Since version 1.3, it is replaced with infiniband-diags package. Let's take a look at the most useful ones:
- ibstat - shows IB device status like firmware version, ports state, their rate, GUIDs, LIDs ...
- ibnetdiscover - discovers IB network topology
- ibroute - queries for IB switch forwarding table (like routing table)
- ibnodes - shows IB nodes in topology
- ibchecknet - runs IB network validation
- ibping - ping IB address
- sysfs - Linux virtual filesystem representing kernel structures, for IB is there directory /sys/class/infiniband
At first, I would like to explain the usage of last two tools - ibping and sysfs. They are simple enough and known from other fields. The IB ping works in client-server fashion. That means you need to run ibping in server mode at one side and another side will act as a client. The server is ponging to the client's pings.
- Server mode - ibping -S -v
- Client mode - ibping -v SERVER_LID_ADDR
ibwarn: [6795] ibping_serv: starting to serve...
ibwarn: [6795] ibping_serv: Pong: node2.(none)
The pongs have to be visible at the client side:
ibwarn: [17946] ibping: Ping..
Pong from node2.(none) (Lid 4): time 0.235 ms
If you aren't able to see them you should check the connectivity status of your IB HCA. One method to do it is via sysfs. Each IB HCA is represented with a subdirectory under the /sys/class/infiniband directory where you can find a lof of useful stuff. For example, if you have dual ported HCA from Mellanox then there should be the following entries for port states:
- /sys/class/infiniband/mthca0/ports/0/state
- /sys/class/infiniband/mthca0/ports/1/state
- DOWN - port is physically disconnected
- INIT - port is connected and it is initialized
- ACTIVE - port is online and it is working
No comments:
Post a Comment