As I written in the previous post, the /etc/init.d/openibd init script is in charge of starting Infiniband (IB) network. The script parses the /etc/ofed/openibd.conf configuration file where you can specify which ULPs should be initialized. By default, all ULPs I mentioned last time - ipoib, srp, sdp - are enabled.
The opensm IB network manager is controlled with the /etc/init.d/opensmd init script which is configurable via /etc/ofed/opensm.conf configuration file. You can turn on debugging here but it is not normally needed. It is more useful to enable verbose mode which increases the log verbosity level. The default log file is /var/log/osm.log. So, if something goes wrong enable verbose mode and check the log file.
After executing the init scripts, you should check the IB network state. The openibd script is started automatically during the system startup, while the opensm has to be enabled (with ntsysv or chkconfig). Follow this checklist:
The opensm IB network manager is controlled with the /etc/init.d/opensmd init script which is configurable via /etc/ofed/opensm.conf configuration file. You can turn on debugging here but it is not normally needed. It is more useful to enable verbose mode which increases the log verbosity level. The default log file is /var/log/osm.log. So, if something goes wrong enable verbose mode and check the log file.
After executing the init scripts, you should check the IB network state. The openibd script is started automatically during the system startup, while the opensm has to be enabled (with ntsysv or chkconfig). Follow this checklist:
- Is Mellanox HCA recognized?
- check the output of lsmod | grep ib_mthca
- check the output of dmesg
- Are appropriate ULPs loaded?
- check the output of lsmod | grep ib_
- should contain ib_ipoib, ib_srp, ib_sdp
- Is IB network initialized and working?
- check the output of cat /sys/class/infiniband/mthca0/ports/X/state
- should be ACTIVE
- Is ib0 network interface available?
- check the output of ifconfig -a
- assign an IP address to the nodes
- run ifconfig ib0 IP_ADDR1 up at first node
- run ifconfig ib0 IP_ADDR2 up at second node
- check the IPoIB functionality
- run ping IP_ADDR2 from the first node
- run ping IP_ADDR1 from the second node
3 comments:
Really nice and simple article for a layman to follow. Thanks for it, I could understand why my ib0 interface was not coming up (I had to load the ipoib module)!
In my system, /sys/class/infiniband/mthca0 directory itself if not being created. Is that normal?
Great series of articles David! I have been battling with IB over the past few days and your clear, concise instructions are a godsend. Thank you!!
Excellent pair of articles on getting IB setup on RHEL, thank you for making it so easy! Turns out I've had everything installed and configured now for 18 months, just missing the startup for OpenSM to get the interfaces live.
Can't wait to read more of your blog! The tag cloud looks juicy!
Post a Comment