dsumsky lines . . .: networking

Showing posts with label networking. Show all posts

Tuesday, May 10, 2011

DNS reverse mapping

Recently, I had to cope with configuring some reverse zones for subnets where the netmask is like 26 or 20 (IPv4). It's quite straightforward to do it with class C networks when it is sufficient to reverse the order of network base of the address, join the result with special domain in-addr.arpa and create a reversed mapped zone file finally. If we had a network 192.168.1.0/24 then the reversed base of the network address would be 1.168.192 and the reversed zone 1.168.192.in-addr.arpa.

The previous technique is well known for class C networks (B or A as well) when it is possible to split it on its octet boundaries. If we have assigned a class C subnet which has less than 256 hosts we can't do it like this and we need to define the network part and the host part of the address differently. The brute force way how to do it is to create a reversed zone for each host . The better way is to read through the RFC2317.

The RFC defines a classless allocation of subnets on non-octal boundaries with less than 256 hosts. Let's take a network 192.168.1.32/28 (subnet of network 192.168.1.0/24) where the network base is 192.168.1.32, the maximum number of host is 14 and the netmask is 255.255.255.240 (28 in CIDR notation). The next step is to reverse the network base which gives us 32.1.168.192 and join the result with the domain in-addr.arpa. It gives us a semi-reversed zone 32.1.168.192.in-addr.arpa. The final steps to construct the reversed zone are not so clear. Take the first octet from the semi-reversed zone, substitute it with the netmask in CIDR notation and write it in the form first_octet/substituted_zone. In our example, we would get 32/28.1.168.192.in-addr.arpa (32 is the last octet and 28 is the netmask).

Why is the presented method useful? Even if the reverse zone creation is not so clear it helps to create only 1 zone file for 14 hosts in our example. If we had a network with netmask e.g. 25 we would be able to specify all 126 PTR records in one zone file. The method is obfuscated a bit but it eliminates a creation of reversed zone file for each host.

It's important to realize the RFC2317 is for networks with 256 hosts or less where netmask is from 24 to 32. For networks with more hosts there has to be used traditional delegation. If we have a network with netmask from 17 to 24 then we will have 1 zone file for each 256 hosts. If we have a netmask from 9 to 16 then we can have 1 zone file per 65536 hosts.

Monday, October 27, 2008

VCB basic usage - VM full backup over NFS

Let's go to practice a bit. Let's have a NFS server in the network available. And we would like to backup virtual machines (VMs) from one of our ESX hosts directly to it, without usage of any specialized backup software.

I don't have to forget to say that VCB is available in your ESX host. There is installed VMware-esx-backuptools package which contains almost all the mentioned commands before - vcbVmName, vcbMounter and vcbRestore. The vcbRestore utility is available only with ESX and it is used to restore a virtual machine from full backup. Additionally, the missing mountVm command is available with VCB for Windows only. Don't forget to keep in mind that VCB commands for ESX are case-sensitive beause of service console based on Linux.

Firstly, we need a running NFS server. The configuration is straightforward with any Linux distro. Install required packages, edit /etc/exports configuration file and paste here a directory which will be used for backup of VMs. Start NFS server and reconfigure firewall to allow access to it (or simply stop it). For details, check the related documentation. If you would like I can write some more notes about it.

So, let's have a NFS server with IP address 192.168.1.1 (from C class). The exported directory is the /backup/vm directory. Secondly, we need to permit NFS client at the ESX host. By default, outgoing connections from any ESX host are blocked. You can do it via VI client or from the service console like this:

esxcfg-firewall -e nfsClient

You can check all the available services with:

esxcfg-firewall -s

To check if the nfsClient service was enabled, run this:

esxcfg-firewall -q nfsClient

If so, you will receive:

Service nfsClient is enabled.

Finally, we need a backup script whose only task is to backup available VMs. The script can be scheduled at ESX host via cron service or from the NFS backup server. It's your choice. The script follows:

#!/bin/sh

BACKUP_SERVER="192.168.1.1"
BACKUP_DIR="/backup/vm"
MOUNT_DIR="/backup/snap"

[ -d $MOUNT_DIR ] || mkdir -p "$MOUNT_DIR" || exit 1

VM_BACKUP="`vcbVmName -s any: | grep name: | cut -d':' -f2`"

if [ ! -z "$VM_BACKUP" ]; then
    mount $BACKUP_SERVER:$BACKUP_DIR $MOUNT_DIR || exit 1

     for VM in $VM_BACKUP; do
             vcbMounter -a name:$VM -r $MOUNT_DIR/$VM
     done

     umount $MOUNT_DIR
fi

exit 0

Now, simple description of the script. At the beginning, there are defined some variables - the NFS server IP address, the exported directory and the local mount point. Then, the available VMs are listed and saved in a variable. The exported directory from the NFS server is mounted and with the vcbMounter command the VMs are backuped. Finally, the directory is unmounted. If you want to use the commands without authentication credentials, you need to define them in the file /etc/vmware/backuptools.conf. Exactly, these parameters are required:

VCHOST=127.0.0.1
USERNAME=admin_user
PASSWORD=admin_user_password

So, the task to backup virtual machines isn't so sophisticated. In the next article, I'm going to restore them with vcbRestore command.

Monday, October 20, 2008

RHEL and Infiniband - advanced diagnostics - part three

Let's decompose the ibnetdiscover output a bit. The first paragraph begins with Switch keyword. The switch has GUID 0x144f00006e9794. The channel adapter begins with Ca keyword. Their GUIDs are 0x3ba0001003de4 (node node2) and 0x3ba0001007ba8 (node node1). The second one corresponds with the node displayed by the ibstat above. You had to notice that there are many numbers in the square brackets. They identify the ends of IB physical connections. Let's inspect them in more detail:

connections from switch to IB nodes (switch -> nodes)

switch port [6] is connected to the [2] channel of IB node node2
switch port [5] is connected to the [1] channel of IB node node2
switch port [4] is connected to the [2] channel of IB node node1
switch port [3] is connected to the [1] channel of IB node node1

connections from IB node node1 to switch (node -> switch)

the [1] IB channel is connected to switch port [3]
the [2] IB channel is connected to switch port [4]

connections from IB node node2 to switch (node -> switch)

the [1] IB channel is connected to switch port [5]
the [2] IB channel is connected to switch port [6]

Do you understand the logic of it? I think it's simple. And it is evident the IB connections are full-duplex in our scenario.

I'm going to skip the ibnodes command. Its output is the same as without running subnet manager. Next command, the ibroute command, is producing the following nice forwarding table:

Unicast lids [0x0-0x5] of switch Lid 2 guid 0x00144f00006e9794 ():
Lid  Out Port   Destination Info
0x0001 003 : (Channel Adapter portguid 0x0003ba0001007ba9: 'node1 HCA-1')
0x0002 000 : (Switch portguid 0x00144f00006e9794: '')
0x0003 004 : (Channel Adapter portguid 0x0003ba0001007baa: 'node1 HCA-1')
0x0004 005 : (Channel Adapter portguid 0x0003ba0001003de5: 'node2 HCA-1')
0x0005 006 : (Channel Adapter portguid 0x0003ba0001003de6: 'node2 HCA-1')
5 valid lids dumped

It lists the assigned LIDs, corresponding switch ports and the other ends of the connections. It's classical routing table saying that a LID X is reachable via a switch port Y with an additional information about the entity owning that LID number. For example, the LID 1 is reachable via the switch port 3 and it is the channel adapter of node node1.

To make the final decision if the IB network is working run the ibchecknet command. The output might say that we have 2 working IB HCAs, 3 IB nodes (two with HCA and one switch) and 8 working IB ports (physically only four but the network is full-duplex in our scenario).

# Checking Ca: nodeguid 0x0003ba0001003de4
# Checking Ca: nodeguid 0x0003ba0001007ba8
## Summary: 3 nodes checked, 0 bad nodes found
##          8 ports checked, 0 bad ports found
##          0 ports have errors beyond threshold

From now, we have working Infiniband network and we are able to do this:

ibping the nodes natively
ping the nodes over ipovib
run unmodified network applications over ipoib (e.g. NFS, FTP and so on)
run natively RDMA application

Thursday, October 16, 2008

RHEL and Infiniband - advanced diagnostics - part two

It's almost two months ago when I began to write about advanced diagnostics of IB networks. In the end of the article, I suggested to start the IB subnet manager. So let's do it with the init script:

/etc/init.d/opensmd start

Now, we are ready to compare the outputs of the commands when the IB subnet manager wasn't running and when it is running. There should be noticeable differences because the IB network should be fully initialized since now. At first, what new shows us the ibstat command:

CA 'mthca0'
  CA type: MT25208 (MT23108 compat mode)
  Number of ports: 2
  Firmware version: 4.7.400
  Hardware version: a0
  Node GUID: 0x0003ba0001007ba8
  System image GUID: 0x0003ba0001007bab
  Port 1:
          State: Active
          Physical state: LinkUp
          Rate: 10
          Base lid: 1
          LMC: 0
          SM lid: 1
          Capability mask: 0x02510a6a
          Port GUID: 0x0003ba0001007ba9
  Port 2:
          State: Active
          Physical state: LinkUp
          Rate: 10
          Base lid: 3
          LMC: 0
          SM lid: 1
          Capability mask: 0x02510a68
          Port GUID: 0x0003ba0001007baa

The IB subnet manager is responsible for finishing IB hardware initialization phase. Both ports of HCA are in Active state and they have assigned the Base lid which is required for communication over IB network. The IB subnet manager is working because it has assigned SM lid as well. What about other nodes in the network? Let's try the ibnetdiscover command. It should say something more:

vendid=0x144f
devid=0x0
switchguid=0x144f00006e9794
Switch  9 "S-00144f00006e9794"          # "" base port 0 lid 2 lmc 0
[6]     "H-0003ba0001003de4"[2]         # "node2 HCA-1" lid 5
[5]     "H-0003ba0001003de4"[1]         # "node2 HCA-1" lid 4
[4]     "H-0003ba0001007ba8"[2]         # "node1 HCA-1" lid 3
[3]     "H-0003ba0001007ba8"[1]         # "node1 HCA-1" lid 1

vendid=0x3ba
devid=0x6278
sysimgguid=0x3ba0001003de7
caguid=0x3ba0001003de4
Ca      2 "H-0003ba0001003de4"          # "node2 HCA-1"
[2]     "S-00144f00006e9794"[6]         # lid 5 lmc 0 "" lid 2
[1]     "S-00144f00006e9794"[5]         # lid 4 lmc 0 "" lid 2

vendid=0x3ba
devid=0x6278
sysimgguid=0x3ba0001007bab
caguid=0x3ba0001007ba8
Ca      2 "H-0003ba0001007ba8"          # "node1 HCA-1"
[2]     "S-00144f00006e9794"[4]         # lid 3 lmc 0 "" lid 2
[1]     "S-00144f00006e9794"[3]         # lid 1 lmc 0 "" lid 2

Do you remember the LIDs number from the uninitialized IB network? There were same zeroes, the HCAs were uninitialized. Now, each channel has an unique LID. Next time, we are going to decompose this output.

Thursday, August 28, 2008

Technical differences between VMware ESXi and ESX

I have spent some time with looking for more details about VMware ESXi compared to VMware ESX. I summarized the main differences in this article but I think it's not complete. There have to be more features missing in ESXi because of service console removal. So, what next did I discover?

ESXi is supported on smaller set of certified hardware because it is standalone system and it doesn't depend on RHEL service console which provides drivers for other hardware.

You can manage ESXi with RCLI on Linux or Windows platform but Virtual Infrastructure client is more comfortable and easier to use. Further, if you deployed ESXi without Virtual Infrastructure licence, RCLI will have read-only access only. The drawback of VI client is that it is available for Windows platform. The solution may exist in using Wine emulator but the installation isn't as straightforward as on Windows plartform. The Wine application database contains this entry about VI client installation but I haven't tried it yet.

You can manage your ESX server directly via serial cable but ESXi is missing this feature.

ESXi kernel is missing jumbo frames support in TCP/IP stack which allows to send larger frames out onto physical network. It can help to achieve higher throughput with NFS or iSCSI protocols.

ESXi doesn't support NetQueue technology which is boosting 10G Ethernet performance.

Finally, VMware in cooperation with Mellanox Technologies supports Infiniband host channel adapters on ESX. ESXi is missing it.

The previous six points are related to the technical aspects of ESX and ESXi hypervisor. These points aren't complete as well but they are quite important for common deployment of VMware technologies. If you know about something else, please share it at my blog. For further information, check these links:

VMware ESX 3.5 release notes
VMware ESXi 3.5 release notes
ESX and ESXi comparison (VMware knowledge base)
Differences between ESXi and ESX (VMware knowledge base)

Wednesday, August 20, 2008

Quickly - how to download a file to the ESX 3.x service console?

The VMware ESX 3.x is missing wget package so you can't use wget command to download anything from the Internet as you wish. In spite of wget, the service console provides lwp-* tools which are simple perl scripts based on LWP and URI perl modules and which allow to do some basic tasks around the HTTP protocol.

The tools are part of perl-libwww-perl package. The package is installed by default. The most important tool is lwp-download which you can use for downloading files. Let's check the steps how to download something:

esxcfg-firewall --allowOutgoing

allow outgoing connections from service console

lwp-download http://dfn.dl..../apcupsd-3.14.4-1.el3.i386.rpm

download apcupsd package

esxcfg-firewall --blockOutgoing

return firewall to the initial state

Beside this, the perl-libwww-perl package contains other tools like lwp-mirror, lwp-request and lwp-rget. Check their man pages for their usage.

Wednesday, July 30, 2008

RHEL and Infiniband - advanced diagnostics - part one

I will continue from the point where I finished last time. The remaining diagnostics tools depend on sysfs interface. The provided information is extracted from this filesystem. If you don't remember the meaning of each entry under the /sys/class/infiniband directory use these tools.

The IB subnet manager is not running is one of the IB network issues. The IB nodes don't have assigned any LIDs and they aren't able to see each other. The node or his IB ports are connected but they aren't initialized yet. To find out this without sysfs use the ibstat command:


CA 'mthca0'
      CA type: MT25208 (MT23108 compat mode)
      Number of ports: 2
      Firmware version: 4.7.400
      Hardware version: a0
      Node GUID: 0x0003ba0001007ba8
      System image GUID: 0x0003ba0001007bab
      Port 1:
              State: Initializing
              Physical state: LinkUp
              Rate: 10
              Base lid: 0
              LMC: 0
              SM lid: 0
              Capability mask: 0x02510a68
              Port GUID: 0x0003ba0001007ba9
      Port 2:
              State: Initializing
              Physical state: LinkUp
              Rate: 10
              Base lid: 0
              LMC: 0
              SM lid: 0
              Capability mask: 0x02510a68
              Port GUID: 0x0003ba0001007baa

The output contains everything what we need - port state, LID, GUID, rate. The IB link is up but the ports are in the INIT state. No IB subnet manager is running. It is clear because the Sm lid parameter has zero value. It should have LID value of the node which acts like IB subnet manager. The same holds for Base lid. The zero value means that the IB network isn't initialized yet. The similar information will be provided by the ibnetdiscover command:


vendid=0x144f
devid=0x0
switchguid=0x144f00006e9794
Switch  9 "S-00144f00006e9794"       # "" base port 0 lid 2 lmc 0
[6]     "H-0003ba0001003de4"[2]      # "node2 HCA-1" lid 0
[5]     "H-0003ba0001003de4"[1]      # "node2 HCA-1" lid 0
[4]     "H-0003ba0001007ba8"[2]      # "node1 HCA-1" lid 0
[3]     "H-0003ba0001007ba8"[1]      # "node1 HCA-1" lid 0

vendid=0x3ba
devid=0x6278
sysimgguid=0x3ba0001003de7
caguid=0x3ba0001003de4
Ca      2 "H-0003ba0001003de4"       # "node2 HCA-1"
[2]     "S-00144f00006e9794"[6]      # lid 0 lmc 0 "" lid 2
[1]     "S-00144f00006e9794"[5]      # lid 0 lmc 0 "" lid 2

vendid=0x3ba
devid=0x6278
sysimgguid=0x3ba0001007bab
caguid=0x3ba0001007ba8
Ca      2 "H-0003ba0001007ba8"       # "node1 HCA-1"
[2]     "S-00144f00006e9794"[4]      # lid 0 lmc 0 "" lid 2
[1]     "S-00144f00006e9794"[3]      # lid 0 lmc 0 "" lid 2

The square brackets contain the physical port number at the switch. As we can see, the network is up and discoverable. It consists of one IB switch and two IB nodes, each with dual ported IB HCA. The switch has assigned LID 2, the nodes aren't initialized yet. To display node GUIDs only, use the ibnodes command:


Ca      : 0x0003ba0001003de4 ports 2 "node2 HCA-1"
Ca      : 0x0003ba0001007ba8 ports 2 "node1 HCA-1"
Switch  : 0x00144f00006e9794 ports 9 "" base port 0 lid 2 lmc 0

Finally, two commands remained - ibroute and ibchecknet. As the IB network is not fully initialized the nodes can't contact the switch for forwarding table. So the ibroute command isn't working otherwise it is helpful. The ibchecknet command produces address resolution errors, the IB network is not valid:


lid 2 address resolution:  FAILED
# Switch: nodeguid 0x00144f00006e9794 failed

# Checking Ca: nodeguid 0x0003ba0001003de4
lid 0 address resolution:  FAILED
# Ca: nodeguid 0x0003ba0001003de4 failed

# Checking Ca: nodeguid 0x0003ba0001007ba8
lid 0 address resolution:  FAILED
# Ca: nodeguid 0x0003ba0001007ba8 failed

## Summary: 3 nodes checked, 3 bad nodes found
##          8 ports checked, 0 bad ports found
##          0 ports have errors beyond threshold

In the beginning, I stated the IB subnet manager is not running. Let's launch it with /etc/init.d/opensmd script and we will see how the behaviour of the tools will change.

Friday, July 25, 2008

RHEL and Infiniband - basic diagnostics

I am going to close the article series about Infiniband technology on RHEL platform (check the previous posts 1, 2, 3) with posts intended to the IB troubleshooting. I would like to introduce a basic diagnostic steps of IB environment which may help you to uncover errors and misconfiguration.

The most of troubles you may meet with are traceable via OFED diagnostics tools. They are part of openib-diags package until OFED 1.2. Since version 1.3, it is replaced with infiniband-diags package. Let's take a look at the most useful ones:

ibstat - shows IB device status like firmware version, ports state, their rate, GUIDs, LIDs ...
ibnetdiscover - discovers IB network topology
ibroute - queries for IB switch forwarding table (like routing table)
ibnodes - shows IB nodes in topology
ibchecknet - runs IB network validation
ibping - ping IB address
sysfs - Linux virtual filesystem representing kernel structures, for IB is there directory /sys/class/infiniband

The IB network is similar to the other high performance network technologies like Fibre Channel. The most of troubles with IB are in common. You may need to resolve connectivity issues, firmware or higher level software revisions incompatibilities, driver bugs and similar.

At first, I would like to explain the usage of last two tools - ibping and sysfs. They are simple enough and known from other fields. The IB ping works in client-server fashion. That means you need to run ibping in server mode at one side and another side will act as a client. The server is ponging to the client's pings.

Server mode - ibping -S -v
Client mode - ibping -v SERVER_LID_ADDR

The -v argument increases verbosity level only. The right LID address can be found with ibnetdiscover command. Run it, find the server node line and use the associated LID now. I will explain it later. If the IB network is healthy ibping should produce the output at the server side like this (the server LID is 4, his hostname is node2):

ibwarn: [6795] ibping_serv: starting to serve...
ibwarn: [6795] ibping_serv: Pong: node2.(none)

The pongs have to be visible at the client side:

ibwarn: [17946] ibping: Ping..
Pong from node2.(none) (Lid 4): time 0.235 ms

If you aren't able to see them you should check the connectivity status of your IB HCA. One method to do it is via sysfs. Each IB HCA is represented with a subdirectory under the /sys/class/infiniband directory where you can find a lof of useful stuff. For example, if you have dual ported HCA from Mellanox then there should be the following entries for port states:

/sys/class/infiniband/mthca0/ports/0/state
/sys/class/infiniband/mthca0/ports/1/state

The state can have three predefined values with these meanings:

DOWN - port is physically disconnected
INIT - port is connected and it is initialized
ACTIVE - port is online and it is working

If ibping has to work the ports of both nodes have to be in ACTIVE state. If they are in INIT state then the subnet manager may be not running. The DOWN state simply means cable problem. By the way, there are other methods to achieve this with help of remaining tools. I am going to explore them next time.

Friday, July 18, 2008

RHEL and Infiniband - basic usage

As I written in the previous post, the /etc/init.d/openibd init script is in charge of starting Infiniband (IB) network. The script parses the /etc/ofed/openibd.conf configuration file where you can specify which ULPs should be initialized. By default, all ULPs I mentioned last time - ipoib, srp, sdp - are enabled.

The opensm IB network manager is controlled with the /etc/init.d/opensmd init script which is configurable via /etc/ofed/opensm.conf configuration file. You can turn on debugging here but it is not normally needed. It is more useful to enable verbose mode which increases the log verbosity level. The default log file is /var/log/osm.log. So, if something goes wrong enable verbose mode and check the log file.

After executing the init scripts, you should check the IB network state. The openibd script is started automatically during the system startup, while the opensm has to be enabled (with ntsysv or chkconfig). Follow this checklist:

Is Mellanox HCA recognized?

check the output of lsmod | grep ib_mthca
check the output of dmesg

Are appropriate ULPs loaded?

check the output of lsmod | grep ib_

should contain ib_ipoib, ib_srp, ib_sdp

Is IB network initialized and working?

check the output of cat /sys/class/infiniband/mthca0/ports/X/state

should be ACTIVE

Is ib0 network interface available?

check the output of ifconfig -a

If you passed all the checks you would be able to use IP protocol over IB network. I supposed you have two IB nodes in the IB network at least, both are configured the same way and both have passed the checks (like in the first article). To configure it follow the commands:

assign an IP address to the nodes

run ifconfig ib0 IP_ADDR1 up at first node
run ifconfig ib0 IP_ADDR2 up at second node

check the IPoIB functionality

run ping IP_ADDR2 from the first node
run ping IP_ADDR1 from the second node

So, wasn't it simple? If everything is working the ping should receive replies from the other side. Now, you can run any IP based application over IB - FTP, NFS and so on and utilize its benefits like high throughput and low latencies. Please, if you are interested in the topic leave me a comment.

Tuesday, July 1, 2008

RHEL and Infiniband - hardware intro

In my two previous articles, I summarized a few facts about the Infiniband support in RHEL distros and included protocols - you can go through them from the following links - RHEL and Infiniband support and Infiniband, RDP, SDP.... Let's be more particular now.

My scenario was based on two servers Sun Fire X4200 M2 and one Infiniband (IB) switch Sun IB Switch 9P. The servers had installed Infiniband host channel adapters (HCA) Sun Dual Port 4x IB HCA to be able to communicate over the IB fabric. The switch provides nine IB compliant ports at dual speeds of 4X/12X what means that each port is able to deliver of 10/30Gbit raw bandwidth. What surprised me was that the switch management is like at the SUN SPARC midrange servers. Yes, it is ALOM and it is perfect because you can use the same interface and similar commands you are used to. By the way, the switch chassis looks like a regular SUN server.

The switch is equipped with the IB subnet manager (SM) which is required to initialize the IB hardware and to allow the communication over the IB fabric. Each IB subnet has to have at least one and each has unambiguous identifier (ID) over the fabric. To be complete, the fabric comprises defined subnets. In my opinion, the IB SM seems to be working like ARP cache and DHCP server in LANs. Each HCA in a fabric is globally identified with so-called node GUID which is like WWN in FC or MAC in LAN. The switch has own GUID as well. The ports of HCA have so-called port GUID. Now, when one HCA or its port want to communicate with another one in the subnet we need to have assigned some network address. This address is called LID or local identifier and the IB SM is in charge of assigning it to the members of the subnet. The conclusion is the LIDs are available inside the subnet only and the GUIDs are routable over the subnets of fabric.

But one thing confused me a bit. When you configure the switch you will need to remember setting its blueprint otherwise you will ask for trouble. I'm going to write about it in the next part.

Thursday, May 15, 2008

Infiniband, RDMA, SDP ... a few facts

In the article about RHEL support of Infiniband technology I used a few keywords which should be explained in more details.

At first, what does RDMA means? The RDMA is simply an extension of DMA which allows computer devices to bypass CPU when accessing data in memory. The RDMA extends this feature so that LAN/SAN host bus adapters can access memory directly without CPU assistance. The CPU only instructs the hardware where the required data are.

What is SDP useful for? The SDP protocol allows to use network applications with Infiniband without rewriting them to understand the Infiniband technology. It is responsible for translating network socket operations to the RDMA layer. Of course, there exists a little descent of performance but it is still very near to the native bandwidth and latency characteristic.

The next scenario is you can apply IPoIB if you intend to use the Infiniband as your physical medium for TCP/IP protocol stack instead of standard Ethernet.

So, you have an network application and its standard communication scheme looks like this:

app -> socket api -> TCP/IP stack -> Ethernet hardware

If you use RDMA then it will be like this:

app -> socket api -> SDP -> RDMA -> Infiniband hardware

And the last variant based on IPoIB is this:

app -> socket api -> TCP/IP stack -> IPoIB -> Infiniband hardware

Of course, you can port the application:

app -> Infiniband api -> Infiniband hardware

And why we should know about these things? Why should we use it? There is a few important reasons why:

it provides a model when you don't need to port you application
it is really really suitable for CPU and I/O intensive environments

Next time, I would like to present some results for comparing the Infiniband with other transport technologies.

Tuesday, January 29, 2008

Quickly - configuring VLANs on Linux

This post is similar to the one about configuring VLANs on RHEL but is applicable on any Linux distribution. VLANs support has been included into the Linux kernel since version 2.4.14. Earlier versions requires to be patched properly. More about it and patches is placed here. Again, configuration steps which requires to be done as root:

we have already configured eth0 interface which is accessible from the network and we want to add VLAN support to it, e.g. to accept packets tagged with VLAN ID 123 (avoid using VLAN ID 1, it is often used as administration VLAN)
we need to have support at the kernel level, try to load the proper module

modprobe 8021q

you don't need to know how to modify configuration file's of interface eth0 because we are going to use the vconfig command to do it
to turn on VLAN ID 123 on the interface eth0, use the command:

vconfig add eth0 123

check that the previous command was applied successfully, you can use ifconfig command of course:

ifconfig eth0.123

finally, configure the remaining settings you want to set for the interface eth0, do it with ifconfig again
to make the changes persistent place the commands to some rc script or create your own init script and enable it during system boot
to check the status, kernel has exported some status information here:

cat /proc/net/vlan/eth0.123

if you want to remove the interface, try these two:

ifconfig eth0.123 down
vconfig rem eth0.123

Monday, December 17, 2007

Quickly - configuring VLANs on RHEL

configuration steps on RHEL (4, 5) - they have to be done as root:

we have already configured eth0 interface which is accessible from the network and we want to add VLAN support to it, e.g. to accept packets tagged with VLAN ID 123 (avoid using VLAN ID 1, it is often used as administration VLAN)
we need to have support at the kernel level, try to load the proper module

modprobe 8021q

configuration of interface eth0 is stored in the following file and it may contain its MAC address, IP address, netmask, network and so on

/etc/sysconfig/network-scripts/ifcfg-eth0

to accept packets with VLAN ID 123 on that interface, run the following commands:

cd /etc/sysconfig/network-scripts/

cp ifcfg-eth0 ifcfg-eth0.123

the new configuration file defines virtual interface eth0.123 of the main interface eth0, which will accept untagged packets from VLAN 123
to enable VLANs on the interface eth0, append this line to the newly created configuration file

VLAN=yes

and change the line defining interface name from eth0 to eth0.123

DEVICE=eth0.123

correct another settings - IP address, netmask and related
finally, apply the new network settings with

/etc/init.d/network restart

check the status via proc interface

/proc/net/vlan/eth0.123