Thursday, December 18, 2008

Sun VirtualBox 2.1 is out

Sun has released new major version of Sun xVM VirtualBox product recently. The version 2.1.0 contains some interesting enhancements like:
  • experimental support of 64b guest on 32b host
  • experimental 3D accelration via OpenGL
  • VMware VMDK virtual disks are supported now
    • experimental support of LsiLogic and BusLogic SCSI HBAs (used with VMDK disks)
And that's not everything. There are many other things which worthwhile to read. The more comprehensive list of all enhancements and bug fixes is here.

Thursday, December 11, 2008

VMware VirtualCenter running inside virtual machine - part one

I finished another installation of virtualized environment recently. I had to get rid of old machines and to deploy virtualization on couple of new SUN Blade 6250 modules (installed in SUN Blade 6000 Modular System) connected to SUN StorageTek 2510 iSCSI disk array. The final solution had to pass high-availability conditions.

The VMware ESX Server 3.5 hypervisor was installed on both of blades and deployed in high-availability mode. It requires VirtualCenter Server to be installed to configure VMware cluster. But does it have to be installed on a standalone machine?

VMware officially supports VirtualCenter Server running inside a virtual machine (further denoted as VC VM). Such configuration supports VMware HA as well. But it will pay to keep some basic rules in mind. Let's go over them:
  1. VC VM should have allocated enough resources - set CPU/MEM reservations and shares sufficiently to avoid of running out of resources because this machine is vital for virtual infrastructure. It has to be prioritized over the other virtual machines.
  2. Remember to monitor the machine. That means configure a simple alarm to check CPU/MEM usage. E.g. it can be trigerred if the resource usage is over 90%.
  3. VC VM should be deployed with security rules in mind. Define which users can access it and limit their permissions (configure user roles with help of Active Directory).
The list above doesn't contain obvious rules like hardware requirements for server, installation of MSSQL database separately and similar. These rules are the same as for standalone VirtualCenter installation. Next, we are going to discuss the question what to NOT perform with VC VM:
  1. Never cold-migrate VC VM! The machine has to be powered off first.
  2. Don't try to clone VC VM if you are deploying version of Virtual Infrastructure before 3.5. The version 3.5 supports virtual machine cloning on the fly.
  3. Try to avoid of any operation with VC VM AKA virtual machine hardware reconfiguration which may require to power it down. If you need to do it, connect directly to the related ESX host, power VC VM off and reconfigure it without loosing management connection.
It remains to discuss the stuff around VirtualCenter virtual machine installation and to realize if such configuration will bring something new compared to standalone installation. But about it by the next time.

Wednesday, December 3, 2008

Quickly - Linux Swap space sizing

It's no surprise that many system administrators are still using this simple rule of thumb for sizing their system's swap size:
  • the swap space should be as large as twice amount of system operating memory
I belonged to them. But is it still really necessary to follow that rule when our machines are equipped with gigabytes of RAM now? Sure, it is waste of disk space. If I have a machine with 32 GB RAM, the swap space should have at least 64 GB!

This rule of thumb really held in the past but nowadays the Linux kernel and its memory management are more mature and well optimized to be able to work without any swap space. Even, if you use a swap file instead of dedicated swap partition, it should have nearly the same performance as swapping to partition. So, do we have a newer replacement for it? In genereal, it is recommended to remember the following:
  • if the machine has less then 2 GB of RAM, the swap space should have the size of it
  • if it has more, the swap space should have 2GB
In my opinion, you can't screw anything up if you set the swap space to 2 GB everytime. It has to be enough for every common situation according to the above rules. In more special scenarios, like database or web servers, it is better to follow related tuning guides.

Thursday, November 13, 2008

Red Hat prefers KVM to XEN! No doubt!

It's unbelievable but it's true! Red Hat in cooperation with AMD performs virtual machine live migration between different platforms - from Intel CPU to AMD cpu. You know, there are many difficulties to achieve it - like various extensions, instructions and so on.

So far, it was possible to migrate between processors of different family of one vendor only. Now, Red Hat can do it with RHEL and KVM which means Red Hat confirmed the replacement of XEN with KVM definitely. I wrote about it a few months ago here. The whole video story is published at youtube.

Monday, November 10, 2008

VMware ESX 3.5 Update 3 released

Today, it was released the third update of VMware ESX platform, respectively. The third update of VMware VirtualCenter 2.5 was released a month ago. Interesting release policy. Perhaps, it has something to do with the "power on VM bug" (details here) which was critical around the August 12. After that, VMware announced some changes in the quality assurance of their products.

Now, what the third updates of both products brought? Vmware VirtualCenter 2.5 update 3 didn't bring anything new. It is just a bug fix release. Really, check the "What's New" section in the release notes.

VMware ESX 3.5 update 3 is introducing support of next versions of popular operating systems like RHEL 4.7 or Solaris 10 update 6. Further, it has extended support of SATA controllers. Now, it should be possible to install and run it on PC-like servers with Intel ICH-7 chipset. Finally, it was raised the maximum limit of vCPUs per physical CPU core. The old maximum value is 8, the new one is 20!

Aren't you satisfied with the new stuff? Go through the official release notes and you might find more.

Thursday, November 6, 2008

VCB, vcbMounter, vcbRestore ...

I have written a series of articles about VMware VCB usage. They are concerned about main VCB principles. The backup procedures are performed via VCB command line utilities. It's not bad idea to make a quick list of articles for better orientation among them:
  1. VM identification - how to identify a virtual machine you intend to backup? The command vcbvmname is the answer.
  2. VM full backup - how to perform a full backup of the chosen virtual machine? The vcbmounter command can do it.
  3. VM full backup data access - how to retrieve data from the virtual machine's full backup? It is possible to mount the backup image with the mountvm command.
  4. VM file level backup - the vcbmounter command is able to perform file-level backup as well.
  5. VM backup over NFS - this article describes a simple scenario of virtual machine backup over NFS protocol.
  6. VM backup restore - finally, it is important to know the process of restoring a virtual machine from the backup. You can use vcbrestore.
I hope this quick list of articles will help you to find what you are looking for. Your prompts are welcome.

Tuesday, November 4, 2008

Solaris 10 10/08 released

The new version of Solaris 10 was released by Sun recently. Its name is Solaris 10 10/08 or Update 6. The most expected new feature is support of booting from ZFS filesystem. I added it to the summary of Solaris updates as I presented here. So, here it is:
  1. Solaris 10 1/06 (u1) - GRUB bootloader, iSCSI initiator, fcinfo command
  2. Solaris 10 6/06 (u2) - ZFS filesystem
  3. Solaris 10 11/06 (u3) - Solaris Trusted Extensions, LDoms
  4. Solaris 10 8/07 (u4) - full TCP/IP stack in zones, iSCSI target, branded zones (Linux in Solaris container), Samba AD, enhanced rcapd
  5. Solaris 10 5/08 (u5) - Intel SpeedStep, AMD PowerNow!, Solaris 8/9 P2V (to Solaris 10 zones), CPU capping
  6. Solaris 10 10/08 (u6) - ZFS boot support, many ZFS filesystem enhancements
For more details, click the particular release to read the official release notes.

Thursday, October 30, 2008

VCB basic usage - VM restore with vcbRestore

The last question remains - how to restore the fully backuped virtual machine as we made it in the previous article? The virtual machine is stored at the NFS server and we need to get it back to the ESX host. There are many possible scenarios to do it - e.g., the original machine is corrupted and you have to restore it from backup. Or you don't have VirtualCenter Server available and you would like to deploy a virtual machine like from template without template feature.

Virtual machine full backup performed with vcbMouter command is defined with a specific catalog file which contains summary of backup. The catalog file contains definitions of virtual machine's:
  • display name
  • name of datastore
  • folder path
  • resource pool
Let's inspect one of such catalog files:
version= esx-3.0
state= poweredOn
display_name= "nas-openfiler"
uuid= "564da78f-f2fc-484f-4d92-24238e486380"
disk.scsi0:0.filename= "scsi0-0-0-nas-openfiler.vmdk"
disk.scsi0:0.diskname= "[storage1] nas-openfiler/nas-openfiler.vmdk"
config.vmx= "[storage1] nas-openfiler/nas-openfiler.vmx"
host= vmware.dom.tld
timestamp= "Sun Oct 12 01:37:12 2008"
config.suspenddir= "[storage1] nas-openfiler"
config.snapshotdir= "[storage1] nas-openfiler"
config.file0= "nas-openfiler.vmsd"
config.file1= "nas-openfiler-cf281ca9.vmss"
config.file2= "nas-openfiler.vmxf"
config.file3= "nas-openfiler.nvram"
config.logdir= "[storage1] nas-openfiler"
config.log0= "vmware-1.log"
config.log1= "vmware.log"
folderpath= "/ha-folder-root/ha-datacenter/vm"
resourcepool= "/ha-folder-root/ha-datacenter/host/vmware.dom.tld/Resources"
Now, what can we say about the backuped virtual machine?
  • it is visible as nas-openfiler in VI client (the display name is nas-openfiler)
  • it is stored at the [storage1] datastore in the nas-openfiler directory
  • it belongs to the vm folder in the VirtualCenter folder hierarchy
  • it is running at vmware.dom.tld ESX host
As you can see, everything around the virtual machine is stored inside the ""[storage1] nas-openfiler" directory. The datastore name is a symbolic name of datastore. You can check it via VI client in storage configuration tab. Or you list the contents of the /vmfs/volumes directory.

Let's suppose, we want to create identical machine like "nas-openfiler" but we want to restore it to a different datastore and directory, e.g. "[storage2] nas-openfiler2", and we want to call it "nas-openfiler2". To do it, we need to change selected parameters in the catalog file:
version= esx-3.0
state= poweredOn
display_name= "nas-openfiler2"
uuid= "564da78f-f2fc-484f-4d92-24238e486380"
disk.scsi0:0.filename= "scsi0-0-0-nas-openfiler.vmdk"
disk.scsi0:0.diskname= "[storage2] nas-openfiler2/nas-openfiler.vmdk"
config.vmx= "[storage2] nas-openfiler2/nas-openfiler.vmx"
host= vmware.dom.tld
timestamp= "Sun Oct 12 01:37:12 2008"
config.suspenddir= "[storage2] nas-openfiler2"
config.snapshotdir= "[storage2] nas-openfiler2"
config.file0= "nas-openfiler.vmsd"
config.file1= "nas-openfiler-cf281ca9.vmss"
config.file2= "nas-openfiler.vmxf"
config.file3= "nas-openfiler.nvram"
config.logdir= "[storage2] nas-openfiler2"
config.log0= "vmware-1.log"
config.log1= "vmware.log"
folderpath= "/ha-folder-root/ha-datacenter/vm"
resourcepool= "/ha-folder-root/ha-datacenter/host/vmware.dom.tld/Resources"
It is recommended to backup the original catalog file somewhere. Try to compare them and to notice the changes.
The last step is we need to perform restore operation with the vcbRestore command. Let's our full backup of virtual machine is in the directory /backup/nas-openfiler. The directory can be local directory or mounted from the NFS server. The original catalog file is catalog and modified one Let's go to restore the machine according to new catalog:
vcbRestore -s /backup/nas-openfiler -a /backup/nas-openfiler/
The -s parameter specifies the source directory where the backup is stored and -a parameter specifies use this particular catalog file. If everything is working the command should produce next output:
[2008-10-17 11:00:21.644 'App' 3076444992 info]
Current working directory: /backup/nas-openfiler2
Converting "/vmfs/volumes/storage2/nas-openfiler2/nas-openfiler.vmdk" (VMFS (flat)):
The machine was restored and you can see it in VI client interface. Or you can check it from the service console of ESX host:
vmware-cmd -l | grep nas-openfiler2
It should print something like this:
The new virtual machine nas-openfiler2 can be powered on now. It is identical with the original one by the contents - both machines are the same, but they have different datastore. Final customization is for another article.

Monday, October 27, 2008

VCB basic usage - VM full backup over NFS

Let's go to practice a bit. Let's have a NFS server in the network available. And we would like to backup virtual machines (VMs) from one of our ESX hosts directly to it, without usage of any specialized backup software.

I don't have to forget to say that VCB is available in your ESX host. There is installed VMware-esx-backuptools package which contains almost all the mentioned commands before - vcbVmName, vcbMounter and vcbRestore. The vcbRestore utility is available only with ESX and it is used to restore a virtual machine from full backup. Additionally, the missing mountVm command is available with VCB for Windows only. Don't forget to keep in mind that VCB commands for ESX are case-sensitive beause of service console based on Linux.

Firstly, we need a running NFS server. The configuration is straightforward with any Linux distro. Install required packages, edit /etc/exports configuration file and paste here a directory which will be used for backup of VMs. Start NFS server and reconfigure firewall to allow access to it (or simply stop it). For details, check the related documentation. If you would like I can write some more notes about it.

So, let's have a NFS server with IP address (from C class). The exported directory is the /backup/vm directory. Secondly, we need to permit NFS client at the ESX host. By default, outgoing connections from any ESX host are blocked. You can do it via VI client or from the service console like this:
esxcfg-firewall -e nfsClient
You can check all the available services with:
esxcfg-firewall -s
To check if the nfsClient service was enabled, run this:
esxcfg-firewall -q nfsClient
If so, you will receive:
Service nfsClient is enabled.
Finally, we need a backup script whose only task is to backup available VMs. The script can be scheduled at ESX host via cron service or from the NFS backup server. It's your choice. The script follows:


[ -d $MOUNT_DIR ] || mkdir -p "$MOUNT_DIR" || exit 1

VM_BACKUP="`vcbVmName -s any: | grep name: | cut -d':' -f2`"

if [ ! -z "$VM_BACKUP" ]; then

for VM in $VM_BACKUP; do
vcbMounter -a name:$VM -r $MOUNT_DIR/$VM

umount $MOUNT_DIR

exit 0
Now, simple description of the script. At the beginning, there are defined some variables - the NFS server IP address, the exported directory and the local mount point. Then, the available VMs are listed and saved in a variable. The exported directory from the NFS server is mounted and with the vcbMounter command the VMs are backuped. Finally, the directory is unmounted. If you want to use the commands without authentication credentials, you need to define them in the file /etc/vmware/backuptools.conf. Exactly, these parameters are required:
So, the task to backup virtual machines isn't so sophisticated. In the next article, I'm going to restore them with vcbRestore command.

Virtualization leader

Will it be VMware? Or Microsoft? Or even Oracle? I think it is not right to say it will be this company or that. But it is clear that we can form some virtualization selection now which defines the leaders of actual virtualization market. I am pleased to use for it a screenshot provided by Gartner:

The most interesting part of the screenshot compares the number of deployed virtual machines by the specific virtualization technology. As we can see, VMware is still far away from the others. But have a look at VirtualIron or Oracle. Isn't it interesting?

As I don't know the source of the data used to produce the screenshot, I wouldn't like to deduce any great conclusions. I'm able only to say that VMware still rules and the others are coming. But one thing is clear - solutions based on XEN are strong and they have a great potential, haven't they?

In my opinion, It would be really cool to know the numbers of pure XEN installation - XEN in Linux distributions like SLES 10 or RHEL 5 and similar. Perhaps, we will be very surprised! The more detailed article which made me to write this short note was published at

Wednesday, October 22, 2008

Solaris 10 updates summary

I needed some quick list of features available in particular update of Solaris 10. As you may know, the Solaris 10 was released in 2005. Since that time, there were realeased 5 updates in total which are bringing new features to the OS. The sixth update might be released during the October, 2008. The following list is my mentioned quick list of important features:
  1. Solaris 10 1/06 (u1) - GRUB bootloader, iSCSI initiator, fcinfo command
  2. Solaris 10 6/06 (u2) - ZFS filesystem
  3. Solaris 10 11/06 (u3) - Solaris Trusted Extensions, LDoms
  4. Solaris 10 8/07 (u4) - full TCP/IP stack in zones, iSCSI target, branded zones (Linux in Solaris container), Samba AD, enhanced rcapd
  5. Solaris 10 5/08 (u5) - Intel SpeedStep, AMD PowerNow!, Solaris 8/9 P2V (to Solaris 10 zones), CPU capping

Monday, October 20, 2008

RHEL and Infiniband - advanced diagnostics - part three

Let's decompose the ibnetdiscover output a bit. The first paragraph begins with Switch keyword. The switch has GUID 0x144f00006e9794. The channel adapter begins with Ca keyword. Their GUIDs are 0x3ba0001003de4 (node node2) and 0x3ba0001007ba8 (node node1). The second one corresponds with the node displayed by the ibstat above. You had to notice that there are many numbers in the square brackets. They identify the ends of IB physical connections. Let's inspect them in more detail:
  • connections from switch to IB nodes (switch -> nodes)
    • switch port [6] is connected to the [2] channel of IB node node2
    • switch port [5] is connected to the [1] channel of IB node node2
    • switch port [4] is connected to the [2] channel of IB node node1
    • switch port [3] is connected to the [1] channel of IB node node1
  • connections from IB node node1 to switch (node -> switch)
    • the [1] IB channel is connected to switch port [3]
    • the [2] IB channel is connected to switch port [4]
  • connections from IB node node2 to switch (node -> switch)
    • the [1] IB channel is connected to switch port [5]
    • the [2] IB channel is connected to switch port [6]
Do you understand the logic of it? I think it's simple. And it is evident the IB connections are full-duplex in our scenario.

I'm going to skip the ibnodes command. Its output is the same as without running subnet manager. Next command, the ibroute command, is producing the following nice forwarding table:
Unicast lids [0x0-0x5] of switch Lid 2 guid 0x00144f00006e9794 ():
Lid Out Port Destination Info
0x0001 003 : (Channel Adapter portguid 0x0003ba0001007ba9: 'node1 HCA-1')
0x0002 000 : (Switch portguid 0x00144f00006e9794: '')
0x0003 004 : (Channel Adapter portguid 0x0003ba0001007baa: 'node1 HCA-1')
0x0004 005 : (Channel Adapter portguid 0x0003ba0001003de5: 'node2 HCA-1')
0x0005 006 : (Channel Adapter portguid 0x0003ba0001003de6: 'node2 HCA-1')
5 valid lids dumped
It lists the assigned LIDs, corresponding switch ports and the other ends of the connections. It's classical routing table saying that a LID X is reachable via a switch port Y with an additional information about the entity owning that LID number. For example, the LID 1 is reachable via the switch port 3 and it is the channel adapter of node node1.

To make the final decision if the IB network is working run the ibchecknet command. The output might say that we have 2 working IB HCAs, 3 IB nodes (two with HCA and one switch) and 8 working IB ports (physically only four but the network is full-duplex in our scenario).
# Checking Ca: nodeguid 0x0003ba0001003de4
# Checking Ca: nodeguid 0x0003ba0001007ba8
## Summary: 3 nodes checked, 0 bad nodes found
## 8 ports checked, 0 bad ports found
## 0 ports have errors beyond threshold
From now, we have working Infiniband network and we are able to do this:
  • ibping the nodes natively
  • ping the nodes over ipovib
  • run unmodified network applications over ipoib (e.g. NFS, FTP and so on)
  • run natively RDMA application

Friday, October 17, 2008

VMware ESX vs ESXi updated

I summarized the main differences between VMware ESX and ESXi hypervisors in these two articles:
  1. Differences between VMware ESXi and ESX
  2. Technical differences between VMware ESXi and ESX
Additionally, the main source of information to the topic should be in the article published at VMware knowledge base:
  1. VMware ESX and ESXi Comparison
This article was updated recently and contains the most actual comparison of the hypervisors.

Thursday, October 16, 2008

RHEL and Infiniband - advanced diagnostics - part two

It's almost two months ago when I began to write about advanced diagnostics of IB networks. In the end of the article, I suggested to start the IB subnet manager. So let's do it with the init script:
/etc/init.d/opensmd start
Now, we are ready to compare the outputs of the commands when the IB subnet manager wasn't running and when it is running. There should be noticeable differences because the IB network should be fully initialized since now. At first, what new shows us the ibstat command:
CA 'mthca0'
CA type: MT25208 (MT23108 compat mode)
Number of ports: 2
Firmware version: 4.7.400
Hardware version: a0
Node GUID: 0x0003ba0001007ba8
System image GUID: 0x0003ba0001007bab
Port 1:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x02510a6a
Port GUID: 0x0003ba0001007ba9
Port 2:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02510a68
Port GUID: 0x0003ba0001007baa
The IB subnet manager is responsible for finishing IB hardware initialization phase. Both ports of HCA are in Active state and they have assigned the Base lid which is required for communication over IB network. The IB subnet manager is working because it has assigned SM lid as well. What about other nodes in the network? Let's try the ibnetdiscover command. It should say something more:
Switch 9 "S-00144f00006e9794" # "" base port 0 lid 2 lmc 0
[6] "H-0003ba0001003de4"[2] # "node2 HCA-1" lid 5
[5] "H-0003ba0001003de4"[1] # "node2 HCA-1" lid 4
[4] "H-0003ba0001007ba8"[2] # "node1 HCA-1" lid 3
[3] "H-0003ba0001007ba8"[1] # "node1 HCA-1" lid 1

Ca 2 "H-0003ba0001003de4" # "node2 HCA-1"
[2] "S-00144f00006e9794"[6] # lid 5 lmc 0 "" lid 2
[1] "S-00144f00006e9794"[5] # lid 4 lmc 0 "" lid 2

Ca 2 "H-0003ba0001007ba8" # "node1 HCA-1"
[2] "S-00144f00006e9794"[4] # lid 3 lmc 0 "" lid 2
[1] "S-00144f00006e9794"[3] # lid 1 lmc 0 "" lid 2
Do you remember the LIDs number from the uninitialized IB network? There were same zeroes, the HCAs were uninitialized. Now, each channel has an unique LID. Next time, we are going to decompose this output.

Monday, October 13, 2008

VMware learned Hyper-V Quick Migration

Yes, the article headline is right. As you already know, there are a lot of discussions what is the difference between VMware VMotion and Microsoft Hyper-V Quick Migration. VMware VMotion is enterprise proven feature which allows to hot migrate a running virtual machine among ESX nodes forming a high availability cluster.

The Hyper-V Quick Migration is much simpler. It suspends the machine, cold migrate it (virtual machine saved state) to another host and unsuspend it. Do you understand the difference now? The Quick Migration requires some downtime depending on virtual machine state size - mainly size of memory.

But the reason I began to write the article is elsewhere. Mike DiPetrillo, a system engineer working for VMware, has coded a simple Powershell script which provides this feature to VMware VirtualCenter. The only prerequisite to use it is to install VMware Infrastructure toolkit for Windows. What does it mean for us? You don't have to pay for VMotion license and you can still quick migrate your virtual machines. Isn't the VMotion for "poor" cool tool? The script is published and described at Mike's blog.

Additionally, you can integrate the script into VirtualCenter with Icomasoft VI PowerScripter. The altered script compatible with VI PowerScripter is published at Icomasoft forum. Let's go to give it a try!

Tuesday, October 7, 2008

SLES10 update and SSL certificate problem

Have you ever needed to update some remote SLES10 system from your local update server (e.g. YUP server)? There may be many reasons for such situation. For example, the remote system can have unstable Internet connectivity to connect to the Novell servers or no connectivity at all with ability to see your local update server via VPN network only. You are able to imagine other situations, of course.

Let's suppose our update server is reachable from the remote locality via HTTPS protocol at URL https://update.domain.tld/path/. The update source is of YUM type and we want to update the system with the zypper command. At first, we need to subscribe to the update server. If the update server SSL certificate is subscribed by some well-known certification authority, then you don't have to worry. You can use the following command to add the update server to the update sources:
zypper subscribe https://update.domain.tld/path/update update
But if you generated your own certification authority or self-subscribed server certificate, then you may notice these errors:
Curl error for 'https://update.domain.tld/path/repodata/repomd.xml':
Error code:
Error message: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
The message is comprehensible and it says that the server certificate is untrusted and can't be verified by the known CA certificates. Simply said, the server certificate is subscribed by your untrusted certificate or it is self-signed. The message only warns you that there may be an attempt of man in the middle attack.

The curl application uses a CA bundle to verify server certificates. The bundle is typically stored in the /usr/share/curl/curl-ca-bundle.crt file. If you want to make your own CA certificate valid, then concat its PEM content to the end of the file like this:
cat ca.crt >> /usr/share/curl/curl-ca-bundle.crt
After this command, everything will begin to work and the update server URL will be added to the update sources.Then, the update may start:
zypper update
I didn't mention that you will have a similar problem if you use the rug command. If I apply the previous steps the rug command will produce an error about SSL certificate verification failure anyway. I suspect that rug doesn't use curl to access the update server. So, does anybody know how to resolve it in case of rug usage?

Wednesday, October 1, 2008

VCB basic usage - VM file-level backup with vcbMounter

The performance of full backup running over LAN network is not optimal because it requires to copy virtual machine disks locally. That may take some time. The usage of SAN or Hot-Add mode is far better in such situations.

The file-level backup is more suitable for LAN networks because it doesn't export any disks. It is able to mount the disk directly and you can access its filesystem without mountvm command like I described here. By the way, this holds for Windows OSes because VCB supports NTFS or FAT filesystems only.

Let's take our previous virtual machine vcb-backup and try to do file-level backup. We will use the same command but with different value of -t parameter:
vcbmounter -h VC_IP -u VC_USER -p VC_PASS -a name:vcb-backup
-r c:\mnt\vcb-backup -t file -m nbd
The virtual disk will be mounted under the C:\mnt\vcb-backup directory in LAN mode. The successful mounting will print the following messages (some lines may be split due to their length):
Opened disk: vpxa-nfc://[STORAGE] vcb-backup/vcb-backup.vmdk@\
VC_IP!52 79 b4 1a d5 0a 84 31-fd 1c e3 fe f8 31 db ed
Proceeding to analyze volumes
Done mounting
Volume 1 mounted at c:\mnt\vcb-backup\digits\1 (mbSize=12291 \
fsType=NTFS )
Volume 1 also mounted on c:\mnt\vcb-backup\letters\C
Again, the NTFS filesystem is accessible via its letter. Now, you can copy the files inside, you can backup the directory structure but you can't delete anything. The reason is you are not working with the virtual disk directly but with its snapshot called _VCB_SNAPSHOT_ (full backup with vcbMounter). Here is a screenshot from Virtual Infrastructure Client proving it:

When we are finished with backup we need to unmount it. It differs from unmounting exported virtual disk because we need to delete the snapshot as well. This is reachable with the vcbmounter command and -U parameter:
vcbmounter -h VC_IP -u VC_USER -p VC_PASS -U c:\mnt\vcb-backup
The output is similar to the one we have seen already:
Unmounted c:\mnt\vcb-backup\digits\1\ (formatted)
Deleted directory c:\mnt\vcb-backup\digits\1\
Deleted directory c:\mnt\vcb-backup\digits
Deleted directory c:\mnt\vcb-backup\letters\C\
Deleted directory c:\mnt\vcb-backup\letters
Deleted directory c:\mnt\vcb-backup

And that's all the magic. Do you remember as we need to export the virtual disk at first and then to mount it? The file-level backup is straightforward. You can bypass copying the virtual disks over LAN and you can do the backup directly.

The conclusion is:
  1. Use the file-level backup where it is possible (Windows machines)
  2. Otherwise use the full backup (UNIX machines)

Wednesday, September 24, 2008

VMware Server 2.0 is out

Let's celebrate it! VMware just released a next major version of their VMware Server. Quickly, let's go through the new features and other changes:
  • New operating systems support - it supports operating systems recently released like Windows Server 2008, Windows Vista or RHEL5 .
  • 64b operating system support - finally, we have stable support of 64b guests on 64b hardware. VMware Server runs natively on 64b Linux host operating systems now. Perfect job guys!
  • Virtual machines scalability - you can configure your virtual machines with up to 8GB of RAM and with up to 10 NICs now. The USB2.0 interface is supported as well.
  • Hot add/remove of SCSI disks - you can attach or detach a new virtual disk to the running virtual machine.
  • Virtual Machine Communication Interface (VMCI) - this feature allows to reach better performance in network communication of guests with host or among guests.
  • VI Web Access - this feature allows you to manage your virtual machines from web-based interface. It is part of the installation package now.
  • VMware Remote Console - it is a web browser addon which makes you able to manage your VMs from it. And it allows to connect local CD-ROM as well.
  • Volume Shadow Copy service - you can make consistent backups of Windows guests now with help of VMware VSS writer which uses snapshots to maintain data integrity.
  • Size of installation package - Wow! The installation package has over 500MB of size now! I'm really curious what's inside! The previous version has about 150MB!
I think that VMware enabled every feature which was proposed and implemented in beta release of VMware Server 2.0. I hope I will find some time to install it and check these nice features. By the way, have you installed it already? Download it from site, read the release notes.

VCB basic usage - VM mount with mountVm

The full backup of virtual machine vcb-backup was finished and we have its virtual disk available locally now. We can backup it directly or we can access its filesystem and backup selected filesystem structure only (e.g. we want to backup some application data only).

The virtual disk and the whole virtual machine is available in the c:\mnt\vcb-backup\ directory. It contains everything what we might require for its restoration (VMX and VMDK configuration files, NVRAM states and so on). This screenshot figures its content:

The virtual disk path is available from the output of vcbmounter command:
To browse the filesystem, we will mount it with the mountvm command:
mountvm -cycleId -d c:\mnt\vcb-backup\scsi0-0-0-vcb-backup.vmdk \
The successful execution of it will produce messages like these:
Opened disk: c:\mnt\vcb-backup\scsi0-0-0-vcb-backup.vmdk
Proceeding to analyze volumes
Done mounting
Volume 1 mounted at c:\vcb-backup\digits\1 (mbSize=12291 fsType=NTFS )
Volume 1 also mounted on c:\vcb-backup\letters\C
A virtual machine disk represents an image of x86 harddisk with own MBR and partition table. Our disk contains only one partition with NTFS filesystem linked with drive letter C. Each such a letter is mounted under the directory which we specified with -d option (again, the directory is created on demand and it can't exist before). In our scenario, we have c:\vcb-backup\letters.

We are able to traverse the filesystem now and backup its particular files or directories. When we are finished, we will have to unmount it:
mountvm -u c:\vcb-backup
If the mounted filesystem isn't busy, the command will print these messages (otherwise it will end with an error):
Unmounted c:\vcb-backup\digits\1\ (formatted)
Deleted directory c:\vcb-backup\digits\1\
Deleted directory c:\vcb-backup\digits
Deleted directory c:\vcb-backup\letters\C\
Deleted directory c:\vcb-backup\letters
Deleted directory c:\vcb-backup
Finally, let's make a summary of virtual machine full backup steps:
  1. we need to identify the machine (vcbvmname)
  2. we need to export its virtual disk (vcbmounter -t fullvm)
  3. optionally, we might need to mount its filesystem (mountvm)
  4. backup the exported disk
  5. backup the mounted filesystem if mounted
  6. unmount the filesystem if mounted (mountvm -u)
That's all for now. Any question or suggestions are welcome.

Friday, September 19, 2008

VCB basic usage - VM full backup with vcbMounter

Let's choose the virtual machine called vcb-backup and perform its backup. Before we proceed, we should stop to explain that VCB is capable of doing two types of backup:
  1. full backup
  2. file-level backup
The file-level backup is available for Windows operating systems only. The full backup means to backup virtual machine images. You can use it with every type of virtual machine.

Next thing, we need to remember to choose the right transport mode:
  1. SAN mode - bypassing VCB proxy via FC or iSCSI SAN (LAN-free backup)
  2. Hot-Add mode - VCB proxy in a virtual machine
  3. LAN mode - backup over LAN network
In SAN mode, both ESX host and VCB proxy has to have access to the shared storage. The backup is completely offloaded from the ESX host which provides virtual machine disks and their snapshots only. The whole process is moved to the VCB proxy which reads disks directly from the SAN. The Hot-Add mode is interesting in that way, it is able to access virtual machine disks directly but without the SAN. If the VCB proxy is virtualized, it is hot added virtual machine disks. Nothing suprising, VMware ESX is capable of hot-adding disks to a virtual machine. The drawback is that you need to have one virtualized VCB proxy on each ESX host so that it can backup all virtual machines hosted on them. The LAN mode uses network protocols to access the virtual machine disks. By the way, the SAN mode is supported by VCB and VMware ESX natively. The remaining two modes requires VMware ESX(i) 3.5 or later.

The vcbmounter command is used for virtual machine backup. The following command will initiate a full backup of vcb-backup virtual machine in LAN mode:
vcbmounter -h VC_IP -u VC_USER -p VC_PASS -a name:vcb-backup \
-r c:\mnt\vcb-backup -t fullvm -m nbd
The -r parameter defines a directory of backup location. The -m parameter specifies transport mode (available values are san, hotadd, nbd/nbdssl). The directory can't exist. It is created on demand. The command should produce the following output when it finishes:
Copying "[system-raid1] vcb-backup/vcb-backup.vmx":
Copying "[system-raid1] vcb-backup/vcb-backup.nvram":
Copying "[system-raid1] vcb-backup//vmware.log":
Converting "c:\mnt\vcb-backup\scsi0-0-0-vcb-backup.vmdk" (compact file):

As you can see, the VCB proxy is copying virtual machine disks to the defined local storage. More precisely, the ESX host provides virtual machine snapshots to the VCB proxy which copies them to the local storage (you can check it with snapshot manager - there will be a snapshot called _VCB-BACKUP_). After that, you can backup them with your favorite backup software. Or you can access them with mountvm command.

Wednesday, September 17, 2008

VCB basic usage - VM idetification with vcbVmName

Before we begin the backup process of selected virtual machine, we need to identify it so that VCB can contact VirtualCenter server and start the backup session. This step is simple and requires to run the vcbvmname command. The VirtualCenter server contacts particular ESX hosts and send a list of hosted virtual machines back. Follows the example of command usage:
vcbVmName.exe -h VC_IP -u VC_USER -p VC_PASS -s any:
The most of used command line options are almost common for all VCB commands:
  • -h - hostname or IP address of VirtualCenter server
  • -u - VirtualCenter user who is allowed to do virtual machine backup (at least user with VMware Consolidated Backup User role)
  • -p - VirtualCenter user password
The -s option is more specific and defines virtual machine identifier prefixed with a search pattern. The any: pattern means search for any available virtual machine. This identifier is used by other commands for virtual machine identification as well. It is possible to identify it by IP address, name and so on. What does the above command produce?
Found VM:
Found VM:
Here, you can see all virtual machine identifiers - name (name:), IP address (ipaddr:), virtual machine unique identifier (uuid:) and managed object reference (moref:). You can use them as search pattern instead of any: universal pattern. For example, the next command will find the virtual machine with name vcb-test only:
vcbVmName.exe -h VC_IP -u VC_USER -p VC_PASS -s name:vcb-test
Now, we know how to identify the particular machine and next time, we are going to backup it.

Friday, September 12, 2008

What next? VMware ESX 4.0

What next will VMware release after VMware ESX 3.5? This is a natural question and I think it's not surprise that it might be VMware ESX 4.0 or something like this. The surprise is what the next generation of ESX might bring to us. Let's have a look at some of the new features:
  • the service console and the kernel should run in 64-bit mode natively
  • VMware VirtualSMP should support 8 virtual logical processors
  • full support for SATA devices (we might be able to run ESX on PC?)
  • clustered VirtualCenter Servers
  • access control on storage resources
  • automatic configuration changes tracking
  • and many others
The list is not complete. Actually, it contains only a few of prepared features. The details should be unveiled at the incoming VMworld2008 forum as it is written at I'm looking forward to it.

Thursday, September 11, 2008

VCB basic usage - introduction

Do you know about some useful guide which introduces VCB commands? In my case, I haven't find any yet. I found some articles about the topic which helped me a bit but no one was usable as a reference guide. So, I am still missing such a guide. Unfortunately, the basic usage of VCB commands is comprehensible.

I don't want to write anything advanced for now. Just basics, no SAN, no backup agents. Just to explain what VCB contains. Let's suppose we wanted to export virtual machine disk files or mount them to a Windows station and then to backup it. The backup will run over the network. We need to mount the images, backup them and unmount them after that.

So, which command do we have? Which can help us with the tasks described above? Let's explore them:
  • vcbvmname - connects to the VirtualCenter server and lists the available virtual machines
  • vcbmounter - allows to export/mount the virtual machine disk files to the backup station
  • mountvm - allows to mount the exported virtual machines disk files locally
  • vcbrestore - allows to completely restore the virtual machine disk files to the ESX host
The usage of the commands will be explained in the following article. Now, if you know about any guide focused on VCB usage, please, share it with us.

Monday, September 8, 2008

Microsoft Hyper-V against VMware ESXi again

Hm, the competitor never sleeps, we could say after Microsoft revealed the plan to release Hyper-V Server 2008 platform without Windows (read more at I'm not able to imagine it but Microsoft developed a minimal version of Windows with the most necessary parts of OS - kernel and drivers - which are loaded in the parent partition. It should be similar to VMware ESXi which is not more dependent on service console. That means the customer doesn't need to invest to the Windows Server 2008 licence. The whole new product should be released within 30 days and it will be free of charge.

By the way, the VMware products are still leaders. And they will be. The stability, performance, central management of virtual environment or enterprise features are more mature than new toys from Microsoft. Let's mention the only one - VMotion technology. It is said Microsoft is going to support live migration of virtual machines in Windows Server 2008 R2 which will not be out before 2010 (details at I think it's quite late...

Thursday, September 4, 2008

Sun VirtualBox 2.0 is out

Today, Sun released new major version of their desktop virtualization product Sun VirtualBox 2.0. Newly, it supports 64-bit guests (only Windows Vista and RHEL5), Microsoft VHD virtual disk format, AMD RVI or Python API.

It can be downloaded from The official release notes is available as well.

Wednesday, September 3, 2008

VMware server 1.x and GNOME library issue

If you install VMware server 1.x at your Linux workstation you may encounter the dependency issue between installed VMware libraries and available system libraries like this (lines are broken):
(vmware:30311): libgnomevfs-WARNING **:
Cannot load module `/opt/gnome/lib/gnome-vfs-2.0/modules/'
version `GCC_4.2.0' not found (required by /usr/lib/

The above message can be initiated via adding a new virtual disk to the virtual machine or via assigning an ISO image to its virtual CD-ROM. Such operations end with the error displayed in the parent console. The reason why such situation maybe appear is that the installed libraries are compiled with an older GCC compiler than the system libraries. The above error was produced at SLES 10 SP1 distribution which includes GCC 4.1.2. The installed VMware server had version 1.0.6 and in my opinion, it is compiled with GCC 3.x.

The resolution for the problem is to set environment variable VMWARE_USE_SHIPPED_GTK to value "yes", export it and run vmware command after that:
  3. vmware &
I recommend to place the variable to your startup script, e.g. to your ~/.profile or ~/.bash_profile.

Monday, September 1, 2008

How to resize ext3 filesystem on RHEL 5.x

I didn't have a luck when I was looking for ext2online utility to resize ext3 filesystem online on RHEL 5.x (it is available on RHEL 4.x). Online means to resize it without requirement to unmount the filesystem. I went through the release notes but I didn't find any notes about it. Perhaps, I didn't read them carefully.

The ext2online tool can be used to resize ext2 filesystem but it has to be unmounted. The tool is able to resize ext3 filesystem online under the condition of kernel supports online resizing. More particularly, it is possible to do online enlarging only.

Alongside it, there exists another tool - resize2fs which is capable of ext2/ext3 filesystem resizing. But the filesystem has to be unmounted first. This is required on RHEL 4.x. If you try to resize a mounted ext3 filesytem on RHEL 4.x with the tool it will end with error "can't resize a mounted filesystem!".

So, how to resize ext3 filesystem on newer RHEL 5.x? Both tools belong to the e2fsprogs package which contains a set of tools for creating, checking, modifying, and correcting ext2/ext3 filesystems. On RHEL 4.x, the package contain both tools - ext2online in version 1.1.8 and resize2fs 1.35. On RHEL 5.x, it contains resize2fs only - version 1.39. The newer version supports online resizing in case of kernel supports it. Here is the summary how to resize ext2/ext3 online on RHEL platform:
  1. RHEL 4.x - use ext2online tool (e2fsprogs package)
  2. RHEL 5.x - use resize2fs tool (e2fsprogs package)
I don't consider necessary to write about these tools usage, it's simple and the tools have related man pages.

Thursday, August 28, 2008

Technical differences between VMware ESXi and ESX

I have spent some time with looking for more details about VMware ESXi compared to VMware ESX. I summarized the main differences in this article but I think it's not complete. There have to be more features missing in ESXi because of service console removal. So, what next did I discover?
  • ESXi is supported on smaller set of certified hardware because it is standalone system and it doesn't depend on RHEL service console which provides drivers for other hardware.
  • You can manage ESXi with RCLI on Linux or Windows platform but Virtual Infrastructure client is more comfortable and easier to use. Further, if you deployed ESXi without Virtual Infrastructure licence, RCLI will have read-only access only. The drawback of VI client is that it is available for Windows platform. The solution may exist in using Wine emulator but the installation isn't as straightforward as on Windows plartform. The Wine application database contains this entry about VI client installation but I haven't tried it yet.
  • You can manage your ESX server directly via serial cable but ESXi is missing this feature.
  • ESXi kernel is missing jumbo frames support in TCP/IP stack which allows to send larger frames out onto physical network. It can help to achieve higher throughput with NFS or iSCSI protocols.
  • ESXi doesn't support NetQueue technology which is boosting 10G Ethernet performance.
  • Finally, VMware in cooperation with Mellanox Technologies supports Infiniband host channel adapters on ESX. ESXi is missing it.
The previous six points are related to the technical aspects of ESX and ESXi hypervisor. These points aren't complete as well but they are quite important for common deployment of VMware technologies. If you know about something else, please share it at my blog. For further information, check these links:
  1. VMware ESX 3.5 release notes
  2. VMware ESXi 3.5 release notes
  3. ESX and ESXi comparison (VMware knowledge base)
  4. Differences between ESXi and ESX (VMware knowledge base)

Tuesday, August 26, 2008

VMware Server 2.0 is coming ...

Last week, VMware released another candidate of oncoming Server 2.0 denoted as RC 2. Because I'm using the current stable version 1.0.6 in my testing environment a lot, I'm looking forward to the new one as well. It should bring a lot of new useful features among which belongs:
  1. support of USB 2.0 devices - it will be more comfortable to use high-speed USB memory sticks
  2. online disk capacity expansion - it will be possible to hot-add a SCSI hard drive to virtual machine and to expand its size on the fly without powering it off
  3. VMCI - Virtual Machine Communication Interface will allow efficient communication between virtual machines or between virtual machine and host system so you don't need to use more generic and slower communication channels like network
  4. virtual machine scalability - virtual machine will be able to manage up to 8GB of memory (the current stable release can manage 3.6GB only) and it will suport two-way vSMP (current release contains experimental support only)
  5. remote client devices - new VMware Remote Console will allow to use not hosted devices such as CD-ROM
  6. broader OS support - support of Microsoft Windows Vista, Windows Server 2008 or RHEL 5 will be included
And that's not everything. More details are written at For impatient, VMware provides beta program which allows you to try it. As well, read the released user guide.

Wednesday, August 20, 2008

Quickly - how to download a file to the ESX 3.x service console?

The VMware ESX 3.x is missing wget package so you can't use wget command to download anything from the Internet as you wish. In spite of wget, the service console provides lwp-* tools which are simple perl scripts based on LWP and URI perl modules and which allow to do some basic tasks around the HTTP protocol.

The tools are part of perl-libwww-perl package. The package is installed by default. The most important tool is lwp-download which you can use for downloading files. Let's check the steps how to download something:
  1. esxcfg-firewall --allowOutgoing
    • allow outgoing connections from service console
  2. lwp-download http://dfn.dl..../apcupsd-3.14.4-1.el3.i386.rpm
    • download apcupsd package
  3. esxcfg-firewall --blockOutgoing
    • return firewall to the initial state
Beside this, the perl-libwww-perl package contains other tools like lwp-mirror, lwp-request and lwp-rget. Check their man pages for their usage.

Thursday, August 14, 2008

Quickly - how to determine the ESX host version?

In my opinion, the easiest way how to find out the ESX host version, is to log in to the service console and use the esxupdate command. The major version can be found in the file /etc/vmware-release. For example, it may contain:
VMware ESX Server 3
So, the major version is 3.x. To determine minor version is a little complicated. Run this command from service console:
esxupdate query
And try to identify patches with the following prefixes from the end of command output (the last one is the right one):
  1. ESX - the minor version should be 0, so we have version 3.0.x
  2. ESX350 - the minor version is 5, the version is 3.5.0
  3. 3.5.0 - initial instalation of version 3.5.0
The corresponding lines may look like these:
3.5.0-64607    16:25:29 05/27/08 ESX 3.0.x to 3.5.0-64607 upgrade
3.5.0-64607 10:42:31 08/06/08 Full bundle of ESX 3.5.0-64607
It remains to identify the update level. Use the same command as above and check the full patch name now:
  1. ESX350-Update01 - we have 3.5.0 Update 1
  2. ESX350-Update02 - we have 3.5.0 Update 2
The corresponding lines are:
ESX350-Update01    16:59:56 05/27/08 ESX Server 3.5.0 Update 1
ESX350-Update02 10:42:38 08/06/08 ESX Server 3.5.0 Update 2
Do you know another way how to reach the version? Beside this, I have found this knowledge base article.

Tuesday, August 12, 2008

VMware ESX 3.5 Update 2 and power on virtual machine bug ?

What a coincidence! Me and my colleague were preparing a server with VMware Infrastructure Update 2 yesterday. Just another simple scenario, we were thinking. We just wanted to run some checks if it is suitable for production usage. Everything went smoothly. But when my colleague began his job, things went worse. His task was easy - install some new virtual machines in prepared environment and check their behaviour (details aren't important now).

When he wanted to power on a prepared virtual machine to begin guest installation, the VMware Infrastructure client displays the following error message:
  • A general system error occurred: Internal error
It is nothing interesting, isn't it? No clue where to go. The virtual machine stays powered off. Any running machine stays running until you suspend it or power it off. So,
  • don't power off or vmotion your virtual machines!!!
Otherwise, you will have a big trouble! Naturally, I checked the virtual machine vmware.log log file. I found here a "virtual" reason why such strange behaviour:
  • [msg.License.product.expired] This product has expired.
  • Be sure that your host machine's date and time are set correctly.
  • There is a more recent version available at ...
Hm, the right time is often critical part of any installation but my habit is to configure NTP server where it is possible. I done it in this scenario as well. I checked the server time and nothing was wrong. The ESX host is licenced properly. When I try to search the VMware knowledge base, there is no answer at all. And you need to have a luck today because it is really really overloaded by others. Maybe, they are deailing with the same problem.

Because my colleague needed to work, my last try was to move the time backwards. I installed the server two days ago and everything was working fine. It was August 10. My colleague began working this morning and since this time, any virtual machine couldn't be powered on. Today, it is August 12. So I moved the time here and it helped! Of course, the NTP server has to be shutdown. You can do it via VI client or you can log in to the service console and use date -s command. I'm not aware of other working solution now.

This afternoon, I found the first article about this annoying bug at It is written here the VMware knowledge base seems to collapsed due to this issue and the support team is promising solution in 36 hours. By the way, Update 2 can't be downloaded now.

Tuesday, August 5, 2008

Differences between VMware ESXi and ESX

VMware ESXi hypervisor is free of charge now but what are the reasons to use it instead of VMware ESX? And what advantages does it have?

The most important advantage is you don't need to pay for it. Furthermore, it supports all VMware Infrastructure features if you buy proper licences - you can vmotion virtual machines, schedule resources, backup them via VCB and so on. If you really want to save more bucks you don't have to pay the support which is required in case of VMware Infrastructure. The new option is to pay per incident.

What are the main differences? As I written here, the ESXi hypervisor is OS independent (it is without service console) and its installation requires only 32MB of disk space. The negative thing is that it lacks VirtualCenter agent, VCB and update manager. These features are included in VMware Infrastructure Foundation edition and higher by default. If you would like to manage ESXi hosts you need to buy agent licences.

It remains last important question. How can we control ESXi hosts remotely if we don't have service console? The ESXi hypervisor doesn't have SSH access by default but supports RCLI or Remote Command Line Interface. The RCLI allows to perform remote command line operations on an ESXi host from your management station. If you still prefer SSH to RCLI you can enable it according this article.

So, are you going to deploy it? If so, you can write me about your experience with the product. I would like to know your story.

Monday, August 4, 2008

VMware ESXi or Microsoft Hyper-V

VMware ESXi 3.5 update 2 is totally free now. It comes out from VMware Infrastructure and has a lot of features in common. The hypervisor can be downloaded from

I was interested in reasons why the hypervisor was released for free. And I must admit the company keeps still the same rule - what provides competitors we provide for free.

The competitor is Microsoft and their Hyper-V hypervisor. It is free like ESXi but both of them require to pay for management software. But if you want to use Hyper-V, you need to pay for Windows Server 2008 licences. Besides, ESXi is bare-metal solution and you don't need any operating system for service console! Check the prices comparison of both solutions in this article at

The conclusion is simple - ESXi seems to be significantly cheaper then Hyper-V in common SMB environments which require basic consolidation with or without centralized management.

Wednesday, July 30, 2008

RHEL and Infiniband - advanced diagnostics - part one

I will continue from the point where I finished last time. The remaining diagnostics tools depend on sysfs interface. The provided information is extracted from this filesystem. If you don't remember the meaning of each entry under the /sys/class/infiniband directory use these tools.

The IB subnet manager is not running is one of the IB network issues. The IB nodes don't have assigned any LIDs and they aren't able to see each other. The node or his IB ports are connected but they aren't initialized yet. To find out this without sysfs use the ibstat command:

CA 'mthca0'
CA type: MT25208 (MT23108 compat mode)
Number of ports: 2
Firmware version: 4.7.400
Hardware version: a0
Node GUID: 0x0003ba0001007ba8
System image GUID: 0x0003ba0001007bab
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0003ba0001007ba9
Port 2:
State: Initializing
Physical state: LinkUp
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0003ba0001007baa

The output contains everything what we need - port state, LID, GUID, rate. The IB link is up but the ports are in the INIT state. No IB subnet manager is running. It is clear because the Sm lid parameter has zero value. It should have LID value of the node which acts like IB subnet manager. The same holds for Base lid. The zero value means that the IB network isn't initialized yet. The similar information will be provided by the ibnetdiscover command:

Switch 9 "S-00144f00006e9794" # "" base port 0 lid 2 lmc 0
[6] "H-0003ba0001003de4"[2] # "node2 HCA-1" lid 0
[5] "H-0003ba0001003de4"[1] # "node2 HCA-1" lid 0
[4] "H-0003ba0001007ba8"[2] # "node1 HCA-1" lid 0
[3] "H-0003ba0001007ba8"[1] # "node1 HCA-1" lid 0

Ca 2 "H-0003ba0001003de4" # "node2 HCA-1"
[2] "S-00144f00006e9794"[6] # lid 0 lmc 0 "" lid 2
[1] "S-00144f00006e9794"[5] # lid 0 lmc 0 "" lid 2

Ca 2 "H-0003ba0001007ba8" # "node1 HCA-1"
[2] "S-00144f00006e9794"[4] # lid 0 lmc 0 "" lid 2
[1] "S-00144f00006e9794"[3] # lid 0 lmc 0 "" lid 2

The square brackets contain the physical port number at the switch. As we can see, the network is up and discoverable. It consists of one IB switch and two IB nodes, each with dual ported IB HCA. The switch has assigned LID 2, the nodes aren't initialized yet. To display node GUIDs only, use the ibnodes command:

Ca : 0x0003ba0001003de4 ports 2 "node2 HCA-1"
Ca : 0x0003ba0001007ba8 ports 2 "node1 HCA-1"
Switch : 0x00144f00006e9794 ports 9 "" base port 0 lid 2 lmc 0

Finally, two commands remained - ibroute and ibchecknet. As the IB network is not fully initialized the nodes can't contact the switch for forwarding table. So the ibroute command isn't working otherwise it is helpful. The ibchecknet command produces address resolution errors, the IB network is not valid:

lid 2 address resolution: FAILED
# Switch: nodeguid 0x00144f00006e9794 failed

# Checking Ca: nodeguid 0x0003ba0001003de4
lid 0 address resolution: FAILED
# Ca: nodeguid 0x0003ba0001003de4 failed

# Checking Ca: nodeguid 0x0003ba0001007ba8
lid 0 address resolution: FAILED
# Ca: nodeguid 0x0003ba0001007ba8 failed

## Summary: 3 nodes checked, 3 bad nodes found
## 8 ports checked, 0 bad ports found
## 0 ports have errors beyond threshold

In the beginning, I stated the IB subnet manager is not running. Let's launch it with /etc/init.d/opensmd script and we will see how the behaviour of the tools will change.

Friday, July 25, 2008

RHEL and Infiniband - basic diagnostics

I am going to close the article series about Infiniband technology on RHEL platform (check the previous posts 1, 2, 3) with posts intended to the IB troubleshooting. I would like to introduce a basic diagnostic steps of IB environment which may help you to uncover errors and misconfiguration.

The most of troubles you may meet with are traceable via OFED diagnostics tools. They are part of openib-diags package until OFED 1.2. Since version 1.3, it is replaced with infiniband-diags package. Let's take a look at the most useful ones:
  1. ibstat - shows IB device status like firmware version, ports state, their rate, GUIDs, LIDs ...
  2. ibnetdiscover - discovers IB network topology
  3. ibroute - queries for IB switch forwarding table (like routing table)
  4. ibnodes - shows IB nodes in topology
  5. ibchecknet - runs IB network validation
  6. ibping - ping IB address
  7. sysfs - Linux virtual filesystem representing kernel structures, for IB is there directory /sys/class/infiniband
The IB network is similar to the other high performance network technologies like Fibre Channel. The most of troubles with IB are in common. You may need to resolve connectivity issues, firmware or higher level software revisions incompatibilities, driver bugs and similar.

At first, I would like to explain the usage of last two tools - ibping and sysfs. They are simple enough and known from other fields. The IB ping works in client-server fashion. That means you need to run ibping in server mode at one side and another side will act as a client. The server is ponging to the client's pings.
  1. Server mode - ibping -S -v
  2. Client mode - ibping -v SERVER_LID_ADDR
The -v argument increases verbosity level only. The right LID address can be found with ibnetdiscover command. Run it, find the server node line and use the associated LID now. I will explain it later. If the IB network is healthy ibping should produce the output at the server side like this (the server LID is 4, his hostname is node2):

ibwarn: [6795] ibping_serv: starting to serve...
ibwarn: [6795] ibping_serv: Pong: node2.(none)

The pongs have to be visible at the client side:

ibwarn: [17946] ibping: Ping..

Pong from node2.(none) (Lid 4): time 0.235 ms

If you aren't able to see them you should check the connectivity status of your IB HCA. One method to do it is via sysfs. Each IB HCA is represented with a subdirectory under the /sys/class/infiniband directory where you can find a lof of useful stuff. For example, if you have dual ported HCA from Mellanox then there should be the following entries for port states:
  1. /sys/class/infiniband/mthca0/ports/0/state
  2. /sys/class/infiniband/mthca0/ports/1/state
The state can have three predefined values with these meanings:
  1. DOWN - port is physically disconnected
  2. INIT - port is connected and it is initialized
  3. ACTIVE - port is online and it is working
If ibping has to work the ports of both nodes have to be in ACTIVE state. If they are in INIT state then the subnet manager may be not running. The DOWN state simply means cable problem. By the way, there are other methods to achieve this with help of remaining tools. I am going to explore them next time.

Wednesday, July 23, 2008

VMware ESXi will be free

During a few days or weeks, VMware should release their lightweight hypervisor VMware ESXi for free. It is an enterprise-class hypervisor with footprint about 32MB which is integrated into modern servers through e.g. solid state disks. The small footprint is achieved by dropping so-called Console Operating System (based on RHEL 3). It includes basic functionalities like vSMP or VMFS and for advanced ones, you need to manage it with VMware VirtualCenter. You can download it from here.

Tuesday, July 22, 2008

Quickly - /dev/vcs and /dev/tty magic on Linux

Have you ever wanted to check the content of the first virtual console without switching to it with "Ctrl+Alt+F1" shortcut from your desktop session? Or the second console of a remote server? Or would you like to send something to the user who is working at the third virtual console (not via wall command)?

The GNU/Linux kernel provides two character devices for such tasks:
  • /dev/ttyX - represents X. virtual console
  • /dev/vcsX - represents X. virtual console text contents
So, to answer the questions use these commands:
  1. cat /dev/vcs1
  2. ssh root@server 'cat /dev/vcs2'
  3. echo "something" > /dev/tty3
More information about Linux allocated devices is written in /usr/src/linux/Documentation/devices.txt. You have to have GNU/Linux sources installed.

Friday, July 18, 2008

RHEL and Infiniband - basic usage

As I written in the previous post, the /etc/init.d/openibd init script is in charge of starting Infiniband (IB) network. The script parses the /etc/ofed/openibd.conf configuration file where you can specify which ULPs should be initialized. By default, all ULPs I mentioned last time - ipoib, srp, sdp - are enabled.

The opensm IB network manager is controlled with the /etc/init.d/opensmd init script which is configurable via /etc/ofed/opensm.conf configuration file. You can turn on debugging here but it is not normally needed. It is more useful to enable verbose mode which increases the log verbosity level. The default log file is /var/log/osm.log. So, if something goes wrong enable verbose mode and check the log file.

After executing the init scripts, you should check the IB network state. The openibd script is started automatically during the system startup, while the opensm has to be enabled (with ntsysv or chkconfig). Follow this checklist:
  1. Is Mellanox HCA recognized?
    • check the output of lsmod | grep ib_mthca
    • check the output of dmesg
  2. Are appropriate ULPs loaded?
    • check the output of lsmod | grep ib_
      • should contain ib_ipoib, ib_srp, ib_sdp
  3. Is IB network initialized and working?
    • check the output of cat /sys/class/infiniband/mthca0/ports/X/state
      • should be ACTIVE
  4. Is ib0 network interface available?
    • check the output of ifconfig -a
If you passed all the checks you would be able to use IP protocol over IB network. I supposed you have two IB nodes in the IB network at least, both are configured the same way and both have passed the checks (like in the first article). To configure it follow the commands:
  1. assign an IP address to the nodes
    • run ifconfig ib0 IP_ADDR1 up at first node
    • run ifconfig ib0 IP_ADDR2 up at second node
  2. check the IPoIB functionality
    • run ping IP_ADDR2 from the first node
    • run ping IP_ADDR1 from the second node
So, wasn't it simple? If everything is working the ping should receive replies from the other side. Now, you can run any IP based application over IB - FTP, NFS and so on and utilize its benefits like high throughput and low latencies. Please, if you are interested in the topic leave me a comment.

Tuesday, July 15, 2008

Quickly - RPM uninstall and scriptlet failure

Sometimes it happens that I'm not able to uninstall a RPM package because of some internal SPEC file errors related to the scriptlets. Last time it happened when I was uninstalling the HP OpenView Storage Data Protector packages from a RHEL server. By mistake, I uninstalled one package which was a dependency of another package and after that I wasn't able to uninstall it due to that dependency and due to it wasn't checked correctly. The whole uninstall procedure looked like this:
  1. rpm -e OB2-CORE-A.06.00-1
  2. rpm -e OB2-DA-A.06.00-1
And the produced error follows:
  • ERROR: Cannot find /opt/omni//bin/omnicc
  • error: %preun(OB2-DA-A.06.00-1.x86_64) scriptlet failed, exit status 3
So, is there a way how to get rid of such a package? Yes, it is and it is simple, just disable executing the scriptlets like this:
  1. rpm -e --noscripts OB2-DA-A.06.00-1
I think it is pretty simple feature of RPM but it is a bit difficult to remember it.