Thursday, October 30, 2008

VCB basic usage - VM restore with vcbRestore

The last question remains - how to restore the fully backuped virtual machine as we made it in the previous article? The virtual machine is stored at the NFS server and we need to get it back to the ESX host. There are many possible scenarios to do it - e.g., the original machine is corrupted and you have to restore it from backup. Or you don't have VirtualCenter Server available and you would like to deploy a virtual machine like from template without template feature.

Virtual machine full backup performed with vcbMouter command is defined with a specific catalog file which contains summary of backup. The catalog file contains definitions of virtual machine's:
  • display name
  • name of datastore
  • folder path
  • resource pool
Let's inspect one of such catalog files:
version= esx-3.0
state= poweredOn
display_name= "nas-openfiler"
uuid= "564da78f-f2fc-484f-4d92-24238e486380"
disk.scsi0:0.filename= "scsi0-0-0-nas-openfiler.vmdk"
disk.scsi0:0.diskname= "[storage1] nas-openfiler/nas-openfiler.vmdk"
config.vmx= "[storage1] nas-openfiler/nas-openfiler.vmx"
host= vmware.dom.tld
timestamp= "Sun Oct 12 01:37:12 2008"
config.suspenddir= "[storage1] nas-openfiler"
config.snapshotdir= "[storage1] nas-openfiler"
config.file0= "nas-openfiler.vmsd"
config.file1= "nas-openfiler-cf281ca9.vmss"
config.file2= "nas-openfiler.vmxf"
config.file3= "nas-openfiler.nvram"
config.logdir= "[storage1] nas-openfiler"
config.log0= "vmware-1.log"
config.log1= "vmware.log"
folderpath= "/ha-folder-root/ha-datacenter/vm"
resourcepool= "/ha-folder-root/ha-datacenter/host/vmware.dom.tld/Resources"
Now, what can we say about the backuped virtual machine?
  • it is visible as nas-openfiler in VI client (the display name is nas-openfiler)
  • it is stored at the [storage1] datastore in the nas-openfiler directory
  • it belongs to the vm folder in the VirtualCenter folder hierarchy
  • it is running at vmware.dom.tld ESX host
As you can see, everything around the virtual machine is stored inside the ""[storage1] nas-openfiler" directory. The datastore name is a symbolic name of datastore. You can check it via VI client in storage configuration tab. Or you list the contents of the /vmfs/volumes directory.

Let's suppose, we want to create identical machine like "nas-openfiler" but we want to restore it to a different datastore and directory, e.g. "[storage2] nas-openfiler2", and we want to call it "nas-openfiler2". To do it, we need to change selected parameters in the catalog file:
version= esx-3.0
state= poweredOn
display_name= "nas-openfiler2"
uuid= "564da78f-f2fc-484f-4d92-24238e486380"
disk.scsi0:0.filename= "scsi0-0-0-nas-openfiler.vmdk"
disk.scsi0:0.diskname= "[storage2] nas-openfiler2/nas-openfiler.vmdk"
config.vmx= "[storage2] nas-openfiler2/nas-openfiler.vmx"
host= vmware.dom.tld
timestamp= "Sun Oct 12 01:37:12 2008"
config.suspenddir= "[storage2] nas-openfiler2"
config.snapshotdir= "[storage2] nas-openfiler2"
config.file0= "nas-openfiler.vmsd"
config.file1= "nas-openfiler-cf281ca9.vmss"
config.file2= "nas-openfiler.vmxf"
config.file3= "nas-openfiler.nvram"
config.logdir= "[storage2] nas-openfiler2"
config.log0= "vmware-1.log"
config.log1= "vmware.log"
folderpath= "/ha-folder-root/ha-datacenter/vm"
resourcepool= "/ha-folder-root/ha-datacenter/host/vmware.dom.tld/Resources"
It is recommended to backup the original catalog file somewhere. Try to compare them and to notice the changes.
The last step is we need to perform restore operation with the vcbRestore command. Let's our full backup of virtual machine is in the directory /backup/nas-openfiler. The directory can be local directory or mounted from the NFS server. The original catalog file is catalog and modified one catalog.new. Let's go to restore the machine according to new catalog:
vcbRestore -s /backup/nas-openfiler -a /backup/nas-openfiler/catalog.new
The -s parameter specifies the source directory where the backup is stored and -a parameter specifies use this particular catalog file. If everything is working the command should produce next output:
[2008-10-17 11:00:21.644 'App' 3076444992 info]
Current working directory: /backup/nas-openfiler2
Converting "/vmfs/volumes/storage2/nas-openfiler2/nas-openfiler.vmdk" (VMFS (flat)):
0%=====================50%=====================100%
**************************************************
The machine was restored and you can see it in VI client interface. Or you can check it from the service console of ESX host:
vmware-cmd -l | grep nas-openfiler2
It should print something like this:
/vmfs/volumes/483c.../nas-openfiler2/nas-openfiler.vmx
The new virtual machine nas-openfiler2 can be powered on now. It is identical with the original one by the contents - both machines are the same, but they have different datastore. Final customization is for another article.

Monday, October 27, 2008

VCB basic usage - VM full backup over NFS

Let's go to practice a bit. Let's have a NFS server in the network available. And we would like to backup virtual machines (VMs) from one of our ESX hosts directly to it, without usage of any specialized backup software.

I don't have to forget to say that VCB is available in your ESX host. There is installed VMware-esx-backuptools package which contains almost all the mentioned commands before - vcbVmName, vcbMounter and vcbRestore. The vcbRestore utility is available only with ESX and it is used to restore a virtual machine from full backup. Additionally, the missing mountVm command is available with VCB for Windows only. Don't forget to keep in mind that VCB commands for ESX are case-sensitive beause of service console based on Linux.

Firstly, we need a running NFS server. The configuration is straightforward with any Linux distro. Install required packages, edit /etc/exports configuration file and paste here a directory which will be used for backup of VMs. Start NFS server and reconfigure firewall to allow access to it (or simply stop it). For details, check the related documentation. If you would like I can write some more notes about it.

So, let's have a NFS server with IP address 192.168.1.1 (from C class). The exported directory is the /backup/vm directory. Secondly, we need to permit NFS client at the ESX host. By default, outgoing connections from any ESX host are blocked. You can do it via VI client or from the service console like this:
esxcfg-firewall -e nfsClient
You can check all the available services with:
esxcfg-firewall -s
To check if the nfsClient service was enabled, run this:
esxcfg-firewall -q nfsClient
If so, you will receive:
Service nfsClient is enabled.
Finally, we need a backup script whose only task is to backup available VMs. The script can be scheduled at ESX host via cron service or from the NFS backup server. It's your choice. The script follows:
#!/bin/sh

BACKUP_SERVER="192.168.1.1"
BACKUP_DIR="/backup/vm"
MOUNT_DIR="/backup/snap"

[ -d $MOUNT_DIR ] || mkdir -p "$MOUNT_DIR" || exit 1

VM_BACKUP="`vcbVmName -s any: | grep name: | cut -d':' -f2`"

if [ ! -z "$VM_BACKUP" ]; then
mount $BACKUP_SERVER:$BACKUP_DIR $MOUNT_DIR || exit 1

for VM in $VM_BACKUP; do
vcbMounter -a name:$VM -r $MOUNT_DIR/$VM
done

umount $MOUNT_DIR
fi

exit 0
Now, simple description of the script. At the beginning, there are defined some variables - the NFS server IP address, the exported directory and the local mount point. Then, the available VMs are listed and saved in a variable. The exported directory from the NFS server is mounted and with the vcbMounter command the VMs are backuped. Finally, the directory is unmounted. If you want to use the commands without authentication credentials, you need to define them in the file /etc/vmware/backuptools.conf. Exactly, these parameters are required:
VCHOST=127.0.0.1
USERNAME=admin_user
PASSWORD=admin_user_password
So, the task to backup virtual machines isn't so sophisticated. In the next article, I'm going to restore them with vcbRestore command.

Virtualization leader

Will it be VMware? Or Microsoft? Or even Oracle? I think it is not right to say it will be this company or that. But it is clear that we can form some virtualization selection now which defines the leaders of actual virtualization market. I am pleased to use for it a screenshot provided by Gartner:


The most interesting part of the screenshot compares the number of deployed virtual machines by the specific virtualization technology. As we can see, VMware is still far away from the others. But have a look at VirtualIron or Oracle. Isn't it interesting?

As I don't know the source of the data used to produce the screenshot, I wouldn't like to deduce any great conclusions. I'm able only to say that VMware still rules and the others are coming. But one thing is clear - solutions based on XEN are strong and they have a great potential, haven't they?

In my opinion, It would be really cool to know the numbers of pure XEN installation - XEN in Linux distributions like SLES 10 or RHEL 5 and similar. Perhaps, we will be very surprised! The more detailed article which made me to write this short note was published at itmanagement.earthweb.com.

Wednesday, October 22, 2008

Solaris 10 updates summary

I needed some quick list of features available in particular update of Solaris 10. As you may know, the Solaris 10 was released in 2005. Since that time, there were realeased 5 updates in total which are bringing new features to the OS. The sixth update might be released during the October, 2008. The following list is my mentioned quick list of important features:
  1. Solaris 10 1/06 (u1) - GRUB bootloader, iSCSI initiator, fcinfo command
  2. Solaris 10 6/06 (u2) - ZFS filesystem
  3. Solaris 10 11/06 (u3) - Solaris Trusted Extensions, LDoms
  4. Solaris 10 8/07 (u4) - full TCP/IP stack in zones, iSCSI target, branded zones (Linux in Solaris container), Samba AD, enhanced rcapd
  5. Solaris 10 5/08 (u5) - Intel SpeedStep, AMD PowerNow!, Solaris 8/9 P2V (to Solaris 10 zones), CPU capping

Monday, October 20, 2008

RHEL and Infiniband - advanced diagnostics - part three

Let's decompose the ibnetdiscover output a bit. The first paragraph begins with Switch keyword. The switch has GUID 0x144f00006e9794. The channel adapter begins with Ca keyword. Their GUIDs are 0x3ba0001003de4 (node node2) and 0x3ba0001007ba8 (node node1). The second one corresponds with the node displayed by the ibstat above. You had to notice that there are many numbers in the square brackets. They identify the ends of IB physical connections. Let's inspect them in more detail:
  • connections from switch to IB nodes (switch -> nodes)
    • switch port [6] is connected to the [2] channel of IB node node2
    • switch port [5] is connected to the [1] channel of IB node node2
    • switch port [4] is connected to the [2] channel of IB node node1
    • switch port [3] is connected to the [1] channel of IB node node1
  • connections from IB node node1 to switch (node -> switch)
    • the [1] IB channel is connected to switch port [3]
    • the [2] IB channel is connected to switch port [4]
  • connections from IB node node2 to switch (node -> switch)
    • the [1] IB channel is connected to switch port [5]
    • the [2] IB channel is connected to switch port [6]
Do you understand the logic of it? I think it's simple. And it is evident the IB connections are full-duplex in our scenario.

I'm going to skip the ibnodes command. Its output is the same as without running subnet manager. Next command, the ibroute command, is producing the following nice forwarding table:
Unicast lids [0x0-0x5] of switch Lid 2 guid 0x00144f00006e9794 ():
Lid Out Port Destination Info
0x0001 003 : (Channel Adapter portguid 0x0003ba0001007ba9: 'node1 HCA-1')
0x0002 000 : (Switch portguid 0x00144f00006e9794: '')
0x0003 004 : (Channel Adapter portguid 0x0003ba0001007baa: 'node1 HCA-1')
0x0004 005 : (Channel Adapter portguid 0x0003ba0001003de5: 'node2 HCA-1')
0x0005 006 : (Channel Adapter portguid 0x0003ba0001003de6: 'node2 HCA-1')
5 valid lids dumped
It lists the assigned LIDs, corresponding switch ports and the other ends of the connections. It's classical routing table saying that a LID X is reachable via a switch port Y with an additional information about the entity owning that LID number. For example, the LID 1 is reachable via the switch port 3 and it is the channel adapter of node node1.

To make the final decision if the IB network is working run the ibchecknet command. The output might say that we have 2 working IB HCAs, 3 IB nodes (two with HCA and one switch) and 8 working IB ports (physically only four but the network is full-duplex in our scenario).
# Checking Ca: nodeguid 0x0003ba0001003de4
# Checking Ca: nodeguid 0x0003ba0001007ba8
## Summary: 3 nodes checked, 0 bad nodes found
## 8 ports checked, 0 bad ports found
## 0 ports have errors beyond threshold
From now, we have working Infiniband network and we are able to do this:
  • ibping the nodes natively
  • ping the nodes over ipovib
  • run unmodified network applications over ipoib (e.g. NFS, FTP and so on)
  • run natively RDMA application

Friday, October 17, 2008

VMware ESX vs ESXi updated

I summarized the main differences between VMware ESX and ESXi hypervisors in these two articles:
  1. Differences between VMware ESXi and ESX
  2. Technical differences between VMware ESXi and ESX
Additionally, the main source of information to the topic should be in the article published at VMware knowledge base:
  1. VMware ESX and ESXi Comparison
This article was updated recently and contains the most actual comparison of the hypervisors.

Thursday, October 16, 2008

RHEL and Infiniband - advanced diagnostics - part two

It's almost two months ago when I began to write about advanced diagnostics of IB networks. In the end of the article, I suggested to start the IB subnet manager. So let's do it with the init script:
/etc/init.d/opensmd start
Now, we are ready to compare the outputs of the commands when the IB subnet manager wasn't running and when it is running. There should be noticeable differences because the IB network should be fully initialized since now. At first, what new shows us the ibstat command:
CA 'mthca0'
CA type: MT25208 (MT23108 compat mode)
Number of ports: 2
Firmware version: 4.7.400
Hardware version: a0
Node GUID: 0x0003ba0001007ba8
System image GUID: 0x0003ba0001007bab
Port 1:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x02510a6a
Port GUID: 0x0003ba0001007ba9
Port 2:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02510a68
Port GUID: 0x0003ba0001007baa
The IB subnet manager is responsible for finishing IB hardware initialization phase. Both ports of HCA are in Active state and they have assigned the Base lid which is required for communication over IB network. The IB subnet manager is working because it has assigned SM lid as well. What about other nodes in the network? Let's try the ibnetdiscover command. It should say something more:
vendid=0x144f
devid=0x0
switchguid=0x144f00006e9794
Switch 9 "S-00144f00006e9794" # "" base port 0 lid 2 lmc 0
[6] "H-0003ba0001003de4"[2] # "node2 HCA-1" lid 5
[5] "H-0003ba0001003de4"[1] # "node2 HCA-1" lid 4
[4] "H-0003ba0001007ba8"[2] # "node1 HCA-1" lid 3
[3] "H-0003ba0001007ba8"[1] # "node1 HCA-1" lid 1

vendid=0x3ba
devid=0x6278
sysimgguid=0x3ba0001003de7
caguid=0x3ba0001003de4
Ca 2 "H-0003ba0001003de4" # "node2 HCA-1"
[2] "S-00144f00006e9794"[6] # lid 5 lmc 0 "" lid 2
[1] "S-00144f00006e9794"[5] # lid 4 lmc 0 "" lid 2

vendid=0x3ba
devid=0x6278
sysimgguid=0x3ba0001007bab
caguid=0x3ba0001007ba8
Ca 2 "H-0003ba0001007ba8" # "node1 HCA-1"
[2] "S-00144f00006e9794"[4] # lid 3 lmc 0 "" lid 2
[1] "S-00144f00006e9794"[3] # lid 1 lmc 0 "" lid 2
Do you remember the LIDs number from the uninitialized IB network? There were same zeroes, the HCAs were uninitialized. Now, each channel has an unique LID. Next time, we are going to decompose this output.

Monday, October 13, 2008

VMware learned Hyper-V Quick Migration

Yes, the article headline is right. As you already know, there are a lot of discussions what is the difference between VMware VMotion and Microsoft Hyper-V Quick Migration. VMware VMotion is enterprise proven feature which allows to hot migrate a running virtual machine among ESX nodes forming a high availability cluster.

The Hyper-V Quick Migration is much simpler. It suspends the machine, cold migrate it (virtual machine saved state) to another host and unsuspend it. Do you understand the difference now? The Quick Migration requires some downtime depending on virtual machine state size - mainly size of memory.

But the reason I began to write the article is elsewhere. Mike DiPetrillo, a system engineer working for VMware, has coded a simple Powershell script which provides this feature to VMware VirtualCenter. The only prerequisite to use it is to install VMware Infrastructure toolkit for Windows. What does it mean for us? You don't have to pay for VMotion license and you can still quick migrate your virtual machines. Isn't the VMotion for "poor" cool tool? The script is published and described at Mike's blog.

Additionally, you can integrate the script into VirtualCenter with Icomasoft VI PowerScripter. The altered script compatible with VI PowerScripter is published at Icomasoft forum. Let's go to give it a try!


Tuesday, October 7, 2008

SLES10 update and SSL certificate problem

Have you ever needed to update some remote SLES10 system from your local update server (e.g. YUP server)? There may be many reasons for such situation. For example, the remote system can have unstable Internet connectivity to connect to the Novell servers or no connectivity at all with ability to see your local update server via VPN network only. You are able to imagine other situations, of course.

Let's suppose our update server is reachable from the remote locality via HTTPS protocol at URL https://update.domain.tld/path/. The update source is of YUM type and we want to update the system with the zypper command. At first, we need to subscribe to the update server. If the update server SSL certificate is subscribed by some well-known certification authority, then you don't have to worry. You can use the following command to add the update server to the update sources:
zypper subscribe https://update.domain.tld/path/update update
But if you generated your own certification authority or self-subscribed server certificate, then you may notice these errors:
Curl error for 'https://update.domain.tld/path/repodata/repomd.xml':
Error code:
Error message: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
The message is comprehensible and it says that the server certificate is untrusted and can't be verified by the known CA certificates. Simply said, the server certificate is subscribed by your untrusted certificate or it is self-signed. The message only warns you that there may be an attempt of man in the middle attack.

The curl application uses a CA bundle to verify server certificates. The bundle is typically stored in the /usr/share/curl/curl-ca-bundle.crt file. If you want to make your own CA certificate valid, then concat its PEM content to the end of the file like this:
cat ca.crt >> /usr/share/curl/curl-ca-bundle.crt
After this command, everything will begin to work and the update server URL will be added to the update sources.Then, the update may start:
zypper update
I didn't mention that you will have a similar problem if you use the rug command. If I apply the previous steps the rug command will produce an error about SSL certificate verification failure anyway. I suspect that rug doesn't use curl to access the update server. So, does anybody know how to resolve it in case of rug usage?

Wednesday, October 1, 2008

VCB basic usage - VM file-level backup with vcbMounter

The performance of full backup running over LAN network is not optimal because it requires to copy virtual machine disks locally. That may take some time. The usage of SAN or Hot-Add mode is far better in such situations.

The file-level backup is more suitable for LAN networks because it doesn't export any disks. It is able to mount the disk directly and you can access its filesystem without mountvm command like I described here. By the way, this holds for Windows OSes because VCB supports NTFS or FAT filesystems only.

Let's take our previous virtual machine vcb-backup and try to do file-level backup. We will use the same command but with different value of -t parameter:
vcbmounter -h VC_IP -u VC_USER -p VC_PASS -a name:vcb-backup
-r c:\mnt\vcb-backup -t file -m nbd
The virtual disk will be mounted under the C:\mnt\vcb-backup directory in LAN mode. The successful mounting will print the following messages (some lines may be split due to their length):
Opened disk: vpxa-nfc://[STORAGE] vcb-backup/vcb-backup.vmdk@\
VC_IP!52 79 b4 1a d5 0a 84 31-fd 1c e3 fe f8 31 db ed
Proceeding to analyze volumes
Done mounting
Volume 1 mounted at c:\mnt\vcb-backup\digits\1 (mbSize=12291 \
fsType=NTFS )
Volume 1 also mounted on c:\mnt\vcb-backup\letters\C
Again, the NTFS filesystem is accessible via its letter. Now, you can copy the files inside, you can backup the directory structure but you can't delete anything. The reason is you are not working with the virtual disk directly but with its snapshot called _VCB_SNAPSHOT_ (full backup with vcbMounter). Here is a screenshot from Virtual Infrastructure Client proving it:

When we are finished with backup we need to unmount it. It differs from unmounting exported virtual disk because we need to delete the snapshot as well. This is reachable with the vcbmounter command and -U parameter:
vcbmounter -h VC_IP -u VC_USER -p VC_PASS -U c:\mnt\vcb-backup
The output is similar to the one we have seen already:
Unmounted c:\mnt\vcb-backup\digits\1\ (formatted)
Deleted directory c:\mnt\vcb-backup\digits\1\
Deleted directory c:\mnt\vcb-backup\digits
Deleted directory c:\mnt\vcb-backup\letters\C\
Deleted directory c:\mnt\vcb-backup\letters
Deleted directory c:\mnt\vcb-backup

And that's all the magic. Do you remember as we need to export the virtual disk at first and then to mount it? The file-level backup is straightforward. You can bypass copying the virtual disks over LAN and you can do the backup directly.

The conclusion is:
  1. Use the file-level backup where it is possible (Windows machines)
  2. Otherwise use the full backup (UNIX machines)