Showing posts with label storage. Show all posts
Showing posts with label storage. Show all posts

Wednesday, April 15, 2009

New Sun Fire servers with Xeon 5500

Sun has released a new line of servers and blade modules based on Intel Xeon 5500-series processors. The new pieces are:
  • Sun Fire X2270 (1RU, 1 or 2 CPUs)
  • Sun Fire X4170 (1RU, 1 or 2 CPUs)
  • Sun Fire X4270 (2RU, 1 or 2 CPUs, 16 2.5" disks)
  • Sun Fire X4275 (2RU, 1 or 2 CPUs, 12 3.5" disks)
  • Sun Blade X6270 (1 or 2 CPUs)
  • Sun Blade X6275 (4 CPUs)
The official announcement of new servers with additional details is published at www.sun.com.

Wednesday, April 1, 2009

VMware ESX and SATA controllers

VMware ESX hypervisor has supported only SCSI internal drives for a long time. The third update of ESX hypervisor introduced support for some SATA controllers like Intel ICH-7. The newest fourth update contains support of ICH-9 and ICH-10 chipsets as well. The same holds for ESXi platform.

The big difference is what SATA mode is supported. For example, the ICH-7 chipset is supported in IDE/ATA mode only, so you can't use use connected hard drives but you can access connected optical drives. The rest of the chipsets is supported in AHCI or Advanced Host Controller Interface mode. In this mode, you can access internal SATA drives.

When IDE/PATA mode is used, you will be able to see internal SATA (or emulated PATA) drives but you can't use it as VMFS storage. VMFS filesystem can be created on SCSI-based disks only.

There exists a nice knowledge base article about the topic. To better understand it, I borrowed an image from the article which is quite self-explanatory:


Monday, February 9, 2009

Aligning VMFS partition

Proper alignment of filesystem on disk partition may bring some I/O performance improvements. Typically, the reason for it is caused by creating RAID device underneath the accessed disk which can stripe data in chunks of some defined size. The typical size of chunk is 64KB. As you know, no partition is placed at the raw beginning of disk because there needs to be written some metadata like MBR or partition table. It is clear now that default aligning may results in latency increase and so in lower throughput.

The same holds for VMFS filesytem, for both versions 2 and 3. The general rule is to align VMFS partition on the 64KB boundary. The problem is default partition alignment by VMware ESX installer (or Red Hat Anaconda). It doesn't count with it and it layouts the disk partitions one by one. If you create VMFS filesystem from VirtualCenter client, it starts from 64KB. Follows output of fdisk -lu command from testing system:
Disk /dev/sda: 146.6 GB, 146685296640 bytes
255 heads, 63 sectors/track, 17833 cylinders, total 286494720 sectors
Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 63 208844 104391 83 Linux
/dev/sda2 208845 10442249 5116702+ 83 Linux
/dev/sda3 10442250 281105369 135331560 fb Unknown
/dev/sda4 281105370 286487144 2690887+ f Win95 Ext'd (LBA)
/dev/sda5 281105433 282213854 554211 82 Linux swap
/dev/sda6 282213918 286294364 2040223+ 83 Linux
/dev/sda7 286294428 286487144 96358+ fc Unknown

Disk /dev/sdb: 128.8 GB, 128849018880 bytes
255 heads, 63 sectors/track, 15665 cylinders, total 251658240 sectors
Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 128 251658224 125829048+ fb Unknown
The first disk /dev/sda is internal one and it was partitioned by ESX installer. The VMFS partition has ID fb. The second disk was initialized from VirtualCenter. It belongs to an external disk array. The starting sector is 128, so it is aligned to 128 x 512B (sector size) = 64KB. The first VMFS partition on /dev/sda is not aligned because 10442250 divided by 128 doesn't give an integer.

There is no destructive way how to realign not optimally aligned VMFS partitions. You need to recreate the partitions from scratch. It requires to backup ESX system, VMFS filesystems, realign the partitions and restore the backup.

There is not defined that every disk or disk array has the alignment boundary at 64KB. It is required to discuss it with system guides. But 64KB is good starting point and it is the most common value. The question is if it is worthwile to perform it because average performance benefit is around 10%.

I drew on more comprehensive guide about the topic published at www.vmware.com. It contains details about test environment, guest filesystem alignment or steps how to layout partitions with fdisk so read it if you are interested in.


Tuesday, February 3, 2009

Licensing open source

I was considering to write this article a while because it doesn't fit in any type of article I have published before. And it isn't my primary business to discuss various open source licensing here. The thing is, it is useful to understand the role of them but it is often quite difficult to imagine what they just want to say. Sometimes, I have a feeling you need a lawyer education to understand them.

You know the obvious questions like "why it has to be GPLed?", "why this license is not compatible with that one?" or "why it can't be part of Linux kernel?". You know the open source license ensures the availability of source code which you can modify and redistribute. The true pitfalls begin appearing when you would like to integrate two products available under two different licenses. To make things clearer I borrowed these two comprehensive schemes from chandanlog at Sun blogs. The first one presents general attitude of open source licenses and classical EULA to source code. The second one explains differences of open source licenses. They are quite minor but may have out of sight consequences.
Let's try to apply the licensing rules to the problem of releasing ZFS filesystem with Linux kernel. What's the problem? First, Sun owns some patent rights which prohibit such action. Second, as Linux kernel is GPLized anything included has to be GPLized as well. ZFS is covered with CDDL license which requires to be preserved. From here, I see the main reason of incompatibility. But if I realize there are other binary only modules like video drivers from ATI or NVIDIA which are linked with kernel via some sort of GPLized open source wrapper why we can't do it the same way with ZFS?!? The question is if it is legal.

The two practical schemes makes me to understand the topic more deeply. The example with ZFS made the situation complicated and I need to find out something what shows me that it is not. I hope you will find these graphical explanations as useful as mine. And check the chandanlog who created them!

Friday, July 11, 2008

Sun released new servers and storage arrays

We had to wait for upcoming AMD Opteron servers from Sun a few months since the new quad-core AMD processors, code named Barcelona, were released. Now, it would be a few days when Sun officially announced here the availibility of the second generation of their AMD servers and new storage arrays, together called as "next generation open storage hardware". More about the Open Storage hardware and related projects, you can find here.

The new servers based on quad-core AMD processors are Sun Fire X4140, 4240 and X4540. At the storage field, there were introduced new Sun Storage J4200, J4400 and J4500 arrays. All of them are SAS JBOD arrays. For more details, look at the Sun System Handbook at SunSolve.

Tuesday, July 1, 2008

RHEL and Infiniband - hardware intro

In my two previous articles, I summarized a few facts about the Infiniband support in RHEL distros and included protocols - you can go through them from the following links - RHEL and Infiniband support and Infiniband, RDP, SDP.... Let's be more particular now.

My scenario was based on two servers Sun Fire X4200 M2 and one Infiniband (IB) switch Sun IB Switch 9P. The servers had installed Infiniband host channel adapters (HCA) Sun Dual Port 4x IB HCA to be able to communicate over the IB fabric. The switch provides nine IB compliant ports at dual speeds of 4X/12X what means that each port is able to deliver of 10/30Gbit raw bandwidth. What surprised me was that the switch management is like at the SUN SPARC midrange servers. Yes, it is ALOM and it is perfect because you can use the same interface and similar commands you are used to. By the way, the switch chassis looks like a regular SUN server.

The switch is equipped with the IB subnet manager (SM) which is required to initialize the IB hardware and to allow the communication over the IB fabric. Each IB subnet has to have at least one and each has unambiguous identifier (ID) over the fabric. To be complete, the fabric comprises defined subnets. In my opinion, the IB SM seems to be working like ARP cache and DHCP server in LANs. Each HCA in a fabric is globally identified with so-called node GUID which is like WWN in FC or MAC in LAN. The switch has own GUID as well. The ports of HCA have so-called port GUID. Now, when one HCA or its port want to communicate with another one in the subnet we need to have assigned some network address. This address is called LID or local identifier and the IB SM is in charge of assigning it to the members of the subnet. The conclusion is the LIDs are available inside the subnet only and the GUIDs are routable over the subnets of fabric.

But one thing confused me a bit. When you configure the switch you will need to remember setting its blueprint otherwise you will ask for trouble. I'm going to write about it in the next part.

Wednesday, June 4, 2008

VMware ESX server 3.x snapshots - capacity planning

So, as we said the snapshot is really useful stuff. But when you take one you need to keep in mind a few rules which should help you to maintain it.

At first, it is useful to know the behaviour of our virtual machine. Maybe, it is growing fast one and that means the snapshot will grow fast as well. The question is how fast. Let's think about a virtual machine whose virtual disk is 10GB large. Then, its snapshot can be as large as the disk but not more. Only 10GB data can be modified. So, at worst, the virtual machine size may grow twice.

But you need to remember another thing. If you take a snapshot of virtual machine's memory (RAM) it needs some disk space as well (it has own delta file). The size of it depends, naturally, on the size of memory and assigned values to the memory reservation and limit parameters. Again, the snapshot may be as large as the size of memory but not larger. So, if you preconfigure the machine with 1GB of memory it is convenient to have 1GB of extra disk space.

How do the reservation and the limit influence it all? Everything which is beyond the reservation and below the limit and that is allocated memory by the virtual machine can be stored in the virtual machine's swapfile if there is no free physical memory available. The swapfile is, by default, stored in the virtual machine's datastore. And because the reservation, by default, has zero value the swapfile has the same size as the memory. So we need another 1GB of extra disk space. Overall, we need , theoretically, 12GB of free disk space.

Let's make some summary of rules:
  1. fast growing virtual machine means fast growing snapshot - e.g. RDBMS, SMTP servers
  2. slowly growing machines has slowly growing snapshots - e.g. firewalls
  3. snapshots can't be larger than the virtual machine's virtual disks
  4. don't forget about virtual machine's memory and swapfile
  5. don't forget about taken snapshots in the past because long time snapshot will take much more longer time to commit the changes