Tuesday, August 12, 2008

VMware ESX 3.5 Update 2 and power on virtual machine bug ?

What a coincidence! Me and my colleague were preparing a server with VMware Infrastructure Update 2 yesterday. Just another simple scenario, we were thinking. We just wanted to run some checks if it is suitable for production usage. Everything went smoothly. But when my colleague began his job, things went worse. His task was easy - install some new virtual machines in prepared environment and check their behaviour (details aren't important now).

When he wanted to power on a prepared virtual machine to begin guest installation, the VMware Infrastructure client displays the following error message:
  • A general system error occurred: Internal error
It is nothing interesting, isn't it? No clue where to go. The virtual machine stays powered off. Any running machine stays running until you suspend it or power it off. So,
  • don't power off or vmotion your virtual machines!!!
Otherwise, you will have a big trouble! Naturally, I checked the virtual machine vmware.log log file. I found here a "virtual" reason why such strange behaviour:
  • [msg.License.product.expired] This product has expired.
  • Be sure that your host machine's date and time are set correctly.
  • There is a more recent version available at ...
Hm, the right time is often critical part of any installation but my habit is to configure NTP server where it is possible. I done it in this scenario as well. I checked the server time and nothing was wrong. The ESX host is licenced properly. When I try to search the VMware knowledge base, there is no answer at all. And you need to have a luck today because it is really really overloaded by others. Maybe, they are deailing with the same problem.

Because my colleague needed to work, my last try was to move the time backwards. I installed the server two days ago and everything was working fine. It was August 10. My colleague began working this morning and since this time, any virtual machine couldn't be powered on. Today, it is August 12. So I moved the time here and it helped! Of course, the NTP server has to be shutdown. You can do it via VI client or you can log in to the service console and use date -s command. I'm not aware of other working solution now.

This afternoon, I found the first article about this annoying bug at www.virtualization.info. It is written here the VMware knowledge base seems to collapsed due to this issue and the support team is promising solution in 36 hours. By the way, Update 2 can't be downloaded now.


5 comments:

Martin Vincenc said...

if I could turn back time.. tada tada :)

http://truthhappens.redhatmagazine.com/2008/08/12/date-bug-kills-vmware-systems/

Martin Vincenc said...
This comment has been removed by the author.
Martin Vincenc said...

Anyway, this workaround has 2 serious impacts:
1] you break compliance
2] This workaround has a number of very serious side affects that could impact product environments. Any Virtual Machines that sync time with the ESX host and serve time sensitive applications would be broken. These include, but are not limited to database servers, mail servers, & domain administration systems.

Final patch will come late this week, probably. I don't feel I could trust the express patches(they just came out today), I will wait till they re-issue them.

David Sumsky said...

The official knowledge base article about the problem is here.
Finally, VMware has released some express patches to correct the problem, download them from here.

David Sumsky said...

I would like to point out the released patches are working. NTP server is running, the ESX server has set right time and virtual machines are live again.