Another interesting VMware issue came up this week. I was in the process of reconfiguring some ESX hosts and as such was forced to put these hosts in maintenance mode. As it turns out, one VM was located on the local storage of an ESX host and as such, I attempted to perform a Storage vMotion so the maintenance mode request would be successful. After about 20% completion, vCenter Server displayed the following error message, “The virtual disk is either corrupted or not a supported format.”
How do you troubleshoot this problem?
Well, as it turns out, the VM was powered on and working properly so I could not understand how the virtual disk could be corrupted or in an unsupported format. I first powered off the VM, which may not be an option for a production VM, and performed a regular migration. Unfortunately, the same issue occurred. I then proceeded to try the following:
- Cloning — same issue
- Convert to template and deploy — same issue
- Use Converter — same issue
- Reboot the ESX host serving the VM, which may not be an option for production systems — same issue
At this point I was getting frustrating. Time to open a SR you might say. Problem is I forgot to mention this was an unsupported VM: OpenBSD. At this point, I decided to go back to the basics and check the VMware logs where I discovered the following:
# grep -i error /var/log/vmkernel | grep Feb
Feb 23 20:13:31 esx02 vmkernel: 0:01:01:15.474 cpu2:1039)Fil3: 4995: READ error 0xbad000a
Feb 23 20:25:37 esx02 vmkernel: 0:01:13:21.268 cpu0:1040)Fil3: 4995: READ error 0xbad000a
Now I was beginning to think something was definitely wrong with either the ESX host local datastore or the VMDK file. The local datastore consisted of a single hard drive without RAID so it was possible the drive was failing or had bad sectors however, the health status reported by the ESX host looked good. Next, I copied another VM to the local datastore and tried a migration. This migration worked without issue. With this information I concluded that others parts of the disk appeared to be working normally. Based on this, I decided to try and copy the VM directory by hand:
[[email protected] root]# cp -r /vmfs/volumes/esx02:storage1/dns02 /vmfs/volumes/esx02:storage2/
cp: reading `/vmfs/volumes/esx02:storage1/dns02/dns02-flat.vmdk': Input/output error
This again confirmed the issue was either hard drive or VMDK related. While this problem is listed in the communities, KB articles, and patches, they all seem to apply to ESX 3.0.x. Unfortunately, the best solution I could find was manually copying the files contained within the VM (i.e. file based copy instead of block based copy). I believe this was successful because while the VM consisted of a 10GB virtual disk only 2GB of space was actually being used.
I have kept this VM around so if anyone has suggestions on how to fix the issue, I would be extremely interested.
© 2010, Steve Flanders. All rights reserved.