Skip to content

Rescan All Hangs ESXi

I ran into a interesting problem the other day. I was brought into an environment that had a pair of ESXi 5.1 hosts connected to an iSCSI datastore. One host could see and access the datastore without issue while the other host showed no datastore attached. Per the administrator both hosts had been mounted to the datastore and the claim was that the environment had not be touched/changed in any way.
What was going on and how can you fix it?

I started by trying to access the host in a variety of ways including SSH, RVC, PowerCLI, etc. Every way resulted in a hung session (i.e. there appeared no way to access the system). After a long period of time, circa 20 minutes, the rescan operation finally completed, which allowed access back into the system. I immediately went to the log files to see what was going on. Looking in /var/log/vmkwarning.log I found messages like the following:

Well these messages clearly did not sound good! It appeared that paths were having issues and that the iSCSI target was going offline. I confirmed on the storage array that the paths were successfully logged in and ensured the general health of the array. When that checked out I began to suspect the network. As you know, iSCSI using TCP/IP to transfer data. To ensure connectivity between the host and the array I ran a vmkping:

This worked as expected. Next, I looked into the network configuration on the host. With the knowledge that vmk1 was the vmkernel interface being used for storage traffic I found the following:

I was not armed with two important pieces of information:

  1. vmk1 was on a distributed switch
  2. vmk1 was configured with a MTU of 9000

This meant that jumbo frames wer intended to be used in the environment. Jumbo frames require end-to-end MTU configuration in order to work properly. Next I took a look at the VDS configuration:

Well look at that, the VDS MTU is set to 1500. After confirming the proper MTU on the upstream switches and storage array I believed I had isolated the issue to the VDS. To confirm this was a MTU issue I ran the vmkping command again, but this time with some additional flags (http://kb.vmware.com/kb/1003681):

Well look there, it does not appear that jumbo packets are being passed properly as we do not get a response. Actually, it is a little odd as I would have expected to see the following:

In any case, let’s fix the MTU on the VDS and try again:

Viola! Now you can perform your typical two Rescan All… operations and you should be back in business.

© 2013, Steve Flanders. All rights reserved.

Published inVMware

Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *