Skip to content

Tag: HA

PDL + New HA Settings in vSphere 5.0 U1

I was recently reading the VMware vSphere Metro Storage Cluster Case Study published May 2012 available here. One section that caught my attention stated (page 18):

Two advanced settings have been introduced in VMware vSphere 5.0 Update 1 to enable vSphere HA to respond to a PDL condition. The first setting, disk.terminateVMOnPDLDefault, is configured on a host level in /etc/ vmware/settings and should be set to True by default. This is a per-host setting, and the host requires a reboot for it to take effect. This setting ensures that a virtual machine is killed when the datastore on which it resides enters a PDL state. The virtual machine is killed as soon as it initiates disk I/O on a datastore that is in a PDL condition and all of the virtual machine files reside on this datastore. If virtual machine files do not all reside on the same datastore and a PDL condition exists on one of the datastores, the virtual machine will not be killed. VMware recommends placing all files for a given virtual machine on a single datastore, ensuring that PDL conditions can be mitigated by vSphere HA. VMware also recommends setting disk.terminateVMonPDLDefault to True. A virtual machine is killed only when issuing I/O to the datastore. Otherwise, it remains active. A virtual machine that is running memory-intensive workloads without issuing I/O to the datastore might remain active in such situations.
The second setting is a vSphere HA advanced setting called das.maskCleanShutdownEnabled. It was introduced in VMware vSphere 5.0 Update 1 and is not enabled by default. It must be set to True on vSphere HA cluster(s). This setting enables vSphere HA to trigger a restart response for a virtual machine that has been killed automatically due to a PDL condition. This enables vSphere HA to differentiate between a virtual machine that was killed due to the PDL state and a virtual machine that has been powered off by an administrator.
VMware recommends setting das.maskCleanShutdownEnabled to True to limit downtime for virtual machines residing on datastores in a PDL condition. When das.maskCleanShutdownEnabled is not set to True and a PDL condition exists while disk.terminateVMonPDLDefault is set to True, virtual machine restart will not occur after virtual machines have been killed. This is because vSphere HA will determine that these virtual machines have been powered off or shut down manually by the administrator.

A couple things stood out to me:

Cannot complete the configuration of the HA agent on the host

I enabled HA on a new cluster the other day and one of the hosts came back with the following error:

Cannot complete the configuration of the HA agent on the host. Misconfiguration in the host network setup.

On ocassion, I have seen weird HA errors where simply selecting the ‘Reconfigure for HA’ on the host fixed the issue. I tried this, but the same error was seen. I next selected the host and went to Tasks & Events – Events. From there, I found the following error message:

HA agent on <host> in cluster <cluster> in <datacenter> has an error: Cannot complete the HA configuration.

Selecting the message and under Related Events selecting Show displayed:

Host <host> has the following extra networks not used by other hosts for HA communication: <IP>,.
Consider using HA advanced option ads.allowNetworkto control network usage.

Looking at the VMkernel interface, everything appeared to be configured correctly. I ensured the IP configuration was correct and that no duplicate IP issue was being experienced. So what was causing the problem?