Skip to content

Tag: ESX

ESX + NFS Datastores

Over the last week, I have been in the process of applying the latest patches to one of the VI3 environments I manage. While looking for potential problems I noticed that a single ESX server had lost access to all of its NFS datastores. All other VMs in the cluster, which connected to the same NFS datastores, appeared to be connected properly. I restarted the management services on the node hoping to fix the issue and continue with the upgrade. Unfortunately, restarting the management services had no effect (remember while restarting the management services should be one of the first steps and does solve a lot of VMware issues, it is not the only step). I verified that the host was configured properly and that no configuration changes had recently taken place. I also had the networking team verify that the switch ports were configured properly.
All checks came back normal, so what was going on?

Have you restarted your management services today? (Cont.)

In my last blog entry, I spoke about the importance of restarting management services when troubleshooting VMware ESX issues. One thing that I have noticed is that if you SSH to an ESX host and restart the management services you cannot cleanly exit out from the SSH session. To illustrate this point, SSH to a non-production ESX host and run the following commands:

[[email protected]] # service mgmt-vmware restart
Stopping VMware ESX Server Management services:
VMware ESX Server Host Agent Watchdog                  [  OK  ]
VMware ESX Server Host Agent                           [  OK  ]
Starting VMware ESX Server Management services:
VMware ESX Server Host Agent (background)              [  OK  ]
Availability report startup (background)               [  OK  ]
[[email protected]] # service vmware-vpxa restart
Stopping vmware-vpxa:                                  [  OK  ]
Starting vmware-vpxa:                                  [  OK  ]
[[email protected]] # exit

You will notice the management services restart successfully, but your terminal hangs when trying to exit. What causes this and how can you fix it?

Have you restarted your management services today?

There are two VMware ESX commands that every VMware ESX administrator should know and master:

  • service mgmt-vmware restart
  • service vmware-vpxa restart

You may notice that for almost every VMware problem I blog about, the first step in troubleshooting is almost always restarting the management services. The reason for this is simple, it is the quickest and easiest way to fix a majority of the ESX problems experienced. I would compare it to restarting Windows in order to fix a Windows OS problem.
So what do these two services actually do?

Phantom VM

For those of you who do not know, I am a VMware fanatic. From time to time, I will be posting blog entries on discoveries I have made, problems I have resolved, and general knowledge I would like to share. Last week, an interesting problem was brought to my attention.
A colleague called me who was in the process of rebuilding an environment after a RAID crashed due to multiple failed drives .This was a testing environment so no monitoring was in place and no backups were kept. At the time, my colleague was redeploying software firewalls and ran into an issue of the firewalls refusing to cluster together. After investigating the logs, it appeared that a duplicate IP was causing the problem. A network engineer traced the MAC address through the switch fabric and found the duplicate IP coming from a VM port group NIC on one of the ESX servers. All VMs were checked on that host, but none of them had the IP address in question. So where was the phantom VM?