Skip to content

Tag: ESXi

Host is not responding

I was recently deploying a Cloud Foundry instance and was experiencing errors during the deployment. From the several failed deployments, I received the following BOSH error messages:

Error 100: Unable to communicate with the remote host, since it is disconnected.

mysql_node/54: Unable to communicate with the remote host, since it is disconnected.

Error 100: A general system error occurred: Server closed connection after 0 response bytes read; SSL(TCPClientSocket(this=000000000de625e0, state=CONNECTED, _connectSocket=TCP(fd=-1), error=(null)) TCPStreamWin32(socket=TCP(fd=23280) local=10.23.6.17:59642, peer=10.17.0.156:443))

Looking at the events on the ESXi host I saw the following:
Host is not responding
What was causing the issue and how can it be fixed?

Hung Server/VMs post ESXi 5.0 upgrade

I have a home lab running vSphere on some PowerEdge T110 servers. My environment was running 4.1, but I recently (6 months ago!) decided to upgrade to 5.0. After I upgraded I started to experience VMs becoming inaccessible on a single server. I attempted to log into the Tech Support Mode on the ESXi server and noticed that once I typed in my password the server hung and never returned. Upon reboot the issue was resolved, however this issue kept reoccurring. My other servers were working without issue so I first looked at hardware. The server experiencing the issue reported no hardware issues and the ESXi logs looked relatively clean.
So what was going on?

ESXi LUN ID Maximum

The VMware Configuration Maximums document is something I reference quite often. One configuration maximum that became relevant for me this week was under ESXi Host Maximums – Storage Maximums – Fibre Channel: LUN ID. According to the document the maximum LUN ID is 255, but what does that mean? Does it mean that you can have a maximum of 255 LUN IDs or the maximum LUN ID number allowed is 255?
For those who know the answer, let me explain where my confusion came from:

  1. Two items above LUN ID in the Configuration Maximums document is ‘LUNs per host’. The maximum for ‘LUNs per host’ is 256. Like most numbering in Linux (e.g. arrays) LUN IDs start at 0. This means LUN IDs 0 to 255 are valid and would total 256, the maximum number of ‘LUNs per host’.
  2. Looking at the storage side, a very important piece of information would be the maximum number of LUNs per storage system. For an EMC VNX7500, the maximum number of LUNs (including private LUNs) is 8192. Since every LUN has to have a unique LUN ID this means on a VNX7500 at a minimum the LUN IDs 0 to 8191 must be valid.

So why was I looking at this maximum in the first place?

A general system error occurred internal error vmodl.fault.HostCommunication

I am in the process of building my home lab. I recently purchased two servers and installed ESXi 4.1 on them. In addition, I deployed a test vCenter Server instance so I could run VUM. With vCenter Server up, I attempted to add the two ESXi servers. The first one added without issue, but the second one failed with the error messages:

Cannot complete the configuration of the HA agent on the host. See the task details for additional information.

Misconfiguration in the host network setup

I verified that the hosts were in fact configured identically and then tried to add the host again, but the same error messages were displayed.  Based on the error messages, I found the following KB article: http://kb.vmware.com/kb/1019200. Unfortunately, the link did not help.
Next, I removed the ESXi host from vCenter Server and tried to re-add it. This time I got a different error message:

A general system error occurred internal error vmodl.fault.HostCommunication

From this error message I found KB article: http://kb.vmware.com/kb/1012154. This article pointed to name resolution (i.e. DNS) being my issue. I know of the importance of DNS with VMware products and was sure I had verified its configuration, but decided to double check. As suspected, DNS was configured and working as expected.
At this point, I decided to restart the management services as that fixes a majority of ESX(i) issues. Upon doing so and trying to add the ESXi server to vCenter Server, I received another new error message:

Unable to access the specified host, either it doesn’t exist, the server software is not responding, or there is a network problem

This error message pointed me to KB article: http://kb.vmware.com/kb/1003409.
Again I tried everything suggested and was still receiving the same error message. At this point, I was frustrated. I decided to reboot the server just in case that fixed the issue. Upon restarting, the error message want back to the vmodl.fault.HostCommuncation one.
What was going on and how could this be fixed?

Configuring syslog on ESXi

I was assigned an interesting problem a few weeks back. A customer had requested that all ESXi servers have syslog configured in order to troubleshoot a potential bug in ESXi. A technician was assigned the case and configured all ESXi hosts to point to the syslog server on the standard port. The problem was the logs were not being seen on the syslog server. I was asked to figure out why the configuration was not working as expected.
In our particular case, all hosts pointed to a syslog VIP, which was responsible for load balancing syslog requests to a pool of syslog servers. Initially, I checked the load balancer to see if the syslog traffic was making it to the VIP. As it turned out, it was. After confirming the load balancer was working as expected, I began to suspect the configuration on the syslog servers. The only thing that I could think of which would prevent the syslog server from accepting syslog messages from the ESXi hosts was ACLs. Looking at the syslog configuration, I confirmed that the ACLs were set to allow traffic from the VMkernel VLAN configured for management traffic.
Why were the syslog messages not being received?

A general system error occurred: internal error

I recently tried to export the system logs from an ESXi via the vSphere client. Instead of receiving the generated bundled the host returned:

A general system error occurred: internal error

Very informative error message, no? I looked at the logs visible from the vSphere client and realized they were all dated before ESXi was installed. What was going on?

An error occurred during host configuration

When creating an NFS datastore on an ESXi host the other day, I received the following error message:

If you look at the task details it says:

Operation failed, diagnostics report: Unable to complete Sysinfo operation.
Please see the VMkernel log file for more details.

What was causing the issue?

Permanently enabling SSH on ESXi via PowerShell

As you all know by now, ESXi comes with SSH, which VMware now refers to as Tech Support Mode, disabled. The reasons behind this include security and the removal of the service console. While the service console has been removed, a shell called BusyBox remains. According to VMware best practice, SSH should not be enabled as it should not be needed. Of course, customers require this kind of access to install agents and to troubleshoot problems. VMware’s response was to enable remote access to the systems via vCenter Server, vMA, or an API and to recommend reinstalling ESXi should troubleshooting become necessary. If you want to read more about this, I would recommend seeing Duncan’s post over at yellow-bricks: http://www.yellow-bricks.com/2010/03/01/disable-tech-support-on-esxi/.
Recently, I ran into an issue where several potential ESXi bugs were discovered, which required SSH access to the ESXi host as the logs were lacking information (one of the reported bugs) and the commands that needed to be executed could not be done remotely (e.g. df -h). As such, I was asked to enable SSH on 64 ESXi hosts. Performing this task manually was not an option so I turned to PowerCLI to automate the task.
This raises the question, how do you enable SSH on ESXi via PowerCLI?