VMware has had the following KB article for some time now, but I think it is important to highlight it: http://kb.vmware.com/kb/2003127.
So what do you need to know?
In short, if you configure certain versions of ESXi to send syslog messages to one or more remote syslog destinations (i.e. you configure Syslog.global.logHost – see http://kb.vmware.com/kb/2003322 for more information) then you might run into an issue where ESXi stops sending syslog messages to the remote syslog destinations. The impacted configurations are as follows:
- Syslog over UDP on ESXi 5.0, 5.0 Update 1, 5.1, 5.1 Update 1 – the only version this has been fixed on to date is 5.0 Update 2 or 5.0 with patch ESXi-5.0.0-20120704001-standard
- Syslog over TCP or SSL on ESXi 5.0, 5.0 Update 1, 5.0 Update 2, 5.1, 5.1 Update 1 – no version of 5.x has a fix to date
- TCP and SSL are impacted together as SSL is implemented over TCP
- ESXi 4.x is not known to have this problem on any protocol
Wait, but the KB article does not mention 5.1! The KB article is out of date. To confirm this have a look at the release notes for ESXi. For example: https://www.vmware.com/support/vsphere5/doc/vsphere-esxi-51u1-release-notes.html:
After a network or storage interruption, syslog over TCP, syslog over SSL, and storage logging do not restart automatically
After a network or storage interruption, the syslog service does not restart automatically in certain configurations. These configurations include syslog over TCP, syslog over SSL, and the interrupt storage logging.
So now you know the impacted configurations, but what triggers the issue? Well the release notes above answers that question, but to elaborate a bit further, any connectivity disruption between the ESXi host and a remote syslog destination. This could be:
- Network connectivity problem
- Storage interruption
- Issue on a remote syslog destination (e.g. restart, reboot, crash, etc)
So how do you know you are experiencing this issue? The only real way to tell is to check your remote syslog destinations and ensure they are receiving messages from the ESXi hosts in question.
What is the workaround to this issue? You need to restart the syslog server on the ESXi host. To do this, run the following command:
# esxcli system syslog reload
For those that aggregate syslog messages this is a big issue. VMware is aware of the issue and working on a fix, but in the mean time this is something to be aware of and monitor for. My general recommendations would be:
- Reduce the number of remote syslog destinations on your ESXi hosts if possible. Ideally use a single remote syslog destination and have it forward messages on as needed. Of course this means you should have redundancy in place for the single remote syslog destination. The thought process here is that a single remote syslog destination reduces the likelihood of experiencing a remote syslog destination interruption.
- Expect newer versions of ESXi to be fixed first. UDP already has a fix in 5.0 while TCP/SSL does not in 5.x. I would expect 5.1 to get the TCP/SSL fix before 5.0. To that end, until this issue is fixed use the working version whenever possible. What I mean by this is install the patch for 5.0 or upgrade to update 2 and use UDP for the remote syslog destination. For 5.1, it does not matter what you use as all protocols are currently impacted.
© 2013, Steve Flanders. All rights reserved.