I was assigned an interesting problem a few weeks back. A customer had requested that all ESXi servers have syslog configured in order to troubleshoot a potential bug in ESXi. A technician was assigned the case and configured all ESXi hosts to point to the syslog server on the standard port. The problem was the logs were not being seen on the syslog server. I was asked to figure out why the configuration was not working as expected.
In our particular case, all hosts pointed to a syslog VIP, which was responsible for load balancing syslog requests to a pool of syslog servers. Initially, I checked the load balancer to see if the syslog traffic was making it to the VIP. As it turned out, it was. After confirming the load balancer was working as expected, I began to suspect the configuration on the syslog servers. The only thing that I could think of which would prevent the syslog server from accepting syslog messages from the ESXi hosts was ACLs. Looking at the syslog configuration, I confirmed that the ACLs were set to allow traffic from the VMkernel VLAN configured for management traffic.
Why were the syslog messages not being received?
To answer the question, I will take a step back and explain the configuration of the management interfaces on the ESXi servers in question. We had four VMkernel interfaces defined as follows:
- VMkernel1 – Management traffic option selected (similar to old service console)
- VMkernel2 – VMotion option selected (used for VMotion)
- VMkernel3 – no options selected (used for NFS traffic)
- VMkernel4 – Fault tolerance option selected (used for Fault tolerance)
Each VMkernel was assigned a different port group ID and was configured for a static IP address in a unique network. This was done for security purposes and to comply with VMware best practices. Based on this configuration, the logging of syslog messages should have been working, but was not. The only thing I could think of was that the syslog traffic was not coming from the management traffic dedicated VMkernel interface (VMkernel1). If this was the case then what interface would it be coming from?
The only other interface that made sense to me was the regular VMkernel interface used for NFS traffic. Based on this assumption, I reconfigured the syslog server ACLs to allow traffic from the strictly VMkernel interface. Upon restarting the syslog server process, logs began coming in from the ESXi servers. This made me wonder if this important information was documented on the VMware site either in an installation/configuration guide or in a knowledge base article. My searching turned up no results (the best I could find was: http://kb.vmware.com/kb/1016621). Next, I turned to the security diagram published on VMreference.com. The diagram is supposed to list all required port and protocols. While it does, it does not specify which interfaces the ports and protocols are applicable for. Based on this information, I though it would be helpful to blog about the issue I experienced and solution I arrived at.
Going back to the issue, it is not only non-intuitive for syslog traffic to be coming from the VMkernel interface instead of the management dedicated interface, but it also causes another issue. Once syslog messages began coming in from the ESXi servers I quickly realized the log files were named by IPs instead of FQDNs. I had configured the syslog servers to resolve the IPs into FQDNs for easier troubleshooting. The reason why the resolution failed was because I never had a need to input the strictly VMkernel interfaces into DNS. Based on this new requirement, I was forced to add entries for all of the VMkernel interfaces. Upon doing so, the logs were named based on FQDNs as expected and desired.
© 2011, Steve Flanders. All rights reserved.