If you are running UCS hardware with ESXi then you should be using custom ENIC/FNIC drivers as specified on the Hardware and Software Interoperability Matrix. If you are running ESXi 5.x and leveraging the FNIC driver than I would highly suggest you look at which version of UCS you are running and what version of the FNIC driver you are running.

Before I get into the issue let me start by briefing discussing the UCS drivers. The ENIC, or Ethernet NIC, driver is used by the the hypervisor to communicate with the physical NIC in the server. The FNIC, or Fiber Channel (FC) NIC, is used by the hypervisor to communicate with the physical HBA in the server. All servers should have NICs, but whether or not they contain HBAs depends on the type of storage you are leveraging. If you are connecting to FC or hardware iSCSI datastores than this article is relevant to you.
The particular issue I am bringing up has to do with FNIC driver Whether or not you should be running FNIC driver depends on a variety of factors including:

  • UCS version
  • Type of HBA
  • Version of ESXi

The specific bug I am referring to is: CSCue21073. Per the link you will notice that the bug states this impacts C-series servers. With a quick search of the Interoperability Matrix I turned up the following matches:

One thing the bug does not mention is that this issue also impacts B-series systems as well. Specifically:

I am mentioning this particular bug as I have personally experienced this issue on several systems. The good news is that the issue is hard to reproduce, but it is something to be aware of. A reboot of the host addressed the issue in the environment I was working in. Per the bug this issue has been fixed and will be part of a future release. What is not clear is whether a FNIC driver update will be required and/or whether you will need to upgrade UCS. In either case, you can expect a reboot will be needed somewhere. If running a service vulnerable to this issue I would personally recommend upgrading to a newer version of UCS that has an upgraded FNIC driver to ensure this issue does not impact you.

© 2013, Steve Flanders. All rights reserved.

2 comments on “BUG ALERT: UCS FNIC Driver + ESXi 5.x = PSODs

danny says:

Hi Steve:
How do we read a vmkernel dump file in a PSOD.
And whenever there is a PSOD we get an issue with the CPU but that does not exactly mean that the CPU is bad.
Could you help in getting an RCA for PSOD issues and which file and what exactly I have to look into for the RCA.

Hey Danny,
VMkernel logs, like all other logs, can be persistently collected in two ways:
1. Configuring Syslog.global.logDir and pointing to a local or shared datastore
2. Configuring Syslog.global.logHost and pointing to a remote syslog server
The good thing about log messages is that they can be collected independently from PSODs. The bad thing about log messages on ESXi is that if you do not configure either Syslog.global.logDir or Syslog.global.logHost (neither is configured by default) then you will not be able to analyze the logs after a PSOD as they are not saved after reboot. Please see http://kb.vmware.com/kb/2003322 for more detail.
In regards to PSODs, here are a couple links that might be helpful:
* http://kb.vmware.com/kb/1004250
* http://kb.vmware.com/kb/1006796
As for troubleshooting PSODs, you can go through the files are even read the PSOD information directly from the screen, but more often than not you will need to engage VMware support. When engaging support you will need to generate a support bundle (see http://kb.vmware.com/kb/1010705). One other potential place you can check is the release notes for later versions of ESXi to see whether or not this issues is already known and/or fixed.
I hope this helps!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top