Skip to content

Tag: Bug

BUG ALERT: UCS FNIC Driver 1.5.0.8 + ESXi 5.x = PSODs

If you are running UCS hardware with ESXi then you should be using custom ENIC/FNIC drivers as specified on the Hardware and Software Interoperability Matrix. If you are running ESXi 5.x and leveraging the FNIC driver than I would highly suggest you look at which version of UCS you are running and what version of the FNIC driver you are running.

UCS Blades Power Off Unexpectedly

I was called into an interesting issue over the past week. I was told that a chassis worth of UCS blades had powered off without any apparent reason bringing down part of production. Initial troubleshooting of the issue showed no real culprits. UCSM was clean of errors except for an IOM post error. A show-tech command was initiated and a sev1 was opened with Cisco TAC. The technician on-call attempted to power on the servers by selecting them all in UCSM, right-clicking on them, and selecting reset. The blades powered on and came back online without issue.
So what caused the blades to power off unexpectedly?

Cisco Bug: Show Commands Cause (Dual) Fabric Reboot(s)

Over the last two weeks I have been hit by the same UCS bug, though by different means, twice and as such I would like to educate others about it. The issue initially came up after running a ‘show tech’ command on a UCS Fabric Interconnect (FI). Shortly after the process started my session to the FI dropped. Since I have experienced random disconnects from an FI in the past I tried to reconnect. To my surprise the FI was unresponsive. Not knowing what was going on I tried the second FI and it also was not responding. A ping check confirmed my fear, both FIs were down.
For those who have never experienced a dual fabric reboot on an active/production environment before, the ten minutes that follow will be the longest of your life (even if you do have access to the console port – locally or remotely). After about ten minutes the FIs started to respond again. As if a dual fabric reboot was not enough, the problem did not end there. About 5-10 minutes after the FIs came back online they went down again! This cycle continued until manual intervention stopped it.
So what was the problem; what was the impact; how can you fix it; and how can you prevent it?

WARNING: CpuSched: XXXX: processor apparently halted for XXXX ms

While I have seen people discuss this error message and solution, I figured it would be a good idea to discuss in terms of specific configurations such as on Cisco hardware and VMware virtualization. I feel this is important to understand the implications of the error message and to express the importance of BIOS configurations.
First, the issue: Cisco UCS B230-M2 blades (dual 10-core = 20 ‘processors’) running ESXi were throwing processor halted log messages. While this in itself may or may not be an issue, under little load via VMware clone operations ESXi hosts were disconnecting from vCenter Server (vCS) and becoming unresponsive for several minutes. Further digging uncovered that when the ESXi host disconnected from vCS the logs shows that all processors on the host were halting at exactly the same time.

Bug in PowerCLI 4.1.1: Set-VIRole

I was trying to set up some permissions on vCenter Server using PowerCLI. Here is an example of a command I was running:

PowerCLI returned the following:

WARNING: There were one or more problems with the server certificate:
* The X509 chain could not be built up to the root certificate.
* The certificate’s CN name does not match the passed value.
Name IsSystem
—- ——–
newTestRole False
newTestRole False
newTestRole False

This looks like it worked, however upon looking at the permissions on vCenter Server, the checkboxes for these three options were not selected. If you attempt the command with any other permissions it works as expected (i.e. the checkboxes are selected).
Why was this not working?

Show VLAN

If you are a network administrator, then you probably know that on many switches typing the command ‘show run’ will display the running switch configuration and typing the command ‘show vlan’ will display the currently configured VLANs on the switch. If you are a system administrator, I would compare the ‘show run’ command to running ‘dmesg’ and the ‘show vlan’ command to running ‘ls’.
Why do I bring this up? Before answering, let me ask you a question: would you schedule a maintenance window to run these commands?