As you have probably heard by now NSX took the spotlight at VMworld US 2013. Since the announcement, I have had several people approach asking what NSX means given the vCNS product. As such, I figured I would provide my thoughts on the history and future of VMware networking.
For those of you who read the MDS 9000 NX-OS update, 5.0(7), release notes, you may have noticed that the title of this post is one of the caveats that was resolved. I figured I would elaborate a bit more on this issue as I was directly involved in discovering it and I am directly involved in ensuring it gets fully resolved.
I received was asked what the maximum number of VMkernel interfaces per ESXi host was. Off the top of my head I was not aware of one so I consulted the vSphere 5.1 Configuration Maximums guide. The guide did list any VMkernel maximums.
So is the answer unlimited?
I ran into a interesting problem the other day. I was brought into an environment that had a pair of ESXi 5.1 hosts connected to an iSCSI datastore. One host could see and access the datastore without issue while the other host showed no datastore attached. Per the administrator both hosts had been mounted to the datastore and the claim was that the environment had not be touched/changed in any way.
What was going on and how can you fix it?
Do you have to connect to a VPN in order to work? Do you need to connect to more than one VPN to work? Have you ever connected to a VPN that broke your Internet connection? VPNs are great, but static routes and DNS servers can really ruin your day sometimes. Recently, I had to connect to my work’s corporate VPN and from there had to connect to a separate internal VPN to access an environment I was working on (don’t ask). While connecting to multiple VPNs in and of itself is not an issue in my case I quickly learned that I did not have the connectivity I needed to access the environment through the second VPN.
What was happening and how can you fix it?
Since I have experienced this issue several times before and I can never remember the commands needed to figure out what is wrong, I figured I would write a quick post. The issue is upon reboot of a Vyatta firewall you receive a ‘configure failed!’ error message. The exact error message depends on which features you have enabled on your device, but one example is:
Over the last two weeks I have been hit by the same UCS bug, though by different means, twice and as such I would like to educate others about it. The issue initially came up after running a ‘show tech’ command on a UCS Fabric Interconnect (FI). Shortly after the process started my session to the FI dropped. Since I have experienced random disconnects from an FI in the past I tried to reconnect. To my surprise the FI was unresponsive. Not knowing what was going on I tried the second FI and it also was not responding. A ping check confirmed my fear, both FIs were down.
For those who have never experienced a dual fabric reboot on an active/production environment before, the ten minutes that follow will be the longest of your life (even if you do have access to the console port – locally or remotely). After about ten minutes the FIs started to respond again. As if a dual fabric reboot was not enough, the problem did not end there. About 5-10 minutes after the FIs came back online they went down again! This cycle continued until manual intervention stopped it.
So what was the problem; what was the impact; how can you fix it; and how can you prevent it?
This appears to be an old problem that keeps popping up, but I just experienced it for the first time. I was tasked with deploying 20 Windows VMs (more on this in a later post) for a project. I did so via templates and once the systems came up, I set the appropriate IP addresses as well as other configuration information. The last step was to reboot the VMs. After reboot about half of the VMs were responsive to ICMP and RDP while the other half were not.
To my surprise, upon checking the network configuration on the hosts that were unresponsive from the console, I found the default gateway was not set. It was a long day at work (>12 hours) and I thought I just missed something (more on why I did not automate this task in the later post as well). I set the default gateway, rebooted, and the same problem occurred.
Why was the default gateway configuration being lost on reboot and how can it be fixed?
If you are a network administrator, then you probably know that on many switches typing the command ‘show run’ will display the running switch configuration and typing the command ‘show vlan’ will display the currently configured VLANs on the switch. If you are a system administrator, I would compare the ‘show run’ command to running ‘dmesg’ and the ‘show vlan’ command to running ‘ls’.
Why do I bring this up? Before answering, let me ask you a question: would you schedule a maintenance window to run these commands?