Networking Bodges: Nexus

Saturday, 23 January 2016

Cisco Nexus Output Errors

A little while ago I was asked to investigate an IP based storage problem which had been traced back to a large amount of output errors on the port facing a particular compute node. The port was on a Cisco Nexus 5000 series device and I could see that, while output errors were clocking up at a massive rate, the switch was giving me nothing to go on as to what kind of errors they were. Every one of the usual suspects (collisions, etc) on the port showed nothing and yet the output errors were clocking up.

The ultimate answer turned out to be related to the fact that the Nexus 5k aims for low latency and as such performs cut-through switching. If you're not familiar with this term, please refer to this reasonably decent Cisco explanation, however at a high level there are two possible modes of transmission in switched networks:

1 - Store and Forward, where the entire frame is buffered into memory, the FCS is validated and then the frame is passed on. This mode can handle ports of differing speeds but obviously for large frames the serialisation delay becomes significant.
2 - Cut through, where just the header is checked for source / destination, plus any fields required for QoS / ACLs, then the rest of the frame is "cut through" onto the appropriate output port without buffering. This requires ports of an identical speed but offers lower latency.

One of the not-immediately-obvious side effects of cut through switching is that the FCS is only validated once the frame has been passed, by which point it is too late to take any corrective action. Essentially, the forwarding switch has already passed a broken fame on and, although it knows this, it can do nothing about it in retrospect and so it just says "oh, well" and increments its error counters on the ingress and egress ports.

If you are seeing output errors on a port with no other real explanation of how they got there, check other ports of the same speed for input errors. In my case it was due to a fibre fault - corrupted frames were entering one port, being cut through to another and causing errors to clock up on both.

Thursday, 23 July 2015

Cisco Nexus Spanning Tree History

I've been doing a fair bit of work on Nexus 5k / 6k platforms lately and while I've been less than impressed with certain aspects of the products, one thing that the Nexus is really excellent at is keeping logs, whether you ask it to or not. That comes in super-handy if you've left the buffer logging at its default don't-wake-me-unless-the-world-ends setting...

Pretty much anything that has a state is logged somewhere in the Nexus and you can get lost in a labyrinth of cryptic troubleshooting messages related to virtually any process in the switch. In this post I'm focusing on spanning tree logs as they're pretty universal.

Imagine the scenario shown below:

We have three sites connected over the WAN. We blew the budget on dark fibres out of the Cardiff site so we've had to skimp on switches and only have one per site, with the Caerphilly switch being root bridge. The link between Caerphilly and Newport is a metro Ethernet circuit which doesn't forward link loss.

Now imagine there's a failure within the carrier network which results in a total loss of traffic across the circuit between Caerphilly and Newport. No ports go down, however after a short time spanning tree will detect the fault and converge to use the indirect route via Cardiff. If the user's port is in p2p mode rather than edge, he is going to see a 30 second outage while his port transitions back to forwarding, even with RSTP.

How would you even know this had happened (aside from users complaining bitterly)? If you're really wily you may notice your traffic statistics look a bit odd, but if the primary link is restored relatively quickly that kind of thing gets lost in 5 minute roll-ups and natural variation quite easily. Since no interfaces went down, there will be nothing in your logs (by default).

Luckily, the Nexus logs every STP port state transition in its event history and keeps them seemingly forever. If the link flapped 6 months ago there's a good chance you could still prove it, as long as you haven't rebooted the switch. These logs can be retrieved using the command show spanning-tree internal event-history all - note that it's pretty verbose and you probably want to narrow it down if you have a lot of VLANs. The first section for each STP instance is the overall state history, mostly concerned with who the root is and how it is best reached:

Newport# show spanning-tree internal event-history all | begin VLAN0055
VDC01 VLAN0055
<snip>
77) Transition at 643104 usecs after Tue Jul 7 07:44:47 2015
     Root: 8037.000c.a45e.321c Cost: 0 Age: 0 Root Port: none Port: none [STP_TREE_EV_MULTI_FLUSH_LOCAL]

78) Transition at 762615 usecs after Tue Jul 7 07:44:49 2015
     Root: 8037.000c.a45e.321c Cost: 0 Age: 0 Root Port: none Port: Ethernet1/1 [STP_TREE_EV_UPDATE_TOPO_RCVD_SUP_BPDU]

79) Transition at 763013 usecs after Tue Jul 7 07:44:49 2015
     Root: 8037.000c.ac6d.43ba Cost: 4 Age: 0 Root Port: Ethernet1/1 Port: none [STP_TREE_EV_MULTI_FLUSH_LOCAL]

80) Transition at 722769 usecs after Tue Jul 7 07:44:51 2015
     Root: 8037.000c.ac6d.43ba Cost: 4 Age: 1 Root Port: Ethernet1/1 Port: Ethernet1/1 [STP_TREE_EV_MULTI_FLUSH_RCVD]

81) Transition at 832764 usecs after Tue Jul 7 07:44:51 2015
     Root: 8037.000c.ac6d.43ba Cost: 4 Age: 1 Root Port: Ethernet1/1 Port: Ethernet1/2 [STP_TREE_EV_MULTI_FLUSH_RCVD]

82) Transition at 752841 usecs after Tue Jul 7 07:44:52 2015
     Root: 8037.000c.ac6d.43ba Cost: 4 Age: 1 Root Port: Ethernet1/1 Port: Ethernet1/2 [STP_TREE_EV_MULTI_FLUSH_RCVD]

83) Transition at 782964 usecs after Tue Jul 7 07:44:53 2015
     Root: 8037.000c.ac6d.43ba Cost: 4 Age: 1 Root Port: Ethernet1/1 Port: Ethernet1/1 [STP_TREE_EV_MULTI_FLUSH_RCVD]

The logs are quite verbose but it's clear to see from the "Root Port: none" message that the primary path to the root was lost, then re-gained within a few seconds. Just a minor flap within the carrier network and a few seconds' impact?

Below the main state history are the individual port histories, let's look at our user's port and see what happened there:

VDC01 VLAN0055 <Ethernet1/10>
<snip>
7) Transition at 762694 usecs after Tue Jul 7 07:44:49 2015
     State: BLK Role: Desg Age: 2 Inc: no [STP_PORT_MULTI_STATE_CHANGE]

8) Transition at 640356 usecs after Tue Jul 7 07:45:04 2015
     State: LRN Role: Desg Age: 2 Inc: no [STP_PORT_STATE_CHANGE]

9) Transition at 642846 usecs after Tue Jul 7 07:45:19 2015
     State: FWD Role: Desg Age: 2 Inc: no [STP_PORT_STATE_CHANGE]

Oh. Right at the same time as the WAN dropped out, our user's port went into blocking for 15s then learning for another 15 before finally transitioning to forwarding again. Ouch... and we never would have known were it not for the STP event history!

Side Note

You can save yourself the effort of reading the incredibly verbose event history by setting the logging level of spanning tree to something more useful, such as informational:

Newport(config)#logging level spanning-tree 6

Note, the logging level for the local buffer or syslog server will need to be set to a level that will record the newly verbose logging.

Also, user ports should be forced into edge mode to avoid STP convergence causing massive disruption to them:

Newport(config-if)#spanning-tree port type edge

The switch should "guess" correctly but it's probably best not to take the chance that a user port accidentally go into p2p mode.