Monday, 21 September 2015

Bundling Ports over EoMPLS and Managed Circuits

Alongside its cloud managed services offerings, the company where I work offers various network services such as Internet feeds, VRFs and point to point layer 2 (Ethernet over MPLS). One question that regularly crops up with the layer 2 services is whether it is possible to bundle or port channel two links together to "get double the bandwidth".

I've already covered elsewhere why bundling ports doesn't always do what people might think, particularly around bandwidth multiplication, but this idea raises a more subtle set of issues.

Barrier 1 - Detecting Failures

OK, so you have a pair of layer 2 extensions. Why would you not want to bundle them?

Actually, there are a few things to consider. Ethernet over MPLS and various other types of managed circuits do not forward link loss, meaning that if a port goes down at one end then the port at the other end remains up. Imagine the case where two links are up at one end while only one is up at the other:

Or even worse, if the carrier has a meltdown and all ports are up but only one link is passing traffic:

Clearly we need a dynamic protocol to detect and mitigate this kind of problem.

Port bundles can either be configured statically (if the port is up we will use it) or dynamically (the device must negotiate over the link before it is used and must remain communicative to stay in the bundle). Never use static port bundles over any kind of WAN circuit, you are just asking for trouble. In fact, never use static port bundles anywhere.  Always use a protocol, and make that protocol LACP.

Barrier 2 - Passing Control Frames

The layer 2 extensions that we provide are pure port based Ethernet over MPLS, garbage in / garbage out, so any valid Ethernet frame that comes in one end will be transported and passed out at the other end.

Specifically, control frames such as LACP and (eugh) PAgP pass through transparently so LAGs / Etherchannels can negotiate fine. With other providers and/or other service types your mileage will vary - some will terminate control frames on the attached device, others will just drop them and bundles will fail to negotiate end-to-end.

LACP will take care of all sorts of failures and mis-configurations, even over a WAN - if you're interested in the specifics see All Sorts of Things About LACP and LAGs.

Is there anything else to consider? Actually, yes:

Barrier 3 - Timers

LACP uses periodic keepalives to detect whether links are still viable. If three keepalives are lost then the link is marked as expired and will be removed from the bundle. These keepalives can be sent either in "fast" or "slow" mode using timers of 1 or 30 seconds, respectively.

On that basis, fast timers give a 3 second fault detection time, whereas slow timers give a 90 second detection time (yikes!) which most people would consider extremely inadequate. Bear in mind that during this 90 second window, statistically half the traffic in each direction will be lost. Worse than that, the traffic paths between hosts is chosen independently in each direction meaning each of the following cases is equally likely:

If we lose the either the red or the green link, 3 out of the 4 possibilities will break traffic flow in at least one direction!

Things are starting to improve but until relatively recently many supposedly decent switches (particularly from "Brand C") only supported long timers. Even current product line Nexus switches for some reason default to using long timers and have to be overridden to use short (lacp rate fast if that's what brought you here!).

So if you are going to bundle, make sure your devices support short timers and make sure they are configured to use them!

Other Thoughts

Given all of this, do I think it's a good idea? I can see both sides.

On one hand, the detection time of fast LACP is better than spanning tree for silent failures and unidirectional links. It is much faster at detecting silent failures than Cisco Fabricpath (~30 seconds).

On the other hand, if you need the additional bandwidth provided by bundling the two circuits then you're going to be out of luck when one of them fails and you should probably reconsider your resilience.

If I were running Fabricpath over the WAN I would certainly put LACP underneath for the faster failure detection. If I were just running STP then I'd probably leave it as it is. If you're sure what you're doing then bundling can work, however if you want resilience then you need to monitor and ensure that the utilisation doesn't rise above 50%... if you're doing that then what's the benefit of bundling?


  1. This is very helpful and gives good thoughts on possible ways to extend layer 2 over pw. Still, the default 30s LACP timer surprises me.

    1. Yes, It's hard to believe that despite all the advances in switching technologies "some vendors" still default to a 90 second failover on one of the most widely deployed protocols in the network!