Friday, 21 December 2012

All sorts of things about LACP and LAGs

A lot of people consider link aggregation groups (LAG / etherchannel / portchannel / MLT) to be pretty basic functionality that "just works" and don't really think any more about it. As with many networking technologies, there is a lot of intelligence responsible for creating the smooth veneer of simplicity.

The basic concept of the LAG is that multiple physical ports are combined into one logical bundle. This provides benefits including:
  • Increased capacity - traffic may be balanced across the member ports to provide increased aggregate throughput
  • Link redundancy - the LAG bundle can survive the loss of one or more member links
LAGs may be statically configured or signalled using standards based LACP, which is the main focus of this post. There is also the Port Aggregation Protocol (PAgP), which is similar in many regards to LACP, but is Cisco proprietary and not in common usage. I won't discuss PAgP in this post.

Load Balancing Operation

One important point to bear in mind with LAGs is that traffic is not dynamically assigned across member links but rather is "sprayed" using a deterministic hash algorithm. Depending on the platform and configuration, a number of parameters may feed into the algorithm including:
  • Source and/or destination MAC address
  • Source and/or destination IP address
  • Source and/or destination TCP / UDP port numbers
  • Ingress interface
  • Service ID or MPLS label
  • System specific information (chassis MAC or system IP)
Ultimately the hash will take in some combination of parameters and decide onto which member link the frame should be placed. Note that, since all the input to the algorithm is either permanently static (i.e. chassis MAC) or static for a given flow (i.e. source and destination MAC), all traffic for a particular flow will always be placed onto the same link. This has the following effects:
  • Order is maintained for frames within a flow - the different member links, particularly on a WAN, may have different delay characteristics. If frames for a single flow were sprayed onto multiple member links, frames could be re-ordered in transit.
  • Traffic for a single flow cannot exceed the bandwidth of a single member link.
  • Traffic balance across member links is largely dependant on the diversity of the offered traffic. If the number of flows is low, some links may be saturated while others are under-utilised. The same effect can be seen if there are many flows but load is proportionally concentrated in just a few of them.
  • When traffic passes through multiple hops using LAGs at each stage, polarisation can occur. This is where repeated application of the same hash function at each hop causes traffic to become unevenly distributed across the links. One link may be running at 100% and dropping excess traffic while another is almost idle. Passing system specific information into the algorithm is designed to mitigate this by ensuring that each hop hashes in a slightly different way.
  • Upstream and downstream traffic for a single flow will not necessarily traverse the same link. Since the devices at each end of a LAG hash traffic independently, there is no guarantee that both legs of a conversation will pass along the same member link.

Active / Standby Operation

In addition to the "normal" load balancing mode of operation, it is also possible to configure a LAG to operate in an active/standby fashion. In fact, it is possible to combine the two modes and have an arbitrary number of links active and passing traffic while an arbitrary number remain on standby pending a fault on the active link(s).

Active / standby groups are generally used when resilience is required, but it is not desirable for the LAG to pass more than a certain amount of traffic or for the available bandwidth to vary. Typical use cases are service provider environments where the customer only pays for a certain bandwidth and corporate networks with highly over-subscribed core.

Rules for LAGs

In order to be able to aggregate ports together certain rules must be obeyed. Fundamentally, the member ports must be homogeneous, but more specifically every member port must have the agree on the following:
  • Speed & Duplex - Since traffic is distributed by a simple hash, it is not possible to combine links of different speeds in the same bundle.
  • Encapsulation - i.e. all ports must use the same number of 802.1Q VLAN tags. For switches this means they must all be access or all be trunk. For routers such as the 7750 this means that the Ethernet encap type (null, dot1q or qinq) must agree between members. For switches in access mode, all member ports must be in the same VLAN.
  • For the 7750, the port type (access, network or hybrid) must agree across members and for the LAG
  • MTU - all member port MTUs must match and for Cisco switches, the same MTU must be configured on the port channel.
Note: the physical media type, i.e. copper or fibre, does not necessarily need to match between all LAG members.

Static Configuration

The simplest method of building a LAG does not involve any signalling or protocols at all and simply specifies the member ports to be aggregated. Here's an example of doing that on two different platforms:

Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown


Cisco 2950:
2950#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode on
Creating a port-channel interface Port-channel 1

2950(config-if-range)#no shut

In this setup, as soon as a port becomes physically up it becomes a member of the LAG bundle. The only, fairly minor, advantage of this is that the configuration is very simple. The disadvantage is that there is no method to detect any kind of cabling or configuration errors.

Note: The lack of any kind of misconfiguration detection makes static LAGs very dangerous to deploy in production networks.

LACP

LACP is the standards based protocol used to signal LAGs. It detects and protects the network from a variety of misconfiguration and fault conditions, ensuring that links are only aggregated into a bundle if they are consistently configured and cabled.

LACP must be configured in one of two modes:
  • Active mode - the device immediately sends LACP messages (LACPDUs) when the port comes up and must reach an agreement with the attached port before traffic will pass.
  • Passive mode - the device does not generate LACPDUs until it receives them. If no LACPDUs are received then the port aggregates as though statically configured. If LACPDUs are received then an agreement must be reached with the peer before traffic will pass.
In practice it is rare to find passive mode used in any properly designed network as it should be clearly and consistently defined which links will use LACP ahead of deployment.

Minimal LACP configuration

The minimal configuration is still very straightforward, requiring little additional CLI:

Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ lacp active

*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown


Cisco 2950:
2950#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode active
Creating a port-channel interface Port-channel 1

2950(config-if-range)#no shut
There is, of course, a lot more going on behind the scenes but most parameters assume default values which are perfectly acceptable for most situations.

LACP Terms and Parameters

There are a number of LACP-specific terms and parameter names that must be understood in order to make sense of LACP debug output and packet traces.

The first and arguably most fundamental concept is that of actors and partners. One of the really nice debugging features of LACP is that it echoes the parameters it receives back to the sender. To avoid confusion, the term actor is used to designate the parameters and flags pertaining to the sending node, while the term partner is used to designate the sending node's view of its peer's parameters and flags.

Per System:
Each network device has a LACP System ID. This is a 48 bit value which generally defaults to the chassis MAC address. The system ID is sent within every LACPDU and makes it easy to check that a LAG goes to the device you expect.

Each device also has a 16 bit LACP System Priority. The system priority is used to decide which system's port priorities are used to decide active / standby in the event that the two peers disagree. Lowest priority wins.

Per LAG:
Each LAG on a system will have a unique 16 bit LACP key, the purpose of which is to differentiate one LAG from another within the protocol. This number is locally significant and may or may not match between peers.The main purpose of the LACP key is to allow a system to detect cabling faults - if different LACP keys are received on members of the same LAG then we are connected to two different LAGs at the far end and, obviously, aggregating those together would be a bad idea.

LACP Flags:
The following flags are used to communicate state between systems:
  • Activity - Set to indicate LACP active mode, cleared to indicate passive mode
  • Timeout - Set to indicate the device is requesting a fast (1s) transmit interval of its partner, cleared to indicate that a slow (30s) transmit interval is being requested.
  • Aggregation - Set to indicate that the port is configured for aggregation (typically always set)
  • Synchronisation - Set to indicate that the system is ready and willing to use this link in the bundle to carry traffic. Cleared to indicate the link is not usable or is in standby mode.
  • Collecting - Set to indicate that traffic received on this interface will be processed by the device. Cleared otherwise.
  • Distributing - Set to indicate that the device is using this link transmit traffic. Cleared otherwise.
  • Expired - Set to indicate that no LACPDUs have been received by the device during the past 3 intervals. Cleared when at least one LACPDU has been received within the past three intervals.
  • Defaulted - When set, indicates that no LACPDUs have been received during the past 6 intervals. Cleared when at least one LACPDU has been received within the past 6 intervals. Once the defaulted flag transitions to set, any stored partner information is flushed. 

Bringing Links into Service

Assuming that the local configuration is consistent and LACPDUs are being exchanged across the link, the following flow chart roughly describes how to decide the value of the synchronisation, distributing and collecting flags.



If by the end your collecting / distributing flags are set then the link will be used for sending and receiving traffic. If not, it won't.

LACP Fault Detection

LACP can detect almost every conceivable patching error and will refuse to aggregate when that would be inappropriate. Following are a number of improper LAG topologies along with a description of how LACP detects and protects the network against them.

Split LAG

In the above scenario, LACP inspects the system ID field of incoming LACPDUs and refuses to aggregate any links whose system ID does not match that of the existing member(s).

Crossed LAGs

In the above scenario, LACP detects the cabling fault by inspecting the key ID on the incoming LACPDUs and refuses to aggregate any links whose key does not match that of the existing member(s).

Looped LAG

In the above scenario, LACP detects the cabling fault by inspecting the system ID and key of the incoming LACPDU. Some systems (e.g. Alcatel-Lucent 7750) allow different LAGs to be interconnected on the same chassis, however it is never allowed for two member ports of the same LAG to be connected.

Unidirectional Link Failure


In the scenario above, a unidirectional link failure has occurred so that LACPDUs are being lost in the direction A to B, but the ports remain physically up. LACPDUs that are lost are indicated in grey. In this situation, system B responds to the loss of three consecutive LACPDUs by clearing its synchronisation, collecting and distributing flags and setting its expired flag. System A responds immediately to the loss of sync by clearing its synchronisation, collecting and distributing flags.

LACP Troubleshooting

The most important part of troubleshooting LAGs is to properly understand the meaning and purpose of all the parameters, particularly the flags, before you begin. After that point, it is just a matter of knowing what CLI commands will show you the required information.

I recommend starting with the basics and working up:
  • Are the member ports physically up?
  • Are all member ports configured consistently (see LAG Rules above)?
  • Can you be sure the topology is as we expect?
    • Use LLDP or CDP if available
    • Use system ID, key and port ID values from the LACPDUs otherwise
  • Determine which end is unhappy (hint, it won't be sending sync).
  • Verify that messages are passing bi-directionally and are not being blocked by any kind of filter (hint, check that the partner details are populated on LACPDUs)
After following these checks you should be able to trace 95% of LAG problems. I, personally, prefer to check the flags, etc, using a packet capture. But then I would, because that's my answer to everything. Below are some CLI methods to gather the same information.

Alcatel-Lucent 7750

To get almost all the information you could ever want, use "show lag [number] detail":

A:7750# show lag 1 detail
===============================================================================
LAG Details
===============================================================================
Description        : N/A
-------------------------------------------------------------------------------
Details
-------------------------------------------------------------------------------
Lag-id              : 1                     Mode                 : access
Adm                 : up                    Opr                  : up
Thres. Exceeded Cnt : 2                     Port Threshold       : 0
Thres. Last Cleared : 12/21/2012 10:59:59   Threshold Action     : down
Dynamic Cost        : false                 Encap Type           : null
Configured Address  : 00:0a:aa:2e:af:ea     Lag-IfIndex          : 1342177281
Hardware Address    : 00:0a:aa:2e:af:ea     Adapt Qos (access)   : distribute
Hold-time Down      : 0.0 sec               Port Type            : standard
Per FP Ing Queuing  : disabled
LACP                : enabled               Mode                 : active
LACP Transmit Intvl : fast                  LACP xmit stdby      : enabled
Selection Criteria  : highest-count         Slave-to-partner     : disabled
Number of sub-groups: 1                     Forced               : -
System Id           : 00:0a:aa:2e:af:ea     System Priority      : 40960
Admin Key           : 32777                 Oper Key             : 32777
Prtr System Id      : 00:12:da:ab:fe:21     Prtr System Priority : 32768
Prtr Oper Key       : 1
Standby Signaling   : lacp

-------------------------------------------------------------------------------
Port-id        Adm     Act/Stdby Opr     Primary   Sub-group     Forced  Prio
-------------------------------------------------------------------------------
2/2/19         up      active    up      yes       1             -       32768
2/2/20         up      active    up                1             -       32768

-------------------------------------------------------------------------------
Port-id        Role      Exp   Def   Dist  Col   Syn   Aggr  Timeout  Activity
-------------------------------------------------------------------------------
2/2/19         actor     No    No    Yes   Yes   Yes   Yes   Yes      Yes
2/2/19         partner   No    No    Yes   Yes   Yes   Yes   No       Yes
2/2/20         actor     No    No    Yes   Yes   Yes   Yes   Yes      Yes
2/2/20         partner   No    No    Yes   Yes   Yes   Yes   No       Yes
===============================================================================
A:7750#

In this output you can see the local and remote flags, system IDs, system priorities and keys in use, whether the underlying ports are functioning and, if sub-groups are in use, whether local ports are active or standby. Note also that it shows you which port in the LAG is primary - if you want to edit anything such as MTU, QoS, etc, then you need to do it on the primary port. Your changes will then be pushed to the other ports automatically.

If you need to verify that LACPDUs are being received, you can use "debug lag [lag-id number] [port port-id] pkt". This will produce a debug message for every LACPDU sent or received, optionally filtered by LAG or by individual port:

A:7750# debug lag lag-id 1 pkt
980 2012/12/21 21:23:56.73 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
Xmit LACPDU on PortId 2/2/19"

981 2012/12/21 21:23:56.80 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
LACPDU rcvd on PortId 2/2/19"


A little light on detail, admittedly, but enough to prove whether they are arriving or not.

For more interactive debugging, a better choice might be "debug lag [lag-id number] [port port-id] sm" to indicate what is happening to the state machine for a given lag or port:

A:7750# debug lag lag-id 1 sm
852 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1: partner oper state bits changed on member 2/2/20 : [sync FALSE -> TRUE]
"

853 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :triggerMap 0 -> e after Rx SM"

854 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :running selection logic"

855 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :MUX SM ATTACHED->COLLECTING_DISTRIBUTING"


The above is quite verbose as it generates state machine transitions every time a LACPDU is sent or received, but it is really the best way to troubleshoot state transitions.

Cisco 2950

There are a few LACP related show commands on IOS and the useful information is spread between them. Starting at the simple end, a high level overview of the LAGs on the system can be obtained using the command "show etherchannel":

2950#show etherchannel
                Channel-group listing:
                ----------------------

Group: 1
----------
Group state = L2
Ports: 2   Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol:   LACP

2950#

To find the local LACP system ID, use "show lacp sys-id":

2950#show lacp sys-id 
32768,0012.da12.abcd

Note that the part before the comma is actually the system priority.

Useful information about the remote device (our partner) can be found using "show lacp neighbor":

2950#show lacp neighbor 
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode

Channel group 1 neighbors
Partner's information:
                  LACP port                        Oper    Port     Port
Port      Flags   Priority  Dev ID         Age     Key     Number   State
Fa0/19    FA      32768     0003.abcd.aaa1   3s    0x8009  0x8894   0x3F
Fa0/20    FA      32768     0003.abcd.aaa1   3s    0x8009  0x8893   0x3F


This shows some useful information such as the timeout and activity flags, plus it allows you to verify the LACP keys being received on each port for consistency. If you need more information, add the "detail" keyword:

2950#show lacp neighbor detail
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode

Channel group 1 neighbors
Partner's information:
          Partner               Partner                     Partner
Port      System ID             Port Number     Age         Flags
Fa0/19     40960,0003.abcd.aaa1  0x8894           11s        FA

          LACP Partner         Partner         Partner
          Port Priority        Oper Key        Port State
          32768                0x8009          0x3F

          Port State Flags Decode:
          Activity:   Timeout:   Aggregation:   Synchronization:
          Active      Long       Yes            Yes

          Collecting:   Distributing:   Defaulted:   Expired:
          Yes           Yes             No           No
          Partner               Partner                     Partner
Port      System ID             Port Number     Age         Flags
Fa0/20     40960,0003.abcd.aaa1  0x8893           11s        FA

          LACP Partner         Partner         Partner
          Port Priority        Oper Key        Port State
          32768                0x8009          0x3F

          Port State Flags Decode:
          Activity:   Timeout:   Aggregation:   Synchronization:
          Active      Long       Yes            Yes

          Collecting:   Distributing:   Defaulted:   Expired:
          Yes           Yes             No           No
2950#


Note that contrary to what you might expect, the "Port State Flags Decode" sections (highlighted in red) actually refer to the local flags rather than those being sent by the remote device. As you can see, in this example the remote end is requesting fast timeouts but the local end is requesting slow.

A fairly detailed overview of the local and remote state can be seen using the "show etherchannel detail" command:

2950#show etherchannel detail
                Channel-group listing:
                ----------------------

Group: 1
----------
Group state = L2
Ports: 2   Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol:   LACP
                Ports in the group:
                -------------------
Port: Fa0/19
------------

Port state    = Up Mstr In-Bndl
Channel group = 1           Mode = Active          Gcchange = -
Port-channel  = Po1         GC   =   -             Pseudo port-channel = Po1
Port index    = 0           Load = 0x00            Protocol =   LACP

Flags:  S - Device is sending Slow LACPDUs   F - Device is sending fast LACPDUs.
        A - Device is in active mode.        P - Device is in passive mode.

Local information:
                            LACP port     Admin     Oper    Port     Port
Port      Flags   State     Priority      Key       Key     Number   State
Fa0/19    SA      bndl      32768         0x1       0x1     0x13     0x3D

Partner's information:
                  LACP port                        Oper    Port     Port
Port      Flags   Priority  Dev ID         Age     Key     Number   State
Fa0/19    FA      32768     0003.abcd.aaa1  26s    0x8009  0x8894   0x3F

Age of the port in the current state: 0d:00h:00m:24s
Port: Fa0/20
------------

Port state    = Up Mstr In-Bndl
Channel group = 1           Mode = Active          Gcchange = -
Port-channel  = Po1         GC   =   -             Pseudo port-channel = Po1
Port index    = 0           Load = 0x00            Protocol =   LACP

Flags:  S - Device is sending Slow LACPDUs   F - Device is sending fast LACPDUs.
        A - Device is in active mode.        P - Device is in passive mode.

Local information:
                            LACP port     Admin     Oper    Port     Port
Port      Flags   State     Priority      Key       Key     Number   State
Fa0/20    SA      bndl      32768         0x1       0x1     0x14     0x3D

Partner's information:
                  LACP port                        Oper    Port     Port
Port      Flags   Priority  Dev ID         Age     Key     Number   State
Fa0/20    FA      32768     0003.abcd.aaa1   0s    0x8009  0x8893   0x3F

Age of the port in the current state: 0d:00h:00m:27s
                Port-channels in the group:
                ---------------------------

Port-channel: Po1    (Primary Aggregator)
------------
Age of the Port-channel   = 0d:00h:00m:50s
Logical slot/port   = 1/0          Number of ports = 2
HotStandBy port = null
Port state          = Port-channel Ag-Inuse
Protocol            =   LACP

Ports in the Port-channel:
Index   Load   Port     EC state        No of bits
------+------+------+------------------+-----------
  0     00     Fa0/19   Active             0
  0     00     Fa0/20   Active             0

Time since last port bundled:    0d:00h:00m:28s    Fa0/19
2950#

For more interactive troubleshooting, there are debug commands present but be careful - on my (admittedly ancient) switch, LACP debugs were only available chassis-wide and were pretty verbose. The packet level debug ("debug lacp packet") for a single LACPDU is shown below:

2950#debug lacp packet
Link Aggregation Control Protocol packet debugging is on
19w0d: LACP :lacp_bugpak: Send LACP-PDU packet via Fa0/20
19w0d: LACP : packet size: 124
19w0d: LACP: pdu: subtype: 1, version: 1
19w0d: LACP: Act: tlv:1, tlv-len:20, key:0x1, p-pri:0x8000, p:0x14, p-state:0x3D,
s-pri:0x8000, s-mac:0012.da12.abcd
19w0d: LACP: Part: tlv:2, tlv-len:20, key:0x8009, p-pri:0x8000, p:0x8893, p-state:0x3F,
s-pri:0xA000, s-mac:0003.abcd.aaa1
19w0d: LACP: col-tlv:3, col-tlv-len:16, col-max-d:0x8000
19w0d: LACP: term-tlv:0 termr-tlv-len:0


Pretty detailed, so watch your CPU!

A rather useful alternative is "debug lacp fsm" - again this provides a very high volume of output but is the only practical way to see detailed info on state transitions via CLI:

2950#debug lacp fsm
Link Aggregation Control Protocol fsm debugging is on
19w0d:     lacp_mux Fa0/19 - mux: during state WAITING, got event 4(ready)
19w0d: @@@ lacp_mux Fa0/19 - mux: WAITING -> ATTACHED
19w0d: LACP: Fa0/19 lacp_action_mx_attached entered
19w0d: LACP: Fa0/19 Attaching mux to aggregator
19w0d:     lacp_mux Fa0/19 - mux: during state ATTACHED, got event 5(in_sync)
19w0d: @@@ lacp_mux Fa0/19 - mux: ATTACHED -> COLLECTING_DISTRIBUTING
19w0d: LACP: Fa0/19 lacp_action_mx_collecting_distributing entered
19w0d: LACP: Fa0/19 Enabling collecting and distributing
19w0d:     lacp_rx Fa0/19 - rx: during state CURRENT, got event 5(recv_lacpdu)
19w0d: @@@ lacp_rx Fa0/19 - rx: CURRENT
2950# -> CURRENT
19w0d: LACP: Fa0/19 lacp_action_rx_current entered
19w0d:     lacp_mux Fa0/19 - mux: during state COLLECTING_DISTRIBUTING, got event 5(in_sync) (ignored)
19w0d:     lacp_ptx Fa0/19 - ptx: during state FAST_PERIODIC, got event 3(pt_expired)
19w0d: @@@ lacp_ptx Fa0/19 - ptx: FAST_PERIODIC -> PERIODIC_TX
19w0d: LACP: Fa0/19 lacp_action_ptx_fast_periodic_exit entered


Very verbose indeed. Be careful with CPU load.

Frankly, if you can, it is better to troubleshoot with a port mirror and packet capture. The protocol is very good at telling you what it is doing as in addition to the periodic LACPDUs, triggered updates are generated whenever anything material such as sync state changes. Use a capture filter (see previous blog post "tshark one-liners" for more info) when capturing on links with a lot of user data.

Oddities

The value of the timeout flag sent by a device indicates the interval at which it expects the partner to send LACPDUs. The partner then should honour the request and send at the indicated interval.

The timeout value does not have to agree between peers. While it is not a recommended configuration, it is possible to bring up a LAG with one end sending every second and the other sending every 30 seconds. In this case, the end requesting fast timers will detect a silent failure in under 3 seconds while the end requesting slow timers will take up to 90 seconds to detect the same fault.

The configuration of sub-groups (and even whether to use sub-groups) does not have to agree between peers. The failure characteristics are often better if one end is configured with active / standby subgroups while the other is configured without any subgroups. In that case, as soon as the end with sub-groups decides to switch a new sub-group to active, the partner is already sending sync on all available links and will immediately put traffic onto the newly active sub-group.

The Alcatel-Lucent 7750 (and probably others, I've just not looked) sends an out of sync LACPDU upon detecting a LAG member go physically down. Normally that won't get through to the other end but in the event of a single fibre failure, for example, it serves tot inform the partner that the link is no longer usable and should be removed from the LAG bundle. This improves failover times considerably in the case where link loss is not forwarded (tens or hundreds of milliseconds as compared to 2 - 3 seconds).

Finally

If you got this far, you should probably download the IEEE 802.1ax-2008 standard.

28 comments:

  1. Replies
    1. Thanks, I'm glad you found it useful!

      Delete
  2. Brilliant post. Many thanks!

    ReplyDelete
  3. awesome blog :) . i never seen a blog with kind of information about networking protocols. You are really helping log of network professional. If possible you can write more about other layer 2 and layer 3 protocols. thank you very much

    ReplyDelete
  4. I have a setup where my host forms a LACP LAG with 2 uplink ports connected to 2 separate switches. The switches are in the same fabric (using Brocade VCS fabric and vLAG). This provides 2 paths, but in a situation where 1 of the switches looses its uplink to its next hop, will LACP be able to detect that path failure although its direct member ports are still UP.
    Does LACP have any mechanism to deal with this?
    Thanks,
    Subhish

    ReplyDelete
    Replies
    1. Hi, Subhish.

      While the LACP protocol is able to take a link out of service (by revoking the sync bit) I've never seen any implementation where you can make a link's availability dependant on the state of an unrelated link or other tracked object. I would have thought your fabrics would be connected by multiple high bandwidth links, though, so being switched horizontally shouldn't have a noticeable effect on the traffic.

      Delete
    2. This is a lab environment and we are testing multi-chassis LACP LAG from a single host and trying to understand the LACP protocol behavior in a split-brain scenario.

      Delete
    3. Hi Subhish. Split brain usually gets quite ugly, which is why the various multi-chassis mechanisms recommend / insist on countermeasures such as multiple peer links or out of band connectivity between chassis. I guess you'd really have to lab test your specific environment.

      Delete
  5. Thanks for this post. Extremely useful!

    ReplyDelete
  6. May I know the best practice of setting LACP? Active-active OR Active-passive ?

    Thanks for your deep and wonderful post.

    ReplyDelete
    Replies
    1. Hi, Alvin. Personally, I would always use active mode on both ends anywhere where I would want / expect a bundle to form. I've never seen any benefit to using passive mode and you can cause strange problems in your network if one end is expecting a bundle while the other end is not. This is particularly true if there is transmission kit in the path between your devices.

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
  8. One of the best blogs about LACP I have ever read. Good work Foeh !!

    ReplyDelete
  9. Good info. How to ensure LACP working is still a myth

    ReplyDelete
  10. Hi,
    Awesome post i have one doubt though
    if 1 physical link/port goes down from the bundle how LACP detects it and how admin will detect it , and what triggers the standby link to become active
    Thanks

    ReplyDelete
    Replies
    1. Sorry, I've not checked my blog for a very long time! If a bundled port goes physically down it is immediately taken out of the bundle and will generate the same link down traps as a standalone port. The mechanism for deciding when to switch over between active and standby is kind of implementation specific but usually is done by either most available links or a minimum member number threshold. It can be triggered by member links becoming unusable (for example, going down or being signalled out of sync by the far end).

      Delete
  11. Thanks! This article has been quite useful

    ReplyDelete
  12. Really good info...Have a doubt here can we have a LACP between a switch and a Router ?

    ReplyDelete
    Replies
    1. Yes, you can use LACP between switches and routers, if the device and software support it. For example, I've seen LACP used between an ASR1k router and a switch but I have also seen older routers which only support "on" mode for bundling or do not support bundling at all.

      Delete
  13. What do you mean by chassis MAC address in LACP system ID, Is it one of the physical port's(which is the part of LAG group) MAC address?

    ReplyDelete
    Replies
    1. Usually a switch will have a "base" MAC address to identify itself by, this is used for example to create the bridge ID used in spanning tree. It is sometimes the MAC of the first interface on the device or it can be completely separate.

      Delete
    2. Thanks Foeh for the reply.

      In a case where we use MAC of the first interface on the device as system Id, when does the system Id gets updated if the first interface goes down for some reason. does this system id update cause lag group down due to re-negotiation?

      for eg:
      - it is temporarily went down and came back after some time (do we need to update system Id in this case?)
      - Moving that interface from one lag to other lag group in the same chassis (definitely we need to update the system id as this interface can be used as system id in other lag group as well)

      Delete
  14. Great information..thank you so much.Keep up the great work.

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Configured a LAG with 1 GE interface each between Huawei ATN 950b and the ALU 7750, the LAG is then linked to with local epipe service to 10 GE interface wich terminates to another ATN 950B. The egress traffic on the 7750 LAG interface is only transmitting on one of the link.That is, the traffic does not load balance on the two members.

    ReplyDelete
    Replies
    1. Are both members showing sync / collecting / distributing? I assume you are not using sub-groups on either side? Also, I assume this is not a multi-chassis LAG on the 7750 side?

      It's been a long time since I worked on 7750 but I believe it should spray even e-pipe traffic coming in on a single link.

      Delete