Friday, 1 March 2013

A Better Way to Compare 7750 Configs

One of the tasks I regularly have to perform as part of my job is to audit router configurations against basebuild templates and occasionally against each other. This can be quite labour-intensive, particularly as good, old-fashioned diff doesn't cope very well with hierarchical configs such as the 7750's. There are many things that make life difficult with traditional diff:
  1. Diff is more-or-less reliant on order being preserved between the files being compared. Some config elements are split across the config file when it is saved. Others may be stored in different locations depending on the software release. Certain items such as layer 3 interfaces are stored in the order that they were added, so two configs may contain the same interfaces but in a different order and traditional diff can't generally work that out.
  2. Traditional diff does not appreciate the significance of policy names, sequence numbers or service IDs in determining what should be compared to what. Ad-hoc insertions and deletions of policy entries or service often causes diff to get completely out of step, making it compare apples to oranges.
  3. Traditional diff does not provide context for differences. Quite often you just get a load of "shutdown" "no shutdown" pairs. It is possible to include a fixed number of lines pre- or post- difference, but even that does not always show the configuration context where the difference actually occurred and brings a load of junk with it. The only sure-fire way is to turn on side by side diff, which shows both configurations in full side by side with changes marked in a centre column.
Point 2 is the real killer, making traditional diff basically useless for audits where a handful of known policies must be validated within a config containing many other policies. Below is a (fairly common) worst case when working with diff - one policy has been removed and another two added:



Traditional diff makes a horrible job of this. Even seeing the two configs side by side it is confusing to look at and it is not immediately apparent what has changed.

I poked and played with a number of "diff" type tools to try and find something that would handle this kind of thing more gracefully but I eventually came to the conclusion that nothing currently existed. I had a rough idea of what I wanted:
  •  It must only compare the contents of like policies, i.e. policy "A" should only be compared to policy "A" and never to policy "B".
  • It must compare configuration elements that appear in a different order in one config to the other.
  • It should, ideally, report the full context of each difference within the hierarchy.
To address these points I decided to write a tool from scratch and called it, unimaginatively, 7750diff. Here's how 7750diff reports the same changes:

D:\7750diff>7750diff a.cfg b.cfg
Unique to a.cfg:
configure
    qos
        scheduler-policy "20000kbps" create
            description "20000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 20000 cir 20000
                exit
            exit
        exit
    exit
exit
Unique to b.cfg:
configure
    qos
        scheduler-policy "25000kbps" create
            description "25000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 25000 cir 25000
                exit
            exit
        exit
        scheduler-policy "40000kbps" create
            description "40000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 40000 cir 40000
                exit
            exit
        exit
    exit
exit

D:\7750diff>


That's not only much clearer (IMHO), but I can take the output and copy / paste it directly into the node to build the missing configuration. Happy days!

How it Works

The basic methodology used by 7750diff is to:
  • Read each config into a hierarchical tree structure based on indent levels
  • Recursively compare the trees starting at the root:
    • For each branch of config A, search for an identical branch on config B within the same context.
    • If a match is found, check for subordinate "child" configuration elements.
    • If any children exist, recursively process them.
    • If no children are then present, remove the matching elements.
Once all elements have been compared only the elements unique to each config will remain, along with the parent elements required to reach the configuration context of the change. The trees can then be output as a list of differences between the two files. In many cases, thanks to the inclusion of context, big chunks of 7750diff output can be directly entered into one node to bring its config into line with the other.

Since the whole thing runs on indents it is not 7750 specific. It "may" work on ISAM configs, it may work on any other sort of config that uses indent / whitespace to denote hierarchy - I just haven't tested it. Try your luck :)

Obtaining the Tool

As usual, the tool is available for download at my github: https://github.com/theclam/7750diff - there is the C source code plus a Windows binary available to download.

The code is pretty horrible as this was my first bit of C coding in over a decade. I may decide to clean the code up at some point but it is quite stable now, meaning I haven't found a config that upsets it for the last couple of versions. I've been running a nightly cron job for quite some time now which pulls down an entire lab's configs and 7750diffs each one against the previous day's and it works great.

If you download 7750diff, please let me know how you get along with it.

Friday, 1 February 2013

A Tool for Measuring Forwarding Delay in Packet Captures

I have access to some pretty expensive test kit in work. One of its main purposes in life is to measure the latency of traffic streams passing through a network, which is a pretty useful feature. Occasionally, though, the figures produced can be hard to believe and it would be nice to be able to validate them independently. It would also sometimes be nice to be able to see down to the packet level how the delay varies over time.

Since the tester inserts a "unique" signature into each frame, it is possible to do the calculations by hand - simply take a packet capture of traffic entering and leaving the device (at minimum capture using 2 ports on the same box, preferably use one port with a two-source port mirror), then manually compare the timestamps of the packets pre- and post-routing.

Finding matching pairs of packets is pretty tedious, especially for large captures and particularly where high throughput rates mean that there may be thousands of other frames between a "before" and "after". The technique is sound, though, if you're patient.

For some work I was doing recently, I needed to do this on a grand scale. A multi-megabit stream did not appear to be queuing as expected and it was unclear why. Eventually that particular problem was traced, using Wireshark IO graphs, back to overly bursty traffic being offered into the device under test but it made me think it would be very nice to have a tool for doing this kind of verification and, actually, it would not be difficult to write one. So I wrote a tool, in case I needed it in a hurry later on. The impatient may just want to scroll down to "Obtaining the Tool".

How it Works

Packet headers are inevitably be changed by routing (and, in fact, encapsulation could be added or removed in the process) so packet payloads must be compared in order to find "before and after" pairs. The Spirent TestCenter tester includes a 20 byte "signature" in each generated packet, which is always at the very end of the payload. In practice it is not necessary to compare the entire, variable length, payload to pair up packets. Rather it is sufficient, and much faster, to compare the last 20 bytes of each packet for a matching signature.

The process implemented in the tool is to read each packet from the pcap file, storing the following details in a list entry:
  • Frame number
  • Arrival time
  • Signature
Each entry is then stored in a linked list, as in the following diagram:


Each packet read in adds another node of around 36 - 44 bytes in size. This is smaller than the original capture but can still be a considerable amount of memory when working with very large captures.

Once the complete list has been built, the next job is to identify the "before and after" pairs. This is done by considering each list entry in turn, then looking forward in the list for an entry with a matching signature. If such an entry is found, the frame numbers and timestamps of each frame are output along with the time delta between the two frames. Pretty simple, really.

Obtaining the Tool

The tool is available to download as C source code and as a Windows binary at https://github.com/theclam/fwding.

To build from source code simply extract the source,  change into the directory and type "make".

Using the Tool

Once the binary has been compiled or downloaded, simply run it with the name of the pcap file as its only parameter: For example:

lab@lab:~/Projects/fwding$ ./fwding input.cap
Arrival Frame Number, Arrival Time, Departure Frame Number, Departure Time, Forwarding Delay
1, 1359461693.826304, 2, 1359461693.826354, 0.000050
5, 1359461693.826418, 6, 1359461693.826468, 0.000050
7, 1359461693.826585, 8, 1359461693.826701, 0.000116
9, 1359461693.826818, 11, 1359461693.826946, 0.000128
10, 1359461693.826830, 12, 1359461693.826958, 0.000128
17, 1359461693.826999, 18, 1359461693.827014, 0.000015
21, 1359461693.827078, 22, 1359461693.827128, 0.000050
25, 1359461693.827192, 26, 1359461693.827242, 0.000050
27, 1359461693.827359, 29, 1359461693.827488, 0.000129
[...snip...]

The output produced is standard CSV-formatted text.which can be piped or redirected to a file as necessary for manipulation by your favourite spreadsheet or command line tool. Timestamps are in seconds since Unix epoch. Delay is reported in seconds.

Note: The pairing-up mechanism is highly dependent on the test traffic containing unique data in the last 20 bytes of each frame. For tester traffic that's taken care of automatically but your mileage with "real" traffic will vary. I would expect that FTPing a compressed file or playing music / white noise over VOIP  should give relatively good entropy to your data if a tester is not available. For best results filter out non-test traffic beforehand - OSPF hellos and LACPDUs are very repetitive so will generate lots of false hits.

For example, if you want a quick-and-dirty graph of latency over arrival time using gnuplot, just pipe the output to file then use a command such as:

gnuplot> plot "all-ways.txt" using 2:5 with points pt 2


Alternatively, graph the latency by frame number:

gnuplot> plot "all-ways.txt" using 1:5 with points pt 2

Hopefully your output won't look like this - it is an intentionally odd example caused by sending very bursty traffic.

Finally

I always ask but it's never happened yet... if you try the tool out, please leave a comment. I'm interested in feedback, good or bad, and if it doesn't quite do what you want I may change it!

Tuesday, 22 January 2013

RADIUS and L2TP Support Added to "dechap"

This is just a very short message to say that I have enhanced the "dechap" tool mentioned in my previous post. In addition to the original PPPoE support it can now extract and attack CHAP authentications sniffed from RADIUS and L2TPv2 protocols.

The syntax remains exactly the same and it should "just work". The code is available to download at https://www.github.com/theclam/dechap. Please post a comment if you have any feedback or suggestions.


Monday, 21 January 2013

Recovering CHAP Passwords from Sniffed PPPoE Sessions

In a previous blog post I outlined the theory behind setting up a PPPoE session including PPPoE discovery, LCP, NCPs and, more relevant to this post, the basics of CHAP authentication. At the time I was writing the post I wondered how easy it would be to work back from the CHAP messages on the wire to the original credentials, so I decided to find out.

Recap of CHAP Theory


As a reminder (or a very quick introduction), the CHAP process works something like this:

CHAP Authentication
  1. The party requiring the opposite peer to authenticate (i.e. "server") sends a CHAP challenge message containing a challenge ID and some unpredictable "random" data.
  2. The party being authenticated (i.e. "client") concatenates the authentication ID, the password and the challenge data into a single unit, then generates an MD5 hash of that. The resulting hash, plus the client name (user ID or hostname) is passed to the server as a CHAP response.
  3. The server compares the incoming hash to the value it obtains by performing the same calculation locally and returns a CHAP success or CHAP failure message.
Now, clearly, if the CHAP challenge and response messages can be captured then an offline brute force attack can be mounted against the password. This can be achieved by simply extracting the authentication ID, challenge data and response from the relevant messages and then trying candidate passwords until one (hopefully) generates a hash identical to that seen in the response message.

The Attack in Practice

While the process is intuitively simple, as usual there are a few corner cases to cover. Recovering CHAP authentications from a capture file full of other junk requires a certain amount of processing logic, then responses must be re-united with their corresponding challenges before they can be attacked.

Gathering CHAP Packets


I wanted the tool to be flexible with regards to encap. Since I work primarily on carrier networks, I get really frustrated by tools that do a job perfectly but only accept untagged, unencapsulated frames. Once you have a packet capture in your hand, realising that it can't be used because it has two VLAN tags and a pair of MPLS labels is a nuisance.

The approach that seemed most sensible was to build a recursive decap function which would take in a (partial) frame plus a "hint" as to what type of header to expect. The function would then check for and record any matching criteria present (i.e. MACs for Ethernet, VLAN ID for 802.1Q - more on this in the next section) before either returning or calling itself on the remainder of the packet with a "hint" derived from the current header.

Worked Example

Let's process the following frame as an example. Data in black are used by the algorithm while data in grey are not.

[Ethernet][VLAN][VLAN][PPPoES][PPP][CHAP]
The initial call to the function passes the entire frame with an "Ethernet" hint. In the Ethernet header, the source and destination MAC addresses are read and stored. The EtherType field contains 0x8100, indicating an 802.1Q VLAN header is next. The function calls itself against the contents of the frame from byte 15 and on with a hint of "VLAN".

[VLAN][VLAN][PPPoES][PPP][CHAP]

Now the function reads and stores the VLAN ID. Since this is the first VLAN we have seen it is stored as the C-VLAN for now. The EtherType is, again, 0x8100 so the function calls itself against bytes 5 and onward using a hint of "VLAN".

[VLAN][PPPoES][PPP][CHAP]


Again,  the function reads and stores the VLAN ID. Since this is not the first VLAN tag found, the previously known VLAN ID is moved into the S-VLAN field and the value from the frame is stored in the C-VLAN field. This time the EtherType is 0x8864, indicating a PPPoE session header follows. The function calls itself against bytes 5 and onwards using a hint of "PPPoE".

[PPPoES][PPP][CHAP]


The function now reads and stores the PPPoE session ID (SID). The only valid thing to follow a PPPoE session header is a PPP header, so the function calls itself on bytes 7 and onward, using a hint of "PPP".



The function now simply checks that the protocol ID in the PPP header is 0xC223 for CHAP. If so, it calls itself one last time against bytes 3 and onward using a hint of CHAP.

[CHAP]


Finally we are down to the payload. The CHAP message type is checked and:
  • For challenges, the authentication ID, challenge length and challenge data are stored.
  • For responses, the authentication ID, response and client name are stored.
Each instance of the function can then return to its parent, eventually resulting in a fully populated record of all the data relevant to authentication. The completed records can then be stored in a doubly linked list for later consumption.

Pairing Up

A CHAP response must be paired up with its respective CHAP challenge, otherwise the maths don't work. In real life there may be several authentications in progress at one time across multiple PPPoE sessions, possibly over multiple different VLANs. Often the CHAP authentication ID is only unique within a PPPoE session. Similarly, the PPPoE session ID only needs to be unique within a broadcast domain so these are often re-used across VLANs. Care must be taken to ensure that the challenge and response really do belong together.

In order to be considered a challenge / response pair, I decided the following criteria must match:
  • Server and Client MACs
  • S & C VLAN IDs (if present)
  • PPPoE SID
  • CHAP authentication ID
I considered including MPLS labels in this but I struggled to think of a realistic scenario in which two authentications would match the above criteria but use a different label.

Additionally, the thought occurred that even with the above details matching, there may be more than one challenge / response pair for the same PPPoE session so a response would have to be paired with the most recent challenge for which the criteria matched. In the program this is achieved by working backwards through the linked list, starting at the response, until a match is found. Data from matching challenge / response pairs are stored in another list for later consumption. If the search reaches the beginning without a matching challenge being found then the response cannot be used and is ignored.

Brute Force Password Guessing


For each challenge / response pair in the list, the next step is to cycle through a list of password guesses. Each candidate password is combined with the authentication ID and challenge data from the captured authentication and hashed. The resulting hash is compared to the one from the captured response and, for those that match, a correct guess is reported. If no password generated a matching hash then the word list does not contain the correct password and this is also reported back.

Downloading the Tool

The C source code may be downloaded from: https://github.com/theclam/dechap

Provided the OpenSSL dev libraries are installed it should be possible to simply extract the source code, cd into the directory then run "make".

In the future I may add the capability to pull the auths from L2TP or RADIUS interactions but for now only PPPoE is supported. It also assumes that Ethernet control words are not present in MPLS encapsulated traffic.

Using the Tool


The usage is pretty straightforward - there are only two parameters and both are mandatory. Specify your capture file (original pcap format) with the -c flag and your word list with the -w flag. Here's an example:

lab@lab:~/dechap$ ./dechap -w mywords.txt -c someauths.cap
Found password "tangerine" for user user1@testisp.com.
Unable to find a password for user user2@testisp.com.
Found password "password1" for user user3@testisp.com.
Found password "Africa" for user user4@testisp.com.
Found password "Frankenstein" for user user5@testisp.com.
lab@lab:~/dechap$

Considering that I've made no effort at all to make the code efficient, I've found the speed pretty good. On my '90s PC, a worst-case run (i.e. where no passwords are found) against 800 auths with 100k candidate passwords, a run still completes inside a minute. I don't think that's bad for parsing 15,000 packets and running 80 million concatenate - hash - compare sequences.

If you try this out, please leave a comment on this post with your experiences - good or bad.

Friday, 11 January 2013

A Script to Bring Up a PPPoE Sessions using Python & Scapy

As I mentioned in my previous post, I have put together a script which can bring up a PPPoE session, authenticate using CHAP, negotiate an IP address and send / receive traffic. The script is written in Python and requires a relatively up to date version of scapy (I use v2.2.0-dev, just grab the latest from http://www.secdev.org/projects/scapy/).

I warn you now that I am not a professional coder (or even a particularly keen amateur) and I don't really get on with Python... so don't be surprised if it looks a bit C-like!

To run the script, simply download PPPoESession.py from https://github.com/theclam/PPPoESession-Python and call it from within Python:

root@labpc:~# python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> execfile("PPPoESession.py")
__main__:2: DeprecationWarning: the md5 module is deprecated; use hashlib instead
WARNING: No route found for IPv6 destination :: (no default route?)
/usr/local/lib/python2.6/dist-packages/scapy/crypto/cert.py:10: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
  import os, sys, math, socket, struct, sha, hmac, string, time
/usr/local/lib/python2.6/dist-packages/scapy/crypto/cert.py:11: DeprecationWarning: The popen2 module is deprecated.  Use the subprocess module.
  import random, popen2, tempfile
>>>


You can expect to see a few deprecation warnings, depending on which version of Python is in use.

The script defines the PPPoESession class, plus a few other miscellaneous functions for encapsulating and extracting parameters. The PPPoESession class inherits from the scapy Automata class, so all the useful features of that class such as graph() and easy debugging are available. See the scapy Automata wiki entry (http://trac.secdev.org/scapy/wiki/Automata) for more details.

In order to bring up a PPPoE session, a PPPoESession object needs to be instantiated and a few parameters need to be set. At minimum the Ethernet interface, username and password need to be configured:

>>> p = PPPoESession()
>>> p.iface="eth1"
>>> p.username="spongebob@bodges"
>>> p.password="password"


Once that is done, the automaton can be started using the runbg() method. The state machine then runs in the background, returning control to the user. Messages will appear as it goes through the motions of bringing up the PPPoE session, then the PPP session, then authenticating before finally completing IPCP:

>>> p = PPPoESession()
>>> p.username="spongebob@bodges"
>>> p.password="password"
>>> p.iface="eth1"
>>> p.runbg()
>>> Starting PPPoED
Starting LCP
Got CHAP Challenge, Authenticating
Authenticated OK
Starting IPCP
Peer provided our IP as 123.4.5.6
IPCP is OPEN

>>>

Once IP is negotiated, the automaton will stay in the IPCP_OPEN state, able to send and receive IP packets and automatically responding to any LCP echoes that arrive.

From that state, the following methods may be called:

recv_queuelen() - returns the number of packets waiting in the receive buffer
recv_packet() - returns and de-queues the first packet in the receive buffer
send_packet(IPPacket) - transmits the given IP packet over the PPPoE session
ip() - returns the IP address given to the client
gw() - returns the peer's IP address

Here's an example of passing some traffic on an open session by pinging the gateway:

>>> p.recv_queuelen()
0
>>> p.send_packet(IP(src=p.ip(), dst=p.gw())/ICMP())
>>> p.recv_queuelen()
1
>>> p.recv_packet()
<IP  version=4L ihl=5L tos=0x0 len=28 id=1 flags= frag=0L ttl=64 proto=icmp chksum=0xbd0f src=1.1.1.1 dst=123.4.5.6 options=[] |<ICMP  type=echo-reply code=0 chksum=0xffff id=0x0 seq=0x0 |<Padding  load='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' |>>>
>>>

The script is still very much a work in progress. There is, for example, no clean way to gracefully shut down the PPP session at the moment and it doesn't handle incoming Terminate-Requests, either. I am hoping to add that, and more, soon.

Have a play with it and let me know what you think, good or bad :)

Thursday, 10 January 2013

Bringing Up a PPPoE Session - The Theory

In a previous post, I shared a Scapy script that implements the PPPoE discovery stage and stops once the session stage is reached. As handy as that script is for testing AC Cookie validation, it is not particularly useful for anything else. It would be much better if the script could bring the PPP session all the way up.

Luckily, the PPPoE discovery script is a cut-down version of another script that I wrote a long time back which goes all the way from PPPoED, through LCP and CHAP authentication and stops at IPCP. At the time, the script was far too messy to share but I've tidied it up and it is now in a state that it could be useable by others. I've also added IPCP negotiation and a couple of methods for sending and receiving IP traffic over the resulting session.

Before I present the script, I'll cover the theory involved, step by step. The impatient may want to just go to the next post (when it is available) for the script itself and instructions on how to run it.

PPPoE Discovery

PPP is (a) point-to-point protocol, designed to run over a dedicated link between two devices. Ethernet is a multi-access network, so if we want to run PPP over Ethernet then we need a mechanism to discover peers and establish a point-to-point relationship between two devices over the shared medium.

PPPoE provides this service and operates in two distinct stages:
  1. Discovery: The discovery stage is responsible for locating PPPoE peers and negotiating session parameters so that, ultimately, a PPPoE session can be created.
  2. Session: Once the discovery stage is complete the protocol enters the session stage, at which time the two peers have a tunneled connection between them over which to start passing PPP.
Once the session stage is reached, the peers bring up and operate their PPP session exactly as they would over a dedicated link.

The diagram below summarises the PPPoE "Discovery" stage:

PPPoE State Transitions
The first step in the journey is to find a PPPoE access concentrator which is willing to terminate our session. To do this, we must broadcast a PPPoE Active Discovery Initiation (PADI) message. It is possible to specify a service name in the PADI - this is just a string that identifies a particular type of service in which the client is interested. The access concentrator may use this to decide whether or not to offer to terminate the session, though in most cases it is just ignored. For this reason clients generally use an empty service name.

Any access concentrators listening on the segment will receive the PADI message, inspect its contents and then make a decision whether or not to make an offer to terminate the client's session. If the access concentrator is willing to terminate the session, it signals this to the client by sending a unicast offer (PADO) message. Typically, the PADO has an AC-Cookie attached to it - essentially the AC-Cookie is an "unpredictable" string, derived from the client's MAC address, which the access concentrator uses to mitigate against certain kinds of resource exhaustion attacks. When AC-Cookies are used, a PADO is generated 'mechanically' from the incoming PADI and no state is created on the access concentrator at this point.

When the client has received at least one PADO, it must select a favourite. It is common to just use the first offer received, but other selection criteria may be used. The client then sends a unicast request (PADR) to the chosen access concentrator, indicating that it would like to access its offer. If an AC-Cookie was contained in the PADO message then is echoed back in the PADR. The requirement to echo the cookie back to the access concentrator is designed to validate that the client really exists and is available on the MAC address where the PADO was sent.

Finally, it is up to the access concentrator to confirm that it the session has been created. If AC-Cookies are in use then the incoming PADR is examined to check whether the AC would have generated the provided cookie given the source MAC - in the case of a mismatch the PADR is silently dropped, otherwise the session state is created in the AC and a session (PADS) message is unicast to the client to confirm that the session has been created and the "Session" stage has begun. The PADS always contains a PPPoE session ID number, which is used to discriminate between multiple PPPoE sessions on the same LAN. The session ID is used to differentiate between multiple PPPoE sessions on the same LAN and must be present in the header of every PPPoE frame exchanged with the AC during the "Session" stage.

The fifth type of PPPoE discovery message is the terminate (PADT) which, as its name suggests, is used to terminate (i.e. end) a session which has been established. Either end may send a PADT message to close the session and once a PADT has been received, no further traffic may be sent for that session.

PPP

PPP itself consists of a number of sub-protocols. There are:
  • Link Control Protocol (LCP) which is responsible for negotiating overall link parameters
  • PAP and CHAP which are used for authentication
  • A family of Network Control Protocols (NCPs) used to negotiate the transport of each upper layer protocol
PPP also defines that once a higher layer protocol has been negotiated by its corresponding NCP, that protocol's traffic will be encapsulated with header indicating that particular protocol's protocol number.

Link Control Protocol (LCP)

RFC 1661 defines LCP as the protocol that is responsible for "establishing, configuring,
and testing the data-link connection." Essentially this means that LCP is used to bring up and take down PPP links, negotiate the configuration parameters and check that the link is still alive. There are a range of LCP codes which are used to fulfil these aims, discussed below.

Configuration Type Codes

In order to bring up a PPP session both peers must agree on certain parameters, for example the maximum size of frame that may be passed, whether to use compression and so on. Both peers propose the settings they would like to use - the opposite peer will then either acknowledge (accept), nak (i.e. suggest alternative) or reject (outright refuse) the proposed options. The aim is to reach a state where the opposite peer has acknowledged the locally proposed parameters.

The following LCP codes are standard and must be implemented:

Configure-Request - Used to propose a set of parameters that we would like to use for the session. The peer will then respond to the proposed parameters with one of the next three responses.

Configure-Ack - Used to advise the peer that their proposed parameters are acceptable. The accepted parameters are echoed back in the ack message.

Configure-Nak - Used to advise the peer that their proposed parameters are not acceptable and that the alternative values should be used. The proposed changes are attached to the nak message.

Configure-Reject - Used to advise the peer that their proposed parameters are not supported and cannot be used. The unacceptable parameters are echoed back in the reject message.

Termination Type Codes

Either peer may request to terminate the session at any point and the opposite peer must honour that request. There are two termination related codes in LCP:

Terminate-Request - Generated by a peer to initiate the tear-down of the link. A Terminate-Request should be re-sent if no Terminate-Ack is received in response.

Terminate-Ack - Generated to confirm receipt of a Terminate-Request. A Terminate-Ack must be generated in response to a Terminate-Request.

Liveness Check Codes

LCP includes a ping-like echo mechanism to verify that the opposite peer is still available, with LCP in an open state and is responding. The same mechanism is used to detect a looped interface - due to the symmetric nature of PPP it's quite possible to negotiate a connection to yourself without necessarily realising or for a connection to be looped mid-session. The following codes are used for liveness checks:

Echo-Request - Sent to the remote peer to solicit an Echo-Reply message. There is no requirement to negotiate the use of LCP echoes and an Echo-Request may be generated at any time while LCP is open. If the Magic-Number option was negotiated during LCP, the Echo-Request must contain the "random" 4 octet magic number decided at that time.

Echo-Reply - Sent in response to an Echo-Request message. When LCP is open, an Echo-Reply message must be sent whenever an Echo-Request is received. The magic number contained within the incoming Echo-Request must be copied into the outgoing Echo-Reply. If the incoming packet has our magic number then the connection has become looped.

Other Codes

There are other codes such as Code-Reject, Protocol-Reject and Discard-Request which do pretty much what you would expect. You don't get to see them very often so I will not discuss them here. I suggest referring to RFC 1661 for more detail on these.

LCP State Diagram

Below is a simplified state diagram showing how LCP makes its way from the "Starting" state into an "Opened" state. Most parts of PPP are referred to as "open" when they are up and running. I have omitted a number of transitions that deal with strange corner cases (like if the peer acks something we never sent, etc) and also transitions related to closing the connection (the Term commands discussed above). RFC 1661 contains a complete state transition table which is far more complex. If you bear in mind that at any stage either peer may terminate the session then this minimal version will cover 95% of "normal" cases.

LCP State Transitions

Authentication

Once LCP is open, the next stage is typically to start authentication. Authentication may be done by either, neither or both the peers as negotiated by LCP and can be done using plaintext PAP or MD5 hashed CHAP. If no authentication was negotiated by LCP, an implicit pass is assumed.

PAP is hardly ever used these days, is strongly discouraged and in any case is pretty simple, so I will not discuss it here. Please refer to RFC 1334 if you require details on PAP.

CHAP, though not immune to attack, offers reasonable security. The password itself is never sent "over the wire" and there is good protection against replay attacks via the use of random challenges. Here is how CHAP operates:

CHAP Authentication

Essentially, security is provided in two ways:
  1. The password is never exchanged in the clear but instead is passed through a one-way cryptographic hash function. It is computationally infeasible to recover the password from the hash function's output, so it is quite safe to pass this output over the wire.
  2. If the client just hashed the password, then it would be possible for an attacker to capture the hashed value and authenticate with the server at a later time by simply replaying the same response. CHAP requires the server to generate a random challenge string, which is also fed into the hash function and affects its output. Provided the server never re-uses a challenge value, an attacker cannot simply replay a previous authentication response to gain access.
When the CHAP response comes in, the server compares the received hash value with the output of a local calculation using the same method to determine whether the authentication attempt was successful. While this is precisely true when the server has a local copy of the password, typically this is not desirable and in practice the authentication check is deferred to an external RADIUS server. In order for the RADIUS to validate the attempt, the server must pass it a copy of the ID and challenge sent, plus the response received. The RADIUS can then use the ID, its own copy of the plaintext password and the challenge value to compute the expected response. If the expected and actual responses match then the RADIUS will return an "Accept" response, otherwise it will return a "Reject" response.

Network Control Protocols (NCPs)

Before any higher layer protocol can be passed through a PPP tunnel, it must be negotiated by a corresponding NCP. For example before you can pass IP through a PPP tunnel, IPCP must be open, indicating that all the required IP parameters have been successfully negotiated. To pass OSI traffic, OSICP must be open. For IPv6, IPV6CP is used.

The operation of each NCP is different but they all essentially follow the same model as LCP - parameters are proposed by each peer and ack'd, nak'd or rejected by the opposite peer. - and the state transition diagram pretty much looks the same.

IPCP

I'll go into a little more detail on IPCP since that is the most commonly used (for now) with a worked example of a DSL subscriber connecting to his ISP, starting immediately after authentication succeeds.

Client Side

The client generally does not know anything when it first connects and relies on the server to provide it with everything it needs. The client sends a Configure-Request proposing an IP address, primary and secondary DNS of 0.0.0.0. Proposing 0.0.0.0 for these is actually an  explicit request for the server to provide legitimate values for the client to use.

The server will then respond with a Configure-Nak message containing the IP address and DNS servers that the client should use.

The client will then send another Configure-Request with the newly acquired details, to which the server responds with a Configure-Ack.

Server Side

The server will typically send out a Configure-Request containing only its own IP address. There is no reason to argue over this so the client should just respond with a Configure-Ack. If the client tries to push a different address to the server using a Configure-Nak, it is typically ignored and after a few retries the session gets pulled down.

Passing Traffic

Once the two peers are agreed and IPCP is open, IP packets may be passed through the PPP tunnel by attaching a header - in most cases, for PPPoE connectivity, the PPP header consists of only a two byte protocol number (0x0021 for IP). The protocol number is analogous to the EtherType field of an Ethernet frame and indicates to the receiver how to interpret the payload. Alternative encapsulations exist - refer to RFC 1662 for more details on HDLC style framing which is often seen in L2TP.

Further Reading

That about covers the protocols involved in bringing up a PPPoE session at a high level. If you require more information I would suggest turning to the following RFCs:

RFC 2516 - PPPoE - http://tools.ietf.org/html/rfc2516
RFC 1661 - PPP - http://tools.ietf.org/html/rfc1661
RFC 1994 - CHAP - http://tools.ietf.org/html/rfc1994
RFC 1332 - IPCP - http://tools.ietf.org/html/rfc1332
RFC 1877 - IPCP extensions for DNS - http://tools.ietf.org/html/rfc1877

Friday, 21 December 2012

All sorts of things about LACP and LAGs

A lot of people consider link aggregation groups (LAG / etherchannel / portchannel / MLT) to be pretty basic functionality that "just works" and don't really think any more about it. As with many networking technologies, there is a lot of intelligence responsible for creating the smooth veneer of simplicity.

The basic concept of the LAG is that multiple physical ports are combined into one logical bundle. This provides benefits including:
  • Increased capacity - traffic may be balanced across the member ports to provide increased aggregate throughput
  • Link redundancy - the LAG bundle can survive the loss of one or more member links
LAGs may be statically configured or signalled using standards based LACP, which is the main focus of this post. There is also the Port Aggregation Protocol (PAgP), which is similar in many regards to LACP, but is Cisco proprietary and not in common usage. I won't discuss PAgP in this post.

Load Balancing Operation

One important point to bear in mind with LAGs is that traffic is not dynamically assigned across member links but rather is "sprayed" using a deterministic hash algorithm. Depending on the platform and configuration, a number of parameters may feed into the algorithm including:
  • Source and/or destination MAC address
  • Source and/or destination IP address
  • Source and/or destination TCP / UDP port numbers
  • Ingress interface
  • Service ID or MPLS label
  • System specific information (chassis MAC or system IP)
Ultimately the hash will take in some combination of parameters and decide onto which member link the frame should be placed. Note that, since all the input to the algorithm is either permanently static (i.e. chassis MAC) or static for a given flow (i.e. source and destination MAC), all traffic for a particular flow will always be placed onto the same link. This has the following effects:
  • Order is maintained for frames within a flow - the different member links, particularly on a WAN, may have different delay characteristics. If frames for a single flow were sprayed onto multiple member links, frames could be re-ordered in transit.
  • Traffic for a single flow cannot exceed the bandwidth of a single member link.
  • Traffic balance across member links is largely dependant on the diversity of the offered traffic. If the number of flows is low, some links may be saturated while others are under-utilised. The same effect can be seen if there are many flows but load is proportionally concentrated in just a few of them.
  • When traffic passes through multiple hops using LAGs at each stage, polarisation can occur. This is where repeated application of the same hash function at each hop causes traffic to become unevenly distributed across the links. One link may be running at 100% and dropping excess traffic while another is almost idle. Passing system specific information into the algorithm is designed to mitigate this by ensuring that each hop hashes in a slightly different way.
  • Upstream and downstream traffic for a single flow will not necessarily traverse the same link. Since the devices at each end of a LAG hash traffic independently, there is no guarantee that both legs of a conversation will pass along the same member link.

Active / Standby Operation

In addition to the "normal" load balancing mode of operation, it is also possible to configure a LAG to operate in an active/standby fashion. In fact, it is possible to combine the two modes and have an arbitrary number of links active and passing traffic while an arbitrary number remain on standby pending a fault on the active link(s).

Active / standby groups are generally used when resilience is required, but it is not desirable for the LAG to pass more than a certain amount of traffic or for the available bandwidth to vary. Typical use cases are service provider environments where the customer only pays for a certain bandwidth and corporate networks with highly over-subscribed core.

Rules for LAGs

In order to be able to aggregate ports together certain rules must be obeyed. Fundamentally, the member ports must be homogeneous, but more specifically every member port must have the agree on the following:
  • Speed & Duplex - Since traffic is distributed by a simple hash, it is not possible to combine links of different speeds in the same bundle.
  • Encapsulation - i.e. all ports must use the same number of 802.1Q VLAN tags. For switches this means they must all be access or all be trunk. For routers such as the 7750 this means that the Ethernet encap type (null, dot1q or qinq) must agree between members. For switches in access mode, all member ports must be in the same VLAN.
  • For the 7750, the port type (access, network or hybrid) must agree across members and for the LAG
  • MTU - all member port MTUs must match and for Cisco switches, the same MTU must be configured on the port channel.
Note: the physical media type, i.e. copper or fibre, does not necessarily need to match between all LAG members.

Static Configuration

The simplest method of building a LAG does not involve any signalling or protocols at all and simply specifies the member ports to be aggregated. Here's an example of doing that on two different platforms:

Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown


Cisco 2950:
2950#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode on
Creating a port-channel interface Port-channel 1

2950(config-if-range)#no shut

In this setup, as soon as a port becomes physically up it becomes a member of the LAG bundle. The only, fairly minor, advantage of this is that the configuration is very simple. The disadvantage is that there is no method to detect any kind of cabling or configuration errors.

Note: The lack of any kind of misconfiguration detection makes static LAGs very dangerous to deploy in production networks.

LACP

LACP is the standards based protocol used to signal LAGs. It detects and protects the network from a variety of misconfiguration and fault conditions, ensuring that links are only aggregated into a bundle if they are consistently configured and cabled.

LACP must be configured in one of two modes:
  • Active mode - the device immediately sends LACP messages (LACPDUs) when the port comes up and must reach an agreement with the attached port before traffic will pass.
  • Passive mode - the device does not generate LACPDUs until it receives them. If no LACPDUs are received then the port aggregates as though statically configured. If LACPDUs are received then an agreement must be reached with the peer before traffic will pass.
In practice it is rare to find passive mode used in any properly designed network as it should be clearly and consistently defined which links will use LACP ahead of deployment.

Minimal LACP configuration

The minimal configuration is still very straightforward, requiring little additional CLI:

Alcatel-Lucent 7750:
A:7750# configure port 2/2/[19..20] ethernet mode access
*A:7750# configure port 2/2/[19..20] ethernet autonegotiate limited
*A:7750# configure port 2/2/[19..20] no shutdown
*A:7750# configure lag 1
*A:7750>config>lag$ mode access
*A:7750>config>lag$ lacp active

*A:7750>config>lag$ port 2/2/19
*A:7750>config>lag$ port 2/2/20
*A:7750>config>lag$ no shutdown


Cisco 2950:
2950#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
2950(config)#int range fa0/19 - 20
2950(config-if-range)#switchport mode access
2950(config-if-range)#channel-group 1 mode active
Creating a port-channel interface Port-channel 1

2950(config-if-range)#no shut
There is, of course, a lot more going on behind the scenes but most parameters assume default values which are perfectly acceptable for most situations.

LACP Terms and Parameters

There are a number of LACP-specific terms and parameter names that must be understood in order to make sense of LACP debug output and packet traces.

The first and arguably most fundamental concept is that of actors and partners. One of the really nice debugging features of LACP is that it echoes the parameters it receives back to the sender. To avoid confusion, the term actor is used to designate the parameters and flags pertaining to the sending node, while the term partner is used to designate the sending node's view of its peer's parameters and flags.

Per System:
Each network device has a LACP System ID. This is a 48 bit value which generally defaults to the chassis MAC address. The system ID is sent within every LACPDU and makes it easy to check that a LAG goes to the device you expect.

Each device also has a 16 bit LACP System Priority. The system priority is used to decide which system's port priorities are used to decide active / standby in the event that the two peers disagree. Lowest priority wins.

Per LAG:
Each LAG on a system will have a unique 16 bit LACP key, the purpose of which is to differentiate one LAG from another within the protocol. This number is locally significant and may or may not match between peers.The main purpose of the LACP key is to allow a system to detect cabling faults - if different LACP keys are received on members of the same LAG then we are connected to two different LAGs at the far end and, obviously, aggregating those together would be a bad idea.

LACP Flags:
The following flags are used to communicate state between systems:
  • Activity - Set to indicate LACP active mode, cleared to indicate passive mode
  • Timeout - Set to indicate the device is requesting a fast (1s) transmit interval of its partner, cleared to indicate that a slow (30s) transmit interval is being requested.
  • Aggregation - Set to indicate that the port is configured for aggregation (typically always set)
  • Synchronisation - Set to indicate that the system is ready and willing to use this link in the bundle to carry traffic. Cleared to indicate the link is not usable or is in standby mode.
  • Collecting - Set to indicate that traffic received on this interface will be processed by the device. Cleared otherwise.
  • Distributing - Set to indicate that the device is using this link transmit traffic. Cleared otherwise.
  • Expired - Set to indicate that no LACPDUs have been received by the device during the past 3 intervals. Cleared when at least one LACPDU has been received within the past three intervals.
  • Defaulted - When set, indicates that no LACPDUs have been received during the past 6 intervals. Cleared when at least one LACPDU has been received within the past 6 intervals. Once the defaulted flag transitions to set, any stored partner information is flushed. 

Bringing Links into Service

Assuming that the local configuration is consistent and LACPDUs are being exchanged across the link, the following flow chart roughly describes how to decide the value of the synchronisation, distributing and collecting flags.



If by the end your collecting / distributing flags are set then the link will be used for sending and receiving traffic. If not, it won't.

LACP Fault Detection

LACP can detect almost every conceivable patching error and will refuse to aggregate when that would be inappropriate. Following are a number of improper LAG topologies along with a description of how LACP detects and protects the network against them.

Split LAG

In the above scenario, LACP inspects the system ID field of incoming LACPDUs and refuses to aggregate any links whose system ID does not match that of the existing member(s).

Crossed LAGs

In the above scenario, LACP detects the cabling fault by inspecting the key ID on the incoming LACPDUs and refuses to aggregate any links whose key does not match that of the existing member(s).

Looped LAG

In the above scenario, LACP detects the cabling fault by inspecting the system ID and key of the incoming LACPDU. Some systems (e.g. Alcatel-Lucent 7750) allow different LAGs to be interconnected on the same chassis, however it is never allowed for two member ports of the same LAG to be connected.

Unidirectional Link Failure


In the scenario above, a unidirectional link failure has occurred so that LACPDUs are being lost in the direction A to B, but the ports remain physically up. LACPDUs that are lost are indicated in grey. In this situation, system B responds to the loss of three consecutive LACPDUs by clearing its synchronisation, collecting and distributing flags and setting its expired flag. System A responds immediately to the loss of sync by clearing its synchronisation, collecting and distributing flags.

LACP Troubleshooting

The most important part of troubleshooting LAGs is to properly understand the meaning and purpose of all the parameters, particularly the flags, before you begin. After that point, it is just a matter of knowing what CLI commands will show you the required information.

I recommend starting with the basics and working up:
  • Are the member ports physically up?
  • Are all member ports configured consistently (see LAG Rules above)?
  • Can you be sure the topology is as we expect?
    • Use LLDP or CDP if available
    • Use system ID, key and port ID values from the LACPDUs otherwise
  • Determine which end is unhappy (hint, it won't be sending sync).
  • Verify that messages are passing bi-directionally and are not being blocked by any kind of filter (hint, check that the partner details are populated on LACPDUs)
After following these checks you should be able to trace 95% of LAG problems. I, personally, prefer to check the flags, etc, using a packet capture. But then I would, because that's my answer to everything. Below are some CLI methods to gather the same information.

Alcatel-Lucent 7750

To get almost all the information you could ever want, use "show lag [number] detail":

A:7750# show lag 1 detail
===============================================================================
LAG Details
===============================================================================
Description        : N/A
-------------------------------------------------------------------------------
Details
-------------------------------------------------------------------------------
Lag-id              : 1                     Mode                 : access
Adm                 : up                    Opr                  : up
Thres. Exceeded Cnt : 2                     Port Threshold       : 0
Thres. Last Cleared : 12/21/2012 10:59:59   Threshold Action     : down
Dynamic Cost        : false                 Encap Type           : null
Configured Address  : 00:0a:aa:2e:af:ea     Lag-IfIndex          : 1342177281
Hardware Address    : 00:0a:aa:2e:af:ea     Adapt Qos (access)   : distribute
Hold-time Down      : 0.0 sec               Port Type            : standard
Per FP Ing Queuing  : disabled
LACP                : enabled               Mode                 : active
LACP Transmit Intvl : fast                  LACP xmit stdby      : enabled
Selection Criteria  : highest-count         Slave-to-partner     : disabled
Number of sub-groups: 1                     Forced               : -
System Id           : 00:0a:aa:2e:af:ea     System Priority      : 40960
Admin Key           : 32777                 Oper Key             : 32777
Prtr System Id      : 00:12:da:ab:fe:21     Prtr System Priority : 32768
Prtr Oper Key       : 1
Standby Signaling   : lacp

-------------------------------------------------------------------------------
Port-id        Adm     Act/Stdby Opr     Primary   Sub-group     Forced  Prio
-------------------------------------------------------------------------------
2/2/19         up      active    up      yes       1             -       32768
2/2/20         up      active    up                1             -       32768

-------------------------------------------------------------------------------
Port-id        Role      Exp   Def   Dist  Col   Syn   Aggr  Timeout  Activity
-------------------------------------------------------------------------------
2/2/19         actor     No    No    Yes   Yes   Yes   Yes   Yes      Yes
2/2/19         partner   No    No    Yes   Yes   Yes   Yes   No       Yes
2/2/20         actor     No    No    Yes   Yes   Yes   Yes   Yes      Yes
2/2/20         partner   No    No    Yes   Yes   Yes   Yes   No       Yes
===============================================================================
A:7750#

In this output you can see the local and remote flags, system IDs, system priorities and keys in use, whether the underlying ports are functioning and, if sub-groups are in use, whether local ports are active or standby. Note also that it shows you which port in the LAG is primary - if you want to edit anything such as MTU, QoS, etc, then you need to do it on the primary port. Your changes will then be pushed to the other ports automatically.

If you need to verify that LACPDUs are being received, you can use "debug lag [lag-id number] [port port-id] pkt". This will produce a debug message for every LACPDU sent or received, optionally filtered by LAG or by individual port:

A:7750# debug lag lag-id 1 pkt
980 2012/12/21 21:23:56.73 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
Xmit LACPDU on PortId 2/2/19"

981 2012/12/21 21:23:56.80 GMT MINOR: DEBUG #2001 Base LAG
"LAG: PKT
LACPDU rcvd on PortId 2/2/19"


A little light on detail, admittedly, but enough to prove whether they are arriving or not.

For more interactive debugging, a better choice might be "debug lag [lag-id number] [port port-id] sm" to indicate what is happening to the state machine for a given lag or port:

A:7750# debug lag lag-id 1 sm
852 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1: partner oper state bits changed on member 2/2/20 : [sync FALSE -> TRUE]
"

853 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :triggerMap 0 -> e after Rx SM"

854 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :running selection logic"

855 2012/12/21 18:55:37.67 GMT MINOR: DEBUG #2001 Base LAG
"LAG: SM
LagId 1 mem. 2/2/20 :MUX SM ATTACHED->COLLECTING_DISTRIBUTING"


The above is quite verbose as it generates state machine transitions every time a LACPDU is sent or received, but it is really the best way to troubleshoot state transitions.

Cisco 2950

There are a few LACP related show commands on IOS and the useful information is spread between them. Starting at the simple end, a high level overview of the LAGs on the system can be obtained using the command "show etherchannel":

2950#show etherchannel
                Channel-group listing:
                ----------------------

Group: 1
----------
Group state = L2
Ports: 2   Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol:   LACP

2950#

To find the local LACP system ID, use "show lacp sys-id":

2950#show lacp sys-id 
32768,0012.da12.abcd

Note that the part before the comma is actually the system priority.

Useful information about the remote device (our partner) can be found using "show lacp neighbor":

2950#show lacp neighbor 
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode

Channel group 1 neighbors
Partner's information:
                  LACP port                        Oper    Port     Port
Port      Flags   Priority  Dev ID         Age     Key     Number   State
Fa0/19    FA      32768     0003.abcd.aaa1   3s    0x8009  0x8894   0x3F
Fa0/20    FA      32768     0003.abcd.aaa1   3s    0x8009  0x8893   0x3F


This shows some useful information such as the timeout and activity flags, plus it allows you to verify the LACP keys being received on each port for consistency. If you need more information, add the "detail" keyword:

2950#show lacp neighbor detail
Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode

Channel group 1 neighbors
Partner's information:
          Partner               Partner                     Partner
Port      System ID             Port Number     Age         Flags
Fa0/19     40960,0003.abcd.aaa1  0x8894           11s        FA

          LACP Partner         Partner         Partner
          Port Priority        Oper Key        Port State
          32768                0x8009          0x3F

          Port State Flags Decode:
          Activity:   Timeout:   Aggregation:   Synchronization:
          Active      Long       Yes            Yes

          Collecting:   Distributing:   Defaulted:   Expired:
          Yes           Yes             No           No
          Partner               Partner                     Partner
Port      System ID             Port Number     Age         Flags
Fa0/20     40960,0003.abcd.aaa1  0x8893           11s        FA

          LACP Partner         Partner         Partner
          Port Priority        Oper Key        Port State
          32768                0x8009          0x3F

          Port State Flags Decode:
          Activity:   Timeout:   Aggregation:   Synchronization:
          Active      Long       Yes            Yes

          Collecting:   Distributing:   Defaulted:   Expired:
          Yes           Yes             No           No
2950#


Note that contrary to what you might expect, the "Port State Flags Decode" sections (highlighted in red) actually refer to the local flags rather than those being sent by the remote device. As you can see, in this example the remote end is requesting fast timeouts but the local end is requesting slow.

A fairly detailed overview of the local and remote state can be seen using the "show etherchannel detail" command:

2950#show etherchannel detail
                Channel-group listing:
                ----------------------

Group: 1
----------
Group state = L2
Ports: 2   Maxports = 16
Port-channels: 1 Max Port-channels = 16
Protocol:   LACP
                Ports in the group:
                -------------------
Port: Fa0/19
------------

Port state    = Up Mstr In-Bndl
Channel group = 1           Mode = Active          Gcchange = -
Port-channel  = Po1         GC   =   -             Pseudo port-channel = Po1
Port index    = 0           Load = 0x00            Protocol =   LACP

Flags:  S - Device is sending Slow LACPDUs   F - Device is sending fast LACPDUs.
        A - Device is in active mode.        P - Device is in passive mode.

Local information:
                            LACP port     Admin     Oper    Port     Port
Port      Flags   State     Priority      Key       Key     Number   State
Fa0/19    SA      bndl      32768         0x1       0x1     0x13     0x3D

Partner's information:
                  LACP port                        Oper    Port     Port
Port      Flags   Priority  Dev ID         Age     Key     Number   State
Fa0/19    FA      32768     0003.abcd.aaa1  26s    0x8009  0x8894   0x3F

Age of the port in the current state: 0d:00h:00m:24s
Port: Fa0/20
------------

Port state    = Up Mstr In-Bndl
Channel group = 1           Mode = Active          Gcchange = -
Port-channel  = Po1         GC   =   -             Pseudo port-channel = Po1
Port index    = 0           Load = 0x00            Protocol =   LACP

Flags:  S - Device is sending Slow LACPDUs   F - Device is sending fast LACPDUs.
        A - Device is in active mode.        P - Device is in passive mode.

Local information:
                            LACP port     Admin     Oper    Port     Port
Port      Flags   State     Priority      Key       Key     Number   State
Fa0/20    SA      bndl      32768         0x1       0x1     0x14     0x3D

Partner's information:
                  LACP port                        Oper    Port     Port
Port      Flags   Priority  Dev ID         Age     Key     Number   State
Fa0/20    FA      32768     0003.abcd.aaa1   0s    0x8009  0x8893   0x3F

Age of the port in the current state: 0d:00h:00m:27s
                Port-channels in the group:
                ---------------------------

Port-channel: Po1    (Primary Aggregator)
------------
Age of the Port-channel   = 0d:00h:00m:50s
Logical slot/port   = 1/0          Number of ports = 2
HotStandBy port = null
Port state          = Port-channel Ag-Inuse
Protocol            =   LACP

Ports in the Port-channel:
Index   Load   Port     EC state        No of bits
------+------+------+------------------+-----------
  0     00     Fa0/19   Active             0
  0     00     Fa0/20   Active             0

Time since last port bundled:    0d:00h:00m:28s    Fa0/19
2950#

For more interactive troubleshooting, there are debug commands present but be careful - on my (admittedly ancient) switch, LACP debugs were only available chassis-wide and were pretty verbose. The packet level debug ("debug lacp packet") for a single LACPDU is shown below:

2950#debug lacp packet
Link Aggregation Control Protocol packet debugging is on
19w0d: LACP :lacp_bugpak: Send LACP-PDU packet via Fa0/20
19w0d: LACP : packet size: 124
19w0d: LACP: pdu: subtype: 1, version: 1
19w0d: LACP: Act: tlv:1, tlv-len:20, key:0x1, p-pri:0x8000, p:0x14, p-state:0x3D,
s-pri:0x8000, s-mac:0012.da12.abcd
19w0d: LACP: Part: tlv:2, tlv-len:20, key:0x8009, p-pri:0x8000, p:0x8893, p-state:0x3F,
s-pri:0xA000, s-mac:0003.abcd.aaa1
19w0d: LACP: col-tlv:3, col-tlv-len:16, col-max-d:0x8000
19w0d: LACP: term-tlv:0 termr-tlv-len:0


Pretty detailed, so watch your CPU!

A rather useful alternative is "debug lacp fsm" - again this provides a very high volume of output but is the only practical way to see detailed info on state transitions via CLI:

2950#debug lacp fsm
Link Aggregation Control Protocol fsm debugging is on
19w0d:     lacp_mux Fa0/19 - mux: during state WAITING, got event 4(ready)
19w0d: @@@ lacp_mux Fa0/19 - mux: WAITING -> ATTACHED
19w0d: LACP: Fa0/19 lacp_action_mx_attached entered
19w0d: LACP: Fa0/19 Attaching mux to aggregator
19w0d:     lacp_mux Fa0/19 - mux: during state ATTACHED, got event 5(in_sync)
19w0d: @@@ lacp_mux Fa0/19 - mux: ATTACHED -> COLLECTING_DISTRIBUTING
19w0d: LACP: Fa0/19 lacp_action_mx_collecting_distributing entered
19w0d: LACP: Fa0/19 Enabling collecting and distributing
19w0d:     lacp_rx Fa0/19 - rx: during state CURRENT, got event 5(recv_lacpdu)
19w0d: @@@ lacp_rx Fa0/19 - rx: CURRENT
2950# -> CURRENT
19w0d: LACP: Fa0/19 lacp_action_rx_current entered
19w0d:     lacp_mux Fa0/19 - mux: during state COLLECTING_DISTRIBUTING, got event 5(in_sync) (ignored)
19w0d:     lacp_ptx Fa0/19 - ptx: during state FAST_PERIODIC, got event 3(pt_expired)
19w0d: @@@ lacp_ptx Fa0/19 - ptx: FAST_PERIODIC -> PERIODIC_TX
19w0d: LACP: Fa0/19 lacp_action_ptx_fast_periodic_exit entered


Very verbose indeed. Be careful with CPU load.

Frankly, if you can, it is better to troubleshoot with a port mirror and packet capture. The protocol is very good at telling you what it is doing as in addition to the periodic LACPDUs, triggered updates are generated whenever anything material such as sync state changes. Use a capture filter (see previous blog post "tshark one-liners" for more info) when capturing on links with a lot of user data.

Oddities

The value of the timeout flag sent by a device indicates the interval at which it expects the partner to send LACPDUs. The partner then should honour the request and send at the indicated interval.

The timeout value does not have to agree between peers. While it is not a recommended configuration, it is possible to bring up a LAG with one end sending every second and the other sending every 30 seconds. In this case, the end requesting fast timers will detect a silent failure in under 3 seconds while the end requesting slow timers will take up to 90 seconds to detect the same fault.

The configuration of sub-groups (and even whether to use sub-groups) does not have to agree between peers. The failure characteristics are often better if one end is configured with active / standby subgroups while the other is configured without any subgroups. In that case, as soon as the end with sub-groups decides to switch a new sub-group to active, the partner is already sending sync on all available links and will immediately put traffic onto the newly active sub-group.

The Alcatel-Lucent 7750 (and probably others, I've just not looked) sends an out of sync LACPDU upon detecting a LAG member go physically down. Normally that won't get through to the other end but in the event of a single fibre failure, for example, it serves tot inform the partner that the link is no longer usable and should be removed from the LAG bundle. This improves failover times considerably in the case where link loss is not forwarded (tens or hundreds of milliseconds as compared to 2 - 3 seconds).

Finally

If you got this far, you should probably download the IEEE 802.1ax-2008 standard.