Friday, 12 April 2013

LACP miscellanea

According to the statistics, a few people have stumbled across this blog because they were searching for certain 7750-specific information relating to LACP. Here are a couple of answers that were missed out from my main LACP article:

What is the failover time for a LAG / etherchannel? 

The answer to this question varies considerably depending on the setup. If a device notices a bundled interface going physically down then it should unbundle it immediately, causing very low loss (50ms should be achievable).

In the event of an interface remaining physically up (i.e. where there is transmission equipment or EoMPLS between the two devices), also known as a silent failure, the failover will be up to 3 times the LACP timer. So the impact would be up to 3 seconds using fast timers or up to 90 seconds using slow timers. Most lower end Cisco kit only supports slow timers.

In the event of a single fibre fault or other asymmetric failure, you may see a combination of these effects where traffic in one direction heals faster than the other. There are other corner cases such as when administratively shutting down an interface - some devices send an out of sync LACPDU to inform the other end the link is about to go away which helps speed convergence. It is really best to lab test where possible to check different failure scenarios.

Be aware that when using load balanced LAGs, the impact to some streams may be zero. Typically traffic that is hashed onto one link in the bundle will not suffer loss when a different link in the bundle fails.

How can I transport, rather than terminate, LACP through epipe services on the Alcatel-Lucent 7750? 

The answer to this is pretty straightforward, but I know I looked in the wrong place when I first needed to use the feature.

All you need to do is to configure "lacp-tunnel" under the configure -> port -> ethernet context.

How can I transport, rather than terminate, LACP through a QinQ tunnel on a Cisco switch? 

Again, this is pretty straightforward. There are a load of different protocols that can optionally be tunneled on a dot1q-tunnel port, but we just need lacp enabled:


Simply configure "l2protocol-tunnel point-to-point lacp" under the dot1q-tunnel interface.

Normally it makes sense to tunnel everything (STP, CDP, VTP, LLDP, LACP, PAgP, ...) for consistency. Either be a tunnel or don't!

What is the valid range for LAG IDs on the 7750?

For IOM-based systems (i.e. SR-7, SR-12), the usable LAG ID range is 1 to 200. For integrated IOM systems such as SR-1 and ESS-1, the LAG ID range is 1 to 64.


Can you tell me something unusual about LACP on the 7750?

When  the Rx fibre for a LACP-speaking port loses light (i.e. fails), right before the port gets pulled down the 7750 sends an LACP out-of-sync message to inform the other end that it is going away. This is useful for single fibre faults and can drastically improve convergence times, particularly where transmission equipment between the two LACP peers does not forward link loss.

What does "mux: during state COLLECTING_DISTRIBUTING, got event 5(in_sync) (ignored)" mean in a Cisco debug?

As best I can tell it means that a LACPDU was received with the sync bit set, indicating that the far end is ready to use the link, but the link was already collecting / distributing (i.e. in use) so no change in state was required.

What does "lag number : partner oper state bits changed on member port : [expired false -> true]" mean on a 7750 debug?

This means that a particular port's state machine moved into the expired state due to missing three inbound LACPDUs from the peer. Once the port reaches the expired state it is removed from the bundle but the peer parameters are remembered for a further 3 intervals, after which point the peer information is flushed and the port enters the defaulted state.

Do the LACP keys need to match at both ends of a LAG?

No - the LACP key is locally significant and corresponds one-to-one with a LAG or etherchannel ID. It is used to check consistency (i.e. to catch crossed cables) so must be identical for all members within a LAG, however the devices at each end of the LAG can select any value they like for any particular LAG.

Why do ports get "suspended" from an etherchannel?

Basically a port gets suspended if its configuration is not in line with that of the port channel with which it is associated. The most common way to accidentally arrive in this state is for a member trunk port to have a different allowed VLAN list than its parent port-channel interface. While IOS allows member ports to be reconfigured, it is much more sensible to make the configuration changes to the port-channel interface - the changes are then pushed down to the member ports automatically, avoiding this kind of conflict.


Friday, 1 March 2013

A Better Way to Compare 7750 Configs

One of the tasks I regularly have to perform as part of my job is to audit router configurations against basebuild templates and occasionally against each other. This can be quite labour-intensive, particularly as good, old-fashioned diff doesn't cope very well with hierarchical configs such as the 7750's. There are many things that make life difficult with traditional diff:
  1. Diff is more-or-less reliant on order being preserved between the files being compared. Some config elements are split across the config file when it is saved. Others may be stored in different locations depending on the software release. Certain items such as layer 3 interfaces are stored in the order that they were added, so two configs may contain the same interfaces but in a different order and traditional diff can't generally work that out.
  2. Traditional diff does not appreciate the significance of policy names, sequence numbers or service IDs in determining what should be compared to what. Ad-hoc insertions and deletions of policy entries or service often causes diff to get completely out of step, making it compare apples to oranges.
  3. Traditional diff does not provide context for differences. Quite often you just get a load of "shutdown" "no shutdown" pairs. It is possible to include a fixed number of lines pre- or post- difference, but even that does not always show the configuration context where the difference actually occurred and brings a load of junk with it. The only sure-fire way is to turn on side by side diff, which shows both configurations in full side by side with changes marked in a centre column.
Point 2 is the real killer, making traditional diff basically useless for audits where a handful of known policies must be validated within a config containing many other policies. Below is a (fairly common) worst case when working with diff - one policy has been removed and another two added:



Traditional diff makes a horrible job of this. Even seeing the two configs side by side it is confusing to look at and it is not immediately apparent what has changed.

I poked and played with a number of "diff" type tools to try and find something that would handle this kind of thing more gracefully but I eventually came to the conclusion that nothing currently existed. I had a rough idea of what I wanted:
  •  It must only compare the contents of like policies, i.e. policy "A" should only be compared to policy "A" and never to policy "B".
  • It must compare configuration elements that appear in a different order in one config to the other.
  • It should, ideally, report the full context of each difference within the hierarchy.
To address these points I decided to write a tool from scratch and called it, unimaginatively, 7750diff. Here's how 7750diff reports the same changes:

D:\7750diff>7750diff a.cfg b.cfg
Unique to a.cfg:
configure
    qos
        scheduler-policy "20000kbps" create
            description "20000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 20000 cir 20000
                exit
            exit
        exit
    exit
exit
Unique to b.cfg:
configure
    qos
        scheduler-policy "25000kbps" create
            description "25000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 25000 cir 25000
                exit
            exit
        exit
        scheduler-policy "40000kbps" create
            description "40000kbps Scheduler"
            tier 1
                scheduler "tier1" create
                    rate 40000 cir 40000
                exit
            exit
        exit
    exit
exit

D:\7750diff>


That's not only much clearer (IMHO), but I can take the output and copy / paste it directly into the node to build the missing configuration. Happy days!

How it Works

The basic methodology used by 7750diff is to:
  • Read each config into a hierarchical tree structure based on indent levels
  • Recursively compare the trees starting at the root:
    • For each branch of config A, search for an identical branch on config B within the same context.
    • If a match is found, check for subordinate "child" configuration elements.
    • If any children exist, recursively process them.
    • If no children are then present, remove the matching elements.
Once all elements have been compared only the elements unique to each config will remain, along with the parent elements required to reach the configuration context of the change. The trees can then be output as a list of differences between the two files. In many cases, thanks to the inclusion of context, big chunks of 7750diff output can be directly entered into one node to bring its config into line with the other.

Since the whole thing runs on indents it is not 7750 specific. It "may" work on ISAM configs, it may work on any other sort of config that uses indent / whitespace to denote hierarchy - I just haven't tested it. Try your luck :)

Obtaining the Tool

As usual, the tool is available for download at my github: https://github.com/theclam/7750diff - there is the C source code plus a Windows binary available to download.

The code is pretty horrible as this was my first bit of C coding in over a decade. I may decide to clean the code up at some point but it is quite stable now, meaning I haven't found a config that upsets it for the last couple of versions. I've been running a nightly cron job for quite some time now which pulls down an entire lab's configs and 7750diffs each one against the previous day's and it works great.

If you download 7750diff, please let me know how you get along with it.

Friday, 1 February 2013

A Tool for Measuring Forwarding Delay in Packet Captures

I have access to some pretty expensive test kit in work. One of its main purposes in life is to measure the latency of traffic streams passing through a network, which is a pretty useful feature. Occasionally, though, the figures produced can be hard to believe and it would be nice to be able to validate them independently. It would also sometimes be nice to be able to see down to the packet level how the delay varies over time.

Since the tester inserts a "unique" signature into each frame, it is possible to do the calculations by hand - simply take a packet capture of traffic entering and leaving the device (at minimum capture using 2 ports on the same box, preferably use one port with a two-source port mirror), then manually compare the timestamps of the packets pre- and post-routing.

Finding matching pairs of packets is pretty tedious, especially for large captures and particularly where high throughput rates mean that there may be thousands of other frames between a "before" and "after". The technique is sound, though, if you're patient.

For some work I was doing recently, I needed to do this on a grand scale. A multi-megabit stream did not appear to be queuing as expected and it was unclear why. Eventually that particular problem was traced, using Wireshark IO graphs, back to overly bursty traffic being offered into the device under test but it made me think it would be very nice to have a tool for doing this kind of verification and, actually, it would not be difficult to write one. So I wrote a tool, in case I needed it in a hurry later on. The impatient may just want to scroll down to "Obtaining the Tool".

How it Works

Packet headers are inevitably be changed by routing (and, in fact, encapsulation could be added or removed in the process) so packet payloads must be compared in order to find "before and after" pairs. The Spirent TestCenter tester includes a 20 byte "signature" in each generated packet, which is always at the very end of the payload. In practice it is not necessary to compare the entire, variable length, payload to pair up packets. Rather it is sufficient, and much faster, to compare the last 20 bytes of each packet for a matching signature.

The process implemented in the tool is to read each packet from the pcap file, storing the following details in a list entry:
  • Frame number
  • Arrival time
  • Signature
Each entry is then stored in a linked list, as in the following diagram:


Each packet read in adds another node of around 36 - 44 bytes in size. This is smaller than the original capture but can still be a considerable amount of memory when working with very large captures.

Once the complete list has been built, the next job is to identify the "before and after" pairs. This is done by considering each list entry in turn, then looking forward in the list for an entry with a matching signature. If such an entry is found, the frame numbers and timestamps of each frame are output along with the time delta between the two frames. Pretty simple, really.

Obtaining the Tool

The tool is available to download as C source code and as a Windows binary at https://github.com/theclam/fwding.

To build from source code simply extract the source,  change into the directory and type "make".

Using the Tool

Once the binary has been compiled or downloaded, simply run it with the name of the pcap file as its only parameter: For example:

lab@lab:~/Projects/fwding$ ./fwding input.cap
Arrival Frame Number, Arrival Time, Departure Frame Number, Departure Time, Forwarding Delay
1, 1359461693.826304, 2, 1359461693.826354, 0.000050
5, 1359461693.826418, 6, 1359461693.826468, 0.000050
7, 1359461693.826585, 8, 1359461693.826701, 0.000116
9, 1359461693.826818, 11, 1359461693.826946, 0.000128
10, 1359461693.826830, 12, 1359461693.826958, 0.000128
17, 1359461693.826999, 18, 1359461693.827014, 0.000015
21, 1359461693.827078, 22, 1359461693.827128, 0.000050
25, 1359461693.827192, 26, 1359461693.827242, 0.000050
27, 1359461693.827359, 29, 1359461693.827488, 0.000129
[...snip...]

The output produced is standard CSV-formatted text.which can be piped or redirected to a file as necessary for manipulation by your favourite spreadsheet or command line tool. Timestamps are in seconds since Unix epoch. Delay is reported in seconds.

Note: The pairing-up mechanism is highly dependent on the test traffic containing unique data in the last 20 bytes of each frame. For tester traffic that's taken care of automatically but your mileage with "real" traffic will vary. I would expect that FTPing a compressed file or playing music / white noise over VOIP  should give relatively good entropy to your data if a tester is not available. For best results filter out non-test traffic beforehand - OSPF hellos and LACPDUs are very repetitive so will generate lots of false hits.

For example, if you want a quick-and-dirty graph of latency over arrival time using gnuplot, just pipe the output to file then use a command such as:

gnuplot> plot "all-ways.txt" using 2:5 with points pt 2


Alternatively, graph the latency by frame number:

gnuplot> plot "all-ways.txt" using 1:5 with points pt 2

Hopefully your output won't look like this - it is an intentionally odd example caused by sending very bursty traffic.

Finally

I always ask but it's never happened yet... if you try the tool out, please leave a comment. I'm interested in feedback, good or bad, and if it doesn't quite do what you want I may change it!

Tuesday, 22 January 2013

RADIUS and L2TP Support Added to "dechap"

This is just a very short message to say that I have enhanced the "dechap" tool mentioned in my previous post. In addition to the original PPPoE support it can now extract and attack CHAP authentications sniffed from RADIUS and L2TPv2 protocols.

The syntax remains exactly the same and it should "just work". The code is available to download at https://www.github.com/theclam/dechap. Please post a comment if you have any feedback or suggestions.


Monday, 21 January 2013

Recovering CHAP Passwords from Sniffed PPPoE Sessions

In a previous blog post I outlined the theory behind setting up a PPPoE session including PPPoE discovery, LCP, NCPs and, more relevant to this post, the basics of CHAP authentication. At the time I was writing the post I wondered how easy it would be to work back from the CHAP messages on the wire to the original credentials, so I decided to find out.

Recap of CHAP Theory


As a reminder (or a very quick introduction), the CHAP process works something like this:

CHAP Authentication
  1. The party requiring the opposite peer to authenticate (i.e. "server") sends a CHAP challenge message containing a challenge ID and some unpredictable "random" data.
  2. The party being authenticated (i.e. "client") concatenates the authentication ID, the password and the challenge data into a single unit, then generates an MD5 hash of that. The resulting hash, plus the client name (user ID or hostname) is passed to the server as a CHAP response.
  3. The server compares the incoming hash to the value it obtains by performing the same calculation locally and returns a CHAP success or CHAP failure message.
Now, clearly, if the CHAP challenge and response messages can be captured then an offline brute force attack can be mounted against the password. This can be achieved by simply extracting the authentication ID, challenge data and response from the relevant messages and then trying candidate passwords until one (hopefully) generates a hash identical to that seen in the response message.

The Attack in Practice

While the process is intuitively simple, as usual there are a few corner cases to cover. Recovering CHAP authentications from a capture file full of other junk requires a certain amount of processing logic, then responses must be re-united with their corresponding challenges before they can be attacked.

Gathering CHAP Packets


I wanted the tool to be flexible with regards to encap. Since I work primarily on carrier networks, I get really frustrated by tools that do a job perfectly but only accept untagged, unencapsulated frames. Once you have a packet capture in your hand, realising that it can't be used because it has two VLAN tags and a pair of MPLS labels is a nuisance.

The approach that seemed most sensible was to build a recursive decap function which would take in a (partial) frame plus a "hint" as to what type of header to expect. The function would then check for and record any matching criteria present (i.e. MACs for Ethernet, VLAN ID for 802.1Q - more on this in the next section) before either returning or calling itself on the remainder of the packet with a "hint" derived from the current header.

Worked Example

Let's process the following frame as an example. Data in black are used by the algorithm while data in grey are not.

[Ethernet][VLAN][VLAN][PPPoES][PPP][CHAP]
The initial call to the function passes the entire frame with an "Ethernet" hint. In the Ethernet header, the source and destination MAC addresses are read and stored. The EtherType field contains 0x8100, indicating an 802.1Q VLAN header is next. The function calls itself against the contents of the frame from byte 15 and on with a hint of "VLAN".

[VLAN][VLAN][PPPoES][PPP][CHAP]

Now the function reads and stores the VLAN ID. Since this is the first VLAN we have seen it is stored as the C-VLAN for now. The EtherType is, again, 0x8100 so the function calls itself against bytes 5 and onward using a hint of "VLAN".

[VLAN][PPPoES][PPP][CHAP]


Again,  the function reads and stores the VLAN ID. Since this is not the first VLAN tag found, the previously known VLAN ID is moved into the S-VLAN field and the value from the frame is stored in the C-VLAN field. This time the EtherType is 0x8864, indicating a PPPoE session header follows. The function calls itself against bytes 5 and onwards using a hint of "PPPoE".

[PPPoES][PPP][CHAP]


The function now reads and stores the PPPoE session ID (SID). The only valid thing to follow a PPPoE session header is a PPP header, so the function calls itself on bytes 7 and onward, using a hint of "PPP".



The function now simply checks that the protocol ID in the PPP header is 0xC223 for CHAP. If so, it calls itself one last time against bytes 3 and onward using a hint of CHAP.

[CHAP]


Finally we are down to the payload. The CHAP message type is checked and:
  • For challenges, the authentication ID, challenge length and challenge data are stored.
  • For responses, the authentication ID, response and client name are stored.
Each instance of the function can then return to its parent, eventually resulting in a fully populated record of all the data relevant to authentication. The completed records can then be stored in a doubly linked list for later consumption.

Pairing Up

A CHAP response must be paired up with its respective CHAP challenge, otherwise the maths don't work. In real life there may be several authentications in progress at one time across multiple PPPoE sessions, possibly over multiple different VLANs. Often the CHAP authentication ID is only unique within a PPPoE session. Similarly, the PPPoE session ID only needs to be unique within a broadcast domain so these are often re-used across VLANs. Care must be taken to ensure that the challenge and response really do belong together.

In order to be considered a challenge / response pair, I decided the following criteria must match:
  • Server and Client MACs
  • S & C VLAN IDs (if present)
  • PPPoE SID
  • CHAP authentication ID
I considered including MPLS labels in this but I struggled to think of a realistic scenario in which two authentications would match the above criteria but use a different label.

Additionally, the thought occurred that even with the above details matching, there may be more than one challenge / response pair for the same PPPoE session so a response would have to be paired with the most recent challenge for which the criteria matched. In the program this is achieved by working backwards through the linked list, starting at the response, until a match is found. Data from matching challenge / response pairs are stored in another list for later consumption. If the search reaches the beginning without a matching challenge being found then the response cannot be used and is ignored.

Brute Force Password Guessing


For each challenge / response pair in the list, the next step is to cycle through a list of password guesses. Each candidate password is combined with the authentication ID and challenge data from the captured authentication and hashed. The resulting hash is compared to the one from the captured response and, for those that match, a correct guess is reported. If no password generated a matching hash then the word list does not contain the correct password and this is also reported back.

Downloading the Tool

The C source code may be downloaded from: https://github.com/theclam/dechap

Provided the OpenSSL dev libraries are installed it should be possible to simply extract the source code, cd into the directory then run "make".

In the future I may add the capability to pull the auths from L2TP or RADIUS interactions but for now only PPPoE is supported. It also assumes that Ethernet control words are not present in MPLS encapsulated traffic.

Using the Tool


The usage is pretty straightforward - there are only two parameters and both are mandatory. Specify your capture file (original pcap format) with the -c flag and your word list with the -w flag. Here's an example:

lab@lab:~/dechap$ ./dechap -w mywords.txt -c someauths.cap
Found password "tangerine" for user user1@testisp.com.
Unable to find a password for user user2@testisp.com.
Found password "password1" for user user3@testisp.com.
Found password "Africa" for user user4@testisp.com.
Found password "Frankenstein" for user user5@testisp.com.
lab@lab:~/dechap$

Considering that I've made no effort at all to make the code efficient, I've found the speed pretty good. On my '90s PC, a worst-case run (i.e. where no passwords are found) against 800 auths with 100k candidate passwords, a run still completes inside a minute. I don't think that's bad for parsing 15,000 packets and running 80 million concatenate - hash - compare sequences.

If you try this out, please leave a comment on this post with your experiences - good or bad.

Friday, 11 January 2013

A Script to Bring Up a PPPoE Sessions using Python & Scapy

As I mentioned in my previous post, I have put together a script which can bring up a PPPoE session, authenticate using CHAP, negotiate an IP address and send / receive traffic. The script is written in Python and requires a relatively up to date version of scapy (I use v2.2.0-dev, just grab the latest from http://www.secdev.org/projects/scapy/).

I warn you now that I am not a professional coder (or even a particularly keen amateur) and I don't really get on with Python... so don't be surprised if it looks a bit C-like!

To run the script, simply download PPPoESession.py from https://github.com/theclam/PPPoESession-Python and call it from within Python:

root@labpc:~# python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> execfile("PPPoESession.py")
__main__:2: DeprecationWarning: the md5 module is deprecated; use hashlib instead
WARNING: No route found for IPv6 destination :: (no default route?)
/usr/local/lib/python2.6/dist-packages/scapy/crypto/cert.py:10: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
  import os, sys, math, socket, struct, sha, hmac, string, time
/usr/local/lib/python2.6/dist-packages/scapy/crypto/cert.py:11: DeprecationWarning: The popen2 module is deprecated.  Use the subprocess module.
  import random, popen2, tempfile
>>>


You can expect to see a few deprecation warnings, depending on which version of Python is in use.

The script defines the PPPoESession class, plus a few other miscellaneous functions for encapsulating and extracting parameters. The PPPoESession class inherits from the scapy Automata class, so all the useful features of that class such as graph() and easy debugging are available. See the scapy Automata wiki entry (http://trac.secdev.org/scapy/wiki/Automata) for more details.

In order to bring up a PPPoE session, a PPPoESession object needs to be instantiated and a few parameters need to be set. At minimum the Ethernet interface, username and password need to be configured:

>>> p = PPPoESession()
>>> p.iface="eth1"
>>> p.username="spongebob@bodges"
>>> p.password="password"


Once that is done, the automaton can be started using the runbg() method. The state machine then runs in the background, returning control to the user. Messages will appear as it goes through the motions of bringing up the PPPoE session, then the PPP session, then authenticating before finally completing IPCP:

>>> p = PPPoESession()
>>> p.username="spongebob@bodges"
>>> p.password="password"
>>> p.iface="eth1"
>>> p.runbg()
>>> Starting PPPoED
Starting LCP
Got CHAP Challenge, Authenticating
Authenticated OK
Starting IPCP
Peer provided our IP as 123.4.5.6
IPCP is OPEN

>>>

Once IP is negotiated, the automaton will stay in the IPCP_OPEN state, able to send and receive IP packets and automatically responding to any LCP echoes that arrive.

From that state, the following methods may be called:

recv_queuelen() - returns the number of packets waiting in the receive buffer
recv_packet() - returns and de-queues the first packet in the receive buffer
send_packet(IPPacket) - transmits the given IP packet over the PPPoE session
ip() - returns the IP address given to the client
gw() - returns the peer's IP address

Here's an example of passing some traffic on an open session by pinging the gateway:

>>> p.recv_queuelen()
0
>>> p.send_packet(IP(src=p.ip(), dst=p.gw())/ICMP())
>>> p.recv_queuelen()
1
>>> p.recv_packet()
<IP  version=4L ihl=5L tos=0x0 len=28 id=1 flags= frag=0L ttl=64 proto=icmp chksum=0xbd0f src=1.1.1.1 dst=123.4.5.6 options=[] |<ICMP  type=echo-reply code=0 chksum=0xffff id=0x0 seq=0x0 |<Padding  load='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' |>>>
>>>

The script is still very much a work in progress. There is, for example, no clean way to gracefully shut down the PPP session at the moment and it doesn't handle incoming Terminate-Requests, either. I am hoping to add that, and more, soon.

Have a play with it and let me know what you think, good or bad :)

Thursday, 10 January 2013

Bringing Up a PPPoE Session - The Theory

In a previous post, I shared a Scapy script that implements the PPPoE discovery stage and stops once the session stage is reached. As handy as that script is for testing AC Cookie validation, it is not particularly useful for anything else. It would be much better if the script could bring the PPP session all the way up.

Luckily, the PPPoE discovery script is a cut-down version of another script that I wrote a long time back which goes all the way from PPPoED, through LCP and CHAP authentication and stops at IPCP. At the time, the script was far too messy to share but I've tidied it up and it is now in a state that it could be useable by others. I've also added IPCP negotiation and a couple of methods for sending and receiving IP traffic over the resulting session.

Before I present the script, I'll cover the theory involved, step by step. The impatient may want to just go to the next post (when it is available) for the script itself and instructions on how to run it.

PPPoE Discovery

PPP is (a) point-to-point protocol, designed to run over a dedicated link between two devices. Ethernet is a multi-access network, so if we want to run PPP over Ethernet then we need a mechanism to discover peers and establish a point-to-point relationship between two devices over the shared medium.

PPPoE provides this service and operates in two distinct stages:
  1. Discovery: The discovery stage is responsible for locating PPPoE peers and negotiating session parameters so that, ultimately, a PPPoE session can be created.
  2. Session: Once the discovery stage is complete the protocol enters the session stage, at which time the two peers have a tunneled connection between them over which to start passing PPP.
Once the session stage is reached, the peers bring up and operate their PPP session exactly as they would over a dedicated link.

The diagram below summarises the PPPoE "Discovery" stage:

PPPoE State Transitions
The first step in the journey is to find a PPPoE access concentrator which is willing to terminate our session. To do this, we must broadcast a PPPoE Active Discovery Initiation (PADI) message. It is possible to specify a service name in the PADI - this is just a string that identifies a particular type of service in which the client is interested. The access concentrator may use this to decide whether or not to offer to terminate the session, though in most cases it is just ignored. For this reason clients generally use an empty service name.

Any access concentrators listening on the segment will receive the PADI message, inspect its contents and then make a decision whether or not to make an offer to terminate the client's session. If the access concentrator is willing to terminate the session, it signals this to the client by sending a unicast offer (PADO) message. Typically, the PADO has an AC-Cookie attached to it - essentially the AC-Cookie is an "unpredictable" string, derived from the client's MAC address, which the access concentrator uses to mitigate against certain kinds of resource exhaustion attacks. When AC-Cookies are used, a PADO is generated 'mechanically' from the incoming PADI and no state is created on the access concentrator at this point.

When the client has received at least one PADO, it must select a favourite. It is common to just use the first offer received, but other selection criteria may be used. The client then sends a unicast request (PADR) to the chosen access concentrator, indicating that it would like to access its offer. If an AC-Cookie was contained in the PADO message then is echoed back in the PADR. The requirement to echo the cookie back to the access concentrator is designed to validate that the client really exists and is available on the MAC address where the PADO was sent.

Finally, it is up to the access concentrator to confirm that it the session has been created. If AC-Cookies are in use then the incoming PADR is examined to check whether the AC would have generated the provided cookie given the source MAC - in the case of a mismatch the PADR is silently dropped, otherwise the session state is created in the AC and a session (PADS) message is unicast to the client to confirm that the session has been created and the "Session" stage has begun. The PADS always contains a PPPoE session ID number, which is used to discriminate between multiple PPPoE sessions on the same LAN. The session ID is used to differentiate between multiple PPPoE sessions on the same LAN and must be present in the header of every PPPoE frame exchanged with the AC during the "Session" stage.

The fifth type of PPPoE discovery message is the terminate (PADT) which, as its name suggests, is used to terminate (i.e. end) a session which has been established. Either end may send a PADT message to close the session and once a PADT has been received, no further traffic may be sent for that session.

PPP

PPP itself consists of a number of sub-protocols. There are:
  • Link Control Protocol (LCP) which is responsible for negotiating overall link parameters
  • PAP and CHAP which are used for authentication
  • A family of Network Control Protocols (NCPs) used to negotiate the transport of each upper layer protocol
PPP also defines that once a higher layer protocol has been negotiated by its corresponding NCP, that protocol's traffic will be encapsulated with header indicating that particular protocol's protocol number.

Link Control Protocol (LCP)

RFC 1661 defines LCP as the protocol that is responsible for "establishing, configuring,
and testing the data-link connection." Essentially this means that LCP is used to bring up and take down PPP links, negotiate the configuration parameters and check that the link is still alive. There are a range of LCP codes which are used to fulfil these aims, discussed below.

Configuration Type Codes

In order to bring up a PPP session both peers must agree on certain parameters, for example the maximum size of frame that may be passed, whether to use compression and so on. Both peers propose the settings they would like to use - the opposite peer will then either acknowledge (accept), nak (i.e. suggest alternative) or reject (outright refuse) the proposed options. The aim is to reach a state where the opposite peer has acknowledged the locally proposed parameters.

The following LCP codes are standard and must be implemented:

Configure-Request - Used to propose a set of parameters that we would like to use for the session. The peer will then respond to the proposed parameters with one of the next three responses.

Configure-Ack - Used to advise the peer that their proposed parameters are acceptable. The accepted parameters are echoed back in the ack message.

Configure-Nak - Used to advise the peer that their proposed parameters are not acceptable and that the alternative values should be used. The proposed changes are attached to the nak message.

Configure-Reject - Used to advise the peer that their proposed parameters are not supported and cannot be used. The unacceptable parameters are echoed back in the reject message.

Termination Type Codes

Either peer may request to terminate the session at any point and the opposite peer must honour that request. There are two termination related codes in LCP:

Terminate-Request - Generated by a peer to initiate the tear-down of the link. A Terminate-Request should be re-sent if no Terminate-Ack is received in response.

Terminate-Ack - Generated to confirm receipt of a Terminate-Request. A Terminate-Ack must be generated in response to a Terminate-Request.

Liveness Check Codes

LCP includes a ping-like echo mechanism to verify that the opposite peer is still available, with LCP in an open state and is responding. The same mechanism is used to detect a looped interface - due to the symmetric nature of PPP it's quite possible to negotiate a connection to yourself without necessarily realising or for a connection to be looped mid-session. The following codes are used for liveness checks:

Echo-Request - Sent to the remote peer to solicit an Echo-Reply message. There is no requirement to negotiate the use of LCP echoes and an Echo-Request may be generated at any time while LCP is open. If the Magic-Number option was negotiated during LCP, the Echo-Request must contain the "random" 4 octet magic number decided at that time.

Echo-Reply - Sent in response to an Echo-Request message. When LCP is open, an Echo-Reply message must be sent whenever an Echo-Request is received. The magic number contained within the incoming Echo-Request must be copied into the outgoing Echo-Reply. If the incoming packet has our magic number then the connection has become looped.

Other Codes

There are other codes such as Code-Reject, Protocol-Reject and Discard-Request which do pretty much what you would expect. You don't get to see them very often so I will not discuss them here. I suggest referring to RFC 1661 for more detail on these.

LCP State Diagram

Below is a simplified state diagram showing how LCP makes its way from the "Starting" state into an "Opened" state. Most parts of PPP are referred to as "open" when they are up and running. I have omitted a number of transitions that deal with strange corner cases (like if the peer acks something we never sent, etc) and also transitions related to closing the connection (the Term commands discussed above). RFC 1661 contains a complete state transition table which is far more complex. If you bear in mind that at any stage either peer may terminate the session then this minimal version will cover 95% of "normal" cases.

LCP State Transitions

Authentication

Once LCP is open, the next stage is typically to start authentication. Authentication may be done by either, neither or both the peers as negotiated by LCP and can be done using plaintext PAP or MD5 hashed CHAP. If no authentication was negotiated by LCP, an implicit pass is assumed.

PAP is hardly ever used these days, is strongly discouraged and in any case is pretty simple, so I will not discuss it here. Please refer to RFC 1334 if you require details on PAP.

CHAP, though not immune to attack, offers reasonable security. The password itself is never sent "over the wire" and there is good protection against replay attacks via the use of random challenges. Here is how CHAP operates:

CHAP Authentication

Essentially, security is provided in two ways:
  1. The password is never exchanged in the clear but instead is passed through a one-way cryptographic hash function. It is computationally infeasible to recover the password from the hash function's output, so it is quite safe to pass this output over the wire.
  2. If the client just hashed the password, then it would be possible for an attacker to capture the hashed value and authenticate with the server at a later time by simply replaying the same response. CHAP requires the server to generate a random challenge string, which is also fed into the hash function and affects its output. Provided the server never re-uses a challenge value, an attacker cannot simply replay a previous authentication response to gain access.
When the CHAP response comes in, the server compares the received hash value with the output of a local calculation using the same method to determine whether the authentication attempt was successful. While this is precisely true when the server has a local copy of the password, typically this is not desirable and in practice the authentication check is deferred to an external RADIUS server. In order for the RADIUS to validate the attempt, the server must pass it a copy of the ID and challenge sent, plus the response received. The RADIUS can then use the ID, its own copy of the plaintext password and the challenge value to compute the expected response. If the expected and actual responses match then the RADIUS will return an "Accept" response, otherwise it will return a "Reject" response.

Network Control Protocols (NCPs)

Before any higher layer protocol can be passed through a PPP tunnel, it must be negotiated by a corresponding NCP. For example before you can pass IP through a PPP tunnel, IPCP must be open, indicating that all the required IP parameters have been successfully negotiated. To pass OSI traffic, OSICP must be open. For IPv6, IPV6CP is used.

The operation of each NCP is different but they all essentially follow the same model as LCP - parameters are proposed by each peer and ack'd, nak'd or rejected by the opposite peer. - and the state transition diagram pretty much looks the same.

IPCP

I'll go into a little more detail on IPCP since that is the most commonly used (for now) with a worked example of a DSL subscriber connecting to his ISP, starting immediately after authentication succeeds.

Client Side

The client generally does not know anything when it first connects and relies on the server to provide it with everything it needs. The client sends a Configure-Request proposing an IP address, primary and secondary DNS of 0.0.0.0. Proposing 0.0.0.0 for these is actually an  explicit request for the server to provide legitimate values for the client to use.

The server will then respond with a Configure-Nak message containing the IP address and DNS servers that the client should use.

The client will then send another Configure-Request with the newly acquired details, to which the server responds with a Configure-Ack.

Server Side

The server will typically send out a Configure-Request containing only its own IP address. There is no reason to argue over this so the client should just respond with a Configure-Ack. If the client tries to push a different address to the server using a Configure-Nak, it is typically ignored and after a few retries the session gets pulled down.

Passing Traffic

Once the two peers are agreed and IPCP is open, IP packets may be passed through the PPP tunnel by attaching a header - in most cases, for PPPoE connectivity, the PPP header consists of only a two byte protocol number (0x0021 for IP). The protocol number is analogous to the EtherType field of an Ethernet frame and indicates to the receiver how to interpret the payload. Alternative encapsulations exist - refer to RFC 1662 for more details on HDLC style framing which is often seen in L2TP.

Further Reading

That about covers the protocols involved in bringing up a PPPoE session at a high level. If you require more information I would suggest turning to the following RFCs:

RFC 2516 - PPPoE - http://tools.ietf.org/html/rfc2516
RFC 1661 - PPP - http://tools.ietf.org/html/rfc1661
RFC 1994 - CHAP - http://tools.ietf.org/html/rfc1994
RFC 1332 - IPCP - http://tools.ietf.org/html/rfc1332
RFC 1877 - IPCP extensions for DNS - http://tools.ietf.org/html/rfc1877