Thursday, 27 November 2014

Removing VLAN/MPLS/PPPoE/GRE/GTP/VXLAN Encapsulation Headers from pcap Files

Many years ago, when I worked in a school, I used to port mirror our proxy server to an old PC running driftnet and leave the screen where the kids could see it as a warning that staff could "see what you're doing on the Internet". I haven't played with driftnet since but certainly at the time it could only handle native frames (no VLAN tags, certainly no MPLS or PPPoE). I vaguely remember some other tools being similar, unfortunately I can't remember which ones.

Looking at the analytics for this blog, I can see I'm not the only one who's had the problem. It's certainly not the number one issue that people are searching for when they get here but there have been a few and the thought occurred that the packet processing engine I wrote for dechap would be really good for this task - it already stripped back VLANs and MPLS, plus it knows how to detect PPPoE and L2TP.

After a couple of hours it was working to the point of being able to strip VLANs and MPLS off, with a little more effort PPPoE also gave way. GRE came quite easily, too, as it has simple headers and uses the same etypes as Ethernet.

Anyway, here is "stripe" (from STRIP Encapsulation), a command line tool which takes a pcap file as input, re-assembles IP fragments and strips off all the encap it can (currently VLAN tags, MPLS shim headers, PPPoE, L2TP, GRE GTP and VXLAN) then outputs another pcap containing just payload over Ethernet.

**UPDATE** - Version 0.3b now adds support for VXLAN.

Download


Stripe is available from my github: https://github.com/theclam/stripe

Usage


The command line is pretty straightforward, as shown in the online help:

Harrys-MacBook-Air:stripe foeh$ ./stripe
stripe: a utility to remove VLAN tags, MPLS shims, PPPoE, L2TP headers,
etc. from the frames in a PCAP file and return untagged IP over Ethernet.
Version v0.1 alpha, November 2014

Usage:
./stripe -r inputcapfile -w outputcapfile

Where inputcapfile is a tcpdump-style .cap file containing encapsulated IP 
outputcapfile is the file where the decapsulated IP will be saved

Harrys-MacBook-Air:stripe foeh$ 

Simply specify the files you want to read encapsulated packets from (-r) and write the cleaned up packets to (-w). Stripe will remove as many layers of encap as it can until you are left with straight payload over Ethernet.

How it Works


The majority of stripe's work is done by the "decap" function. This function takes in a block of memory, a length parameter, a data type hint and a frame template. The process runs as follows:


  1. If the type is Ethernet, populate the source / destination MACs of the frame template
  2. If the type has an Ethertype or protocol type field, use this to populate the ethertype of the frame template
  3. If the next protocol is possibly or definitely payload, set the payload pointer of the frame template to the address of the next protocol and return
  4. If the next protocol is possibly or definitely encapsulation, call decap against the remainder of the packet
So essentially it eats up encap, recording MACs and protocol types as it goes, until there is no more encap left. By the end there is a fully populated frame template with source and destination MAC (the innermost copy if there are multiple as in the case of MPLS pseudowires), the etherype of the payload and the payload itself. Piecing these together gives a minimally encapsulated frame, i.e. one with just an Ethernet header and payload.

Here is a worked example for a frame with VLAN, MPLS, GRE over IP and an IP payload:


Step 1 - The "decap" function is called on the entire frame. Since the first header is Ethernet, the frame template gets populated with the source / destination MACs and the etype from the Ethernet header. The frame template's length field gets populated with the size of the frame minus the Ethernet header and the payload pointer is adjusted to point at the next header. The decap function then calls itself on the remainder of the frame, hinting that the type is VLAN tag based on the current header's etype.


Step 2 - The decap function now considers the partial frame starting at the VLAN tag. Since the VLAN tag has an etype associated, the frame template's etype is overwritten with the one from the VLAN header. The length is overwritten with the length of the payload after the VLAN header and the pointer adjusted to point at the next header. The decap function then calls itself again with a hint of MPLS, based on the etype in the VLAN header.


Step 3 - The decap function now considers the partial frame starting at the MPLS label. Since the MPLS label is bottom of stack, we know there are no more MPLS labels left . Unfortunately there is no protocol type in an MPLS header (these are signaled on the control plane) so we have to take a peek at the byte immediately following the label. If we find a "4" or a "6" in the high order nibble then we have to guess that the next protocol is IPv4 or IPv6, respectively. If the following four bytes are all zeroes then we assume Ethernet over MPLS with control word, otherwise we assume Ethernet over MPLS without control word. In this case we find a 4 in the low nibble, so call decap with an "IP" hint.


Step 4 - The IP header tells us that GRE is the next protocol so for now nothing changes in the frame template (the remainder could be decodable or not). We just call decap again on the GRE part...



Step 5 - The GRE header is decoded and the etype is copied into the frame header. The length of the remaining payload is updated in the frame template and the pointer is adjusted. Decap is called on the next header, which is IP. When the decap function inspects the IP payload it can go no further and just returns the frame template.



In essence, the process has started with a deeply encapsulated frame and ended with IP over Ethernet. The source and destination MACs are taken from the innermost ones found (which in this case is the outermost Ethernet header) but with the etype changed to match the payload, which is the first non-encapsulating payload found in the frame, in this case the second IP.

References

https://tools.ietf.org/html/rfc2784
https://tools.ietf.org/html/rfc1701
http://www.ieee802.org/1/pages/802.1Q.html
http://www.3gpp.org/DynaReport/29060.htm

2 comments:

  1. Replies
    1. Hi,

      Thank you for the kind words. Could I be cheeky and ask what your use case is for it? Would any other protocols be useful?

      Thanks,

      Foeh

      Delete