Looking at the analytics for this blog, I can see I'm not the only one who's had the problem. It's certainly not the number one issue that people are searching for when they get here but there have been a few and the thought occurred that the packet processing engine I wrote for dechap would be really good for this task - it already stripped back VLANs and MPLS, plus it knows how to detect PPPoE and L2TP.
After a couple of hours it was working to the point of being able to strip VLANs and MPLS off, with a little more effort PPPoE also gave way. GRE came quite easily, too, as it has simple headers and uses the same etypes as Ethernet.
Anyway, here is "stripe" (from STRIP Encapsulation), a command line tool which takes a pcap file as input, re-assembles IP fragments and strips off all the encap it can (currently VLAN tags, MPLS shim headers, PPPoE, L2TP, GRE GTP and VXLAN) then outputs another pcap containing just payload over Ethernet.
**UPDATE** - Version 0.3b now adds support for VXLAN.
Stripe is available from my github: https://github.com/theclam/stripe
The command line is pretty straightforward, as shown in the online help:
Harrys-MacBook-Air:stripe foeh$ ./stripe
stripe: a utility to remove VLAN tags, MPLS shims, PPPoE, L2TP headers,
etc. from the frames in a PCAP file and return untagged IP over Ethernet.
Version v0.1 alpha, November 2014
./stripe -r inputcapfile -w outputcapfile
outputcapfile is the file where the decapsulated IP will be saved
Simply specify the files you want to read encapsulated packets from (-r) and write the cleaned up packets to (-w). Stripe will remove as many layers of encap as it can until you are left with straight payload over Ethernet.
How it Works
The majority of stripe's work is done by the "decap" function. This function takes in a block of memory, a length parameter, a data type hint and a frame template. The process runs as follows:
- If the type is Ethernet, populate the source / destination MACs of the frame template
- If the type has an Ethertype or protocol type field, use this to populate the ethertype of the frame template
- If the next protocol is possibly or definitely payload, set the payload pointer of the frame template to the address of the next protocol and return
- If the next protocol is possibly or definitely encapsulation, call decap against the remainder of the packet
So essentially it eats up encap, recording MACs and protocol types as it goes, until there is no more encap left. By the end there is a fully populated frame template with source and destination MAC (the innermost copy if there are multiple as in the case of MPLS pseudowires), the etherype of the payload and the payload itself. Piecing these together gives a minimally encapsulated frame, i.e. one with just an Ethernet header and payload.
Here is a worked example for a frame with VLAN, MPLS, GRE over IP and an IP payload:
Step 1 - The "decap" function is called on the entire frame. Since the first header is Ethernet, the frame template gets populated with the source / destination MACs and the etype from the Ethernet header. The frame template's length field gets populated with the size of the frame minus the Ethernet header and the payload pointer is adjusted to point at the next header. The decap function then calls itself on the remainder of the frame, hinting that the type is VLAN tag based on the current header's etype.
Step 2 - The decap function now considers the partial frame starting at the VLAN tag. Since the VLAN tag has an etype associated, the frame template's etype is overwritten with the one from the VLAN header. The length is overwritten with the length of the payload after the VLAN header and the pointer adjusted to point at the next header. The decap function then calls itself again with a hint of MPLS, based on the etype in the VLAN header.
Step 3 - The decap function now considers the partial frame starting at the MPLS label. Since the MPLS label is bottom of stack, we know there are no more MPLS labels left . Unfortunately there is no protocol type in an MPLS header (these are signaled on the control plane) so we have to take a peek at the byte immediately following the label. If we find a "4" or a "6" in the high order nibble then we have to guess that the next protocol is IPv4 or IPv6, respectively. If the following four bytes are all zeroes then we assume Ethernet over MPLS with control word, otherwise we assume Ethernet over MPLS without control word. In this case we find a 4 in the low nibble, so call decap with an "IP" hint.
Step 4 - The IP header tells us that GRE is the next protocol so for now nothing changes in the frame template (the remainder could be decodable or not). We just call decap again on the GRE part...
Step 5 - The GRE header is decoded and the etype is copied into the frame header. The length of the remaining payload is updated in the frame template and the pointer is adjusted. Decap is called on the next header, which is IP. When the decap function inspects the IP payload it can go no further and just returns the frame template.
In essence, the process has started with a deeply encapsulated frame and ended with IP over Ethernet. The source and destination MACs are taken from the innermost ones found (which in this case is the outermost Ethernet header) but with the etype changed to match the payload, which is the first non-encapsulating payload found in the frame, in this case the second IP.