In this post I'll walk through setting up a Cisco 3560 to act as a central QinQ switch and how to set up a few example topologies. This can also be done on older / lower spec switches using the same concepts - 802.1Q tunnelling is supported in most of the gear you'll find on eBay.
Theory
Service providers like to aggregate as many customers onto a single link as possible - otherwise they can't be price competitive. Customers want circuits that allow them to not only trunk VLANs but to use and and all of the 4095 possible VLAN IDs without having to check with their service provider first.
One (adequate) way to do this is to extend the notion of VLANs. We all know VLANs separate a LAN into multiple logical partitions using a VLAN identifier tag - QinQ takes that to the next level by stacking 2 VLAN tags on top of each other. If the service provider assigns a VLAN to each customer then they can be used to segregate customers from one another as follows:
Note here that both customers use VLAN 10, however they each get their own VLAN 10 independent of any other customer's. Customers can use any VLAN numbers they like, irrespective of what other customers or the provider has chosen to use.
In service provider parlance, the first (or outer) VLAN tag is the Service Provider VLAN (or S-VLAN). A customer's VLAN which get tunnelled through is called Customer VLAN (or C-VLAN). There is a hierarchy in that one S-VLAN may have multiple C-VLANs, while a C-VLAN can only have one parent S-VLAN. VLAN IDs can be re-used across customers, however customer A's VLAN 10 is different to customer B's VLAN 10.
For our lab setup we will do exactly the same thing, but locally on a single switch. 802.1Q tunnelling not only allows us to connect trunks to each other over a VLAN but also, optionally, control protocols such as LACP, spanning tree, CDP and so on can also be tunnelled through, giving the impression that the end devices are actually attached to each other rather than through a switch.
Physical Topology
Here's a simple lab setup - 2 PCs, 2 routers, 2 firewalls and 2 switches (plus our QinQ switch):
Basically, we just need to plug all the devices into the QinQ switch. Use as many ports as you have available in the QinQ switch and be sure to label them up with descriptions! Sensibly, though, you will really want a minimum of 4 ports for a switch and at least 3 for a firewall. You can get most of the functionality you want with using sub-interfaces on a router so in a pinch you can usually get away with having single links if need be. It's all about giving yourself the maximum flexibility.
Initial Setup
Before we even begin with the configuration proper, we have to work around a dopey default behaviour in Cisco switches. Straight out of the box most Catalyst switches have a layer 2 switching MTU of 1500 for both Fast Ethernet and Gigabit Ethernet ports. This needs to be overridden to allow full size frames to be passed through with a VLAN tag still attached.
Adjust the maximum MTU for Fast Ethernet ports as follows:
Lab-QinQ(config)#system mtu 1998
Changes to the system MTU will not take effect until the next reload is done
Note - different switches have different maximum values. Use the "?" key to see what your device will go to and pick the maximum
Adjust the maximum MTU for Gigabit Ethernet ports as follows:
Lab-QinQ(config)#system mtu jumbo 9000
Changes to the system jumbo MTU will not take effect until the next reload is done
Again, different devices may have different maximum values so use the "?" key to find how far you can set this.
Now, as it says, reboot the switch to make the changes stick.
Once the switch is back up and running, the next job is to create your "provider" VLANs. Note - these VLANs will not be visible to your "customer" topology so it's good to pick a range of consecutive VLAN IDs. I like to start at 100 and work up. Note that each of these VLANs needs to have its MTU increased from the default to allow transport of full size frames with the additional VLAN tag:
Lab-QinQ(config)#vlan 100
Lab-QinQ(config-vlan)# name xconnect100
Lab-QinQ(config-vlan)# mtu 1900
Lab-QinQ(config-vlan)#exit
You will need one VLAN for each virtual connection between devices - I usually throw 20 in and add more later if required.
Finally, for every port where you will attach a device, set the switchport into 802.1Q tunnel mode, and enable all the protocol tunnelling options:
Lab-QinQ(config)#interface range Fa0/1 - Fa0/24
Lab-QinQ(config-if-range)# switchport mode dot1q-tunnel
Lab-QinQ(config-if-range)# l2protocol-tunnel cdp
Lab-QinQ(config-if-range)# l2protocol-tunnel stp
Lab-QinQ(config-if-range)# l2protocol-tunnel vtp
Lab-QinQ(config-if-range)# l2protocol-tunnel point-to-point pagp
Lab-QinQ(config-if-range)# l2protocol-tunnel point-to-point lacpLab-QinQ(config-if-range)# l2protocol-tunnel point-to-point udld
Lab-QinQ(config-if-range)# spanning-tree portfast trunk
Setting Up a Basic Topology
OK so we have the switch set up, let's set up a really simple topology:
The basic process here is to look at your diagram and anywhere you see a line, assign it one of your provider VLAN numbers:
Now, for each port on the topology, go and set the relevant switch port as a member of that VLAN:
Lab-QinQ(config)#int Fa0/1
Lab-QinQ(config-if)#description PC1
Lab-QinQ(config-if)#switchport access vlan 101
Lab-QinQ(config-if)#interface Fa0/6
Lab-QinQ(config-if)#description SW1 Gi1/4
Lab-QinQ(config-if)#switchport access vlan 101
Lab-QinQ(config-if)#interface Fa0/3
Lab-QinQ(config-if)#description SW1 Gi1/1
Lab-QinQ(config-if)#switchport access vlan 102
Lab-QinQ(config-if)#interface Fa0/11
Lab-QinQ(config-if)#description R1 Fa0/1
Lab-QinQ(config-if)#switchport access vlan 102
As you can see, VLAN 101 is used to "connect" PC1 to SW1 port Gi1/4, while VLAN 102 is used to "connect" SW1 port Gi1/1 to R1 port Fa0/1. Thanks to the protocol tunnelling config, SW1 and R1 believe they are directly connected:
R1#show cdp neighbors
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
SW1 Fas 0/1 168 R S I WS-C6503- Gig 1/1
MAC entries are learned as if the devices were directly attached and the router and PC can ping each other:
R1#ping PC1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
R1#
This is a fairly simple example but as we're about to get into, way more complex topologies can be achieved using the same methods.
Slightly More Complex Setup
Let's mix things up a bit by building a topology with an HA pair of Juniper SRX firewalls, trunking VLANs down to a pair of switches, connected by an LACP link bundle (portchannel):
As with the other example, we simply assign a provider VLAN to each link:
I won't include the config here for each of these as it's a pure repetition of the earlier work - just make sure that your ports go into the right VLANs and everything should be fine.
Now we can see that the firewalls have come up in HA:
{primary:node0}
root@SRX-top> show chassis cluster status
Monitor Failure codes:
CS Cold Sync monitoring FL Fabric Connection monitoring
GR GRES monitoring HW Hardware monitoring
IF Interface monitoring IP IP monitoring
LB Loopback monitoring MB Mbuf monitoring
NH Nexthop monitoring NP NPC monitoring
SP SPU monitoring SM Schedule monitoring
Cluster ID: 1
Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1
node0 100 primary no no None
node1 50 secondary no no None
Redundancy group: 1 , Failover count: 1
node0 100 primary no no None
node1 50 secondary no no None
The switches have bundled their ports and can see each other's device IDs:
SW-2#show lacp internal
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
Channel group 1
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Gi0/1 SA bndl 32768 0x1 0x1 0x112 0x3D
Gi0/2 SA bndl 32768 0x1 0x1 0x113 0x3D
SW-2#show lacp neighbor
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
Channel group 1 neighbors
Partner's information:
LACP port Admin Oper Port Port
Port Flags Priority Dev ID Age key Key Number State
Gi0/1 SA 32768 3037.a6ca.aa80 10s 0x0 0x1 0x112 0x3D
Gi0/2 SA 32768 3037.a6ca.aa80 9s 0x0 0x1 0x113 0x3D
And, as before, CDP works fine:
SW-2#show cdp neighbor
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
D - Remote, C - CVTA, M - Two-port Mac Relay
Device ID Local Intrfce Holdtme Capability Platform Port ID
SW-1 Gig 0/1 152 S I WS-C3560G Gig 0/1
SW-1 Gig 0/2 164 S I WS-C3560G Gig 0/2
SW-2#
Even UDLD is active and "sees" the device on the other end as if it were locally connected:
SW-1#show udld Gi0/1
Interface Gi0/1
---
Port enable administrative configuration setting: Enabled / in aggressive mode
Port enable operational state: Enabled / in aggressive mode
Current bidirectional state: Bidirectional
Current operational state: Advertisement - Single neighbor detected
Message interval: 7
Time out interval: 5
Entry 1
---
Expiration time: 44
Device ID: 1
Current neighbor state: Bidirectional
Device name: FOCABCD0123
Port ID: Gi0/1
Neighbor echo 1 device: FOCWXYZ9876
Neighbor echo 1 port: Gi0/1
Message interval: 15
Time out interval: 5
CDP Device name: SW-2
SW-1#
As a side note, a lot of Cisco kit only supports long LACP timers (90 second failure detection, as opposed to 3 seconds for short) so if you are in this boat then consider using UDLD when configuring bundles over indirect links. This should reduce detection time to 45 seconds, which is still a bit rubbish but better than 90. By default, error-disable recovery is not active for UDLD so once UDLD takes a link down, it stays down - so you probably want to switch recovery on:
SW-1(config)#errdisable recovery cause udld
Simulating Failures
One of the more common uses for a lab environment is to test failovers. One of the more common failure types to want to simulate is a link failure, however this is not quite as straightforward with a QinQ switch in the middle as taking one port down does not make the other go down, e.g.:
If you want to simulate pulling a link, I find the best way is to use an interface range command. Let's say we want to remove the link between FW1 and SW1, simply specify an interface range on the QinQ switch containing the switch ports facing each of those two devices and shut them down at the same time:
Lab-QinQ(config)#interface range fa0/14, fa0/21
Lab-QinQ(config-if-range)#shut
.May 30 12:55:30 UTC: %LINK-5-CHANGED: Interface FastEthernet0/14, changed state to administratively down
.May 30 12:55:30 UTC: %LINK-5-CHANGED: Interface FastEthernet0/21, changed state to administratively down
.May 30 12:55:31 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/14, changed state to down
.May 30 12:55:31 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/21, changed state to down
The same technique can be used to simulate the failure of an entire cabinet or site, just select all the ports you need in a range and shut them down.
Another common test that you may want to perform is to simulate a "silent failure", i.e. where both ends see the link up but traffic is lost in the middle. This is good for checking how quickly and how well protocol heartbeats detect link problems (think LACP, UDLD, routing protocols, etc) and is definitely worth checking before you put services over carrier circuits or inter-DC links. To achieve this, simply set the provider VLAN on one of your ports to something unused:
Lab-QinQ(config)#interface fa0/14
Lab-QinQ(config-if-range)#switchport access vlan 2
The two ends of the link will remain up but no traffic will pass through.
References
Cisco Documentation on Jumbo Frames
Cisco Documentation on UDLD