Networking Bodges: Using QinQ to Build Flexible Lab Topologies

If you're sick and tired of re-cabling your lab every time you want to try out a new topology then you should probably consider using a QinQ tunnelling switch instead. With a QinQ switch at the heart of your lab network it's possible to stand up completely arbitrary topologies with very little effort, no re-cabling and even set it up from a remote location.

In this post I'll walk through setting up a Cisco 3560 to act as a central QinQ switch and how to set up a few example topologies. This can also be done on older / lower spec switches using the same concepts - 802.1Q tunnelling is supported in most of the gear you'll find on eBay.

Theory

Service providers like to aggregate as many customers onto a single link as possible - otherwise they can't be price competitive. Customers want circuits that allow them to not only trunk VLANs but to use and and all of the 4095 possible VLAN IDs without having to check with their service provider first.

One (adequate) way to do this is to extend the notion of VLANs. We all know VLANs separate a LAN into multiple logical partitions using a VLAN identifier tag - QinQ takes that to the next level by stacking 2 VLAN tags on top of each other. If the service provider assigns a VLAN to each customer then they can be used to segregate customers from one another as follows:

Note here that both customers use VLAN 10, however they each get their own VLAN 10 independent of any other customer's. Customers can use any VLAN numbers they like, irrespective of what other customers or the provider has chosen to use.

In service provider parlance, the first (or outer) VLAN tag is the Service Provider VLAN (or S-VLAN). A customer's VLAN which get tunnelled through is called Customer VLAN (or C-VLAN). There is a hierarchy in that one S-VLAN may have multiple C-VLANs, while a C-VLAN can only have one parent S-VLAN. VLAN IDs can be re-used across customers, however customer A's VLAN 10 is different to customer B's VLAN 10.

For our lab setup we will do exactly the same thing, but locally on a single switch. 802.1Q tunnelling not only allows us to connect trunks to each other over a VLAN but also, optionally, control protocols such as LACP, spanning tree, CDP and so on can also be tunnelled through, giving the impression that the end devices are actually attached to each other rather than through a switch.

Physical Topology

Here's a simple lab setup - 2 PCs, 2 routers, 2 firewalls and 2 switches (plus our QinQ switch):

Basically, we just need to plug all the devices into the QinQ switch. Use as many ports as you have available in the QinQ switch and be sure to label them up with descriptions! Sensibly, though, you will really want a minimum of 4 ports for a switch and at least 3 for a firewall. You can get most of the functionality you want with using sub-interfaces on a router so in a pinch you can usually get away with having single links if need be. It's all about giving yourself the maximum flexibility.

Initial Setup

Before we even begin with the configuration proper, we have to work around a dopey default behaviour in Cisco switches. Straight out of the box most Catalyst switches have a layer 2 switching MTU of 1500 for both Fast Ethernet and Gigabit Ethernet ports. This needs to be overridden to allow full size frames to be passed through with a VLAN tag still attached.

Adjust the maximum MTU for Fast Ethernet ports as follows:

Lab-QinQ(config)#system mtu 1998
Changes to the system MTU will not take effect until the next reload is done

Note - different switches have different maximum values. Use the "?" key to see what your device will go to and pick the maximum

Adjust the maximum MTU for Gigabit Ethernet ports as follows:

Lab-QinQ(config)#system mtu jumbo 9000
Changes to the system jumbo MTU will not take effect until the next reload is done

Again, different devices may have different maximum values so use the "?" key to find how far you can set this.

Now, as it says, reboot the switch to make the changes stick.

Once the switch is back up and running, the next job is to create your "provider" VLANs. Note - these VLANs will not be visible to your "customer" topology so it's good to pick a range of consecutive VLAN IDs. I like to start at 100 and work up. Note that each of these VLANs needs to have its MTU increased from the default to allow transport of full size frames with the additional VLAN tag:

Lab-QinQ(config)#vlan 100
Lab-QinQ(config-vlan)# name xconnect100
Lab-QinQ(config-vlan)# mtu 1900
Lab-QinQ(config-vlan)#exit

You will need one VLAN for each virtual connection between devices - I usually throw 20 in and add more later if required.

Finally, for every port where you will attach a device, set the switchport into 802.1Q tunnel mode, and enable all the protocol tunnelling options:

Lab-QinQ(config)#interface range Fa0/1 - Fa0/24
Lab-QinQ(config-if-range)# switchport mode dot1q-tunnel
Lab-QinQ(config-if-range)# l2protocol-tunnel cdp
Lab-QinQ(config-if-range)# l2protocol-tunnel stp
Lab-QinQ(config-if-range)# l2protocol-tunnel vtp
Lab-QinQ(config-if-range)# l2protocol-tunnel point-to-point pagp
Lab-QinQ(config-if-range)# l2protocol-tunnel point-to-point lacpLab-QinQ(config-if-range)# l2protocol-tunnel point-to-point udld

Lab-QinQ(config-if-range)# spanning-tree portfast trunk

Setting Up a Basic Topology

OK so we have the switch set up, let's set up a really simple topology:

The basic process here is to look at your diagram and anywhere you see a line, assign it one of your provider VLAN numbers:

Now, for each port on the topology, go and set the relevant switch port as a member of that VLAN:

Lab-QinQ(config)#int Fa0/1
Lab-QinQ(config-if)#description PC1
Lab-QinQ(config-if)#switchport access vlan 101
Lab-QinQ(config-if)#interface Fa0/6
Lab-QinQ(config-if)#description SW1 Gi1/4
Lab-QinQ(config-if)#switchport access vlan 101

Lab-QinQ(config-if)#interface Fa0/3
Lab-QinQ(config-if)#description SW1 Gi1/1
Lab-QinQ(config-if)#switchport access vlan 102
Lab-QinQ(config-if)#interface Fa0/11
Lab-QinQ(config-if)#description R1 Fa0/1
Lab-QinQ(config-if)#switchport access vlan 102

As you can see, VLAN 101 is used to "connect" PC1 to SW1 port Gi1/4, while VLAN 102 is used to "connect" SW1 port Gi1/1 to R1 port Fa0/1. Thanks to the protocol tunnelling config, SW1 and R1 believe they are directly connected:

R1#show cdp neighbors
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
                  D - Remote, C - CVTA, M - Two-port Mac Relay

Device ID        Local Intrfce     Holdtme    Capability Platform Port ID
SW1              Fas 0/1          168             R S I WS-C6503- Gig 1/1

MAC entries are learned as if the devices were directly attached and the router and PC can ping each other:

R1#ping PC1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
R1#

This is a fairly simple example but as we're about to get into, way more complex topologies can be achieved using the same methods.

Slightly More Complex Setup

Let's mix things up a bit by building a topology with an HA pair of Juniper SRX firewalls, trunking VLANs down to a pair of switches, connected by an LACP link bundle (portchannel):

As with the other example, we simply assign a provider VLAN to each link:

I won't include the config here for each of these as it's a pure repetition of the earlier work - just make sure that your ports go into the right VLANs and everything should be fine.

Now we can see that the firewalls have come up in HA:

{primary:node0}
root@SRX-top> show chassis cluster status
Monitor Failure codes:
    CS Cold Sync monitoring        FL Fabric Connection monitoring
    GR GRES monitoring             HW Hardware monitoring
    IF Interface monitoring        IP IP monitoring
    LB Loopback monitoring         MB Mbuf monitoring
    NH Nexthop monitoring          NP NPC monitoring
    SP SPU monitoring              SM Schedule monitoring

Cluster ID: 1
Node   Priority Status         Preempt Manual   Monitor-failures

Redundancy group: 0 , Failover count: 1
node0 100      primary        no      no       None
node1 50       secondary      no      no       None

Redundancy group: 1 , Failover count: 1
node0 100      primary        no      no       None
node1 50       secondary      no      no       None

The switches have bundled their ports and can see each other's device IDs:

SW-2#show lacp internal
Flags: S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode

Channel group 1
                            LACP port     Admin     Oper    Port        Port
Port      Flags   State     Priority      Key       Key     Number      State
Gi0/1     SA      bndl      32768         0x1       0x1     0x112       0x3D
Gi0/2     SA      bndl      32768         0x1       0x1     0x113       0x3D

SW-2#show lacp neighbor
Flags: S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode

Channel group 1 neighbors

Partner's information:

                  LACP port                        Admin Oper   Port    Port
Port      Flags   Priority Dev ID          Age    key    Key    Number State
Gi0/1     SA      32768     3037.a6ca.aa80 10s    0x0    0x1    0x112   0x3D
Gi0/2     SA      32768     3037.a6ca.aa80   9s    0x0    0x1    0x113   0x3D

And, as before, CDP works fine:

SW-2#show cdp neighbor
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone,
                  D - Remote, C - CVTA, M - Two-port Mac Relay

Device ID        Local Intrfce     Holdtme    Capability Platform Port ID
SW-1             Gig 0/1           152              S I   WS-C3560G Gig 0/1
SW-1             Gig 0/2           164              S I   WS-C3560G Gig 0/2
SW-2#

Even UDLD is active and "sees" the device on the other end as if it were locally connected:

SW-1#show udld Gi0/1

Interface Gi0/1
---
Port enable administrative configuration setting: Enabled / in aggressive mode
Port enable operational state: Enabled / in aggressive mode
Current bidirectional state: Bidirectional
Current operational state: Advertisement - Single neighbor detected
Message interval: 7
Time out interval: 5

    Entry 1
    ---
    Expiration time: 44
    Device ID: 1
    Current neighbor state: Bidirectional
    Device name: FOCABCD0123
    Port ID: Gi0/1
    Neighbor echo 1 device: FOCWXYZ9876
    Neighbor echo 1 port: Gi0/1

    Message interval: 15
    Time out interval: 5
    CDP Device name: SW-2
SW-1#

As a side note, a lot of Cisco kit only supports long LACP timers (90 second failure detection, as opposed to 3 seconds for short) so if you are in this boat then consider using UDLD when configuring bundles over indirect links. This should reduce detection time to 45 seconds, which is still a bit rubbish but better than 90. By default, error-disable recovery is not active for UDLD so once UDLD takes a link down, it stays down - so you probably want to switch recovery on:

SW-1(config)#errdisable recovery cause udld

Simulating Failures

One of the more common uses for a lab environment is to test failovers. One of the more common failure types to want to simulate is a link failure, however this is not quite as straightforward with a QinQ switch in the middle as taking one port down does not make the other go down, e.g.:

If you want to simulate pulling a link, I find the best way is to use an interface range command. Let's say we want to remove the link between FW1 and SW1, simply specify an interface range on the QinQ switch containing the switch ports facing each of those two devices and shut them down at the same time:

Lab-QinQ(config)#interface range fa0/14, fa0/21
Lab-QinQ(config-if-range)#shut
.May 30 12:55:30 UTC: %LINK-5-CHANGED: Interface FastEthernet0/14, changed state to administratively down
.May 30 12:55:30 UTC: %LINK-5-CHANGED: Interface FastEthernet0/21, changed state to administratively down
.May 30 12:55:31 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/14, changed state to down
.May 30 12:55:31 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/21, changed state to down

The same technique can be used to simulate the failure of an entire cabinet or site, just select all the ports you need in a range and shut them down.

Another common test that you may want to perform is to simulate a "silent failure", i.e. where both ends see the link up but traffic is lost in the middle. This is good for checking how quickly and how well protocol heartbeats detect link problems (think LACP, UDLD, routing protocols, etc) and is definitely worth checking before you put services over carrier circuits or inter-DC links. To achieve this, simply set the provider VLAN on one of your ports to something unused:

Lab-QinQ(config)#interface fa0/14
Lab-QinQ(config-if-range)#switchport access vlan 2

The two ends of the link will remain up but no traffic will pass through.

References

Cisco Documentation on Jumbo Frames
Cisco Documentation on UDLD

Networking Bodges

Tuesday, 30 May 2017

Using QinQ to Build Flexible Lab Topologies