Networking Bodges: IOS

Showing posts with label IOS. Show all posts

Saturday, 9 April 2016

Producing topology diagrams from OSPF database CLI output

I always imagined it should be possible to automatically produce a topology diagram from the information in the OSPF database of a router - in fact I've heard of products that allow you do do this by attaching a device into your network and joining the OSPF domain. For many cases that is too invasive or completely impractical - what would be really nice would be to be able to produce this directly from the CLI output of a "show" command.

After spending a bit of time looking around, I could not find a tool to do this so I went to work using Python and came up with a basic prototype in a couple of hours. The script doesn't actually do the plotting and layout but rather produces a DOT file leaves the heavy lifting to GraphViz. With a little extra work I have now produced a working script which takes the output of "show ip ospf database router" and produces a DOT file which can be used to plot a topology map showing each OSPF router complete with the links between (including metrics) and any transit multi-access networks.

CLI output from Cisco IOS and Cisco ASA is supported (the output seems to be essentially the same) and, obviously, it doesn't matter what vendors' kit is attached into the network, provided you run the "show ip ospf database router" on a supported platform.

The tool, not-so-snappily named "ospfcli2dot" is available from my github: https://github.com/theclam/ospfcli2dot

Example

Here's a simple example of a 4 router setup. R1, R2 and R3 all sit on a shared LAN, while R4 is attached point to point to R3 and R5:

The "show ip ospf database router" command can be run from any device in the network since all devices within an area share the same topology database. The output of this is quite verbose so will not be shown here. For the purposes of this example, I have just copied and pasted the output into a file called cli-output.txt.

Simply run the script against that file:

foeh@feeble ~/Projects/ospfcli2dot $ ./ospfcli2dot
ospfcli2dot - takes the output of "show ip ospf database router" and outputs a GraphViz DOT file corresponding to the network topology

v0.2 alpha, By Foeh Mannay, April 2016

Enter input filename: cli-output.txt
Enter output filename: example.dot
foeh@feeble ~/Projects/ospfcli2dot $ dot -Tgif -oexample.gif example.dot

This creates "example.gif", shown below:

As you can see, the metric is shown against each link and the script has automatically highlighted in red that one of the point to point links has different metrics in each direction.

Please give it a try and let me know how you get on!

Links

Download: https://github.com/theclam/ospfcli2dot

Saturday, 11 July 2015

Testing the Impact of Local Packet Capture on the Cisco 6500 Series

For a while now, many of the larger Cisco devices (such as 6500 and 7600s) have supported local packet capture. I've always been hesitant to use these in a production environment, primarily due to concerns about the potential performance hit it could cause. Anyway, I decided to test out the potential impact in the lab to see whether it was sometimes / always / never safe to run local captures.

Test Setup

For my test setup I used a 6504-E switch with a modest SUP32-GE-3B supervisor and a 12.2 Advanced IP Services IOS - if it works on that, it should be safe anywhere. For traffic, I used a spare server running Ubuntu with a combination of Ostinato and Scapy.
The configuration was as follows:

Running at full tilt, Ostinato was happily producing 1 Gbps of traffic which went into my node on VLAN 100, out around the loop cable, back into the node and then out of the same interface on VLAN 101. Basically the port is running at 1 Gbps in each direction, so the worst possible case for mirroring a gig port.

Default Settings

The default settings for the capture are pretty conservative - a tiny 2 MB linear capture buffer with a rate limit of 10,000 frames per second. With this config, a 1 Gbps stream of 1500 byte packets fills the buffer in ~ 2.5s, triggering the capture to end. The impact of this is almost impossible to measure at all, with the capture being over so quickly you may not actually see any change in CPU on the 5 second roll-ups.

Lab-6503E#monitor capture start
*Jul 11 14:30:59.205: %SPAN-5-PKTCAP_START: Packet capture session 1 started
Lab-6503E#
*Jul 11 14:31:01.449: %SPAN-5-PKTCAP_STOP: Packet capture session 1 ended as the buffer is full, 21845 packets captured
Lab-6503E#show proc cpu hist
                                                              
                                                              
                         222223333322222               8888811
100                                                           
 90                                                           
 80                                                           
 70                                                           
 60                                                           
 50                                                           
 40                                                           
 30                                                           
 20                                                           
 10                                                    *****  
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5    
               CPU% per second (last 60 seconds)

Worst Case

OK, so the world didn't end. The next step was to see how bad it could be so I made the following changes:

Increased the rate limit to 100,000 fps (max)
Increased the packet buffer to 64 MB (max)
Enabled a circular buffer (why?!)

This time I set the capture to run for 60 seconds and checked the impact, which was much more severe:

Lab-6503E#monitor capture circular buffer size 65535 start for 60 sec
*Jul 11 14:45:02.953: %SPAN-5-PKTCAP_START: Packet capture session 1 started
*Jul 11 14:46:02.945: %SPAN-5-PKTCAP_STOP: Packet capture session 1 ended after the specified time, 699040 packets captured
Lab-6503E#show proc cpu hist

    2222244444444444444455555444444444444444444444444444444444
    9999999999888889999966666999999999999999999998888899999999
100
 90
 80
 70
 60                     *****
 50      *****************************************************
 40      *****************************************************
 30 **********************************************************
 20 **********************************************************
 10 **********************************************************
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5

               CPU% per second (last 60 seconds)

The impact to the supervisor in this case was much more noticeable - up to 60% CPU utilisation. The scenario is pretty unrealistic (forget circular buffers!) but suffice to say I wouldn't want to do that on a production device.
Now we have a the best and the worst cases, let's look at some realistic use cases and explore some of the other capture parameters that might help us capture what we need without causing havoc on the network.

Narrowing Down the Capture

It's possible to define criteria to decide what gets captured - as the CLI points out, some of these criteria are processed in hardware while others are handled in software:

Lab-6503E(config-mon-capture)#filter ?
  access-group  Filter access-list (hardware based)
  ethertype     Matching ethertype (software based)
  length        Matching L2-packet length (software based)
  mac-address   Matching mac-address (software-based)
  vlan          Filter vlan (hardware based)

Our test traffic consists of nearly a gig of junk run alongside a small 1-per-second ping, which we will decide is "interesting" to us and we want to capture. This traffic profile makes it easy to test ACL, MAC address and length filters and their relative performances.

ACL Filter

As the CLI says, ACL filters are applied in hardware so the junk is discarded at source before it hits the CPU. In this example I set up an ACL as follows:

Lab-6503E(config)#ip access-list extended icmp-only
Lab-6503E(config-ext-nacl)#permit icmp any any
Lab-6503E(config-ext-nacl)#deny ip any any

... and applied it to my capture as follows:

Lab-6503E(config)#monitor session 1
Lab-6503E(config-mon-capture)#filter access-group icmp-only

Now, I re-ran the "worst case" test, with massively different results:

Lab-6503E#monitor capture circular buffer size 65535 start for 60 sec
*Jul 11 14:49:00.345: %SPAN-5-PKTCAP_START: Packet capture session 1 started
*Jul 11 14:50:00.337: %SPAN-5-PKTCAP_STOP: Packet capture session 1 ended after the specified time, 60 packets captured
Lab-6503E#show proc cpu hist

         22222222223333355555          11111
100
 90
 80
 70
 60
 50
 40
 30
 20
 10                     *****
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5
               CPU% per second (last 60 seconds)

Two things to note here - because this is a hardware filter, applied in the ASICs:

Only 1 packet per second was punted to the CPU, resulting in essentially no hit at all
All 60 of the ping packets were received and nothing else

In summary, ACL filters are ideal when you know what you want to pull from a big stream as you only pay a CPU penalty for the "good stuff".

MAC Filter

In contrast, the MAC filter runs in software. It's also pretty limited, only matching on source MAC. I repeated the above test but instead of an ACL match, applied a MAC filter as follows:

Lab-6503E(config)#monitor session 1
Lab-6503E(config-mon-capture)#filter mac-address 0011.2233.4455

This took us more-or-less back to the worst case:

Lab-6503E#monitor capture circular buffer size 65535 start for 60 sec
*Jul 11 15:05:55.197: %SPAN-5-PKTCAP_START: Packet capture session 1 started
*Jul 11 15:06:55.189: %SPAN-5-PKTCAP_STOP: Packet capture session 1 ended after the specified time, 5 packets captured
Lab-6503E#show proc cpu hist

      44444444443333344444333334444444444444444444444444333333
    4422222111119999900000999990000011111333331111111111888889
100
 90
 80
 70
 60
 50
 40   ********************************************************
 30   ********************************************************
 20   ********************************************************
 10   ********************************************************
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5
               CPU% per second (last 60 seconds)

Yuck. I wouldn't do that in production. This is basically because the software filters are applied after the packets have been punted to the CPU, so you pay a penalty for the garbage as well as the good stuff. You'll notice that it only captured 5 packets as well, more on this later but that's another side effect of software filters.

Length Filter

The frame length filter is another software-based mechanism, which means it's pretty terrible under load, too. Our junk traffic consists of large frames, our interesting traffic is small, so let's configure the capture to only catch the short frames:

Lab-6503E(config)#monitor session 1
Lab-6503E(config-mon-capture)#filter length 0 100

Again, the output is pretty miserable:

Lab-6503E#monitor capture circular buffer size 65535 start for 60 sec
*Jul 11 15:15:12.145: %SPAN-5-PKTCAP_START: Packet capture session 1 started
*Jul 11 15:16:12.137: %SPAN-5-PKTCAP_STOP: Packet capture session 1 ended after the specified time, 17 packets captured
Lab-6503E#show proc cpu hist

    1111444443333344444444443333333333333333333333333444443333
    4444000009999911111222229999999999999999999999999222229999
100
 90
 80
 70
 60
 50
 40     ******************************************************
 30     ******************************************************
 20     ******************************************************
 10 **********************************************************
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5
               CPU% per second (last 60 seconds)

Again, the CPU took a hammering and we only captured a few of the ping packets - 17 out of 60.

Quirks / Order of Operations

Now you may think that software filters might be OK if we just reduce the rate-limit configured on the capture:

Lab-6503E(config)#monitor session 1
Lab-6503E(config-mon-capture)#rate-limit 100

This *does* do what we want for the CPU load - here's an example with a MAC filter:

Lab-6503E#monitor capture circular buffer size 65535 start for 60 sec
*Jul 11 15:26:43.793: %SPAN-5-PKTCAP_START: Packet capture session 1 started
*Jul 11 15:27:43.785: %SPAN-5-PKTCAP_STOP: Packet capture session 1 ended after the specified time, 0 packets captured
Lab-6503E#show proc cpu hist

                    11111          11111          6666633333
100
 90
 80
 70
 60
 50
 40
 30
 20
 10                                               *****
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5
               CPU% per second (last 60 seconds)

Great - nothing to see here. But also nothing to see in the capture buffer - 0 packets captured!
Just for fun let's try the same with a hardware ACL filter:

*Jul 11 15:25:26.921: %SPAN-5-PKTCAP_STOP: Packet capture session 1 ended after the specified time, 60 packets captured

Why is this? Well, it's the order of operations. Basically the flow for hardware ACL filters is:

So the filter throws out the junk before the rate limiter, meaning that the rate limiter only counts the good stuff. If the good stuff exceeds the rate limit then you'll lose some of it but the junk doesn't count.
Compare that to the flow for the software filters:

The software filters are applied after the rate limiter, so clearly when the rate limit is exceeded you throw out a mix of good and bad traffic, then pick out what's left of the good. If your traffic is overwhelmingly garbage, you may not get any of the good stuff at all!

Summary - Play it Safe

So in answer to the question "is it safe to run a local capture on a production 6500" - packet capture on even a relatively modest SUP32-3B supervisor is pretty safe provided you are cautious. If you want to do this in a busy production environment then my message to you is:

Use ACLs where at all possible
Set the rate limit to a sensible value (the default 1000 fps is fine for most cases)
Use linear buffers of a sensible size (do you really need 64MB of capture?)
Limit the frame count or capture duration at first (it may turn out there is a lot more "interesting" traffic than you thought!)

I'm convinced enough to do this in production but obviously I'm only testing one device, on one release of code, so don't blame me if you try it on something different and encounter a bug!

Saturday, 4 July 2015

OSPF stuck in EXCHANGE / EXSTART

One problem that occasionally comes up in network troubleshooting, mainly in carrier type environments, is a situation where OSPF refuses to come up to a FULL state and instead just sits in the EXCHANGE state at one end and the EXSTART state at the other. To be fair it's one of those things you've either seen or you haven't, but it's something every network engineer should know.

TL;DR - If you don't care why and just want to fix it: it's ALWAYS an MTU mismatch!

For those who are interested, I'll explain what's happening after a quick review of the OSPF neighbour establishment process. Here's a prettified version of the state table from RFC 2328:

Up until the EXSTART state, all the packets are small and no MTU information is shared so everything works fine. Now, to move from the from the EXSTART state into the EXCHANGE state the two devices must agree on who is master. This is done by each device sending an empty database descriptor (DBD) packet to the other - the devices check each other's DBDs and the device with the highest router ID becomes master.

The problem here is that DBDs contain MTU information and if a DBD is received with a higher MTU than the interface on which it arrived, the DBD is silently dropped as per RFC 2328:

"If the Interface MTU field in the Database Description packet indicates an IP datagram size that is larger than the router can accept on the receiving interface without fragmentation, the Database Description packet is rejected."

So, the device with the larger interface MTU receives a DBD, sends its DBD and is happy enough to move into the EXCHANGE state. The device with the smaller MTU has sent its DBD but has effectively not received one in return so it remains in EXSTART. No matter how many times the DBD with the larger MTU is retransmitted it will never be accepted. Eventually the state times out and we go back to the beginning.

Papering Over the Cracks

In Cisco IOS it is possible to configure ip ospf mtu-ignore under the interface, which drops the MTU check for that interface. This might seem like a good idea, however I wouldn't recommend it. Of course the best practice is to make sure MTUs are consistent across your network, there's not really an excuse to have MTU mismatched across a link! While ignoring the MTU might get the link up, you are storing up problems for later. Aside from the obvious data plane issues (black-holing large packets in one direction) you may also break the control plane.

For example, you could have a configuration that has been in place for months without change and has "always worked" but suddenly, following a link flap, is now stuck in EXCHANGE / EXSTART. Initially when you connect the devices up, the odds are that the LSDB will be small. At that point, the mismatched MTU will not cause problems and the neighbour will establish fine. Later on in life, though, the LSDBs will be full and the DBDs larger, until the device with the larger MTU has a big enough LSDB update to fill an over-sized packet which its partner can't handle. Then the state gets all screwed up and neighbours reset... bad times!

Debugging

If you're stuck in EXCHANGE / EXSTART but you're still not convinced it's MTU (or if you're trying to inter-op and your two devices use different conventions to define MTU) you can use debugs to confirm what's going on.

The key one here for Cisco IOS is "debug ip ospf adj", which produces output as shown below:

*Jul 4 22:04:23.979: OSPF-1 ADJ Fa0/0: 2 Way Communication to 10.4.4.254, state 2WAY
*Jul 4 22:04:23.983: OSPF-1 ADJ Fa0/0: Nbr 10.4.4.254: Prepare dbase exchange
*Jul 4 22:04:23.983: OSPF-1 ADJ Fa0/0: Send DBD to 10.4.4.254 seq 0x1D43 opt 0x52 flag 0x7 len 32
*Jul 4 22:04:24.011: OSPF-1 ADJ Fa0/0: Rcv DBD from 10.4.4.254 seq 0xA0CF5CC opt 0x52 flag 0x7 len 32 mtu 1500 state EXSTART
*Jul 4 22:04:24.011: OSPF-1 ADJ Fa0/0: Nbr 10.4.4.254 has larger interface MTU
*Jul 4 22:04:28.435: OSPF-1 ADJ Fa0/0: Rcv DBD from 10.4.4.254 seq 0xA0CF5CC opt 0x52 flag 0x7 len 32 mtu 1500 state EXSTART
*Jul 4 22:04:28.439: OSPF-1 ADJ Fa0/0: Nbr 10.4.4.254 has larger interface MTU
*Jul 4 22:04:28.623: OSPF-1 ADJ Fa0/0: Send DBD to 10.4.4.254 seq 0x1D43 opt 0x52 flag 0x7 len 32
*Jul 4 22:04:28.623: OSPF-1 ADJ Fa0/0: Retransmitting DBD to 10.4.4.254 [1]
[...]
*Jul 4 22:06:27.955: OSPF-1 ADJ Fa0/0: Rcv DBD from 10.4.4.254 seq 0xA0CF5CC opt 0x52 flag 0x7 len 32 mtu 1500 state EXSTART
*Jul 4 22:06:27.955: OSPF-1 ADJ Fa0/0: Nbr 10.4.4.254 has larger interface MTU
*Jul 4 22:06:28.147: OSPF-1 ADJ Fa0/0: Killing nbr 10.4.4.254 due to excessive (25) retransmissions
*Jul 4 22:06:28.147: OSPF-1 ADJ Fa0/0: 10.4.4.254 address 10.4.4.254 is dead, state DOWN
*Jul 4 22:06:28.151: %OSPF-5-ADJCHG: Process 1, Nbr 10.4.4.254 on FastEthernet0/0 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
*Jul 4 22:06:28.151: OSPF-1 ADJ Fa0/0: Nbr 10.4.4.254: Clean-up dbase exchange
*Jul 4 22:06:32.555: OSPF-1 ADJ Fa0/0: Nbr 10.4.4.254 10.4.4.254 is currently ignored

On IOS-XR we have "debug ospf instance-id adj", which returns the same output.

On Juniper JunOS we can configure "set protocols ospf traceoptions flag database-description" which produces the output below:

Jul 4 22:04:41.813313 OSPF rcvd DbD 10.4.4.99 -> 10.4.4.254 (vlan.0 IFL 69 area 0.0.0.0)
Jul 4 22:04:41.814387 Version 2, length 32, ID 10.4.4.99, area 0.0.0.0
Jul 4 22:04:41.814466 checksum 0x0, authtype 0
Jul 4 22:04:41.814566 options 0x52, i 1, m 1, ms 1, r 0, seq 0x1d43, mtu 1492
Jul 4 22:04:41.815182 RPD_OSPF_NBRUP: OSPF neighbor 10.4.4.99 (realm ospf-v2 vlan.0 area 0.0.0.0) state changed from Init to ExStart due to 2WayRcvd (event reason: initial DBD packet was received)
Jul 4 22:04:41.815388 1400 Max dbd packet
Jul 4 22:04:41.815763 OSPF sent DbD 10.4.4.254 -> 224.0.0.5 (vlan.0 IFL 69 area 0.0.0.0)
Jul 4 22:04:41.815889 Version 2, length 32, ID 10.4.4.254, area 0.0.0.0
Jul 4 22:04:41.815970 options 0x52, i 1, m 1, ms 1, r 0, seq 0xa0cf5cc, mtu 1500
Jul 4 22:04:46.254104 OSPF resend last DBD to 10.4.4.99
Jul 4 22:04:46.254753 OSPF sent DbD 10.4.4.254 -> 224.0.0.5 (vlan.0 IFL 69 area 0.0.0.0)
Jul 4 22:04:46.254861 Version 2, length 32, ID 10.4.4.254, area 0.0.0.0
Jul 4 22:04:46.254966 options 0x52, i 1, m 1, ms 1, r 0, seq 0xa0cf5cc, mtu 1500
Jul 4 22:04:46.447212 OSPF rcvd DbD 10.4.4.99 -> 10.4.4.254 (vlan.0 IFL 69 area 0.0.0.0)
Jul 4 22:04:46.447359 Version 2, length 32, ID 10.4.4.99, area 0.0.0.0
Jul 4 22:04:46.447439 checksum 0x0, authtype 0
Jul 4 22:04:46.447584 options 0x52, i 1, m 1, ms 1, r 0, seq 0x1d43, mtu 1492
Jul 4 22:04:50.313983 OSPF resend last DBD to 10.4.4.99
[...]
Jul 4 22:06:45.737775 OSPF sent DbD 10.4.4.254 -> 224.0.0.5 (vlan.0 IFL 69 area 0.0.0.0)
Jul 4 22:06:45.737882 Version 2, length 32, ID 10.4.4.254, area 0.0.0.0
Jul 4 22:06:45.738103 options 0x52, i 1, m 1, ms 1, r 0, seq 0xa0cf5cc, mtu 1500
Jul 4 22:06:50.336478 OSPF resend last DBD to 10.4.4.99
Jul 4 22:06:50.337124 OSPF sent DbD 10.4.4.254 -> 224.0.0.5 (vlan.0 IFL 69 area 0.0.0.0)
Jul 4 22:06:50.337291 Version 2, length 32, ID 10.4.4.254, area 0.0.0.0
Jul 4 22:06:50.337414 options 0x52, i 1, m 1, ms 1, r 0, seq 0xa0cf5cc, mtu 1500
Jul 4 22:06:54.868260 RPD_OSPF_NBRDOWN: OSPF neighbor 10.4.4.99 (realm ospf-v2 vlan.0 area 0.0.0.0) state changed from ExStart to Init due to 1WayRcvd (event reason: neighbor is in one-way mode)

References

RFC2328

Sunday, 19 April 2015

Quick build - PPPoE Client on Cisco IOS

In this quick-build guide I'll show you how to set up a very basic IOS-based PPPoE client. This example is from a Cisco 819 router, however the config is pretty much the same on most ISR type devices. As usual, the build will cover the most simple common use case (no VLAN tags, dynamic AC selection, negotiated IP).

Note, if you want a PPPoE access concentrator to go with your client, you may find the Quick Build: Cisco IOS PPPoE Server with RADIUS Authentication post useful.

The Setup

The PPPoE client is basically set up in two parts - the first being the physical interface which will connect towards the access concentrator, the second being a dialer interface that will become instantiated when the PPPoE session comes up. We'll build the physical interface first, as follows:

interface GigabitEthernet0
description To AC
pppoe enable pppoe-client dial-pool-number 1
no shutdown
!

Pretty minimal... turn PPPoE on, and tell it which dialer pool to use. Note, in older versions of IOS the command was simply "pppoe-client dial-pool-number 1". Next, we have to configure the dialer interface, as follows:

interface Dialer1
ip address negotiated
encapsulation ppp
dialer pool 1
dialer-group 1
ppp authentication chap callin
ppp chap hostname user@domain
ppp chap password 0 b0dges
!
dialer-list 1 protocol ip permit
ip route 0.0.0.0 0.0.0.0 Dialer1

This creates the dialer interface that we will use, tells it to use PPP and to pick up its IP address dynamically.

The "dialer pool" command places this dialer into the pool where the physical interface was set to look, while the "dialer-group" command specifies which dialer-list will be used to decide what traffic is interesting (i.e. will bring or keep the PPPoE session up).

The PPP commands force the authentication type to CHAP, specify that we will not make the AC authenticate to us (generally not supported) and set the CHAP hostname (think username) and password.

Finally, the dialer-list referred to in the earlier "dialer-group" command is defined to match any IP traffic at all, before a static route is used to force traffic out of the dialer interface.

That really is all that you need! In real life you will probably need to add NAT statements and you will definitely need at least one other interface, but that's the PPPoE part done.

Debugging

There's an entire post dedicated to this subject, but the short version is as follows:

Verify that you are getting PPPoE control traffic between your client and the server (debug pppoe packet, debug pppoe event). The sequence should be PADI, PADO, PADR, PADS. PADT indicates someone is pulling down the session, the debugs should show you who!
Check the static route has installed in your routing table as traffic will only trigger the PPP up if it hits the interface (show ip route)
Verify that there is at least one "up" IP interface on the box other than the dialer. If there's no source address usable then any test traffic will fail to encapsulate and you won't be able to bring PPP up. (show ip interfaces brief)
If your client can't authenticate, check the credentials (both hostname and password under the Dialer interface) and ensure that the authentication type is CHAP in "callin" mode.
Check your PPP is negotiating OK (debug ppp negotiation)