Sunday, 31 January 2016

Invalid Command Stopping Cisco 7600 Supervisor Redundancy Entering SSO / Hot Standby Mode

I recently ran into a problem when trying to apply a base build to a Cisco 7600 router with dual supervisors which didn't seem to be documented anywhere, so I thought I'd record the issue and the eventual fix here.

The gist of the problem was that the secondary supervisor would not go from cold standby to hot, so in other words if the active supervisor crashed, the chassis would have to reboot in order to use the standby supervisor. The system was showing the reason for this as software mismatch, even though the two cards had the same image installed:

BUILD#show redundancy states
       my state = 13 -ACTIVE
     peer state = 4  -STANDBY COLD
           Mode = Duplex
           Unit = Primary
        Unit ID = 5

Redundancy Mode (Operational) = rpr    Reason: Software mismatch
Redundancy Mode (Configured)  = sso
Redundancy State              = rpr
     Maintenance Mode = Disabled
 Communications = Up

   client count = 159
 client_notification_TMR = 30000 milliseconds
          keep_alive TMR = 9000 milliseconds
        keep_alive count = 1
    keep_alive threshold = 18
           RF debug mask = 0x0


I won't say exactly which image this was, but it was an SSO-capable relase of IOS 15 and the two supervisors were *definitely* running the same code (one was copied from the other). The tale of software incompatibility seemed unlikely.

BUILD#show log
[snip]
*Jan  6 17:21:33.339: %SYS-SP-STDBY-5-RESTART: System restarted --
Cisco IOS Software, c7600s72033_sp Software (c7600s72033_sp-ADVIPSERVICESK9-M), Version 15.x(x)x, RELEASE SOFTWARE (xx)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2012 by Cisco Systems, Inc.
Compiled Mon 00-Jan-00 00:00 by prod_rel_team
*Jan  6 17:22:50.255 GMT: Config Sync: Bulk-sync failure due to Servicing Incompatibility. Please check full list of mismatched commands via:
  show redundancy config-sync failures mcl
*Jan  6 17:22:50.255 GMT: Config Sync: Starting lines from MCL file:
-ipv6 mfib hardware-switching replication-mode ingress
*Jan  6 17:22:50.255 GMT: %ISSU-SP-3-INCOMPATIBLE_PEER_UID: Setting image (c7600s72033_sp-ADVIPSERVICESK9-M), version (15.x(x)xx) on peer uid (6) as incompatible
*Jan  6 17:22:50.995 GMT: %RF-SP-5-RF_RELOAD: Peer reload. Reason: ISSU Incompatibility
*Jan  6 17:22:50.995 GMT: %OIR-SP-3-PWRCYCLE: Card in module 6, is being power-cycled (RF request)
*Jan  6 17:22:51.999 GMT: %PFREDUN-SP-6-ACTIVE: Standby processor removed or reloaded, changing to Simplex mode
*Jan  6 17:22:53.195 GMT: %SNMP-5-MODULETRAP: Module 6 [Down] Trap
*Jan  6 17:24:19.791 GMT: %ISSU-SP-3-PEER_IMAGE_INCOMPATIBLE: Peer image (c7600s72033_sp-ADVIPSERVICESK9-M), version (15.x(x)xx) on peer uid (6) is incompatible
*Jan  6 17:24:19.791 GMT: %ISSU-SP-3-PEER_IMAGE_INCOMPATIBLE: Peer image (c7600s72033_sp-ADVIPSERVICESK9-M), version (15.x(x)xx) on peer uid (6) is incompatible
*Jan  6 17:25:53.149 GMT: %PFREDUN-SP-4-INCOMPATIBLE: Defaulting to RPR mode (Runtime incompatible)
*Jan  6 17:25:54.154 GMT: %PFREDUN-SP-6-ACTIVE: Standby initializing for RPR mode
*Jan  6 17:25:58.471 GMT: %SYS-SP-3-LOGGER_FLUSHED: System was paused for 00:00:00 to ensure console debugging output.
*Jan  6 17:25:58.763 GMT: %FABRIC-SP-5-CLEAR_BLOCK: Clear block option is off for the fabric in slot 6.
*Jan  6 17:25:58.859 GMT: %FABRIC-SP-5-FABRIC_MODULE_BACKUP: The Switch Fabric Module in slot 6 became standby
*Jan  6 17:26:00.299 GMT: %SNMP-5-MODULETRAP: Module 6 [Up] Trap
*Jan  6 17:26:00.279 GMT: %DIAG-SP-6-BYPASS: Module 6: Diagnostics is bypassed
*Jan  6 17:26:00.375 GMT: %OIR-SP-6-INSCARD: Card inserted in slot 6, interfaces are now online
*Jan  6 17:26:06.435 GMT: %RF-SP-5-RF_TERMINAL_STATE: Terminal state reached for (RPR)

OK, so clearly it doesn't like the "ipv6 mfib hardware-switching replication-mode ingress" command for some reason. Why it would work on one and not the other is a mystery but hey... I don't have big plans for IPv6 multicast so I don't care what replication mode it's in - let's just delete the offending command:

BUILD#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
BUILD(config)#no ipv6 mfib hardware-switching replication-mode ingress
no ipv6 mfib hardware-switching replication-mode ingress
         ^
% Invalid input detected at '^' marker.

So I can't negate the command, in fact there's no "mfib" stanza under "no ipv6":

BUILD(config)#no ipv6 ?
  access-list        Configure access lists
  [snip]
  local              Specify local options
  mld                Global mld commands
  [snip]
  spd                Selective Packet Discard (SPD)

In fact, even the original command seems to be invalid:

BUILD(config)#ipv6 mfib hardware-switching replication-mode ?
% Unrecognized command

And yet here it is in the config from which we booted:

BUILD#show start | inc ipv6 
ipv6 unicast-routing
ipv6 mfib hardware-switching replication-mode ingress
no mls flow ipv6

?!?!

I guess it's one of those legacy commands they bodge the CLI to take but you can't see in the help. But it won't take the command anyway :| Eventually I found an equivalent command that it *would* take:

BUILD(config)#no ipv6 multicast hardware-switching replication-mode ingress 
Warning: This command will change the replication mode for all address families.
 BUILD(config)#do show run | inc ipv6
ipv6 unicast-routing
no mls flow ipv6
BUILD(config)#



At last, the problem config is gone! We're almost there but not quite, the previous failures sit in the active supervisor even if the standby is reloaded so we have to kick it to re-evaluate:

BUILD#show redundancy config-sync failures mcl 
Mismatched Command List
-----------------------
-ipv6 mfib hardware-switching replication-mode ingress

BUILD#redundancy config-sync validate mismatched-commands  
*Jan  7 08:26:28.600 GMT: CONFIG SYNC: MCL validation succeeded
*Jan  7 08:26:28.600 GMT: %ISSU-SP-3-PEER_IMAGE_REM_FROM_INCOMP_LIST: Peer image (c7600s72033_sp-ADVIPSERVICESK9-M), version (15.x(x)xx) on peer uid (6) being removed from the incompatibility list
BUILD#show redundancy config-sync failures mcl 
Mismatched Command List
-----------------------

The list is Empty

BUILD#redundancy reload peer 
Reload peer [confirm]
Preparing to reload peer

BUILD#

*Jan  7 08:27:16.096 GMT:  RP sending reload request to Standby. User: admin on console, Reason: Admin reload CLI

BUILD#

Eventually...
 

*Jan  7 08:33:37.532 GMT: %HA_CONFIG_SYNC-6-BULK_CFGSYNC_SUCCEED: Bulk Sync succeeded*Jan  7 08:33:37.552 GMT: %RF-SP-5-RF_TERMINAL_STATE: Terminal state reached for (SSO)
*Jan  7 08:33:36.572 GMT: %PFREDUN-SP-STDBY-6-STANDBY: Ready for SSO mode
BUILD#show redundancy
Redundant System Information :
------------------------------
       Available system uptime = 15 hours, 20 minutes
Switchovers system experienced = 0
              Standby failures = 3
        Last switchover reason = none

                 Hardware Mode = Duplex
    Configured Redundancy Mode = sso
     Operating Redundancy Mode = sso
              Maintenance Mode = Disabled
                Communications = Up

Current Processor Information :
-------------------------------
               Active Location = slot 5
        Current Software state = ACTIVE
       Uptime in current state = 15 hours, 19 minutes
                 Image Version = Cisco IOS Software, c7600s72033_rp Software (c7600s72033_rp-ADVIPSERVICESK9-M), Version 15.x(x)xx, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2012 by Cisco Systems, Inc.
Compiled Wed 01-Aug-12 20:15 by prod_rel_team
                          BOOT = sup-bootdisk:/c7600s72033-advipservicesk9-mz.15x-x.xx.bin,1;
                   CONFIG_FILE =
                       BOOTLDR =
        Configuration register = 0x2102

Peer Processor Information :
----------------------------
              Standby Location = slot 6
        Current Software state = STANDBY HOT
       Uptime in current state = 3 minutes
                 Image Version = Cisco IOS Software, c7600s72033_rp Software (c7600s72033_rp-ADVIPSERVICESK9-M), Version 15.x(x)xx, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2012 by Cisco Systems, Inc.
Compiled Wed 01-Aug-12 20:15 by prod_rel_team
                          BOOT = sup-bootdisk:/c7600s72033-advipservicesk9-mz.15x-x.xx.bin,1;
                   CONFIG_FILE =
                       BOOTLDR =
        Configuration register = 0x2102
BUILD#

Win!

1 comment: