You are on page 1of 59

CIS 187 CCNP SWITCH Ch.

5 IP SLAs
Rick Graziani Cabrillo College graziani@cabrillo.edu Spring 2011

Understanding High Availability

Components of High Availability


Redundancy Technology (including hardware and software features) People Processes Tools

Redundancy
Geographic diversity and path diversity are often included.
Dual devices and links are common. Dual WAN providers are common. Dual data centers are sometimes used, especially for large companies and large e-commerce sites. Dual collocation facilities, dual phone central office facilities, and dual power substations can be implemented.

Technology
Cisco Nonstop Forwarding (NSF) Stateful Switchover (SSO) Graceful Restart Cisco IOS IP Service Level Agreements (SLA) Object Tracking Firewall Stateful Failover

People
Prepare, Plan, Design, Implement, Operate, and Optimize (PPDIOO) is a guide. Work habits and attention to detail important. Skills are acquired via ongoing technical training. Good communication and documentation critical. Use lab testing to simulate failover scenarios. Take time to design. Identify roles. Identify responsibilities. Align teams with services. Ensure time to do job.

Processes
Organizations should build repeatable processes. Organizations should use labs appropriately. Organizations need meaningful change controls. Management of operational changes is important.

Tools
Network diagrams. Documentation of network design evolution. Key addresses, VLANs, and servers documented. Documentation tying services to applications and physical servers.

Resiliency for High Availability


Network-Level Resiliency High Availability and Failover Times

Network-Level Resiliency
Built with device and link redundancy. Employs fast convergence. Relies on monitoring with NTP, SNMP, Syslog, and IP SLA.

High Availability and Failover Times

Tuned routing protocols failover in less than 1 second. RSTP converges in about 1 second. EtherChannel can failover in approximately 1 second. HSRP timers are 3 seconds for hello and 10 seconds for hold time. Stateful service modules typically failover within 3-5 seconds. TCP/IP stacks have up to a 9-second tolerance.

Optimal Redundancy
Provide alternate paths. Avoid too much redundancy. Avoid single point of failure. Use Cisco NSF with SSO, if applicable. Use Cisco NSF with routing protocols.

Provide Alternate Paths


Use redundant distribution-tocore links in case a core switch fails. Link distribution switches to support summarization of routing information from the distribution to the core.

Avoid Too Much Redundancy


Where should the root switch be placed? With this design, it is not easy to determine where the root switch is located. What links should be in a blocking state? It is hard to determine how many ports will be in a blocking state. What are the implications of STP and RSTP convergence? The network convergence is definitely not deterministic. When something goes wrong, how do you find the source of the problem? The design is much harder to troubleshoot.

Avoid Single Point of Failure

Key element of high availability. Easy to implement at core and distribution. Access layer switch is single point of failure. Reduce outages to 1 to 3 seconds in the access layer with: SSO in L2 environment Cisco NSF with SSO in L3 environment.

Cisco NSF with SSO (Stateful Switchover)


Supervisor redundancy mechanism in Cisco IOS enabling supervisor switchover at L2-L3-L4. SSO enables standby RP to take control after fault on active RP. Cisco NSF is L3 function that works with SSO to minimize time network unavailable following switchover, continuing to forward IP packets following RP switchover.

Routing Protocols and NSF (Cisco Nonstop Forwarding)


NSF enables continued forwarding of packets along known routes while routing protocol information is being restored during switchover. Switchover must complete before NSF dead and hold timers expire or routing peers will reset adjacencies and reroute traffic.

Implementing Network Monitoring

Logging Services

Events on networking devices can be logged. Various events Various levels of severity Events are logged to: Console (default) Console display Buffer Server Examples Interfaces up or down Configuration changes Routing protocol adjacencies

19

Logging Services

Logging severity levels on Cisco Systems devices are as follows: (0) Emergencies (1) Alerts (2) Critical (3) Errors (4) Warnings (5) Notifications (6) Informational (7) Debugging By default, all messages from level 0 to 7 are logged to the console

20

Logging Services

Console

You can also adjust the logging severity level of the console. By default, all messages from level 0 to 7 are logged to the console; You can configure the severity level as an optional parameter: logging console level Limits the logging of messages displayed on the console terminal to the specified level and (numerically) lower levels. 21 You can enter the level number or level name.

Logging Services

Buffer logging buffered [buffer-size|level] May or may not be the default By default, messages of all severity levels are logged to buffer. show logging Displays the content of the buffer The buffer is circular, meaning that when the buffer has reached its maximum capacity, the oldest messages will be discarded to allow the logging of new messages.
22

Configuring Syslog
To configure logging to the buffer of the local switch, use the command logging buffered.
Switch(config)# logging buffered ? <0-7> Logging severity level <4096-2147483647> Logging buffer size alerts Immediate action needed (severity=1) critical Critical conditions (severity=2) debugging Debugging messages (severity=7) discriminator Establish MD-Buffer association emergencies System is unusable (severity=0) errors Error conditions (severity=3) informational Informational messages (severity=6) notifications Normal but significant conditions (severity=5) warnings Warning conditions (severity=4) xml Enable logging in XML to XML logging buffer

Logging Services

Server logging ip-address command Some IOS version it is logging host By default, only messages of severity level 6 or lower will be logged to the syslog server. This can be changed by entering the logging trap level command.

24

Configuring Syslog
To configure a syslog server, use the logging ip_addr global configuration command. To which severity levels of messages are sent to the syslog server, use the global configuration command logging trap level.
Switch(config)# logging trap ? <0-7> Logging severity level alerts Immediate action needed critical Critical conditions debugging Debugging messages emergencies System is unusable errors Error conditions informational Informational messages notifications Normal but significant conditions warnings Warning conditions

(severity=1) (severity=2) (severity=7) (severity=0) (severity=3) (severity=6) (severity=5) (severity=4)

Sample Syslog Messages


08:01:13: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/5, changed state to up 08:01:23: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 10.1.1.1 (Vlan1) is up: new adjacency 08:02:31: %LINK-3-UPDOWN: Interface FastEthernet0/8, changed state to up 08:18:20: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/5, changed state to down 08:18:22: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/5, changed state to up 08:18:24: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/2, changed state to down 08:18:24: %ILPOWER-5-IEEE_DISCONNECT: Interface Fa0/2: PD removed 08:18:26: %LINK-3-UPDOWN: Interface FastEthernet0/2, changed state to down 08:19:49: %ILPOWER-7-DETECT: Interface Fa0/2: Power Device detected: Cisco PD 08:19:53: %LINK-3-UPDOWN: Interface FastEthernet0/2, changed state to up 08:19:53: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/2, changed state to up

Syslog Severity Levels


Smaller numerical levels are the more critical syslog alarms.
Syslog Severity Emergency Alert Critical Error Warning Notice Severity Level Level 0, highest level Level 1 Level 2 Level 3 Level 4 Level 5

Informational
Debugging

Level 6
Level 7

Syslog Facilities

Service identifiers. Identify and categorize system state data for error and event message reporting. Cisco IOS has more than 500 facilities. Most common syslog facilities: IP OSPF SYS operating system IP Security (IPsec) Route Switch Processor (RSP) Interface (IF)

Syslog Message Format

System messages begin with a percent sign (%) Facility: A code consisting of two or more uppercase letters that indicates the hardware device, protocol, or a module of the system software. Severity: A single-digit code from 0 to 7 that reflects the severity of the condition. The lower the number, the more serious the situation. Mnemonic: A code that uniquely identifies the error message. Message-text: A text string describing the condition. This portion of the message sometimes contains detailed information about the event, including terminal port numbers, network addresses, or addresses that correspond to locations in the system memory address space.

Verifying Syslog Configuration


Use the show logging command to display the content of the local log files. Use the pipe argument (|) in combination with keywords such as include or begin to filter the output.

Switch# show logging | include LINK-3 2d20h: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up 2d20h: %LINK-3-UPDOWN: Interface FastEthernet0/2, changed state to up 2d20h: %LINK-3-UPDOWN: Interface FastEthernet0/1, changed state to up Switch# show logging | begin %DUAL 2d22h: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(10) 10: Neighbor 10.1.253.13 (FastEthernet0/11) is down: interface down 2d22h: %LINK-3-UPDOWN: Interface FastEthernet0/11, changed state to down 2d22h: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/11, changed state to down

Cisco IP SLA

IP SLA, feature of Cisco IOS software allows you to configure a router to send synthetic traffic to: A host computer Router that has been configured to respond (Responder)

31

IP SLA is very useful for: performance measurement monitoring network baselining. You can tie the results of the IP SLA operations to other features of your router and trigger action based on the results of the probe.

32

To implement IP SLA network performance measurement, you need to perform the following tasks: Enable the IP SLA responder, if required. Configure the required IP SLA operation type. Configure any options available for the specified operation type. Configure threshold conditions, if required. Schedule the operation to run, and then let the operation run for a period of time to gather statistics. Display and interpret the results of the operation using the Cisco IOS CLI or a network management system (NMS), with Simple Network Management Protocol (SNMP).
33

Depending on the type of probe you setup, you may or may not need to configure an IP SLA Responder. For example, if you are setting up a simple echo probe to a IP host, you do not need a responder. An IP SLA Responder allows for more detailed information to be retrieved.

34

IP SLA Responder is a component embedded in the destination Cisco routing device. Allows the system to anticipate and respond to IP SLA request packets Provides a large advantage with accurate measurements without the need for dedicated probes and additional statistics not available from standard ICMP-based measurements. See information regarding IP SLAs with Responder Time Stamps IP SLA Source (Cisco device) uses an IP SLA Control Protocol to communicate with the IP SLA Responder. Tells the responder which port it should listen to and respond. Responder will enable the specified UDP or TCP port for a specific duration.

35

Example: Network Availability


Router(config)# ip route 0.0.0.0 0.0.0.0 fa0/0 Router(config)# ip route 0.0.0.0 0.0.0.0 fa0/1 5 fa0/0

fa0/1
172.16.1.1

Customer A is multihoming to two ISPs. Customer A is not using BGP with the ISPs; but using static default routes. Two default static routes with different administrative distances are configured Link to ISP-1 is the primary link Link to ISP-2 is the backup link The static default route with the lower administrative distance will be preferred and injected into the routing table. However, if there is a problem within the ISP-1 domain but its interface to Customer A is still up, all traffic from Customer A will still go to that ISP The traffic may then get lost within the ISP.

36

fa0/0

fa0/1
172.16.1.1

The solution to this issue is the Cisco IOS IP SLAs functionality Configure the SLAs to: Continuously check the reachability of a specific destination such as: Provider edge [PE] router interface ISP's DNS server Any other specific destination: 10.1.1.1 and 172.16.1.1 Conditionally announce the default route only if the connectivity is verified.
37

R1(config)# ip sla monitor 11 R1(config-rtr)# type echo protocol ipIcmpEcho 10.1.1.1 source-interface fa0/0 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule schedule 11 life forever start-time now R1(config)# track 1 rtr 11 reachability R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/0 2 track 1

Probe

Tracking Object Status of Tracking Object

172.16.1.1 Defining the Probe ip sla: defines probe 11 type echo: specifies that the ICMP echoes are sent: To destination 10.1.1.1 to check connectivity With the source interface of FastEthernet0/0 frequency 10: schedules the connectivity test to repeat every 10 seconds. ip sla monitor schedule 11 life forever start-time now: defines the start time of now and it will continue forever

38

R1(config)# ip sla monitor 11 R1(config-rtr)# type echo protocol ipIcmpEcho 10.1.1.1 source-interface fa0/0 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule schedule 11 life forever start-time now R1(config)# track 1 rtr 11 reachability R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/0 2 track 1

Probe

Tracking Object Status of Tracking Object

172.16.1.1 Defining the Tracking Object track 1 rtr 11 reachability: Specifies that: Object 1 is tracked (next step) Linked to probe 11 (defined in the first step) so that the reachability of the 10.1.1.1 is tracked.

39

R1(config)# ip sla monitor 11 R1(config-rtr)# type echo protocol ipIcmpEcho 10.1.1.1 source-interface fa0/0 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule schedule 11 life forever start-time now R1(config)# track 1 rtr 11 reachability R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/0 2 track 1

Probe

Tracking Object Status of Tracking Object

AD=2

172.16.1.1

Defining an action based on the status of the tracking object ip route 0.0.0.0 0.0.0.0 fa0/0 2 track 1: Conditionally announces the default route, out fa0/0, with an administrative distance 2 if the result of tracking object 1 is true if the probe is successful. To summarize: If 10.1.1.1 is reachable, a static default route out Fa0/0 with an administrative distance of 2, is installed in the routing table.
40

R1(config)# ip sla monitor 22 R1(config-rtr)# type echo protocol ipIcmpEcho 172.16.1.1 source-interface fa0/1 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule 22 life forever start-time now R1(config)# track 2 rtr 22 reachability R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/1 3 track 2

Probe

Tracking Object Status of Tracking Object

172.16.1.1 Defining the Probe ip sla: defines probe 22 type echo: specifies that the ICMP echoes are sent: To destination 172.16.1.1 to check connectivity, With the source interface of FastEthernet0/1 frequency 10: schedules the connectivity test to repeat every 10 seconds. ip sla monitor schedule 22 life forever start-time now: defines the start time of now and it will continue forever

41

R1(config)# ip sla monitor 22 R1(config-rtr)# type echo protocol ipIcmpEcho 172.16.1.1 source-interface fa0/1 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule 22 life forever start-time now R1(config)# track 2 rtr 22 reachability R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/1 3 track 2

Probe

Tracking Object Status of Tracking Object

172.16.1.1 Defining the Tracking Object track 1 rtr 22 reachability: Specifies that: Object 2 is tracked (next step) Linked to probe 22 (defined in the first step) so that the reachability of the 172.16.1.1 is tracked.

42

R1(config)# ip sla monitor 22 R1(config-rtr)# type echo protocol ipIcmpEcho 172.16.1.1 source-interface fa0/1 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule 22 life forever start-time now R1(config)# track 2 rtr 22 reachability R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/1 3 track 2

Probe

Tracking Object Status of Tracking Object

AD=2 AD=3

172.16.1.1

Defining an action based on the status of the tracking object ip route 0.0.0.0 0.0.0.0 fa 0/1 3 track 2: Conditionally announces the default route, exit fa0/1, with an administrative distance 3 if the result of tracking object 1 is true if the probe is successful. To summarize: If 172.16.1.1 is reachable, a static default route exit fa0/1 with an administrative distance of 3 is offered to the routing table. Because this default route has a higher AD of 3, if the path via R2 is available, this path will be the backup path.

43

R1(config)# ip sla monitor 11 R1(config-rtr)# type echo protocol ipIcmpEcho 10.1.1.1 source-interface fa0/0 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule 11 life forever start-time now R1(config)# track 1 rtr 11 reachability R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/0 2 track 1 R1(config)# ip sla monitor 22 R1(config-rtr)# type echo protocol ipIcmpEcho 172.16.1.1 source-interface fa0/1 R1(config-rtr)# frequency 10 R1(config)# ip sla monitor schedule 22 life forever start-time now R1(config)# track 2 rtr 22 reachability

Probe

Tracking Object Status of Tracking Object

Probe

Tracking Object

R1(config)# ip route 0.0.0.0 0.0.0.0 fa0/1 3 track 2

Status of Tracking Object

If 10.1.1.1 is reachable, a static default route via R2 with an administrative distance of 2, is installed in the routing table If 172.16.1.1 is reachable, a static default route via R3 with an administrative distance of 3 is available to the routing table as a backup path.

AD=2 AD=3

172.16.1.1

44

Example: Type DNS

RouterB(config)# ip sla monitor 11 RouterB(config-rtr)# type dns target-addr www.cisco.com name-server 172.20.2.132 RouterB(config-rtr)# frequency 60 RouterB(config-rtr)# exit RouterB(config)# ip sla monitor schedule 11 life forever start-time now

To measure the difference between the time taken to send a DNS request and the time a reply is received by a Cisco device, use the IP SLAs DNS operation. Configuration of an IP SLAs operation type of DNS to find the IP address of the hostname cisco.com. The DNS operation number 11 is scheduled to start immediately and run indefinitely. To view and interpret the results of an IP SLAs operation use the show ip sla monitor statistics command.

45

Common IP SLA Issues


Sender

Sender

Receiver

Probes will cause a burden if overscheduled If multiple senders overwhelm one receiver, or if the device is already a bottleneck and its CPU utilization is high. Senders generally suffer more from the over-scheduling and frequency of probes. Probe scheduling can be problematic if the clock on the device is out of sync Reason synchronizing through Network Time Protocol (NTP) is highly recommended

46

Cisco Internetwork Performance Monitor (IPM) Several Cisco network management applications use IP SLAs One example is the Cisco Internetwork Performance Monitor (IPM) in CiscoWorks2000 RWAN bundle.

47

Intro to Cisco IP SLA Operations - SolarWinds Video http://www.youtube.com/watch?v=x-fQr24kFKg

48

Network Performance Monitoring: Using IP SLA Monitor with Orion NPM http://www.youtube.com/watch?v=YKXoexOVsaE&feature=relat ed

49

Implementing Redundant Supervisor Engines in Catalyst Switches

Redundancy Features on Catalyst 4500/6500


RPR (Route Processor Redundancy) and RPR+ (only on Catalyst 6500) SSO (Stateful SwitchOver) NSF (Non-Stop Forwarding) with SSO

SE1 SE2

Route Processor Redundancy (RPR)


Redundancy RPR RPR+ Catalyst 6500 Failover Time 2-4 minutes 30-60 seconds Catalyst 4500 Failover Time Less than 60 seconds ---

With RPR, any of the following events triggers a switchover from the active to the standby Supervisor Engine: Route Processor (RP) or Switch Processor (SP) crash on the active Supervisor Engine. A manual switchover from the CLI. Removal of the active Supervisor Engine. Clock synchronization failure between Supervisor Engines. In a switchover, the redundant Supervisor Engine becomes fully operational and the following events occur on the remaining modules during an RPR failover: All switching modules are power-cycled. Remaining subsystems on the MSFC (including Layer 2 and Layer 3 protocols) are initialized on the prior standby, now active, Supervisor Engine. ACLs based on the new active Supervisor Engine are reprogrammed into the Supervisor Engine hardware.

Route Processor Redundancy Plus (RPR+)


Redundancy RPR RPR+ Catalyst 6500 Failover Time 2-4 minutes 30-60 seconds Catalyst 4500 Failover Time Less than 60 seconds ---

RPR+ enhances Supervisor redundancy compared to RPR by providing the following additional benefits: Reduced switchover time: Depending on the configuration, the switchover time is in the range of 30 seconds to 60 seconds. No reloading of installed modules: Because both the startup configuration and the running configuration stay continually synchronized from the active to the redundant Supervisor Engine during a switchover, no reloading of line modules occurs. Synchronization of Online Insertion and Removal (OIR) events between the active and standby: This occurs such that modules in the online state remain online and modules in the down state remain in the down state after a switchover.

Configuring and Verifying RPR+ Redundancy


Step 1. Use the redundancy command to start configuring redundancy modes: Step 2. Use the mode rpr-plus command under redundancy configuration submode to configure RPR+:
Switch# configure terminal Enter configuration commands, one per line. End with CNTL/Z. Switch(config)# redundancy Switch(config-red)# mode rpr-plus Switch(config-red)# end Switch# show redundancy states my state = 13 ACTIVE peer state = 1 -DISABLED Mode = Simplex Unit = Primary Unit ID = 1 Redundancy Mode (Operational) = Route Processor Redundancy Plus Redundancy Mode (Configured) = Route Processor Redundancy Plus Split Mode = Disabled Manual Swact = Disabled Reason: Simplex mode Communications = Down Reason: Simplex mode <output omitted>

Stateful Switchover (SSO)


Provides minimal Layer 2 traffic disruption during Supervisor switchover. Redundant Supervisor starts up in fully initialized state and synchronizes with startup configuration and running configuration of active Supervisor. Standby Supervisor in SSO mode keeps in sync with active Supervisor for all changes in hardware and software states for features supported via SSO.

Protocols and Features Supported by SSO


802.3x (Flow Control) 802.3ad (LACP) and PAgP 802.1X (Authentication) and Port security 802.3af (Inline power) VTP Dynamic ARP Inspection/DHCP snooping/IP source guard IGMP snooping (versions 1 and 2) DTP (802.1Q and ISL) MST/PVST+/Rapid-PVST PortFast/UplinkFast/BackboneFast /BPDU Guard and filtering Voice VLAN Unicast MAC filtering ACL (VLAN ACLs, Port ACLs, Router ACLs) QOS (DBL) Multicast storm control/broadcast storm control

Configuring and Verifying SSO


Step 1. Enter the redundancy command to start configuring redundancy modes.ancy Step 2. Use the mode sso command under redundancy configuration submode to configure RPR+:
Switch# configure terminal Enter configuration commands, one per line. End with CNTL/Z. Switch(config)# redundancy Switch(config-red)# mode sso Changing to sso mode will reset the standby. Do you want to continue? [confirm] Switch(config-red)# end Switch# show redundancy states my state = 13 ACTIVE peer state = 8 -STANDBY HOT Mode = Duplex Unit = Primary Unit ID = 2 Redundancy Mode (Operational) = Stateful Switchover Redundancy Mode (Configured) = Stateful Switchover Split Mode = Disabled Manual Swact = Enabled Communications = Up <output omitted>

NSF with SSO


Catalyst 4500 and 6500. Minimizes time that L3 network is unavailable following Supervisor switchover by continuing to forward IP packets using CEF entries built from the old active Supervisor. Zero or near zero packet loss. Supports BGP, EIGRP, OSPF, and IS-IS. Routing protocol neighbor relationships are maintained during Supervisor failover. Prevents route flapping.

CIS 187 CCNP SWITCH Ch. 5 IP SLAs


Rick Graziani Cabrillo College graziani@cabrillo.edu Spring 2011

You might also like