You are on page 1of 44

IBM Tivoli Switch Analyzer

Troubleshooting Guide
By Michael L. Webb
Version 2.0
Copyright Notice
Copyright IBM Corporation 2005. All rights reserved. May only be used pursuant
to a Tivoli Systems Software License Agreement, an IBM Software License
Agreement, or Addendum for Tivoli Products to IBM Customer or License Agreement.
No part of this publication may be reproduced, transmitted, transcribed, stored
in a retrieval system, or translated into any computer language, in any form or
by any means, electronic, mechanical, magnetic, optical, chemical, manual, or
otherwise, without prior written permission of IBM Corporation. IBM Corporation
grants you limited permission to make hardcopy or other reproductions of any
machine-readable documentation for your own use, provided that each such
reproduction shall carry the IBM Corporation copyright notice. No other rights
under copyright are granted without prior written permission of IBM Corporation.
The document is not intended for production and is furnished “as is” without
warranty of any kind. All warranties on this document are hereby disclaimed,
including the warranties of merchantability and fitness for a particular
purpose.
U.S. Government Users Restricted Rights -- Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.
Trademarks
IBM, the IBM logo, Tivoli, the Tivoli logo, AIX, NetView, Tivoli Enterprise,
Tivoli Enterprise Console are trademarks or registered trademarks of
International Business Machines Corporation or Tivoli Systems Inc. in the United
States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
Extreme Networks is a registered trademark of Extreme Networks, Inc. Summit,
Summit5iTx, and Summit48si are trademarks of Extreme Networks, Inc.
Catalyst, Cisco, Cisco IOS, Cisco Systems are registered trademarks or
trademarks of Cisco Systems, Inc.
NORTEL NETWORKS is a trademark of Nortel Networks. BayStack is a trademark of
Nortel Networks.
3Com, the 3Com logo, and SuperStack are registered trademarks of 3Com
Corporation.
Symbol Technologies is a registered trademark of Symbol Technologies, Inc.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product, and service names may be trademarks or service marks of
others.
Notices
References in this publication to Tivoli Systems or IBM products, programs, or
services do not imply that they will be available in all countries in which
Tivoli Systems or IBM operates. Any reference to these products, programs, or
services is not intended to imply that only Tivoli Systems or IBM products,
programs, or services can be used. Subject to valid intellectual property or
other legally protectable right of Tivoli Systems or IBM, any functionally
equivalent product, program, or service can be used instead of the referenced
product, program, or service. The evaluation and verification of operation in
conjunction with other products, except those expressly designated by Tivoli
Systems or IBM, are the responsibility of the user. Tivoli Systems or IBM may
have patents or pending patent applications covering subject matter in this
document. The furnishing of this document does not give you any license to these
patents. You can send license inquiries, in writing, to the IBM Director of
Licensing, IBM Corporation, North Castle Drive, Armonk, New York 10504-1785,
U.S.A.

ii
About the Tivoli Field Guides
Sponsor
Tivoli Customer Support sponsors the Tivoli Field Guide program.

Authors
Those who write field guides belong to one of these two groups:

Tivoli Support and Services Engineers who work directly with customers
Tivoli Customers and Business Partners who have experience using Tivoli software in a
production environment

Audience
The field guides are written for all customers, both new and existing. They are applicable to
external audiences including executives, project leads, technical leads, team members, and to
internal audiences as well.

Types of Field Guides


Two types of Tivoli Field Guides describe how Tivoli products work and how they are used in real
life situations:

Field Guides for technical issues are designed to address specific technical scenarios or
concepts that are often complex to implement or difficult to understand, for example:
endpoint mobility, migration, and heartbeat monitoring.
Field Guides for business issues are designed to address specific business practices that
have a high impact on the success or failure of an ESM project, for example: change
management, asset management, and deployment phases.

Purposes
The Field Guide program has two major purposes:

To empower customers & business partners to succeed with Tivoli software by


documenting and sharing product information that provides accurate and timely
information on Tivoli products and the business issues that impact an enterprise systems
management project
To leverage the internal knowledge within Tivoli Customer Support and Services and the
external knowledge of Tivoli customers and Business Partners

Availability
All completed field guides are available free to registered customers and internal IBM employees
at the following Web site:

http://www.ibm.com/software/sysmgmt/products/support/Field_Guides.html

Authors can submit proposals and access papers by e-mail:


Tivoli_eSupport_Feedback@us.ibm.com

iii
Table of Contents

1 DAEMON STARTUP TROUBLESHOOTING....................................... 1


1.1 NETVIEW SERVER IS NOT DISCOVERED PROPERLY .................................................. 1
1.2 NETVIEW SERVER IS MULTI-HOMED ....................................................................... 1
1.3 NETVIEW SERVER INITIAL DISCOVERY ................................................................... 2
1.4 ORPHAN OR UN-PARENTED PROCESSES ................................................................... 2
2 DISCOVERY TROUBLESHOOTING .................................................... 3
2.1 HOW DO I GENERATE SWITCH ANALYZER REPORTS? ............................................. 3
2.1.1 How to generate a Discovery Report.............................................................. 3
2.1.2 How to generate a Summary Report............................................................... 3
2.1.3 How to generate a Status Report .................................................................... 4
2.1.4 How to generate an Impact Analysis Report .................................................. 4
2.1.5 I cannot generate the reports.......................................................................... 4
2.2 HOW DO I ADD AN OID OF A NEW SWITCH TYPE?................................................... 5
2.3 HOW DO I REDISCOVER MY SWITCH AFTER AN UPDATE?......................................... 5
2.4 MY SWITCH DOES NOT SHOW UP IN NETVIEW AND/OR SWITCH ANALYZER............ 6
2.5 MY SWITCH IS NOT DISCOVERED COMPLETELY ....................................................... 7
2.5.1 Switch Requirements....................................................................................... 7
2.5.2 Discovery Issues with Layer 3 Switches (as Routers)..................................... 7
2.5.3 Discovery Issues with Symbol Access Points.................................................. 8
2.5.4 Discovery Issues with 3Com Switches ............................................................ 9
2.5.5 Discovery Issues with NorTel Networks Switches .......................................... 9
2.5.6 Discovery Issues with Centillion Switches...................................................... 9
2.5.7 Discovery Issues with Cisco 2900XL Switches............................................... 9
2.5.8 Discovery Issues with Cisco 2950 Switches.................................................. 10
2.5.9 Discovery Issues with Extreme Networks Switches ...................................... 10
2.6 HOW DO I SOLVE PROBLEMS FOUND IN THE SUMMARY REPORT? ......................... 11
2.6.1 Discovery is in progress for the following nodes ......................................... 11
2.6.2 Discovery has been completed for the following nodes, but one or more
errors occurred (# = retry count has been exceeded)................................................ 11
2.6.3 Discovery has been completed for the following nodes, but node was
unreachable via layer 2 segments .............................................................................. 12
2.6.4 Discovery has been turned off for the following nodes ................................ 12
2.6.5 Discovery has been completed for the following nodes................................ 13
2.6.6 Discovery will not be done for the following nodes because they are located
within a remote campus.............................................................................................. 14
2.6.7 The following (layer 2) nodes are located within a remote campus and are
being monitored for status only.................................................................................. 14
2.6.8 The following (layer 3) nodes are being monitored for status only.............. 15
2.6.9 NetView Custom Links .................................................................................. 15

iv
2.6.10 IBM Tivoli Remote Campus Installation Service for Switch Analyzer (TWL)
16
2.7 HOW DO I SOLVE PROBLEMS FOUND IN THE DISCOVERY REPORT? ....................... 18
2.7.1 Nodes placed on the wrong switch................................................................ 18
2.7.2 An asterisk (*) shows up after a node........................................................... 18
2.7.3 Router-on-a-stick subinterfaces are not displayed ....................................... 19
3 ROOT CAUSE / IMPACT ANALYSIS TROUBLESHOOTING ......... 21
3.1 HOW DOES SWITCH ANALYZER WORK WITH NETVIEW? ...................................... 22
3.1.1 What does the “[B]” symbolize in my root cause event? ............................. 22
3.1.2 What does the “[D]” symbolize in my root cause event?............................. 23
3.2 WHY DID I GET SEVERAL ROOT CAUSE EVENTS AT ONCE? .................................... 23
3.2.1 Unmanaged Devices ..................................................................................... 23
3.2.2 Discovery Poll............................................................................................... 23
3.3 WHY IS THE LAYER2STATUS FIELD EMPTY FOR SOME SWITCHES?........................ 24
3.4 MY IMPACT ANALYSIS REPORT DOES NOT SHOW ANYTHING ................................ 24
3.5 IT TAKES A LONG TIME FOR ROOT CAUSE EVENTS TO SHOW UP ............................. 25
3.6 WHY IS THE ROOT CAUSE SOMETIMES THE SWITCH PORT AND SOMETIMES THE END
NODE?............................................................................................................................. 26

4 WEB CONSOLE TROUBLESHOOTING............................................. 28


4.1 WEB CONSOLE STARTUP PROBLEMS .................................................................... 28
4.1.1 Applet com.tivoli.netview.client.NetViewApplet notinited............................ 28
4.1.2 Cannot start the NetView Web Console........................................................ 28
4.1.3 Cannot connect to the NetView Web Console .............................................. 28
4.1.4 Cannot launch submap explorer from Tivoli Event Console........................ 29
4.2 LAYER 2 MENU MISSING ....................................................................................... 29
4.3 OUT OF MEMORY ERROR ....................................................................................... 29
4.4 PHYSICAL VIEW DOES NOT SHOW ATTACHED DEVICES ......................................... 29
4.5 MY POINT-TO-POINT VIEW TIMES OUT ................................................................ 30
4.6 THERE ARE NO SWITCHES IN MY POINT-TO-POINT VIEW ...................................... 31
4.6.1 Remote Campus ............................................................................................ 31
4.6.2 Router-on-a-Stick.......................................................................................... 31
4.7 THERE IS AN X SYMBOL IN MY POINT-TO-POINT VIEW ......................................... 32
5 PORT STATUS MONITOR TROUBLESHOOTING ........................... 34
5.1 I SEE TWO ROOT CAUSE EVENTS ............................................................................ 34
5.2 I SEE UNEXPECTED PORT STATUS .......................................................................... 35
5.2.1 My switch ports are Interface Up/Correlated Down (impact)...................... 35
5.2.2 My switch ports are Interface Up/Correlated Unmanaged .......................... 36
5.3 HIGH CPU USAGE FOR L2_EVENT_ADAPTER PROCESS ......................................... 37

v
IBM Tivoli Switch Analyzer
Troubleshooting Guide
The IBM Tivoli Switch Analyzer Troubleshooting Guide v2.0 is designed to assist users in
identifying and resolving common technical problems that may occur with IBM Tivoli Switch
Analyzer v1.3. If using IBM Tivoli Switch Analyzer v1.2.1 or earlier, then please refer to the IBM
Tivoli Switch Analyzer Troubleshooting Guide v1.0.

This guide (v2.0) lists common symptoms and problem areas, and then provides a solution for the
user to implement in each case. To avoid referring to this guide too often, prior to installation of
the product the user is encouraged to read and follow the steps outlined in the IBM Tivoli Switch
Analyzer v1.3 Deployment Guide.

Problems are sometimes unavoidable with enterprise networks, and usage problems often occur
with network management products. Therefore, this troubleshooting guide should be consulted
when the user is experiencing abnormal behavior with IBM Tivoli Switch Analyzer. These
unusual characteristics are differentiated by certain symptoms, where each symptom can be
traced to one or more problems identified in the Table of Contents of this guide. The Table of
Contents should be read over thoroughly in order to identify the problem area most closely
related to the one experienced by the user. Once identified, each problem can be resolved by
implementing the solution in the appropriate section of this document.

If the user still experiences problems despite having exhausted all solutions covered in this guide,
then IBM Tivoli Support should be contacted for additional assistance. Please have current and
accurate information about the network topology and the NetView server available at all times.
Rapid and intelligent decisions can be made with complete information on hand about the
environment experiencing the problem. Providing relevant and accurate information to IBM Tivoli
Support will aid troubleshooting and allow it to be done quickly and easily.

About the Author


Michael L. Webb is currently a Software Verification Engineer in the IBM Tivoli Quality Assurance
area. He works as a member of the IBM Tivoli NetView and IBM Tivoli Switch Analyzer
verification teams.

Webb joined IBM in 1991 and has been a part of several IBM networking product areas since his
arrival. He has a general networking background that spans both development and test
organizations

vi
1 Daemon Startup Troubleshooting

1.1 NetView server is not discovered properly


Some users may find that they cannot start the itsl2 daemon. The most likely cause for this is
that the NetView server itself has not been discovered properly. If this is the case, then the
following message should be displayed in file correlator.log:

Abort : cannot find node for management system host: <hostname> [<IP address>]

Ensure the server itself is discovered with SNMP. Check the symbol on the NetView map for an
inner shape, and check that the ovwdb field for this node has the field isSNMPSupported set to
TRUE. Also ensure there are no DNS discrepancies with the hostname and IP address. Each
should resolve to the other.

1.2 NetView server is multi-homed


A NetView server that has more than one network card in it (multi-homed) connecting it to the
network can potentially cause the itsl2 daemon not to start successfully. If this is the case, then
the following message should be in the correlator.log:

Abort : cannot find node for management system host: <hostname> [<IP address>]

The itsl2 daemon will use the first IP address for the server that it gets from DNS. This IP
address must match the IP address of an interface within the NetView topology database. If a
matching interface for the NetView server is not found, then the itsl2 daemon will not start.

To resolve this problem, the user should perform one of the following actions:

1. Demand poll the NetView server in order to discover the missing interface
2. Determine the IP address of one of the NetView server interfaces by entering the
command ovtopodump -r <hostname>
Then update the mgt_host field in /usr/OV/ITSL2/conf/correlator.ini as
follows:
[ManagementSystem]
mgt_host = IP address of interface

A system administrator can provide multiple entries in DNS for multi-homed servers. When the
correlator mgt_host field is blank, as it is by default, then the itsl2 daemon will use the first IP
address found in DNS. The IP address currently used by itsl2 is specified by the <IP address>
printed in the Abort log message shown earlier.

1
1.3 NetView server initial discovery
One reason for the itsl2 daemon not to start is a timing problem that sometimes occurs during
NetView’s initial discovery. As noted above in an earlier section, the itsl2 daemon will not start if
the server machine itself has not yet been discovered. If the user has performed a rediscovery of
the network (clearing the NetView databases and rediscovering), then it is possible that the itsl2
daemon is requiring additional information about the server machine itself before the NetView
discovery has completely obtained it. In this case, the user should simply wait until the bulk of the
new discovery has been completed and then restart the itsl2 daemon.

Some users may want to obtain layer 2 topology information before the NetView server has
completed its discovery of the enterprise network. In this case, the user should add the NetView
server to the top of seed file for quicker discovery of the server. In this way, the itsl2 daemon can
be started sooner since the server will be discovered earlier in the NetView discovery phase.
Please keep in mind that the entire network needs to be discovered by the NetView server before
IBM Tivoli Switch Analyzer can put together the complete layer 2 topology. Therefore, the itsl2
daemon may need to be restarted again later to pick up missing nodes. It is recommended that
the itsl2 daemon be started after the NetView discovery is complete.

Note that each time the itsl2 daemon starts, IBM Tivoli Switch Analyzer will rediscover the entire
layer 2 topology. Allow at least 15 minutes after startup for the topo_cache file to be created.
This file is generated for the reports viewable from the GUI and is refreshed every 15 minutes by
default. This setting can be modified in file /usr/OV/ITSL2/conf/correlator.ini via the
topo_cache_freq field.

1.4 Orphan or un-parented processes


One reason for the itsl2 daemon not to start is that of orphaned or un-parented IBM Tivoli Switch
Analyzer processes. These particular processes may exist due to a previous itsl2 failure, which
left one of the itsl2 processes, usually correlator, as an orphan or un-parented process. Once this
process is terminated, then the itsl2 daemon will start.

2
2 Discovery Troubleshooting
This chapter will focus on general problems encountered during the IBM Tivoli Switch Analyzer
discovery process. However, many discovery issues can be avoided by reading through
“Chapter 2. Installing Tivoli Switch Analyzer” of the IBM Tivoli Switch Analyzer Administrator’s
Guide. In there you will find the following:

1. Prerequisite information
2. Information on eliminating network islands
3. How to find missing switches
4. Verifying the installation

Once the above issues have been addressed, then you should proceed through this chapter of
this guide.

2.1 How do I generate Switch Analyzer reports?


The user can generate four reports with IBM Tivoli Switch Analyzer: the Discovery Report, the
Summary Report, the Status Report, and the Impact Analysis Report.

2.1.1 How to generate a Discovery Report


The Discovery Report can be generated as follows:

ITSL2_reports –r layer2 [-d]

Use this command to provide additional options:

ITSL2_reports –h

The Discovery Report is an ASCII layout of the customer’s enterprise network at layer 2, or the
switch level. Note that the layer 2 topology can also be displayed graphically via the layer 2
topology views in the Web Console. The ASCII Discovery Report generated via the command
above will list each discovered switch and the discovered devices connected to each port on each
switch. The user can also use the Discovery Report to verify the layer 2 topology at the port level
when working through the issues outlined in the Summary Report.

Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.

2.1.2 How to generate a Summary Report


The Summary Report can be generated as follows:

ITSL2_reports –r summary [-d]

The Summary Report will list nodes sorted by discovery return codes as seen in the Discovery
Report. Use the Summary Report to identify and resolve as many discovery discrepancies as
possible to ensure that the maximum amount of information is available during discovery. Keep
in mind that missing information for one switch may have an adverse effect on the accuracy of the
topology for the devices attached downstream from that switch. Section 2.6 “How do I solve

3
problems found in the Summary Report?” of this guide will walk you through each section and
provide explanations regarding how to resolve discovery problems.

Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.

2.1.3 How to generate a Status Report

The Status Report can be generated as follows:

ITSL2_reports –r status [-d]

The Status Report will list switch nodes, sorted by name, that are currently managed by the port
status monitor function of IBM Tivoli Switch Analyzer. Use the Status Report to identify the
current port status for all switch devices discovered.

Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.

2.1.4 How to generate an Impact Analysis Report


The Impact Analysis Report can be generated as follows:

ITSL2_reports –r whatif -o <object ID>

The Impact Analysis Report lists the nodes that would be affected if the selected device indicated
by the object ID were to fail. The output displays all of the nodes downstream of the device,
downstream being from the viewpoint of the server station.

Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.

2.1.5 I cannot generate the reports


There may be times that the user cannot generate a report, in which the following error message
may appear:

Error: cannot parse node record in topology file


[/usr/OV/ITSL2/cache/topo_db.out]

The most frequent cause of this error is that the user has recently cleared the NetView database
and rediscovered the network and the user is also managing remote networks via a services tool
called IBM Tivoli Remote Campus Installation Service for IBM Tivoli Switch Analyzer. This tool
allows the user to create campus links and is described further in section 2.6.10, “IBM Tivoli
Remote Campus Installation Service for Switch Analyzer (TWL).”

The problem of not being able to generate the reports is that the object IDs in the NetView
database have changed after the rediscovery. This may cause a problem with the artificial links
that were created with IBM Tivoli Remote Campus Installation Service for IBM Tivoli Switch
Analyzer. To solve this problem, the user needs to delete the old link entries via the services tool
and then add them back.

Another typical reason for this error is that the topo_db.out file is incomplete. This could
happen when a report was run against the file while the file was being written to. Some possible
solutions would be:

4
Wait a few seconds and re-run the report.
Use /usr/OV/ITSL2/bin/corrcl (option 2) to dump a new topology file.
Use '-d' option when running the report.

2.2 How do I add an OID of a new switch type?


In order to successfully add an OID for a switch, perform the steps in this section. Refer to the
following example ovtopodump –X output:

Node ID Object IP Status L2 Status IP Address SNMP OID Layer 2 OID?


267 cs3550b.lab.com Up Up 33.33.1.11 1.3.6.1.4.1.9.1.366 Yes
269 cs3550a.lab.com Up Up 33.33.1.10 1.3.6.1.4.1.9.1.366 Yes

The steps used to add an OID for a new switch type are:

1. Verify that the switch is not in the above output from the ovtopodump –X command.
If the switch already shows up in this output, no further steps are necessary.
2. Add the SNMP sysObjectID in the /usr/OV/conf/oid_to_type file with either a B, H,
or BH flag.
Note: Do not use the G flag.
3. Run ovstop netmon
4. Run ovstart netmon
5. Demand poll the switch from the server console.

If IBM Tivoli Switch Analyzer was not installed already, the newly added OID will be incorporated
automatically during the install.

If IBM Tivoli Switch Analyzer was installed already prior to adding the OID to the oid_to_type
file, then perform the following commands:

1. Run ovstop itsl2


2. Run the following command:
/usr/OV/bin/importNvOids (for UNIX)
\usr\ov\ITSL2\bin\importnvoids.bat (for Windows)
3. Run ovstart itsl2

Rerun the ovtopodump –X command and verify that the switch is now in the list. After IBM
Tivoli Switch Analyzer is installed, the rightmost column, Layer 2 OID, should contain a Yes for
the switch.

2.3 How do I rediscover my switch after an update?


After resolving a problem or a set of problems, the user should initiate a rediscovery of the layer 2
topology, wait the amount of time indicated by the topo_cache_freq period in the
correlator.ini file (default is 15 minutes), and regenerate the reports. For problems
involving missing information in the switch device’s forwarding tables, expect to see differences in

5
the topology each time the reports are generated due to the order of discovery of network
devices.

To initiate a rediscovery, the user can restart the itsl2 daemon, which will perform a completely
new discovery of the network. Or, if there are only a small number of changes, the user can
select each updated switch in the server submap and click on the menu items, Monitor t Layer 2
t Rediscover, without incurring the overhead of restarting the itsl2 daemon. In this case the
switch itself plus those switches nearby will be rediscovered. These discovery requests are
logged in the /usr/OV/ITSL2/log/l2_topo_adapter.log. View this file to see discovery
requests that are in progress.

The user can also use the following command to issue a switch rediscovery:

/usr/OV/ITSL2/L2_topo_req.sh –s <selection name>

In addition, the itsl2 daemon performs a periodic discovery poll that is defaulted to run every 24
hours. The discovery poll forces a rediscovery of each switch. Therefore, the user can opt to
wait for the automatic rediscovery instead of manually rediscovering the switches. The user can
modify the setting for the discovery interval value via the discovery_interval field (measured
in minutes) in the /usr/OV/ITSL2/conf/l2_topo_adapter.ini file.

2.4 My switch does not show up in NetView and/or Switch


Analyzer
For general troubleshooting guidelines when a switch does not show up in the NetView maps or
when it does not appear in the IBM Tivoli Switch Analyzer reports, perform the following steps in
this section. Refer to the following example ovtopodump –X output:

Node ID Object IP Status L2 Status IP Address SNMP OID Layer 2 OID?


267 cs3550b.lab.com Up Up 33.33.1.11 1.3.6.1.4.1.9.1.366 Yes
269 cs3550a.lab.com Up Up 33.33.1.10 1.3.6.1.4.1.9.1.366 Yes

The troubleshooting steps are as follows:

1. Run the ovtopodump –X command and view the output as shown above. The output
provides a list of all devices that the NetView server has identified as layer 2 devices.
2. A switch may be missing from this list for any of the following reasons:
a.It is not discovered.
b.The SNMP agent is not running.
c.NetView server does not have the community string (name) for the switch.
d.The SNMP sysObjectID for the switch is not in /usr/OV/conf/oid_to_type with
either a B, H, or BH flag (but not G).
e. The switch has more than one IP interface in NetView database.
3. Resolve the problem that applies to the missing switch.
a. To discover a missing switch, try pinging it and then execute a demand poll of the
nearest router, or put an entry in the seed file for it.
b. If the problem is 2b, 2c, or 2d, have the NetView administrator correct the problem
and then demand poll the switch.
c. If the problem is 2e, have the network administrator correct the problem and then
demand poll the switch.
d. If modifications were made to the oid_to_type file, restart the netmon daemon and
demand poll the switch.

6
If IBM Tivoli Switch Analyzer is installed and the NetView server has discovered the switch as a
switch object, then the rightmost column of the ovtopodump –X output, Layer 2 OID, should
contain a Yes for the switch. If this column contains a No, or if modifications were made to
oid_to_type file after installing IBM Tivoli Switch Analyzer, the user must perform the following
steps:

1. Run ovstop itsl2.


2. Run the following command:
/usr/OV/bin/importNvOids (for UNIX)
\usr\ov\ITSL2\bin\importnvoids.bat (for Windows)
3. Run ovstart itsl2.

Note: If the switch device was initially discovered as a node (a blank object in the NetView map
with no switch symbol) and IBM Tivoli Switch Analyzer was already installed when the problem
was resolved (converting the node object to a switch object), then the user must ovstop and
ovstart the itsl2 daemon.

Finally, rerun the ovtopodump –X command; verify that the switch is now in the list and the
rightmost column, Layer 2 OID, contains a Yes for the switch.

2.5 My switch is not discovered completely


2.5.1 Switch Requirements
The accuracy and completeness of IBM Tivoli Switch Analyzer’s layer 2 discovery depends
primarily on the availability of enough accurate information accessible from the Bridge MIB
forwarding tables and certain private MIBs. The switch must support the RFC 1493 Bridge MIB,
which is the minimum requirement for switch management.

Switches that support multiple VLANs often require an additional SNMP querying technique
referred to as Community String Indexing (CSI). CSI is used to support Cisco switches with
multiple VLANs. Other vendor switches that do not require CSI or access to private MIBs for
VLAN support are also supported. For non-Cisco vendors requiring CSI, discovery is restricted to
only those ports in the default VLAN (usually VLAN 1). CSI is not supported in IBM Tivoli Switch
Analyzer for non-Cisco switch devices.

2.5.2 Discovery Issues with Layer 3 Switches (as Routers)


Some customers use layer 3 switches as routers, as shown in the following diagram.

7
L3 SW
NetView
ITSA

SW 01 SW 02 SW 03

When using a layer 3 switch as a router (such as the Cisco 3750), the downstream switch
devices may not be discovered properly if the only router for these switch devices is the layer 3
switch. This is a known limitation and will be addressed in a future release.

However, when the Layer 3 switch is the Cisco 6509 configured in hybrid mode, discovery can
work when IBM Tivoli Switch Analyzer Interim version 1.3 Fix IY67325 has been installed.

2.5.3 Discovery Issues with Symbol Access Points


Some access point devices do not support the Bridge MIB, such as the Symbol Technologies
Spectrum access point devices. These devices may be configured as switches in NetView, but
should not be configured as switches in IBM Tivoli Switch Analyzer. You can determine the
behavior of these devices by walking the dot1dBridge for each device.

snmpwalk –c <Community String of Device> <IP Address> dot1dBridge

If it does not return anything, then the device does not support the Bridge MIB.

In this case, IBM Tivoli Switch Analyzer can be configured to not discover the access points as
switch devices in the /usr/OV/ITSL2/conf/files/l2_oids.cfg file as follows (these
particular access points are used as an example):

From:
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.1|*|Y|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.3|*|Y|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.5|*|Y|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.8|*|Y|

To:
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.1|*|N|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.3|*|N|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.5|*|N|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.8|*|N|

For complete information on how to disable devices from being discovered, please refer to
section, “Selectively disabling discovery” of the IBM Tivoli Switch Analyzer Administrator’s Guide
(in Chapter 3).

8
2.5.4 Discovery Issues with 3Com Switches
IBM Tivoli Switch Analyzer only supports Community String Indexing (CSI) for Cisco switches,
which allows the discovery of multiple VLANs on Cisco devices. Multiple VLAN support exists for
non-Cisco devices where CSI or access to private MIB information is not required to access the
port forwarding table information within various VLANs.

Although many non-Cisco switches fall into this category, the 3Com SuperStack II 1100 and 3300
switches are not in this category since they require CSI. If multiple VLANs are configured on
these devices (with the operating system code used in our lab), IBM Tivoli Switch Analyzer will
not discover ports that are not in the default VLAN.

The switch information for these devices that are not fully supported by IBM Tivoli Switch
Analyzer but that have been used in Tivoli labs are listed here:

3Com SuperStack II 1100 or 3300


Operational Version : 2.40 or 2.70
Hardware Version : 2
Boot Version : 1.00

There may be other 3Com switches that Tivoli does not support. The 1100 and 3300 are only
examples of switches tested within the Tivoli lab that have demonstrated discovery problems due
to their requirement of CSI, which is not supported for non-Cisco switches.

2.5.5 Discovery Issues with NorTel Networks Switches


The following NorTel Networks switches have been tested in the Tivoli test lab: BayStack 380
24PT, BayStack 470 48PT, and the Business Policy Switch 2000. Although accurate discovery
was observed for the BayStack 470 and the Business Policy Switch 2000 switches, discovery
problems were encountered with the BayStack 380.

Here is the information for the switch that is not supported:

BayStack 380
HW:R01 FW:2.0.0.12 SW:v2.0.0.46
OID: 1.3.6.1.4.1.45.3.45.1

This list is not meant to be a complete list of Tivoli non-supported NorTel Networks switches.

2.5.6 Discovery Issues with Centillion Switches


Tivoli does not support layer 2 discovery of some Centillion switches. In particular, the following
Centillion switch is not supported:

Centillion 100 MCP version 3.2.2 Advanced Image (9811201)


OID: .iso.org.dod.internet.private.enterprises.930.1.1

2.5.7 Discovery Issues with Cisco 2900XL Switches


IBM Tivoli Switch Analyzer supports Community String Indexing (CSI) for most Cisco switches,
which allows port discovery for ports in multiple VLANs. However, older releases of the Cisco
2900XL switch IOS (OID: .1.3.6.1.4.1.9.1.218) do not support CSI. This causes discovery
problems since the 2900XL does not return complete Bridge MIB information without CSI
capability.

9
Upgrading to a newer release of the code, such as IOS code level 12.0.5.WC7, dated 10-MAR-
2003, has proven to work successfully in the Tivoli test lab. This updated level of code does
support CSI and there have been no discovery issues when using it. Here is the information for
this particular switch when upgraded:

Cisco 2900XL with OID: .1.3.6.1.4.1.9.1.218


Cisco Internetwork Operating System Software
IOS (tm) C2900XL Software (C2900XL-C3H2S-M), Version 12.0(5)WC7
Copyright (c) 1986-2003 by cisco Systems, Inc.
Compiled Wed 05-Mar-03 10:26 by antonino
System image file is "flash:c2900xl-c3h2s-mz.120-5.WC7.bin"

2.5.8 Discovery Issues with Cisco 2950 Switches


Some levels of the CISCO 2950 switch IOS, specifically versions 12.1.19 and 12.1.20, appear to
have problems when responding to SNMP queries. A solution that we have seen work is to
downgrade the IOS to the following:

Cisco Internetwork Operating System Software


IOS (tm) C2950 Software (C2950-I6Q4L2-M), Version 12.1(13)EA1c,
Copyright (c) 1986-2003 by cisco Systems, Inc.
Compiled Tue 24-Jun-03 17:31 by yenanh
System image file is "flash:/c2950-i6q4l2-mz.121-13.EA1c.bin"

2.5.9 Discovery Issues with Extreme Networks Switches


The minimum requirement for IBM Tivoli Switch Analyzer discovery is complete access to the
Bridge MIB in RFC 1493. However, some Extreme Networks switches are defaulted to prevent
SNMP queries to the Bridge MIB forwarding table (disable snmp dot1dTpFdbTable). As a
result, IBM Tivoli Switch Analyzer will not discover ports on Extreme switches that have this
access disabled. To remedy this problem, the user must enable SNMP access to the Bridge
forwarding tables in order for these devices to be discovered completely.

Some of the Extreme switches that have default configurations that prevent SNMP queries of the
Bridge MIB forwarding tables include:

Description: Summit5i - Version 6.2.2 (Build 81) by Patch_Master 02/05/03


OID: .1.3.6.1.4.1.1916.2.15

Description: Summit5iTx - Version 6.2.2 (Build 81) by Patch_Master 02/05/03


OID: .1.3.6.1.4.1.1916.2.22

Description: Summit48si - Version 6.2.2 (Build 81) by Patch_Master 02/05/03


OID: .1.3.6.1.4.1.1916.2.28

The above list is not intended to be complete, and there may be other switches that fall into this
category. On the other hand, information is not available on all Extreme switches. Therefore,
there may in fact exist some Extreme switches that do not disable SNMP queries for the
forwarding tables by default and may, therefore, allow complete discovery out of the box.

10
2.6 How do I solve problems found in the Summary Report?
The switch devices in the Summary Report are sorted by discovery return code. This section will
cover these error categories and how to resolve each issue. The goal is to move as many switch
nodes as possible into the group described below in subsection 2.6.5, “Discovery has been
completed for the following nodes.”

2.6.1 Discovery is in progress for the following nodes


The Summary Report can group switch objects into many categories, one of which is, “Discovery
is in progress for the following nodes.” This indicates that IBM Tivoli Switch Analyzer is still in the
process of discovering these particular switch objects. Wait 15 minutes and regenerate the
report, doing this until this section of the Summary Report is empty.

Users can modify the SNMP discovery parameters used during this process by updating the
following fields in the l2_topo_adapter.ini file.

[Layer2]
retry_interval=900
retry_cnt=3

The retry_interval indicates the amount of time in seconds between retries that SNMP
queries are made against a switch, where the default is 900 seconds (15 minutes). The
retry_cnt is the number of times a retry will take place for a switch, where the default is set to
3. See the next subsection for troubleshooting when the retry count has been exceeded.

2.6.2 Discovery has been completed for the following nodes, but one or more
errors occurred (# = retry count has been exceeded)
The Summary Report can group switch objects into many categories, one of which is, “Discovery
has been completed for the following nodes, but one or more errors occurred (# = retry count has
been exceeded).” See the following as an example:

--------------------------------------------------------------------
Discovery has been attempted for the following nodes, but
one or more errors occurred (# = retry count has been exceeded):
--------------------------------------------------------------------
172.10.10.100/172.10.10.100 [#]

When this occurs, the user should check the /usr/OV/ITLS2/log/l2_topo_adapter.log


for errors against each of the switches in this section.

Typically, errors in this category will be caused by SNMP access problems or missing MIB
information, which is shown by the ”# = retry count has been exceeded” indicator in
the report file. The first step would be to simply walk the Bridge MIB to verify that it is available
for each switch in this category as follows:

snmpwalk –c <community string> <switch IP address> dot1dBridge

This command should not timeout, but if it does the user should resolve as many of the SNMP
access problems as possible. Afterwards, the user can manually trigger rediscovery for each
switch that was found to have an SNMP problem. For information on rediscovery, please refer to
section 2.3, “How do I rediscover my switch after an update?”

11
The nodes in this category of the Summary Report will be automatically re-polled until either the
errors are resolved or the retry count is exceeded. See section 2.6.1, “Discovery is in progress
for the following nodes” for instructions on how to update the SNMP discovery parameters used
during this process.

2.6.3 Discovery has been completed for the following nodes, but node was
unreachable via layer 2 segments
The Summary Report can group switch objects into many categories, one of which is, “Discovery
has been completed for the following nodes, but node was unreachable via layer 2 segments.”
See the following as an example:

--------------------------------------------------------------------
Discovery has been completed for the following nodes, but
node was unreachable via layer 2 segments:
--------------------------------------------------------------------
172.10.20.101/172.10.20.101

In this case, IBM Tivoli Switch Analyzer was unable to trace a path from a router to each of the
switches in this section. This is typically caused by missing entries in the Bridge MIB forwarding
tables on the switch listed in this section or on one of the nearby switches in the network
topology. Consider the following suggestions to help resolve problems in this category.

1. Resolve errors in the l2_topo_adapter.log discussed the section 2.6.2 “Discovery


has been completed for the following nodes, but one or more errors occurred (# = retry
count has been exceeded)”. Doing so may mitigate problems for switches found in this
section too.
2. See if you have a switch or network configuration with a known discovery problem as
documented in section 2.5, “My switch is not discovered completely”.
3. Modify access level switches in the network topology that are involved in discovery
inaccuracies to increase the Forwarding Table cache age from the default of 5 minutes to
15 minutes or more. This provides the itsl2 daemon more time to read important entries
before they are aged out. Look for downstream switches in the Discovery Report whose
end nodes appear on other switches. These downstream switches are candidates to
have their timeouts reconfigured.

2.6.4 Discovery has been turned off for the following nodes
The Summary Report can group switch objects into many categories, one of which is, “Discovery
has been turned off for the following nodes.” See the following as an example:

--------------------------------------------------------------------
Discovery has been turned off for the following nodes:
--------------------------------------------------------------------
172.10.40.102/172.10.40.102
172.10.40.103/172.10.40.103

12
Switches that do not support the required MIBs, such as the dot1dBride MIB or the interfaces
MIB, will appear in this section of the Summary Report. In this case, you should prevent
discovery of these devices as switches. For complete information on how to disable devices from
being discovered, please refer to section, “Selectively disabling discovery” of the IBM Tivoli
Switch Analyzer Administrator’s Guide (in Chapter 3).

For switches in this section of the Summary Report, there will always be a corresponding
message in the l2_topo_adapter.log like the following:

L2 ERROR for node [x.x.x.x]: error in pdu [.1.3.6.1.2.1.17.1.1.0]:


MIB variable does not exist.

When switches appear in this section, it is always a matter of not finding a particular MIB for
these switches. The l2_topo_adapter.log should tell you which MIB OID is causing the
problem.

If errors similar to the one above appear in the l2_topo_adapter.log, it may be possible that
an access list is configured on the switch, preventing the necessary SNMP access via a particular
community string. On Cisco devices, for example, there may be an access list to allow only
certain devices access to the community string as follows:

access-list 1300 permit ...


snmp-server community public RO 1300

In the example above, access list 1300 only allows devices with certain IP addresses access to
community string public. The user needs to make sure that the NetView server has SNMP
access for the community string that is configured for the switch.

Another problem occurs when switches are configured to disable SNMP queries to the Bridge
MIB. For example, some Extreme switches are configured to disable SNMP queries by default.
The user must enable SNMP access to the Bridge MIB in order for layer 2 discovery to work
properly. Try walking the Bridge MIB to verify that it is available for each switch in this section of
the Summary Report as follows:

snmpwalk –c <community string> <switch IP address> dot1dBridge

This command should not timeout. If it does, the user should resolve as many of the SNMP
access errors as possible.

2.6.5 Discovery has been completed for the following nodes


The Summary Report can group switch objects into many categories, one of which is, “Discovery
has been completed for the following nodes.” See the following as an example:

--------------------------------------------------------------------
Discovery has been completed for the following nodes:
--------------------------------------------------------------------
172.10.70.101/172.10.70.101
172.10.70.105/172.10.70.105

In this case, no errors were found for the switches in this section. The more switches in this
category, the more accurate the layer 2 discovered topology will be, as displayed in the Discovery
Report and the layer 2 topology views in the Web Console. Note that there could still be missing
critical entries in the forwarding tables for these nodes, causing topology problems in other areas
of the network as shown in other sections of the Summary Report.

13
2.6.6 Discovery will not be done for the following nodes because they are
located within a remote campus
The Summary Report can group switch objects into many categories, one of which is, “Discovery
will not be done for the following nodes because they are located within a remote campus.” See
the following as an example:

----------------------------------------------------------------
Discovery will not be done for the following nodes because
they are located within a remote campus:
----------------------------------------------------------------
172.10.70.110/172.10.70.110
172.10.70.111/172.10.70.111

Switches in this section have not been discovered for topology connections nor have they been
configured for port status monitoring.

If the user desires to have connected topology information for the switches in this section, then
the user must verify that the managed network is fully connected back to the NetView server on
the IP Internet map. All routers in this path must be SNMP enabled and the server must be
configured with their community strings. Please be sure to check for the discovery of the default
router for the server itself. Often it is just a matter of providing a community string in the
communityNames.conf file or the SNMP Configuration dialog for the unmanaged router node
and then restarting netmon with a subsequent demand poll of the router.

The goal is to eliminate islands on the IP Internet submap. IBM Tivoli Switch Analyzer will only
manage devices for connectivity in the contiguous network. This is a network where the NetView
server manages all devices for which there is a layer 3 connection path on the IP Internet
submap. In addition, also be sure that there is SNMP access to each switch device between the
server and the switch devices listed in this section of the Summary Report. As noted with the
routers above, it is often just a matter of supplying the server with a community string for each
switch device.

If the IP Internet map is fully connected, the user should check the file
/usr/OV/ITLS2/log/l2_topo_adapter.log for errors against each of the switches in this
section of the Summary Report. Typically, campus situations are caused by SNMP access
problems or missing MIB information for devices between the server and the switches listed in
this section of the Summary Report. Resolve as many of the SNMP access errors as possible.

The switches listed in this section of the Summary Report will be automatically re-polled until
either the errors are resolved or until the retry count is exceeded. By default the number of retries
is 3. This is set by the [Layer2] property retry_cnt in configuration file
/usr/OV/ITSL2/conf/l2_topo_adapter.ini. Afterwards, the user can manually trigger a
rediscover for each switch. Please refer to section 2.3, “How do I rediscover my switch after an
update?”

See sub-sections 2.6.9 and 2.6.10 on how to enable remote campus management to help further
eliminate campus situations.

2.6.7 The following (layer 2) nodes are located within a remote campus and are
being monitored for status only
The Summary Report can group switch objects into many categories, one of which is, “The
following (layer 2) nodes are located within a remote campus and are being monitored for status
only.” See the following as an example:

14
--------------------------------------------------------------------
The following (layer 2) nodes are located within a remote campus
and are being monitored for status only:
--------------------------------------------------------------------
172.10.70.110/172.10.70.110
172.10.70.111/172.10.70.111

Switches in this section have been identified as being in a remote campus and are discovered for
port status monitoring only. Connection information will not be available in the Discovery Report
or the Web Console layer 2 views. Depending on the outcome of the discovery, switches in this
section may appear in other sections of the Summary Report.

See sub-sections 2.6.9 and 2.6.10 on how to enable remote campus management to help
eliminate campus situations.

2.6.8 The following (layer 3) nodes are being monitored for status only
The Summary Report can group switch objects into many categories, one of which is, “The
following (layer 3) nodes are being monitored for status only.” See the following as an example:

--------------------------------------------------------------------
The following (layer 3) nodes are being monitored for status only:
--------------------------------------------------------------------
172.10.50.101/172.10.50.101
172.10.50.102/172.10.50.102

Switches in this section have been discovered for port status monitoring only, and connection
information will not be available in the Discovery Report or the Web Console layer 2 views.
Depending on the outcome of the discovery, switches in this section may appear in other sections
of the Summary Report.

2.6.9 NetView Custom Links

Resolving the remote campus (island) issues will result in more areas of the network that can be
managed by IBM Tivoli Switch Analyzer for connectivity discovery. There are times a network
administrator will want the NetView server to manage remote networks or campuses. In this
case, IBM Tivoli Switch Analyzer will not have the required connected layer 2 topology, as
demonstrated in the following diagram.

Managing Remote Campuses

RT A RT B
Managed Unmanaged Managed
Network Network Network
10.10.10.1 30.30.30.1

When it is not possible to completely manage an end-to-end IP network, you should use the
custom links provided with NetView 7.1.4 Fix Pack 2 interim fix 143453. This fix was required in
order to install version IBM Tivoli Switch Analyzer 1.3 on NetView 7.1.4 Fix Pack 2. Future
NetView fix packs will have this fix included and, therefore, not require a separate install. Custom

15
Links is a NetView construct that provides a connection path for IBM Tivoli Switch Analyzer to use
for correlation between remote islands for the purposes of layer 2 discovery and root cause
management. This mechanism is available on all supported NetView platforms.

The IBM Tivoli Switch Analyzer 1.3 Release Notes provides construct custom links. Further
instructions for installing and configuring this feature are included in this section.

Managing Remote Campuses


with Custom Link

RT A RT B
Managed Unmanaged Managed
Network Network Network
10.10.10.1 30.30.30.1

In the example instructions below (and depicted above), the user is adding a custom link to the
NetView database between two managed routers, where these managed routers are in two
different managed networks. Note that these custom links can only be added between router
devices.

Here are the configuration instructions for this example:

1. Add the following entries to your seed file:

@link 10.10.10.1:0:30.30.30.1:0

2. netmon –y (or restart netmon)

3. Demand poll routers RT A and RT B. This is not necessary if you clear your NetView database
and rediscover.

4. You should see dot-dashed lines between the two routers in your NetView maps.

Afterwards, IBM Tivoli Switch Analyzer will be able to provide connection information (in the
Discovery Report and the Physical Views of the Web Console) for the managed switches in the
remote campus due to the artificial links created. Please note that it is sometimes necessary to
restart the itsl2 daemon in order to completely discover the remote campus devices after
configuring these artificial links.

2.6.10 IBM Tivoli Remote Campus Installation Service for Switch Analyzer (TWL)
Resolving the remote campus (island) issues will result in more areas of the network that can be
managed by IBM Tivoli Switch Analyzer for connectivity. There are times a network administrator
will want the NetView server to manage remote networks or campuses.

It is suggested that you use the custom links documented in the previous section to eliminate
undiscovered island networks. However, a custom link can only be inserted into NetView
between 2 managed routers. If the NetView server (IBM Tivoli Switch Analyzer server) is in a
part of the network that is itself not being managed by NetView, or when there are no routers in
the local part of the network, then there is no router in the local network that can be artificially
connected to a router in the remote network. This is shown in the diagram below. In this case,
IBM Tivoli Switch Analyzer cannot have the required connected layer 2 topology via custom links.

16
Managing Remote Campuses

Switch A
Managed Analyzer Unmanaged Managed
Network Network Network
w/o
routers 33.33.2.1

When it is not possible to completely manage an end-to-end IP network, and when using custom
links is not an option, ask IBM Software Support about the IBM Tivoli Remote Campus Installation
Service for IBM Tivoli Switch Analyzer. This is a services utility that provides a connection path
from the Switch Analyzer server to a device in a remote campus, which allows for correlation
between remote islands for the purposes of layer 2 discovery and root cause management. At
this time, the services tool is not available for Windows platforms.

Afterwards, IBM Tivoli Switch Analyzer will be able to provide connection information (in the
Discovery Report and the Physical Views of the Web Console) for the managed switches in the
remote campus due to the artificial links created. Please note that it is sometimes necessary to
restart the itsl2 daemon in order to completely discover the remote campus devices after
configuring these artificial links. Instructions

Instructions are included with the tool. Additional tips on installing and configuring this service are
provided in this section. In the example instructions below, the user is adding a link to the layer 2
topology that connects the edge router in the remote managed campus network, router A, back to
the IBM Tivoli Switch Analyzer server in the locally managed network. This edge router has IP
address 33.33.2.1.

Managing Remote Campuses


with TWL Link

Switch A
Managed Analyzer Unmanaged Managed
Network Network Network
w/o
routers 33.33.2.1

The previous name for this product was Topology WAN LAN (TWL), and the file name has a
format like twlbuild-1.3.0.6.tar. Each of these references will be seen in following
example.

17
1. Install Switch Analyzer -> leave itsl2 daemon running
2. Untar TWL file in temporary directory:
tar -xvf twlbuild-1.3.0.6.tar
3. cd twlinstall
4. Install TWL: ./twlinst.sh
-> System Check OK
-> Installation complete
5. cd /usr/OV/ITSL2/bin
6. Run TWL: ./twl.sh
7. Setup link: select option 1
-> Please enter REMOTE END device name
(match exactly as in topology)
ex: 33.33.2.1
-> Please enter a Description for entry
ex: edge router
-> sent event Interface 0.0.0.0 Added
-> Done.
-> Press Enter to Continue -> Enter
-> x (Exit)

2.7 How do I solve problems found in the Discovery Report?


2.7.1 Nodes placed on the wrong switch
The user may find that discovery is still showing topology errors in the Discovery Report (and
layer 2 topology views in the Web Console) although all known errors have been resolved in the
Summary Report and the log files. In particular, the Discovery Report may, in rare situations,
show Cisco devices attached to other Cisco devices even though these devices are not
connected to one another in the actual enterprise network. In this case, the user may be
experiencing a problem with Cisco Discovery Protocol (CDP) usage in heterogeneous topologies.
Please refer to the following diagram.

Discovery Report and CDP


Switch
Analyzer A B C
1 3

Cisco Non-Cisco Cisco

The CDP cache is read in order to assist with layer 2 discovery when accurate information cannot
be obtained via other MIBs. The problem in this case is that the CDP cache on Switch A port 1
will show Switch C port 3 directly attached to it, and vice-versa. This will lead to incorrect
topology discoveries in the Discovery Report.

The problem can be resolved by making sure there is MIB access to all switches. This problem
can also be resolved by disabling CDP on the ports of Cisco switches or Cisco routers that are
directly connected to non-Cisco switches. In the example above, Cisco Switch A port 1 and
Cisco Switch C port 3 would need to have CDP disabled on a per port basis to prevent the
discovery problem from occurring.

2.7.2 An asterisk (*) shows up after a node


In some cases, the lack of information acquired from the network can prevent a discovered path
from a node (switch, router, or end node) to the upstream router for that node. When this occurs,

18
default logic is applied to re-connect the affected node to the same port on the switch that the
router is also connected to (the default router for the affected node). IBM Tivoli Switch Analyzer
will assume this default layer 3 connection for topology and correlation purposes. This default
logic can also be applied when the MAC address of an interface is not known when initially added
to the layer 2 topology.

The Discovery Report will indicate nodes that have been connected via this default logic with an
asterisk (*) at the end of the node entry. These entries are difficult to remove and sometimes can
only be resolved by restarting the itsl2 daemon. In fact, scenarios in which a user should restart
the itsl2 daemon include:

When there has been plenty of new node discovery such as managing networks that
were previously unmanaged, adding custom links or TWL links, or when the NetView
server databases were cleared and rediscovery has taken place. Any of these scenarios
can result in nodes in the Discovery Report with an asterisk following them or result in
missing nodes from the report altogether.
When the server does not know community string information for devices, and could also
result in having nodes in the Discovery Report with an asterisk.
When new router interfaces have been added to the database, and these interfaces show
up with an asterisk next to them in the Discovery Report.

The above scenarios indicate when the user should restart the itsl2 daemon. However, there are
steps the user must take prior to restarting the daemon, or restarting the daemon would not result
in an improved Discovery Report with fewer asterisk (*) default location indicators next to nodes.
These steps are listed here:

1. The user should make sure that the server has SNMP access to all switch devices that
the user wants to manage. Ideally, all switches within the IP managed network should be
managed by the server. It is often just a matter of supplying the server with the
community string of each switch device. The device located by default on a particular
switch (indicated by an asterisk) could actually be connected to another switch in the real
network. That other switch must be discovered as well.
2. The user should make sure that the NetView server knows the SNMP community strings
for each device with an asterisk in the Discovery Report, whether it is a router, switch, or
end node. If some community strings are missing, add them and demand poll each
appropriate device via the server console GUI. IBM Tivoli Switch Analyzer needs the
MAC address of the end nodes for accurate placement of the end nodes in the topology.
However, the NetView server can learn the MAC addresses for end nodes even if SNMP
queries fail for some reason, so it is not absolutely required that end nodes (non-routers
and non-switches) support SNMP.
3. If new router interfaces have been added to the network and they appear in the
Discovery Report with an asterisk next to them, demand poll the router through the server
console GUI.

Once the user has resolved as many of these items as possible, the itsl2 daemon should be
stopped and restarted.

2.7.3 Router-on-a-stick subinterfaces are not displayed


IBM Tivoli Switch Analyzer does not associate a router with the switch at layer 2. The switch
contains the layer 2 path, and the router introduces a layer 3 redundant path when a router-on-a-
stick topology is in use. The end result is that not all corresponding router subinterfaces are
depicted in the Discovery Report. However, note that root cause correlation still works.

19
Trunk Discovery

R3620C
CS3550B 33.33.1.5
33.33.1.11 33.33.5.5
33.33.7.5
Trunk
Fa0/15 Fa1/0

Let’s look at an example to help illustrate the problem further. As seen in the diagram above,
switch CS3550B and router R3620C form a router-on-a-stick. The router is using interface Fa1/0
for the stick, connected to port Fa0/15 of the switch. The router is also using subinterfaces
Fa1/0.1, Fa1/0.5, and Fa1/0.7 with IP addresses 33.33.1.5, 33.33.5.5, and 33.33.7.5 respectively.
See the following output from a sample Discovery Report:

cs3550b.tivlab.itsa.com/33.33.1.11
[FastEthernet0/1/1/0.0.0.0] ===>
[cs1924a.tivlab.itsa.com/33.33.1.16 - A /26/0.0.0.0]

[FastEthernet0/3/3/0.0.0.0] ===>
[cs1912b.tivlab.itsa.com/33.33.1.18 - A /26/0.0.0.0]

[FastEthernet0/15/15/0.0.0.0] ===>
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.1/9/33.33.1.5]

From the output above, port 15 of CS3550B shows only a connection to Fa1/0.1 of router
R3620C, which happens to be the same subnet that the switch IP address is in for the
management interface of the switch (33.33.1.0/24). To help further illustrate the problem in the
Discovery Report, see the sample hand-generated output below that shows the missing
subinterfaces in the report for router R3620C.

[FastEthernet0/15/15/0.0.0.0] ===>
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.1/9/33.33.1.5]
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.5/9/33.33.5.5]
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.7/9/33.33.7.5]

The above output shows the desired discovery for this scenario. The current release of IBM
Tivoli Switch Analyzer does not provide this level of detail in the Discovery Report. However, not
displaying the router subinterfaces of a router trunk link does not interfere with the ability of the
root cause engine to correlate properly.

20
3 Root Cause / Impact Analysis Troubleshooting
The IBM Tivoli Switch Analyzer will generate an event for any outage in the managed, connected
network region detected by the NetView server. If there is a topology inaccuracy in that area, it is
likely to manifest in one of two ways.

1. Root cause events will show up for a device that is downstream (from the point of view of
the server) from the real problem.
2. There may be multiple root cause events, usually identifying some unreachable end
nodes.

To minimize the possibility of the two scenarios above from occurring, review and solve the
issues mentioned in chapter 2, “

21
Discovery Troubleshooting.”

IBM Tivoli Switch Analyzer will also generate events for switches in a remote campus when the
Port Status Monitor is active. Please refer to subsection “Port status monitoring” in Chapter 4 of
the IBM Tivoli Switch Analyzer Administrator’s Guide for introductory information. For
configuration information, see “Configuring the port status monitor” in Chapter 7 instead.

3.1 How does Switch Analyzer work with NetView?


Please refer to “Chapter 3. Layer 2 discovery” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for information on the working relationship between NetView and IBM Tivoli Switch
Analyzer.

3.1.1 What does the “[B]” symbolize in my root cause event?


The user may see an event with [B] appended to it in the NetView event browser. For example,
the browser may display the following event: Node Down [B]. An explanation for this appended
item is provided here with a description of the bounce algorithm.

The bounce algorithm works as follows. The interface_bounce_count and


interface_bounce_interval in configuration file Correlator.ini are used to identify
when a network device is "bouncing" and does not stay down, where bounce means that a down
event is followed by an up event. Let’s say the interface_bounce_count is set to 2 and the
interface_bounce_interval is set to 900 seconds (15 minutes), where the default settings
for these fields are 5 and 3600 respectively.

Correlator.ini file
[Correlation]
interface_bounce_count=2
interface_bounce_interval=900

In the following example, EndNode01 is attached to Switch01. If a network device such as


EndNode01 bounces two times within 15 minutes, the device will be treated as down. A device is
considered to have bounced twice when it goes down for a third time within the configured
interval.

EndNode01 N Node Down


Switch01 V Interface Down
Switch01 V Interface Up
EndNode01 N Node Up
EndNode01 N Node Down
Switch01 V Interface Down
Switch01 V Interface Up
EndNode01 N Node Up
EndNode01 V Node Down [B]

In this example, IBM Tivoli Switch Analyzer issued the earlier two root cause events as vendor
interface downs on the switch port and not as vendor node downs on the end node.
Nevertheless, the NetView server issued two node down events for the corresponding end node.
However, the bouncing algorithm is only tracked for NetView events, not IBM Tivoli Switch
Analyzer events. So the user will not see bouncing events for interfaces of a switch. As a result,
the bounce indicator [B] is applied to the device attached to the switch port and not applied to
the switch port itself.

22
Once a device is considered down via the bouncing algorithm, it will not be cleared, or declared to
have a status of up, until an up event is received for the device and the time indicated by
interface_bounce_interval has expired without any additional down events.

3.1.2 What does the “[D]” symbolize in my root cause event?


The user may see an event with [D] appended to it in the NetView event browser. For example,
the browser may display the following event: Node Down [D]. An explanation for this appended
item is provided here.

During the discovery process, it may not be possible to determine all of the layer 2 connections
between switches and routers. As a result, there may not always be a discovered path to every
switch in the network. In order to connect the switch to the currently discovered topology when
the path is not known, the management interface of the switch is assigned to a default location in
the layer 2 topology. For additional information, refer to section 2.7.2, “An asterisk (*) shows up
after a node”.

The [D] in the event is used to denote that this correlated event has occurred on a switch that
was, in fact, placed in a default location according to its management interface during the
discovery process. It is important to note that these particular root cause events, indicated with
[D], may or may not be the true root cause of a network outage due to the default placement of
this switch in the internal database.

The switch referenced in an event such as this will have a corresponding entry in the Discovery
Report with a [~] indicator. For example, the Discovery Report entry for switch SS3300B that
was placed in the layer 2 topology via default logic would look as follows:

ss3300b.tivlab.itsa.com/33.50.1.11 [~]

The same switch will appear in the Summary Report as follows:

Discovery has been completed for the following nodes, but


node was unreachable via layer 2 segments:
---------------------------------------------------------------
ss3300b.tivlab.itsa.com/33.50.1.11

3.2 Why did I get several root cause events at once?


3.2.1 Unmanaged Devices
IBM Tivoli Switch Analyzer may generate many events if the user has unmanaged a device from
the NetView console. The algorithm for unmanaged devices is for IBM Tivoli Switch Analyzer to
also unmanage all downstream devices within the layer 2 topology database. As a result, any
downstream device with an open root cause event will have the event closed via unmanaged or
resolved events, which appear in the event browser. These events are expected.

3.2.2 Discovery Poll


IBM Tivoli Switch Analyzer may generate many events during the recurring discovery poll, which
is defaulted to run every 24 hours. The discovery poll forces a rediscovery for each switch and, in
turn, may cause several events to appear including node resolves, switch interface deletes, and
new root cause events.

The user can look in the l2_topo_adapter.log for entries around the same date and time
when the events were seen in the NetView event browser. If at the same time there are

23
discovery requests for each device in the network, then the recurring discovery poll most likely
took place. Therefore, the series of events seen in the event browser is expected behavior.

The user can modify the setting for the discovery interval value via the discovery_interval
field (measured in minutes) in file /usr/OV/ITSL2/conf/l2_topo_adapter.ini.

3.3 Why is the Layer2Status field empty for some switches?


Layer 2 status is calculated based on the detection of outages in the network. If a network
outage occurs at a switch, the Layer2Status field of the switch is updated and will be monitored
until the problem is resolved, when at that time the Layer2Status field is updated as appropriate.
Prior to the original problem for a device, the Layer2Status field is unset and will not appear in
object properties. So having an empty Layer2Status fields is expected for devices that have not
had a root cause assigned to them yet.

3.4 My Impact Analysis Report does not show anything


The Impact Analysis Report for any particular router or switch device shows a list of nodes
downstream that would be affected if the selected device experienced an outage. The impact
analysis for a device can be obtained via the NetView console by highlighting a device and
selecting menu options Monitor t Layer 2 t Impact Analysis. The following is the command line
equivalent:

ITSL2_reports -r whatif -o <object ID>

The option –o indicates the object ID of the device. Sample output from this command is shown
below:

--------------------------------------------------
Impact Analysis Report
--------------------------------------------------
cs1912g.tivlab.itsa.com
cs1912h.tivlab.itsa.com
cs1924d.tivlab.itsa.com
saen26.tivlab.itsa.com
--------------------------------------------------

There are times when the Impact Analysis Report does not show any impacted nodes
downstream. In this case, layer 2 network redundancy combined with the Cisco Discovery
Protocol (CDP) plays a role in this unusual, but expected, behavior. See the following diagram.

24
Layer 2 Redundant Paths
and Impact Analysis

A Core B

Switch
Analyzer E C D

Access

Let’s say that IBM Tivoli Switch Analyzer is located on an access switch in the above fully
connected network mesh. If the network consists of Cisco switches, then only the access
switches C, D, and E will have a populated Impact Analysis Report for the end nodes hanging off
of them. However, the report for switch C will not show switches A, B, or D as impacted due to
the layer 2 redundancy. That’s because IBM Tivoli Switch Analyzer discovers the devices
attached to the blocked ports using the CDP cache in Cisco devices, causing a redundant path in
the layer 2 topology database. And when there is redundancy, there is no impact.

In addition, an outage occurring with switches A and B will not have an impact at all in relation to
other devices due to the redundancy for the same reason. Therefore, the Impact Analysis
Reports for these 2 switches will be completely blank. In contrast, the Impact Analysis Report for
switch E will have all downstream devices in it, including switches A, B, C, and D, as well as end
nodes hanging off of these switches.

3.5 It takes a long time for root cause events to show up


When testing event correlation in the connected topology, some users may find that it takes a
while to return the correlated event, especially if sitting at the monitor waiting for it to happen.
That's because the root cause event is returned five minutes after an interface down event has
been issued by the NetView server.

When IBM Tivoli Switch Analyzer detects the interface down event, it begins processing and
looking for the root cause of the problem. The default timer value to return a root cause is five
minutes. This field is configurable in the correlator.ini file, in section Correlation, with
field interface_timeout, which is defaulted to 300 seconds. Lower this value when it is
desired to have the root cause returned sooner. When this value is changed, the user must
restart the itsl2 daemon to activate the changes.

There may be a negative impact when making this value too small, in that the root cause event
may be issued before the actual root cause can be fully determined. For instance, reducing this
value down to ten seconds would not be wise.

25
3.6 Why is the root cause sometimes the switch port and
sometimes the end node?
IBM Tivoli Switch Analyzer begins its root cause analysis when the NetView server issues an
interface down event. In the link down example shown in the diagram below, the link between the
switch and the end node went down. Therefore, the end node is down from the server’s point of
view. In one case, a root cause event will be issued for the switch port (interface down), causing
the switch to become marginal and turn yellow in the map. In the other case, the root cause
event will be issued for the end node, with no change to the switch icon color in the server map.

Location of Root Cause: Race Condition

Switch Access
Analyzer Switch End Node

In this scenario, the link has gone down between the access switch and the end node. And, as
mentioned above, the root cause event could be issued for the switch port or for the end node
because of a timing issue due to a race condition. After the link goes down, the switch device
may take several seconds, and even up to twenty seconds, to resynchronize itself. Afterwards,
the switch port is updated in the switch’s MIB tables to reflect the down condition.

If IBM Tivoli Switch Analyzer queries the switch within the first few seconds of this
resynchronization, it will not see an error for the switch port, and may conclude that the end node
is at fault. If the switch is queried after resynchronization, the switch port will be detected as
down and an interface down root cause event is issued for the switch port. Either of these results
is reasonable, given that the real network failure lies in the link between the two devices.

There are two other scenarios worth mentioning when it comes to the location of the root cause.
Please refer to the following two diagrams that illustrate these scenarios.

Location of Root Cause: End Node

Switch Access
Hub End Node
Analyzer Switch

In the diagram above, a hub was inserted between the access switch and the end node. Note
that hub devices are not managed devices. In this case, the link went down between the hub and
the end node, causing a NetView node down event to be issued for the end node. In contrast to
the race condition mentioned earlier, the MIB information for the switch port will not be updated at
all by the switch device since its connection is still active with the hub. Therefore, the only root
cause event that can be generated is the one for the end node in the form of a node down. This
is also true if the hub is replaced with an unmanaged switch, that is, a switch where the
community string is not configured within the NetView server.

26
Next let’s look at a link that goes down between the access switch and the hub. In this case, the
user will experience the same race condition mentioned above. As stated earlier, if the switch is
queried within the first few seconds of resynchronization, the conclusion will be that the end node
is at fault since the switch MIB tables are not updated yet. If the switch is queried after
resynchronization is complete, the switch port will be detected as down and an interface down
root cause event is issued for the switch port.

Location of Root Cause: Race Condition #2

Switch Access
Hub End Node
Analyzer Switch

27
4 Web Console Troubleshooting
For configuration information regarding the IBM Tivoli Switch Analyzer Web servlet, please refer
to, “Configuring the Web servlet”, of the IBM Tivoli Switch Analyzer Administrator’s Guide (in
Chapter 7). Please also see, “Web servlet and console debugging”, for information on how to
further debug possible Web Console issues.

4.1 Web Console Startup Problems

4.1.1 Applet com.tivoli.netview.client.NetViewApplet notinited


If you have the wrong JRE version, then you may see the following message in the bottom
message area of the web browser when using the NetView Web Console applet.

Applet com.tivoli.netview.client.NetViewApplet notinited

The Sun Java Console could show the following as an example:

Java(TM) Plug-in: Version 1.4.1_02


Using JRE version 1.4.1_02 Java HotSpot(TM) Client VM

This version of the JRE is not supported by the Web Console when using IBM Tivoli Switch
Analyzer version 1.3. If you have a version other than 1.3.1 of the JRE, please install version
1.3.1 instead.

4.1.2 Cannot start the NetView Web Console


In some circumstances, the NetView Web Console may not start successfully. Please check the
following field in your NetView configuration:

File: /usr/OV/conf/NetViewLargeDb.conf
Field: NETVIEW_LARGE_DB=FALSE

You will need to set this value to TRUE if you have a large database. Please consult IBM
Software Support regarding when this field should be set to TRUE.

Note: This is not an IBM Tivoli Switch Analyzer specific issue. However, our customers have run
into this problem, so we are documenting it here.

4.1.3 Cannot connect to the NetView Web Console


Customers may find that workstations that used to work with the NetView Web Console applet no
longer work after installing IBM Tivoli Switch Analyzer 1.3. Workstations using JRE 1.3 worked
with the Web Console before. However, after installing IBM Tivoli Switch Analyzer 1.3,
customers need to update their workstations to use JRE 1.3.1 so that their Web Consoles
connect to the server without problems. Also note that using JRE 1.4.x will not work either. JRE
1.3.1 must be used.

28
4.1.4 Cannot launch submap explorer from Tivoli Event Console
There are a few ways to launch the NetView Web Console from an event with Tivoli Event
Console. One way is to highlight a NetView generated event and launch the Web Console
submap explorer. However, there is a known problem where only the Web Console comes up
without the submap explorer. The problem can be resolved as follows.

Go to the \usr\ov\www\webapps\netview\scripts directory and edit the file


submapexplorer.js. On the fourth line from the bottom (framefactory.addFrame), you
should remove the fourth argument ", subexp", save it, and restart the Web Console. Then
you will be able to launch the Web Console submap explorer from the Tivoli Event Console event.

4.2 Layer 2 menu missing


Some users may notice that the Layer 2 menu is missing from the NetView Web Console
dropdown menus. The Layer 2 menu introduced in IBM Tivoli Switch Analyzer version 1.3 can be
found in the following directory:

/usr/OV/www/webapps/netview/warf/Templates/WebConsole

You should find a file called ITSL2Menu.xml, where this XML file defines all the menu entries. If
this file does not exist, then there was a problem encountered during the install, and you should
check the install logs (/usr/OV/tmp/itsl2 directory). If the file ITSL2Menu.xml does
exist, you should save your Web Console security settings again (even if you have made no
changes) and restart the webserver daemon. Then launch the Web Console, and the Layer 2
menu should now exist.

4.3 Out of memory error


When launching the Physical View for a large switch with 50 or more large switches connected to
it, the NetView Web Console may report an "out of memory" error, and no view will be created.

To fix this problem, edit the nvwc.bat (nvwc.sh) file and change each of the -Xmx64m options
to be -Xmx128m or more. Make sure to change all the occurrences of this option (there should
be two occurrences). Then re-launch the Web Console.

4.4 Physical View does not show attached devices


When the Physical View is selected for a device via the Web Console, a typical diagram may look
like the following:

29
IBM Tivoli Switch Analyzer is designed to display attached devices to the switch ports in Physical
Views. In this example, a Physical View was selected for a switch device, and the devices
attached to the switch are depicted in the diagram. The results of this view are generated from
the layer 2 discovery information.

However, some users may see the following instead:

When the Physical View does not display attached devices to the switch ports, the switch device
is most likely in a remote campus. If you feel that your switch chosen for the Physical View
should not be in a remote campus, please check the Summary Report and make sure your switch
device is not placed in a remote campus by Switch Analyzer.

Please see sections 2.6.9 and 2.6.10 to minimize remote campuses in your network.

4.5 My Point-to-Point View Times Out


Sometimes a user may run into a situation where a timeout occurs when launching the Point-to-
Point View via the Web Console. In this case, the user will need to increase the
ClientRequestTimeout value used, which defaults to 120 seconds.

To adjust the ClientRequestTimeout value, perform the following steps:

1. Open the web.xml file.

2. Look for the following block of lines and modify the number of seconds:

<init-param>
<param-name>ClientRequestTimeout</param-name>
<param-value>120</param-value>
</init-param>

3. Change the 120 value to some other higher value.

30
4. Restart the webserver daemon.

For additional information about the Web servlet, refer to the “Configuring the Web servlet”
section of the IBM Tivoli Switch Analyzer Administrator’s Guide (in Chapter 7).

4.6 There are no switches in my Point-to-Point View


There are two reasons a Point-to-Point View may contain no switches: remote campuses and
router-on-a-stick network topology configurations.

4.6.1 Remote Campus


A user may select two end nodes for a Point-to-Point View and get the following results:

EN 01 RT 01 RT 02 RT 03 EN 02

There are no switches in this output, although you were expecting to see a few. When this
happens, your end nodes chosen for the Point-to-Point View are most likely attached to switches
that are in a remote campus as follows:

Switch
Analyzer Unmanaged
Network

EN EN
01 02

This is the expected behavior when the end nodes are in the remote campus portion of the
managed network. If you feel that your end nodes chosen for the Point-to-Point View, along with
your switch devices, should not be in a remote campus, please check the Summary Report and
make sure your switch devices are not placed in a remote campus by Switch Analyzer.

Please see sections 2.6.9 and 2.6.10 to minimize remote campuses in your network.

4.6.2 Router-on-a-Stick
As noted in section 2.7.3, “Router-on-a-stick subinterfaces are not displayed”, the switch contains
the layer 2 path information, and the router introduces a layer 3 redundant path when a router-on-
a-stick topology is in use. The end result, as noted earlier, is that not all corresponding router
subinterfaces are depicted in the Discovery Report.

31
This lack of discovered information causes problems for the Point-to-Point View. For example, a
user may select two end nodes for a Point-to-Point View and get the following results:

EN 01 RT 02 EN 02

There are no switches in this output, although you were expecting to see a few. When this
happens, your end nodes chosen for the Point-to-Point View may be attached to switches in a
router-on-a-stick network topology as follows:

RT 02

Cisco Trunk with


Both Subnets #1 and #2

Subnet #1 Subnet #2

EN EN
01 02

In this topology, the two access switches and end nodes use router RT 02 as their gateway, via a
router-on-a-stick. Point-to-Point Views do not work well when the two end nodes selected for this
view pass through a router via a router-on-a-stick implementation as shown in the diagram above.
As a result, the switch objects downstream of the router-on-a-stick implementation do not appear
in the Point-to-Point View.

4.7 There is an X symbol in my Point-to-Point View


A user may select two end nodes for a Point-to-Point View and get the following results as shown
in this diagram:

EN 01 SW 01 RT 01 SW 02 RT 02 EN 02

X
There are a few reasons why this X may appear in your output.

Point-to-Point Views do not work over artificially created links. These links can be created with
IBM Tivoli Remote Campus Installation Service for IBM Tivoli Switch Analyzer (Topology WAN
LAN, or TWL) links or NetView custom links. Refer to sections 2.6.9 and 2.6.10 for information
on these links.

32
Customers are encouraged to create these artificial links in order to improve layer 2 discovery.
However, if the 2 selected points in the Point-to-Point View span one of these artificial links, then
the Point-to-Point diagram will show an X object at the point where the artificial link connects the
topology, and then connects the second point to the X object in the view.

If you are, in fact, not using artificial links and feel that neither of your end nodes chosen for the
Point-to-Point View should be in a remote campus, please check the Summary Report and make
sure your switch devices along the path are not placed in a remote campus by IBM Tivoli Switch
Analyzer.

Please see sections 2.6.9 and 2.6.10 to minimize remote campuses in your network.

33
5 Port Status Monitor Troubleshooting
In addition to finding the root cause events in the connected topology, IBM Tivoli Switch Analyzer
version 1.3 will generate events for switches in a remote campus. This is accomplished via the
Port Status Monitor function, which is on by default. Please refer to subsection “Port status
monitoring” in Chapter 4 of the IBM Tivoli Switch Analyzer Administrator’s Guide for introductory
information. For configuration information, see “Configuring the port status monitor” in Chapter 7
instead.

5.1 I see two root cause events


If a network is fully interconnected at layer 2, then IBM Tivoli Switch Analyzer versions 1.2.1 and
earlier will not provide a root cause correlated event until all paths are down between the NetView
server and some managed device in the network. This is no longer the case when using Port
Status Monitoring introduced in IBM Tivoli Switch Analyzer version 1.3. See the following
diagram as an example.

Layer 2 Redundant Paths


and Root Cause

A Core B

Switch
Analyzer C D

Access

Let’s say that IBM Tivoli Switch Analyzer version 1.2.1 is located on an access switch in the
above fully connected mesh network for redundancy. In this example, the red dashed links are
blocking. The blocking connections are between Switch A and Switch B, between Switch B and
Switch D, and between Switch A and Switch D.

When the forwarding link between Switch C and Switch D goes down, the link between Switch A
and Switch D becomes active, for example. An interface down event is not issued by NetView
since all nodes are accessible from the server during the NetView status poll. The server would
have never known that a problem occurred.

34
Layer 2 Redundant Paths
with Link Down

A Core B

Switch
Analyzer C D

X
Access

In this example, the customer would not be informed immediately about the outage between
Switch C and Switch D. All paths need to be down between the server and a managed device
before a root cause event would be generated.

However, with the Port Status Monitor (PSM) function introduced in version 1.3, the outage
between Switch C and Switch D will be found during the PSM status poll. So instead of receiving
no events for the outage as in previous releases, PSM will return 2 events for this single outage.
The reason is that PSM will SNMP status poll both switch C and switch D since it has IP access
to both switches. Each switch will indicate that a port has gone down, resulting in 2 (V) events in
the NetView event browser.

5.2 I see unexpected port status


The Status Report will list switch nodes, sorted by name, that are currently managed by the port
status monitor function of IBM Tivoli Switch Analyzer. The Status Report can be used to identify
the current port status for all switch devices discovered. The layer 2 topology views in the
NetView Web Console also display the status of switch ports.

When viewing the Status Report or the layer 2 topology views, some users may experience what
seems to be unexpected port status behavior, which are documented in this section.

5.2.1 My switch ports are Interface Up/Correlated Down (impact)


One common unexpected port status, as shown in the Status Report, are ports that have status
Interface Up/Correlated Down (impact). When this happens, there is an outage for
an upstream port and is propagated to the downstream ports. Referring to the diagram below,
port 23 of switch 2950k has received an interface down (V) event.

Impacted Downstream Ports


Network Outage
Switch
Analyzer 2950k 2950j 1924e
23
X

35
The Status Report for the switches in the above diagram will contain the following information.

2950k.itec.lab.com/172.30.120.10/Reachable/Correlated Marginal (root cause)


[17/FastEthernet0/17/Interface Up/Correlated Up]
[18/FastEthernet0/18/Interface Up/Correlated Up]
[23/FastEthernet0/23/Interface Down/Correlated Down (root cause)]
[24/FastEthernet0/24/Interface Up/Correlated Up]

2950j.itec.lab.com/172.30.120.11/Unreachable/Correlated Down (not a root cause)


[2/FastEthernet0/2/Interface Up/Correlated Down (impact)]
[4/FastEthernet0/4/Interface Up/Correlated Down (impact)]
[5/FastEthernet0/5/Interface Up/Correlated Down (impact)]
[6/FastEthernet0/6/Interface Up/Correlated Down (impact)]

1924e.itec.lab.com/172.30.120.18/Unreachable/Correlated Down (not a root cause)


[17/17/Interface Up/Correlated Down (impact)]
[18/18/Interface Up/Correlated Down (impact)]
[19/19/Interface Up/Correlated Down (impact)]

Notice that all of the ports downstream of switch 2950k port 23 have the status Interface
Up/Correlated Down (impact), which is the expected behavior. This is because IBM
Tivoli Switch Analyzer marks all downstream ports as down, driven by its algorithm regarding
impact analysis. The code regards ports downstream of an outage as also being down
(correlated down with impact) since it cannot manage these ports again until the original root
cause problem is resolved.

5.2.2 My switch ports are Interface Up/Correlated Unmanaged


One common unexpected port status, as shown in the Status Report, is that of a port that has
status Interface Up/Correlated Down (impact). In this case we have a switch that
was unmanaged by the user, where the unmanaged condition is propagated to the downstream
ports. Referring to the diagram below, switch 2950j has been unmanaged.

Impacted Downstream Ports


Unmanaged Switch
Switch
Analyzer 2950k (2950j) 1924e
23

The Status Report for the above switches will contain the following information.

2950k.itec.lab.com/172.30.120.10/Reachable/Correlated Up
[17/FastEthernet0/17/Interface Up/Correlated Up]
[18/FastEthernet0/18/Interface Up/Correlated Up]
[23/FastEthernet0/23/Interface Up/Correlated Up]
[24/FastEthernet0/24/Interface Up/Correlated Up]

36
2950j.itec.lab.com/172.30.120.11/Reachable/Correlated Up
[2/FastEthernet0/2/Interface Up/Correlated Unmanaged]
[4/FastEthernet0/4/Interface Up/Correlated Unmanaged]
[5/FastEthernet0/5/Interface Up/Correlated Unmanaged]
[6/FastEthernet0/6/Interface Up/Correlated Unmanaged]

1924e.itec.lab.com/172.30.120.18/Reachable/Correlated Up
[17/17/Interface Up/Correlated Unmanaged]
[18/18/Interface Up/Correlated Unmanaged]
[19/19/Interface Up/Correlated Unmanaged]

Notice that all of the ports downstream of switch 2950j, and including the ports of switch 2950j,
have the status Interface Up/Correlated Unmanaged, which is the expected behavior.
This is because IBM Tivoli Switch Analyzer has its own concept of unmanaged, driven by its
algorithm regarding impact analysis. The code regards ports downstream of an unmanaged port
or unmanaged switch as also being unmanaged (correlated unmanaged) since it cannot manage
these ports again until the original root cause problem is resolved.

5.3 High CPU usage for l2_event_adapter process


Some customers are monitoring hundreds or thousands of switches. In these environments, it is
normal to have multiple instances of the l2_event_adapter process. By default, each one
monitors 1000 switches. The number of switches is set via the following entry:

File: /usr/OV/ITSL2/conf/l2_event_receiver.ini

[Adapter]
req_cnt=1000

The default configuration for the Port Status Monitor (PSM) is to monitor all ports of all discovered
switch devices every 5 minutes. This is quite intense and should be scaled back for large
networks. Below are two suggested parameter changes to minimize the CPU utilization for the
l2_event_adapter, followed by a brief description of each change.

File: /usr/OV/ITSL2/conf/l2_event_adapter.ini

From:
[Polling]
poll_cycle=300

To:
[Polling]
poll_cycle=1800

In this case, the PSM polling cycle was changed from 5 minutes (300 seconds) to 30 minutes.
You can set this parameter at whatever value is best for your network. The reason for making
this change is that the PSM polling is SNMP based. This can be quite intense for some servers,
especially if the user is polling thousands of switch devices.

37
Next, you should update the ports that PSM should examine by actually decreasing the ports to
be monitored as follows:

File: /usr/OV/ITSL2/conf/correlator.ini

From:
[PortStatus]
poll_all_ports=y

To:
[PortStatus]
poll_all_ports=n

With this configuration, the PSM function only status polls switch ports that do not have
connections to them, which are devices attached to switch ports as determined by IBM Tivoli
Switch Analyzer. The Layer 2 Discovery Report, as well as the Physical View of a switch via the
Web Console, will indicate which switch ports have connections. These particular ports will be
the ports “not” actually monitored by PSM when the poll_all_ports value is set to n. For
these connected ports not monitored by PSM in this scenario, the NetView Interface Down event
will trigger the correlation engine as before and result in the root cause event being generated for
these particular ports.

In general, when poll_all_ports is set to n, PSM will only manage (poll) switch ports that do
not have connections to them, which include the following ports:

1. Switch ports in a redundant layer 2 path


2. Switch ports on switches that are in a remote campus
3. Switch ports where the connected device (downstream) to that port is unmanaged

38

You might also like