Professional Documents
Culture Documents
Troubleshooting Guide
By Michael L. Webb
Version 2.0
Copyright Notice
Copyright IBM Corporation 2005. All rights reserved. May only be used pursuant
to a Tivoli Systems Software License Agreement, an IBM Software License
Agreement, or Addendum for Tivoli Products to IBM Customer or License Agreement.
No part of this publication may be reproduced, transmitted, transcribed, stored
in a retrieval system, or translated into any computer language, in any form or
by any means, electronic, mechanical, magnetic, optical, chemical, manual, or
otherwise, without prior written permission of IBM Corporation. IBM Corporation
grants you limited permission to make hardcopy or other reproductions of any
machine-readable documentation for your own use, provided that each such
reproduction shall carry the IBM Corporation copyright notice. No other rights
under copyright are granted without prior written permission of IBM Corporation.
The document is not intended for production and is furnished “as is” without
warranty of any kind. All warranties on this document are hereby disclaimed,
including the warranties of merchantability and fitness for a particular
purpose.
U.S. Government Users Restricted Rights -- Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.
Trademarks
IBM, the IBM logo, Tivoli, the Tivoli logo, AIX, NetView, Tivoli Enterprise,
Tivoli Enterprise Console are trademarks or registered trademarks of
International Business Machines Corporation or Tivoli Systems Inc. in the United
States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
Extreme Networks is a registered trademark of Extreme Networks, Inc. Summit,
Summit5iTx, and Summit48si are trademarks of Extreme Networks, Inc.
Catalyst, Cisco, Cisco IOS, Cisco Systems are registered trademarks or
trademarks of Cisco Systems, Inc.
NORTEL NETWORKS is a trademark of Nortel Networks. BayStack is a trademark of
Nortel Networks.
3Com, the 3Com logo, and SuperStack are registered trademarks of 3Com
Corporation.
Symbol Technologies is a registered trademark of Symbol Technologies, Inc.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product, and service names may be trademarks or service marks of
others.
Notices
References in this publication to Tivoli Systems or IBM products, programs, or
services do not imply that they will be available in all countries in which
Tivoli Systems or IBM operates. Any reference to these products, programs, or
services is not intended to imply that only Tivoli Systems or IBM products,
programs, or services can be used. Subject to valid intellectual property or
other legally protectable right of Tivoli Systems or IBM, any functionally
equivalent product, program, or service can be used instead of the referenced
product, program, or service. The evaluation and verification of operation in
conjunction with other products, except those expressly designated by Tivoli
Systems or IBM, are the responsibility of the user. Tivoli Systems or IBM may
have patents or pending patent applications covering subject matter in this
document. The furnishing of this document does not give you any license to these
patents. You can send license inquiries, in writing, to the IBM Director of
Licensing, IBM Corporation, North Castle Drive, Armonk, New York 10504-1785,
U.S.A.
ii
About the Tivoli Field Guides
Sponsor
Tivoli Customer Support sponsors the Tivoli Field Guide program.
Authors
Those who write field guides belong to one of these two groups:
Tivoli Support and Services Engineers who work directly with customers
Tivoli Customers and Business Partners who have experience using Tivoli software in a
production environment
Audience
The field guides are written for all customers, both new and existing. They are applicable to
external audiences including executives, project leads, technical leads, team members, and to
internal audiences as well.
Field Guides for technical issues are designed to address specific technical scenarios or
concepts that are often complex to implement or difficult to understand, for example:
endpoint mobility, migration, and heartbeat monitoring.
Field Guides for business issues are designed to address specific business practices that
have a high impact on the success or failure of an ESM project, for example: change
management, asset management, and deployment phases.
Purposes
The Field Guide program has two major purposes:
Availability
All completed field guides are available free to registered customers and internal IBM employees
at the following Web site:
http://www.ibm.com/software/sysmgmt/products/support/Field_Guides.html
iii
Table of Contents
iv
2.6.10 IBM Tivoli Remote Campus Installation Service for Switch Analyzer (TWL)
16
2.7 HOW DO I SOLVE PROBLEMS FOUND IN THE DISCOVERY REPORT? ....................... 18
2.7.1 Nodes placed on the wrong switch................................................................ 18
2.7.2 An asterisk (*) shows up after a node........................................................... 18
2.7.3 Router-on-a-stick subinterfaces are not displayed ....................................... 19
3 ROOT CAUSE / IMPACT ANALYSIS TROUBLESHOOTING ......... 21
3.1 HOW DOES SWITCH ANALYZER WORK WITH NETVIEW? ...................................... 22
3.1.1 What does the “[B]” symbolize in my root cause event? ............................. 22
3.1.2 What does the “[D]” symbolize in my root cause event?............................. 23
3.2 WHY DID I GET SEVERAL ROOT CAUSE EVENTS AT ONCE? .................................... 23
3.2.1 Unmanaged Devices ..................................................................................... 23
3.2.2 Discovery Poll............................................................................................... 23
3.3 WHY IS THE LAYER2STATUS FIELD EMPTY FOR SOME SWITCHES?........................ 24
3.4 MY IMPACT ANALYSIS REPORT DOES NOT SHOW ANYTHING ................................ 24
3.5 IT TAKES A LONG TIME FOR ROOT CAUSE EVENTS TO SHOW UP ............................. 25
3.6 WHY IS THE ROOT CAUSE SOMETIMES THE SWITCH PORT AND SOMETIMES THE END
NODE?............................................................................................................................. 26
v
IBM Tivoli Switch Analyzer
Troubleshooting Guide
The IBM Tivoli Switch Analyzer Troubleshooting Guide v2.0 is designed to assist users in
identifying and resolving common technical problems that may occur with IBM Tivoli Switch
Analyzer v1.3. If using IBM Tivoli Switch Analyzer v1.2.1 or earlier, then please refer to the IBM
Tivoli Switch Analyzer Troubleshooting Guide v1.0.
This guide (v2.0) lists common symptoms and problem areas, and then provides a solution for the
user to implement in each case. To avoid referring to this guide too often, prior to installation of
the product the user is encouraged to read and follow the steps outlined in the IBM Tivoli Switch
Analyzer v1.3 Deployment Guide.
Problems are sometimes unavoidable with enterprise networks, and usage problems often occur
with network management products. Therefore, this troubleshooting guide should be consulted
when the user is experiencing abnormal behavior with IBM Tivoli Switch Analyzer. These
unusual characteristics are differentiated by certain symptoms, where each symptom can be
traced to one or more problems identified in the Table of Contents of this guide. The Table of
Contents should be read over thoroughly in order to identify the problem area most closely
related to the one experienced by the user. Once identified, each problem can be resolved by
implementing the solution in the appropriate section of this document.
If the user still experiences problems despite having exhausted all solutions covered in this guide,
then IBM Tivoli Support should be contacted for additional assistance. Please have current and
accurate information about the network topology and the NetView server available at all times.
Rapid and intelligent decisions can be made with complete information on hand about the
environment experiencing the problem. Providing relevant and accurate information to IBM Tivoli
Support will aid troubleshooting and allow it to be done quickly and easily.
Webb joined IBM in 1991 and has been a part of several IBM networking product areas since his
arrival. He has a general networking background that spans both development and test
organizations
vi
1 Daemon Startup Troubleshooting
Abort : cannot find node for management system host: <hostname> [<IP address>]
Ensure the server itself is discovered with SNMP. Check the symbol on the NetView map for an
inner shape, and check that the ovwdb field for this node has the field isSNMPSupported set to
TRUE. Also ensure there are no DNS discrepancies with the hostname and IP address. Each
should resolve to the other.
Abort : cannot find node for management system host: <hostname> [<IP address>]
The itsl2 daemon will use the first IP address for the server that it gets from DNS. This IP
address must match the IP address of an interface within the NetView topology database. If a
matching interface for the NetView server is not found, then the itsl2 daemon will not start.
To resolve this problem, the user should perform one of the following actions:
1. Demand poll the NetView server in order to discover the missing interface
2. Determine the IP address of one of the NetView server interfaces by entering the
command ovtopodump -r <hostname>
Then update the mgt_host field in /usr/OV/ITSL2/conf/correlator.ini as
follows:
[ManagementSystem]
mgt_host = IP address of interface
A system administrator can provide multiple entries in DNS for multi-homed servers. When the
correlator mgt_host field is blank, as it is by default, then the itsl2 daemon will use the first IP
address found in DNS. The IP address currently used by itsl2 is specified by the <IP address>
printed in the Abort log message shown earlier.
1
1.3 NetView server initial discovery
One reason for the itsl2 daemon not to start is a timing problem that sometimes occurs during
NetView’s initial discovery. As noted above in an earlier section, the itsl2 daemon will not start if
the server machine itself has not yet been discovered. If the user has performed a rediscovery of
the network (clearing the NetView databases and rediscovering), then it is possible that the itsl2
daemon is requiring additional information about the server machine itself before the NetView
discovery has completely obtained it. In this case, the user should simply wait until the bulk of the
new discovery has been completed and then restart the itsl2 daemon.
Some users may want to obtain layer 2 topology information before the NetView server has
completed its discovery of the enterprise network. In this case, the user should add the NetView
server to the top of seed file for quicker discovery of the server. In this way, the itsl2 daemon can
be started sooner since the server will be discovered earlier in the NetView discovery phase.
Please keep in mind that the entire network needs to be discovered by the NetView server before
IBM Tivoli Switch Analyzer can put together the complete layer 2 topology. Therefore, the itsl2
daemon may need to be restarted again later to pick up missing nodes. It is recommended that
the itsl2 daemon be started after the NetView discovery is complete.
Note that each time the itsl2 daemon starts, IBM Tivoli Switch Analyzer will rediscover the entire
layer 2 topology. Allow at least 15 minutes after startup for the topo_cache file to be created.
This file is generated for the reports viewable from the GUI and is refreshed every 15 minutes by
default. This setting can be modified in file /usr/OV/ITSL2/conf/correlator.ini via the
topo_cache_freq field.
2
2 Discovery Troubleshooting
This chapter will focus on general problems encountered during the IBM Tivoli Switch Analyzer
discovery process. However, many discovery issues can be avoided by reading through
“Chapter 2. Installing Tivoli Switch Analyzer” of the IBM Tivoli Switch Analyzer Administrator’s
Guide. In there you will find the following:
1. Prerequisite information
2. Information on eliminating network islands
3. How to find missing switches
4. Verifying the installation
Once the above issues have been addressed, then you should proceed through this chapter of
this guide.
ITSL2_reports –h
The Discovery Report is an ASCII layout of the customer’s enterprise network at layer 2, or the
switch level. Note that the layer 2 topology can also be displayed graphically via the layer 2
topology views in the Web Console. The ASCII Discovery Report generated via the command
above will list each discovered switch and the discovered devices connected to each port on each
switch. The user can also use the Discovery Report to verify the layer 2 topology at the port level
when working through the issues outlined in the Summary Report.
Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.
The Summary Report will list nodes sorted by discovery return codes as seen in the Discovery
Report. Use the Summary Report to identify and resolve as many discovery discrepancies as
possible to ensure that the maximum amount of information is available during discovery. Keep
in mind that missing information for one switch may have an adverse effect on the accuracy of the
topology for the devices attached downstream from that switch. Section 2.6 “How do I solve
3
problems found in the Summary Report?” of this guide will walk you through each section and
provide explanations regarding how to resolve discovery problems.
Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.
The Status Report will list switch nodes, sorted by name, that are currently managed by the port
status monitor function of IBM Tivoli Switch Analyzer. Use the Status Report to identify the
current port status for all switch devices discovered.
Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.
The Impact Analysis Report lists the nodes that would be affected if the selected device indicated
by the object ID were to fail. The output displays all of the nodes downstream of the device,
downstream being from the viewpoint of the server station.
Please see “Chapter 6. Generating reports” of the IBM Tivoli Switch Analyzer Administrator’s
Guide for additional information.
The most frequent cause of this error is that the user has recently cleared the NetView database
and rediscovered the network and the user is also managing remote networks via a services tool
called IBM Tivoli Remote Campus Installation Service for IBM Tivoli Switch Analyzer. This tool
allows the user to create campus links and is described further in section 2.6.10, “IBM Tivoli
Remote Campus Installation Service for Switch Analyzer (TWL).”
The problem of not being able to generate the reports is that the object IDs in the NetView
database have changed after the rediscovery. This may cause a problem with the artificial links
that were created with IBM Tivoli Remote Campus Installation Service for IBM Tivoli Switch
Analyzer. To solve this problem, the user needs to delete the old link entries via the services tool
and then add them back.
Another typical reason for this error is that the topo_db.out file is incomplete. This could
happen when a report was run against the file while the file was being written to. Some possible
solutions would be:
4
Wait a few seconds and re-run the report.
Use /usr/OV/ITSL2/bin/corrcl (option 2) to dump a new topology file.
Use '-d' option when running the report.
The steps used to add an OID for a new switch type are:
1. Verify that the switch is not in the above output from the ovtopodump –X command.
If the switch already shows up in this output, no further steps are necessary.
2. Add the SNMP sysObjectID in the /usr/OV/conf/oid_to_type file with either a B, H,
or BH flag.
Note: Do not use the G flag.
3. Run ovstop netmon
4. Run ovstart netmon
5. Demand poll the switch from the server console.
If IBM Tivoli Switch Analyzer was not installed already, the newly added OID will be incorporated
automatically during the install.
If IBM Tivoli Switch Analyzer was installed already prior to adding the OID to the oid_to_type
file, then perform the following commands:
Rerun the ovtopodump –X command and verify that the switch is now in the list. After IBM
Tivoli Switch Analyzer is installed, the rightmost column, Layer 2 OID, should contain a Yes for
the switch.
5
the topology each time the reports are generated due to the order of discovery of network
devices.
To initiate a rediscovery, the user can restart the itsl2 daemon, which will perform a completely
new discovery of the network. Or, if there are only a small number of changes, the user can
select each updated switch in the server submap and click on the menu items, Monitor t Layer 2
t Rediscover, without incurring the overhead of restarting the itsl2 daemon. In this case the
switch itself plus those switches nearby will be rediscovered. These discovery requests are
logged in the /usr/OV/ITSL2/log/l2_topo_adapter.log. View this file to see discovery
requests that are in progress.
The user can also use the following command to issue a switch rediscovery:
In addition, the itsl2 daemon performs a periodic discovery poll that is defaulted to run every 24
hours. The discovery poll forces a rediscovery of each switch. Therefore, the user can opt to
wait for the automatic rediscovery instead of manually rediscovering the switches. The user can
modify the setting for the discovery interval value via the discovery_interval field (measured
in minutes) in the /usr/OV/ITSL2/conf/l2_topo_adapter.ini file.
1. Run the ovtopodump –X command and view the output as shown above. The output
provides a list of all devices that the NetView server has identified as layer 2 devices.
2. A switch may be missing from this list for any of the following reasons:
a.It is not discovered.
b.The SNMP agent is not running.
c.NetView server does not have the community string (name) for the switch.
d.The SNMP sysObjectID for the switch is not in /usr/OV/conf/oid_to_type with
either a B, H, or BH flag (but not G).
e. The switch has more than one IP interface in NetView database.
3. Resolve the problem that applies to the missing switch.
a. To discover a missing switch, try pinging it and then execute a demand poll of the
nearest router, or put an entry in the seed file for it.
b. If the problem is 2b, 2c, or 2d, have the NetView administrator correct the problem
and then demand poll the switch.
c. If the problem is 2e, have the network administrator correct the problem and then
demand poll the switch.
d. If modifications were made to the oid_to_type file, restart the netmon daemon and
demand poll the switch.
6
If IBM Tivoli Switch Analyzer is installed and the NetView server has discovered the switch as a
switch object, then the rightmost column of the ovtopodump –X output, Layer 2 OID, should
contain a Yes for the switch. If this column contains a No, or if modifications were made to
oid_to_type file after installing IBM Tivoli Switch Analyzer, the user must perform the following
steps:
Note: If the switch device was initially discovered as a node (a blank object in the NetView map
with no switch symbol) and IBM Tivoli Switch Analyzer was already installed when the problem
was resolved (converting the node object to a switch object), then the user must ovstop and
ovstart the itsl2 daemon.
Finally, rerun the ovtopodump –X command; verify that the switch is now in the list and the
rightmost column, Layer 2 OID, contains a Yes for the switch.
Switches that support multiple VLANs often require an additional SNMP querying technique
referred to as Community String Indexing (CSI). CSI is used to support Cisco switches with
multiple VLANs. Other vendor switches that do not require CSI or access to private MIBs for
VLAN support are also supported. For non-Cisco vendors requiring CSI, discovery is restricted to
only those ports in the default VLAN (usually VLAN 1). CSI is not supported in IBM Tivoli Switch
Analyzer for non-Cisco switch devices.
7
L3 SW
NetView
ITSA
SW 01 SW 02 SW 03
When using a layer 3 switch as a router (such as the Cisco 3750), the downstream switch
devices may not be discovered properly if the only router for these switch devices is the layer 3
switch. This is a known limitation and will be addressed in a future release.
However, when the Layer 3 switch is the Cisco 6509 configured in hybrid mode, discovery can
work when IBM Tivoli Switch Analyzer Interim version 1.3 Fix IY67325 has been installed.
If it does not return anything, then the device does not support the Bridge MIB.
In this case, IBM Tivoli Switch Analyzer can be configured to not discover the access points as
switch devices in the /usr/OV/ITSL2/conf/files/l2_oids.cfg file as follows (these
particular access points are used as an example):
From:
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.1|*|Y|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.3|*|Y|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.5|*|Y|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.8|*|Y|
To:
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.1|*|N|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.3|*|N|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.5|*|N|
l2_oid|Spectrum24 Access Point|.1.3.6.1.4.1.388.1.8|*|N|
For complete information on how to disable devices from being discovered, please refer to
section, “Selectively disabling discovery” of the IBM Tivoli Switch Analyzer Administrator’s Guide
(in Chapter 3).
8
2.5.4 Discovery Issues with 3Com Switches
IBM Tivoli Switch Analyzer only supports Community String Indexing (CSI) for Cisco switches,
which allows the discovery of multiple VLANs on Cisco devices. Multiple VLAN support exists for
non-Cisco devices where CSI or access to private MIB information is not required to access the
port forwarding table information within various VLANs.
Although many non-Cisco switches fall into this category, the 3Com SuperStack II 1100 and 3300
switches are not in this category since they require CSI. If multiple VLANs are configured on
these devices (with the operating system code used in our lab), IBM Tivoli Switch Analyzer will
not discover ports that are not in the default VLAN.
The switch information for these devices that are not fully supported by IBM Tivoli Switch
Analyzer but that have been used in Tivoli labs are listed here:
There may be other 3Com switches that Tivoli does not support. The 1100 and 3300 are only
examples of switches tested within the Tivoli lab that have demonstrated discovery problems due
to their requirement of CSI, which is not supported for non-Cisco switches.
BayStack 380
HW:R01 FW:2.0.0.12 SW:v2.0.0.46
OID: 1.3.6.1.4.1.45.3.45.1
This list is not meant to be a complete list of Tivoli non-supported NorTel Networks switches.
9
Upgrading to a newer release of the code, such as IOS code level 12.0.5.WC7, dated 10-MAR-
2003, has proven to work successfully in the Tivoli test lab. This updated level of code does
support CSI and there have been no discovery issues when using it. Here is the information for
this particular switch when upgraded:
Some of the Extreme switches that have default configurations that prevent SNMP queries of the
Bridge MIB forwarding tables include:
The above list is not intended to be complete, and there may be other switches that fall into this
category. On the other hand, information is not available on all Extreme switches. Therefore,
there may in fact exist some Extreme switches that do not disable SNMP queries for the
forwarding tables by default and may, therefore, allow complete discovery out of the box.
10
2.6 How do I solve problems found in the Summary Report?
The switch devices in the Summary Report are sorted by discovery return code. This section will
cover these error categories and how to resolve each issue. The goal is to move as many switch
nodes as possible into the group described below in subsection 2.6.5, “Discovery has been
completed for the following nodes.”
Users can modify the SNMP discovery parameters used during this process by updating the
following fields in the l2_topo_adapter.ini file.
[Layer2]
retry_interval=900
retry_cnt=3
The retry_interval indicates the amount of time in seconds between retries that SNMP
queries are made against a switch, where the default is 900 seconds (15 minutes). The
retry_cnt is the number of times a retry will take place for a switch, where the default is set to
3. See the next subsection for troubleshooting when the retry count has been exceeded.
2.6.2 Discovery has been completed for the following nodes, but one or more
errors occurred (# = retry count has been exceeded)
The Summary Report can group switch objects into many categories, one of which is, “Discovery
has been completed for the following nodes, but one or more errors occurred (# = retry count has
been exceeded).” See the following as an example:
--------------------------------------------------------------------
Discovery has been attempted for the following nodes, but
one or more errors occurred (# = retry count has been exceeded):
--------------------------------------------------------------------
172.10.10.100/172.10.10.100 [#]
Typically, errors in this category will be caused by SNMP access problems or missing MIB
information, which is shown by the ”# = retry count has been exceeded” indicator in
the report file. The first step would be to simply walk the Bridge MIB to verify that it is available
for each switch in this category as follows:
This command should not timeout, but if it does the user should resolve as many of the SNMP
access problems as possible. Afterwards, the user can manually trigger rediscovery for each
switch that was found to have an SNMP problem. For information on rediscovery, please refer to
section 2.3, “How do I rediscover my switch after an update?”
11
The nodes in this category of the Summary Report will be automatically re-polled until either the
errors are resolved or the retry count is exceeded. See section 2.6.1, “Discovery is in progress
for the following nodes” for instructions on how to update the SNMP discovery parameters used
during this process.
2.6.3 Discovery has been completed for the following nodes, but node was
unreachable via layer 2 segments
The Summary Report can group switch objects into many categories, one of which is, “Discovery
has been completed for the following nodes, but node was unreachable via layer 2 segments.”
See the following as an example:
--------------------------------------------------------------------
Discovery has been completed for the following nodes, but
node was unreachable via layer 2 segments:
--------------------------------------------------------------------
172.10.20.101/172.10.20.101
In this case, IBM Tivoli Switch Analyzer was unable to trace a path from a router to each of the
switches in this section. This is typically caused by missing entries in the Bridge MIB forwarding
tables on the switch listed in this section or on one of the nearby switches in the network
topology. Consider the following suggestions to help resolve problems in this category.
2.6.4 Discovery has been turned off for the following nodes
The Summary Report can group switch objects into many categories, one of which is, “Discovery
has been turned off for the following nodes.” See the following as an example:
--------------------------------------------------------------------
Discovery has been turned off for the following nodes:
--------------------------------------------------------------------
172.10.40.102/172.10.40.102
172.10.40.103/172.10.40.103
12
Switches that do not support the required MIBs, such as the dot1dBride MIB or the interfaces
MIB, will appear in this section of the Summary Report. In this case, you should prevent
discovery of these devices as switches. For complete information on how to disable devices from
being discovered, please refer to section, “Selectively disabling discovery” of the IBM Tivoli
Switch Analyzer Administrator’s Guide (in Chapter 3).
For switches in this section of the Summary Report, there will always be a corresponding
message in the l2_topo_adapter.log like the following:
When switches appear in this section, it is always a matter of not finding a particular MIB for
these switches. The l2_topo_adapter.log should tell you which MIB OID is causing the
problem.
If errors similar to the one above appear in the l2_topo_adapter.log, it may be possible that
an access list is configured on the switch, preventing the necessary SNMP access via a particular
community string. On Cisco devices, for example, there may be an access list to allow only
certain devices access to the community string as follows:
In the example above, access list 1300 only allows devices with certain IP addresses access to
community string public. The user needs to make sure that the NetView server has SNMP
access for the community string that is configured for the switch.
Another problem occurs when switches are configured to disable SNMP queries to the Bridge
MIB. For example, some Extreme switches are configured to disable SNMP queries by default.
The user must enable SNMP access to the Bridge MIB in order for layer 2 discovery to work
properly. Try walking the Bridge MIB to verify that it is available for each switch in this section of
the Summary Report as follows:
This command should not timeout. If it does, the user should resolve as many of the SNMP
access errors as possible.
--------------------------------------------------------------------
Discovery has been completed for the following nodes:
--------------------------------------------------------------------
172.10.70.101/172.10.70.101
172.10.70.105/172.10.70.105
In this case, no errors were found for the switches in this section. The more switches in this
category, the more accurate the layer 2 discovered topology will be, as displayed in the Discovery
Report and the layer 2 topology views in the Web Console. Note that there could still be missing
critical entries in the forwarding tables for these nodes, causing topology problems in other areas
of the network as shown in other sections of the Summary Report.
13
2.6.6 Discovery will not be done for the following nodes because they are
located within a remote campus
The Summary Report can group switch objects into many categories, one of which is, “Discovery
will not be done for the following nodes because they are located within a remote campus.” See
the following as an example:
----------------------------------------------------------------
Discovery will not be done for the following nodes because
they are located within a remote campus:
----------------------------------------------------------------
172.10.70.110/172.10.70.110
172.10.70.111/172.10.70.111
Switches in this section have not been discovered for topology connections nor have they been
configured for port status monitoring.
If the user desires to have connected topology information for the switches in this section, then
the user must verify that the managed network is fully connected back to the NetView server on
the IP Internet map. All routers in this path must be SNMP enabled and the server must be
configured with their community strings. Please be sure to check for the discovery of the default
router for the server itself. Often it is just a matter of providing a community string in the
communityNames.conf file or the SNMP Configuration dialog for the unmanaged router node
and then restarting netmon with a subsequent demand poll of the router.
The goal is to eliminate islands on the IP Internet submap. IBM Tivoli Switch Analyzer will only
manage devices for connectivity in the contiguous network. This is a network where the NetView
server manages all devices for which there is a layer 3 connection path on the IP Internet
submap. In addition, also be sure that there is SNMP access to each switch device between the
server and the switch devices listed in this section of the Summary Report. As noted with the
routers above, it is often just a matter of supplying the server with a community string for each
switch device.
If the IP Internet map is fully connected, the user should check the file
/usr/OV/ITLS2/log/l2_topo_adapter.log for errors against each of the switches in this
section of the Summary Report. Typically, campus situations are caused by SNMP access
problems or missing MIB information for devices between the server and the switches listed in
this section of the Summary Report. Resolve as many of the SNMP access errors as possible.
The switches listed in this section of the Summary Report will be automatically re-polled until
either the errors are resolved or until the retry count is exceeded. By default the number of retries
is 3. This is set by the [Layer2] property retry_cnt in configuration file
/usr/OV/ITSL2/conf/l2_topo_adapter.ini. Afterwards, the user can manually trigger a
rediscover for each switch. Please refer to section 2.3, “How do I rediscover my switch after an
update?”
See sub-sections 2.6.9 and 2.6.10 on how to enable remote campus management to help further
eliminate campus situations.
2.6.7 The following (layer 2) nodes are located within a remote campus and are
being monitored for status only
The Summary Report can group switch objects into many categories, one of which is, “The
following (layer 2) nodes are located within a remote campus and are being monitored for status
only.” See the following as an example:
14
--------------------------------------------------------------------
The following (layer 2) nodes are located within a remote campus
and are being monitored for status only:
--------------------------------------------------------------------
172.10.70.110/172.10.70.110
172.10.70.111/172.10.70.111
Switches in this section have been identified as being in a remote campus and are discovered for
port status monitoring only. Connection information will not be available in the Discovery Report
or the Web Console layer 2 views. Depending on the outcome of the discovery, switches in this
section may appear in other sections of the Summary Report.
See sub-sections 2.6.9 and 2.6.10 on how to enable remote campus management to help
eliminate campus situations.
2.6.8 The following (layer 3) nodes are being monitored for status only
The Summary Report can group switch objects into many categories, one of which is, “The
following (layer 3) nodes are being monitored for status only.” See the following as an example:
--------------------------------------------------------------------
The following (layer 3) nodes are being monitored for status only:
--------------------------------------------------------------------
172.10.50.101/172.10.50.101
172.10.50.102/172.10.50.102
Switches in this section have been discovered for port status monitoring only, and connection
information will not be available in the Discovery Report or the Web Console layer 2 views.
Depending on the outcome of the discovery, switches in this section may appear in other sections
of the Summary Report.
Resolving the remote campus (island) issues will result in more areas of the network that can be
managed by IBM Tivoli Switch Analyzer for connectivity discovery. There are times a network
administrator will want the NetView server to manage remote networks or campuses. In this
case, IBM Tivoli Switch Analyzer will not have the required connected layer 2 topology, as
demonstrated in the following diagram.
RT A RT B
Managed Unmanaged Managed
Network Network Network
10.10.10.1 30.30.30.1
When it is not possible to completely manage an end-to-end IP network, you should use the
custom links provided with NetView 7.1.4 Fix Pack 2 interim fix 143453. This fix was required in
order to install version IBM Tivoli Switch Analyzer 1.3 on NetView 7.1.4 Fix Pack 2. Future
NetView fix packs will have this fix included and, therefore, not require a separate install. Custom
15
Links is a NetView construct that provides a connection path for IBM Tivoli Switch Analyzer to use
for correlation between remote islands for the purposes of layer 2 discovery and root cause
management. This mechanism is available on all supported NetView platforms.
The IBM Tivoli Switch Analyzer 1.3 Release Notes provides construct custom links. Further
instructions for installing and configuring this feature are included in this section.
RT A RT B
Managed Unmanaged Managed
Network Network Network
10.10.10.1 30.30.30.1
In the example instructions below (and depicted above), the user is adding a custom link to the
NetView database between two managed routers, where these managed routers are in two
different managed networks. Note that these custom links can only be added between router
devices.
@link 10.10.10.1:0:30.30.30.1:0
3. Demand poll routers RT A and RT B. This is not necessary if you clear your NetView database
and rediscover.
4. You should see dot-dashed lines between the two routers in your NetView maps.
Afterwards, IBM Tivoli Switch Analyzer will be able to provide connection information (in the
Discovery Report and the Physical Views of the Web Console) for the managed switches in the
remote campus due to the artificial links created. Please note that it is sometimes necessary to
restart the itsl2 daemon in order to completely discover the remote campus devices after
configuring these artificial links.
2.6.10 IBM Tivoli Remote Campus Installation Service for Switch Analyzer (TWL)
Resolving the remote campus (island) issues will result in more areas of the network that can be
managed by IBM Tivoli Switch Analyzer for connectivity. There are times a network administrator
will want the NetView server to manage remote networks or campuses.
It is suggested that you use the custom links documented in the previous section to eliminate
undiscovered island networks. However, a custom link can only be inserted into NetView
between 2 managed routers. If the NetView server (IBM Tivoli Switch Analyzer server) is in a
part of the network that is itself not being managed by NetView, or when there are no routers in
the local part of the network, then there is no router in the local network that can be artificially
connected to a router in the remote network. This is shown in the diagram below. In this case,
IBM Tivoli Switch Analyzer cannot have the required connected layer 2 topology via custom links.
16
Managing Remote Campuses
Switch A
Managed Analyzer Unmanaged Managed
Network Network Network
w/o
routers 33.33.2.1
When it is not possible to completely manage an end-to-end IP network, and when using custom
links is not an option, ask IBM Software Support about the IBM Tivoli Remote Campus Installation
Service for IBM Tivoli Switch Analyzer. This is a services utility that provides a connection path
from the Switch Analyzer server to a device in a remote campus, which allows for correlation
between remote islands for the purposes of layer 2 discovery and root cause management. At
this time, the services tool is not available for Windows platforms.
Afterwards, IBM Tivoli Switch Analyzer will be able to provide connection information (in the
Discovery Report and the Physical Views of the Web Console) for the managed switches in the
remote campus due to the artificial links created. Please note that it is sometimes necessary to
restart the itsl2 daemon in order to completely discover the remote campus devices after
configuring these artificial links. Instructions
Instructions are included with the tool. Additional tips on installing and configuring this service are
provided in this section. In the example instructions below, the user is adding a link to the layer 2
topology that connects the edge router in the remote managed campus network, router A, back to
the IBM Tivoli Switch Analyzer server in the locally managed network. This edge router has IP
address 33.33.2.1.
Switch A
Managed Analyzer Unmanaged Managed
Network Network Network
w/o
routers 33.33.2.1
The previous name for this product was Topology WAN LAN (TWL), and the file name has a
format like twlbuild-1.3.0.6.tar. Each of these references will be seen in following
example.
17
1. Install Switch Analyzer -> leave itsl2 daemon running
2. Untar TWL file in temporary directory:
tar -xvf twlbuild-1.3.0.6.tar
3. cd twlinstall
4. Install TWL: ./twlinst.sh
-> System Check OK
-> Installation complete
5. cd /usr/OV/ITSL2/bin
6. Run TWL: ./twl.sh
7. Setup link: select option 1
-> Please enter REMOTE END device name
(match exactly as in topology)
ex: 33.33.2.1
-> Please enter a Description for entry
ex: edge router
-> sent event Interface 0.0.0.0 Added
-> Done.
-> Press Enter to Continue -> Enter
-> x (Exit)
The CDP cache is read in order to assist with layer 2 discovery when accurate information cannot
be obtained via other MIBs. The problem in this case is that the CDP cache on Switch A port 1
will show Switch C port 3 directly attached to it, and vice-versa. This will lead to incorrect
topology discoveries in the Discovery Report.
The problem can be resolved by making sure there is MIB access to all switches. This problem
can also be resolved by disabling CDP on the ports of Cisco switches or Cisco routers that are
directly connected to non-Cisco switches. In the example above, Cisco Switch A port 1 and
Cisco Switch C port 3 would need to have CDP disabled on a per port basis to prevent the
discovery problem from occurring.
18
default logic is applied to re-connect the affected node to the same port on the switch that the
router is also connected to (the default router for the affected node). IBM Tivoli Switch Analyzer
will assume this default layer 3 connection for topology and correlation purposes. This default
logic can also be applied when the MAC address of an interface is not known when initially added
to the layer 2 topology.
The Discovery Report will indicate nodes that have been connected via this default logic with an
asterisk (*) at the end of the node entry. These entries are difficult to remove and sometimes can
only be resolved by restarting the itsl2 daemon. In fact, scenarios in which a user should restart
the itsl2 daemon include:
When there has been plenty of new node discovery such as managing networks that
were previously unmanaged, adding custom links or TWL links, or when the NetView
server databases were cleared and rediscovery has taken place. Any of these scenarios
can result in nodes in the Discovery Report with an asterisk following them or result in
missing nodes from the report altogether.
When the server does not know community string information for devices, and could also
result in having nodes in the Discovery Report with an asterisk.
When new router interfaces have been added to the database, and these interfaces show
up with an asterisk next to them in the Discovery Report.
The above scenarios indicate when the user should restart the itsl2 daemon. However, there are
steps the user must take prior to restarting the daemon, or restarting the daemon would not result
in an improved Discovery Report with fewer asterisk (*) default location indicators next to nodes.
These steps are listed here:
1. The user should make sure that the server has SNMP access to all switch devices that
the user wants to manage. Ideally, all switches within the IP managed network should be
managed by the server. It is often just a matter of supplying the server with the
community string of each switch device. The device located by default on a particular
switch (indicated by an asterisk) could actually be connected to another switch in the real
network. That other switch must be discovered as well.
2. The user should make sure that the NetView server knows the SNMP community strings
for each device with an asterisk in the Discovery Report, whether it is a router, switch, or
end node. If some community strings are missing, add them and demand poll each
appropriate device via the server console GUI. IBM Tivoli Switch Analyzer needs the
MAC address of the end nodes for accurate placement of the end nodes in the topology.
However, the NetView server can learn the MAC addresses for end nodes even if SNMP
queries fail for some reason, so it is not absolutely required that end nodes (non-routers
and non-switches) support SNMP.
3. If new router interfaces have been added to the network and they appear in the
Discovery Report with an asterisk next to them, demand poll the router through the server
console GUI.
Once the user has resolved as many of these items as possible, the itsl2 daemon should be
stopped and restarted.
19
Trunk Discovery
R3620C
CS3550B 33.33.1.5
33.33.1.11 33.33.5.5
33.33.7.5
Trunk
Fa0/15 Fa1/0
Let’s look at an example to help illustrate the problem further. As seen in the diagram above,
switch CS3550B and router R3620C form a router-on-a-stick. The router is using interface Fa1/0
for the stick, connected to port Fa0/15 of the switch. The router is also using subinterfaces
Fa1/0.1, Fa1/0.5, and Fa1/0.7 with IP addresses 33.33.1.5, 33.33.5.5, and 33.33.7.5 respectively.
See the following output from a sample Discovery Report:
cs3550b.tivlab.itsa.com/33.33.1.11
[FastEthernet0/1/1/0.0.0.0] ===>
[cs1924a.tivlab.itsa.com/33.33.1.16 - A /26/0.0.0.0]
[FastEthernet0/3/3/0.0.0.0] ===>
[cs1912b.tivlab.itsa.com/33.33.1.18 - A /26/0.0.0.0]
[FastEthernet0/15/15/0.0.0.0] ===>
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.1/9/33.33.1.5]
From the output above, port 15 of CS3550B shows only a connection to Fa1/0.1 of router
R3620C, which happens to be the same subnet that the switch IP address is in for the
management interface of the switch (33.33.1.0/24). To help further illustrate the problem in the
Discovery Report, see the sample hand-generated output below that shows the missing
subinterfaces in the report for router R3620C.
[FastEthernet0/15/15/0.0.0.0] ===>
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.1/9/33.33.1.5]
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.5/9/33.33.5.5]
[r3620c.tivlab.itsa.com/33.33.1.5 - FastEthernet1/0.7/9/33.33.7.5]
The above output shows the desired discovery for this scenario. The current release of IBM
Tivoli Switch Analyzer does not provide this level of detail in the Discovery Report. However, not
displaying the router subinterfaces of a router trunk link does not interfere with the ability of the
root cause engine to correlate properly.
20
3 Root Cause / Impact Analysis Troubleshooting
The IBM Tivoli Switch Analyzer will generate an event for any outage in the managed, connected
network region detected by the NetView server. If there is a topology inaccuracy in that area, it is
likely to manifest in one of two ways.
1. Root cause events will show up for a device that is downstream (from the point of view of
the server) from the real problem.
2. There may be multiple root cause events, usually identifying some unreachable end
nodes.
To minimize the possibility of the two scenarios above from occurring, review and solve the
issues mentioned in chapter 2, “
21
Discovery Troubleshooting.”
IBM Tivoli Switch Analyzer will also generate events for switches in a remote campus when the
Port Status Monitor is active. Please refer to subsection “Port status monitoring” in Chapter 4 of
the IBM Tivoli Switch Analyzer Administrator’s Guide for introductory information. For
configuration information, see “Configuring the port status monitor” in Chapter 7 instead.
Correlator.ini file
[Correlation]
interface_bounce_count=2
interface_bounce_interval=900
In this example, IBM Tivoli Switch Analyzer issued the earlier two root cause events as vendor
interface downs on the switch port and not as vendor node downs on the end node.
Nevertheless, the NetView server issued two node down events for the corresponding end node.
However, the bouncing algorithm is only tracked for NetView events, not IBM Tivoli Switch
Analyzer events. So the user will not see bouncing events for interfaces of a switch. As a result,
the bounce indicator [B] is applied to the device attached to the switch port and not applied to
the switch port itself.
22
Once a device is considered down via the bouncing algorithm, it will not be cleared, or declared to
have a status of up, until an up event is received for the device and the time indicated by
interface_bounce_interval has expired without any additional down events.
During the discovery process, it may not be possible to determine all of the layer 2 connections
between switches and routers. As a result, there may not always be a discovered path to every
switch in the network. In order to connect the switch to the currently discovered topology when
the path is not known, the management interface of the switch is assigned to a default location in
the layer 2 topology. For additional information, refer to section 2.7.2, “An asterisk (*) shows up
after a node”.
The [D] in the event is used to denote that this correlated event has occurred on a switch that
was, in fact, placed in a default location according to its management interface during the
discovery process. It is important to note that these particular root cause events, indicated with
[D], may or may not be the true root cause of a network outage due to the default placement of
this switch in the internal database.
The switch referenced in an event such as this will have a corresponding entry in the Discovery
Report with a [~] indicator. For example, the Discovery Report entry for switch SS3300B that
was placed in the layer 2 topology via default logic would look as follows:
ss3300b.tivlab.itsa.com/33.50.1.11 [~]
The user can look in the l2_topo_adapter.log for entries around the same date and time
when the events were seen in the NetView event browser. If at the same time there are
23
discovery requests for each device in the network, then the recurring discovery poll most likely
took place. Therefore, the series of events seen in the event browser is expected behavior.
The user can modify the setting for the discovery interval value via the discovery_interval
field (measured in minutes) in file /usr/OV/ITSL2/conf/l2_topo_adapter.ini.
The option –o indicates the object ID of the device. Sample output from this command is shown
below:
--------------------------------------------------
Impact Analysis Report
--------------------------------------------------
cs1912g.tivlab.itsa.com
cs1912h.tivlab.itsa.com
cs1924d.tivlab.itsa.com
saen26.tivlab.itsa.com
--------------------------------------------------
There are times when the Impact Analysis Report does not show any impacted nodes
downstream. In this case, layer 2 network redundancy combined with the Cisco Discovery
Protocol (CDP) plays a role in this unusual, but expected, behavior. See the following diagram.
24
Layer 2 Redundant Paths
and Impact Analysis
A Core B
Switch
Analyzer E C D
Access
Let’s say that IBM Tivoli Switch Analyzer is located on an access switch in the above fully
connected network mesh. If the network consists of Cisco switches, then only the access
switches C, D, and E will have a populated Impact Analysis Report for the end nodes hanging off
of them. However, the report for switch C will not show switches A, B, or D as impacted due to
the layer 2 redundancy. That’s because IBM Tivoli Switch Analyzer discovers the devices
attached to the blocked ports using the CDP cache in Cisco devices, causing a redundant path in
the layer 2 topology database. And when there is redundancy, there is no impact.
In addition, an outage occurring with switches A and B will not have an impact at all in relation to
other devices due to the redundancy for the same reason. Therefore, the Impact Analysis
Reports for these 2 switches will be completely blank. In contrast, the Impact Analysis Report for
switch E will have all downstream devices in it, including switches A, B, C, and D, as well as end
nodes hanging off of these switches.
When IBM Tivoli Switch Analyzer detects the interface down event, it begins processing and
looking for the root cause of the problem. The default timer value to return a root cause is five
minutes. This field is configurable in the correlator.ini file, in section Correlation, with
field interface_timeout, which is defaulted to 300 seconds. Lower this value when it is
desired to have the root cause returned sooner. When this value is changed, the user must
restart the itsl2 daemon to activate the changes.
There may be a negative impact when making this value too small, in that the root cause event
may be issued before the actual root cause can be fully determined. For instance, reducing this
value down to ten seconds would not be wise.
25
3.6 Why is the root cause sometimes the switch port and
sometimes the end node?
IBM Tivoli Switch Analyzer begins its root cause analysis when the NetView server issues an
interface down event. In the link down example shown in the diagram below, the link between the
switch and the end node went down. Therefore, the end node is down from the server’s point of
view. In one case, a root cause event will be issued for the switch port (interface down), causing
the switch to become marginal and turn yellow in the map. In the other case, the root cause
event will be issued for the end node, with no change to the switch icon color in the server map.
Switch Access
Analyzer Switch End Node
In this scenario, the link has gone down between the access switch and the end node. And, as
mentioned above, the root cause event could be issued for the switch port or for the end node
because of a timing issue due to a race condition. After the link goes down, the switch device
may take several seconds, and even up to twenty seconds, to resynchronize itself. Afterwards,
the switch port is updated in the switch’s MIB tables to reflect the down condition.
If IBM Tivoli Switch Analyzer queries the switch within the first few seconds of this
resynchronization, it will not see an error for the switch port, and may conclude that the end node
is at fault. If the switch is queried after resynchronization, the switch port will be detected as
down and an interface down root cause event is issued for the switch port. Either of these results
is reasonable, given that the real network failure lies in the link between the two devices.
There are two other scenarios worth mentioning when it comes to the location of the root cause.
Please refer to the following two diagrams that illustrate these scenarios.
Switch Access
Hub End Node
Analyzer Switch
In the diagram above, a hub was inserted between the access switch and the end node. Note
that hub devices are not managed devices. In this case, the link went down between the hub and
the end node, causing a NetView node down event to be issued for the end node. In contrast to
the race condition mentioned earlier, the MIB information for the switch port will not be updated at
all by the switch device since its connection is still active with the hub. Therefore, the only root
cause event that can be generated is the one for the end node in the form of a node down. This
is also true if the hub is replaced with an unmanaged switch, that is, a switch where the
community string is not configured within the NetView server.
26
Next let’s look at a link that goes down between the access switch and the hub. In this case, the
user will experience the same race condition mentioned above. As stated earlier, if the switch is
queried within the first few seconds of resynchronization, the conclusion will be that the end node
is at fault since the switch MIB tables are not updated yet. If the switch is queried after
resynchronization is complete, the switch port will be detected as down and an interface down
root cause event is issued for the switch port.
Switch Access
Hub End Node
Analyzer Switch
27
4 Web Console Troubleshooting
For configuration information regarding the IBM Tivoli Switch Analyzer Web servlet, please refer
to, “Configuring the Web servlet”, of the IBM Tivoli Switch Analyzer Administrator’s Guide (in
Chapter 7). Please also see, “Web servlet and console debugging”, for information on how to
further debug possible Web Console issues.
This version of the JRE is not supported by the Web Console when using IBM Tivoli Switch
Analyzer version 1.3. If you have a version other than 1.3.1 of the JRE, please install version
1.3.1 instead.
File: /usr/OV/conf/NetViewLargeDb.conf
Field: NETVIEW_LARGE_DB=FALSE
You will need to set this value to TRUE if you have a large database. Please consult IBM
Software Support regarding when this field should be set to TRUE.
Note: This is not an IBM Tivoli Switch Analyzer specific issue. However, our customers have run
into this problem, so we are documenting it here.
28
4.1.4 Cannot launch submap explorer from Tivoli Event Console
There are a few ways to launch the NetView Web Console from an event with Tivoli Event
Console. One way is to highlight a NetView generated event and launch the Web Console
submap explorer. However, there is a known problem where only the Web Console comes up
without the submap explorer. The problem can be resolved as follows.
/usr/OV/www/webapps/netview/warf/Templates/WebConsole
You should find a file called ITSL2Menu.xml, where this XML file defines all the menu entries. If
this file does not exist, then there was a problem encountered during the install, and you should
check the install logs (/usr/OV/tmp/itsl2 directory). If the file ITSL2Menu.xml does
exist, you should save your Web Console security settings again (even if you have made no
changes) and restart the webserver daemon. Then launch the Web Console, and the Layer 2
menu should now exist.
To fix this problem, edit the nvwc.bat (nvwc.sh) file and change each of the -Xmx64m options
to be -Xmx128m or more. Make sure to change all the occurrences of this option (there should
be two occurrences). Then re-launch the Web Console.
29
IBM Tivoli Switch Analyzer is designed to display attached devices to the switch ports in Physical
Views. In this example, a Physical View was selected for a switch device, and the devices
attached to the switch are depicted in the diagram. The results of this view are generated from
the layer 2 discovery information.
When the Physical View does not display attached devices to the switch ports, the switch device
is most likely in a remote campus. If you feel that your switch chosen for the Physical View
should not be in a remote campus, please check the Summary Report and make sure your switch
device is not placed in a remote campus by Switch Analyzer.
Please see sections 2.6.9 and 2.6.10 to minimize remote campuses in your network.
2. Look for the following block of lines and modify the number of seconds:
<init-param>
<param-name>ClientRequestTimeout</param-name>
<param-value>120</param-value>
</init-param>
30
4. Restart the webserver daemon.
For additional information about the Web servlet, refer to the “Configuring the Web servlet”
section of the IBM Tivoli Switch Analyzer Administrator’s Guide (in Chapter 7).
EN 01 RT 01 RT 02 RT 03 EN 02
There are no switches in this output, although you were expecting to see a few. When this
happens, your end nodes chosen for the Point-to-Point View are most likely attached to switches
that are in a remote campus as follows:
Switch
Analyzer Unmanaged
Network
EN EN
01 02
This is the expected behavior when the end nodes are in the remote campus portion of the
managed network. If you feel that your end nodes chosen for the Point-to-Point View, along with
your switch devices, should not be in a remote campus, please check the Summary Report and
make sure your switch devices are not placed in a remote campus by Switch Analyzer.
Please see sections 2.6.9 and 2.6.10 to minimize remote campuses in your network.
4.6.2 Router-on-a-Stick
As noted in section 2.7.3, “Router-on-a-stick subinterfaces are not displayed”, the switch contains
the layer 2 path information, and the router introduces a layer 3 redundant path when a router-on-
a-stick topology is in use. The end result, as noted earlier, is that not all corresponding router
subinterfaces are depicted in the Discovery Report.
31
This lack of discovered information causes problems for the Point-to-Point View. For example, a
user may select two end nodes for a Point-to-Point View and get the following results:
EN 01 RT 02 EN 02
There are no switches in this output, although you were expecting to see a few. When this
happens, your end nodes chosen for the Point-to-Point View may be attached to switches in a
router-on-a-stick network topology as follows:
RT 02
Subnet #1 Subnet #2
EN EN
01 02
In this topology, the two access switches and end nodes use router RT 02 as their gateway, via a
router-on-a-stick. Point-to-Point Views do not work well when the two end nodes selected for this
view pass through a router via a router-on-a-stick implementation as shown in the diagram above.
As a result, the switch objects downstream of the router-on-a-stick implementation do not appear
in the Point-to-Point View.
EN 01 SW 01 RT 01 SW 02 RT 02 EN 02
X
There are a few reasons why this X may appear in your output.
Point-to-Point Views do not work over artificially created links. These links can be created with
IBM Tivoli Remote Campus Installation Service for IBM Tivoli Switch Analyzer (Topology WAN
LAN, or TWL) links or NetView custom links. Refer to sections 2.6.9 and 2.6.10 for information
on these links.
32
Customers are encouraged to create these artificial links in order to improve layer 2 discovery.
However, if the 2 selected points in the Point-to-Point View span one of these artificial links, then
the Point-to-Point diagram will show an X object at the point where the artificial link connects the
topology, and then connects the second point to the X object in the view.
If you are, in fact, not using artificial links and feel that neither of your end nodes chosen for the
Point-to-Point View should be in a remote campus, please check the Summary Report and make
sure your switch devices along the path are not placed in a remote campus by IBM Tivoli Switch
Analyzer.
Please see sections 2.6.9 and 2.6.10 to minimize remote campuses in your network.
33
5 Port Status Monitor Troubleshooting
In addition to finding the root cause events in the connected topology, IBM Tivoli Switch Analyzer
version 1.3 will generate events for switches in a remote campus. This is accomplished via the
Port Status Monitor function, which is on by default. Please refer to subsection “Port status
monitoring” in Chapter 4 of the IBM Tivoli Switch Analyzer Administrator’s Guide for introductory
information. For configuration information, see “Configuring the port status monitor” in Chapter 7
instead.
A Core B
Switch
Analyzer C D
Access
Let’s say that IBM Tivoli Switch Analyzer version 1.2.1 is located on an access switch in the
above fully connected mesh network for redundancy. In this example, the red dashed links are
blocking. The blocking connections are between Switch A and Switch B, between Switch B and
Switch D, and between Switch A and Switch D.
When the forwarding link between Switch C and Switch D goes down, the link between Switch A
and Switch D becomes active, for example. An interface down event is not issued by NetView
since all nodes are accessible from the server during the NetView status poll. The server would
have never known that a problem occurred.
34
Layer 2 Redundant Paths
with Link Down
A Core B
Switch
Analyzer C D
X
Access
In this example, the customer would not be informed immediately about the outage between
Switch C and Switch D. All paths need to be down between the server and a managed device
before a root cause event would be generated.
However, with the Port Status Monitor (PSM) function introduced in version 1.3, the outage
between Switch C and Switch D will be found during the PSM status poll. So instead of receiving
no events for the outage as in previous releases, PSM will return 2 events for this single outage.
The reason is that PSM will SNMP status poll both switch C and switch D since it has IP access
to both switches. Each switch will indicate that a port has gone down, resulting in 2 (V) events in
the NetView event browser.
When viewing the Status Report or the layer 2 topology views, some users may experience what
seems to be unexpected port status behavior, which are documented in this section.
35
The Status Report for the switches in the above diagram will contain the following information.
Notice that all of the ports downstream of switch 2950k port 23 have the status Interface
Up/Correlated Down (impact), which is the expected behavior. This is because IBM
Tivoli Switch Analyzer marks all downstream ports as down, driven by its algorithm regarding
impact analysis. The code regards ports downstream of an outage as also being down
(correlated down with impact) since it cannot manage these ports again until the original root
cause problem is resolved.
The Status Report for the above switches will contain the following information.
2950k.itec.lab.com/172.30.120.10/Reachable/Correlated Up
[17/FastEthernet0/17/Interface Up/Correlated Up]
[18/FastEthernet0/18/Interface Up/Correlated Up]
[23/FastEthernet0/23/Interface Up/Correlated Up]
[24/FastEthernet0/24/Interface Up/Correlated Up]
36
2950j.itec.lab.com/172.30.120.11/Reachable/Correlated Up
[2/FastEthernet0/2/Interface Up/Correlated Unmanaged]
[4/FastEthernet0/4/Interface Up/Correlated Unmanaged]
[5/FastEthernet0/5/Interface Up/Correlated Unmanaged]
[6/FastEthernet0/6/Interface Up/Correlated Unmanaged]
1924e.itec.lab.com/172.30.120.18/Reachable/Correlated Up
[17/17/Interface Up/Correlated Unmanaged]
[18/18/Interface Up/Correlated Unmanaged]
[19/19/Interface Up/Correlated Unmanaged]
Notice that all of the ports downstream of switch 2950j, and including the ports of switch 2950j,
have the status Interface Up/Correlated Unmanaged, which is the expected behavior.
This is because IBM Tivoli Switch Analyzer has its own concept of unmanaged, driven by its
algorithm regarding impact analysis. The code regards ports downstream of an unmanaged port
or unmanaged switch as also being unmanaged (correlated unmanaged) since it cannot manage
these ports again until the original root cause problem is resolved.
File: /usr/OV/ITSL2/conf/l2_event_receiver.ini
[Adapter]
req_cnt=1000
The default configuration for the Port Status Monitor (PSM) is to monitor all ports of all discovered
switch devices every 5 minutes. This is quite intense and should be scaled back for large
networks. Below are two suggested parameter changes to minimize the CPU utilization for the
l2_event_adapter, followed by a brief description of each change.
File: /usr/OV/ITSL2/conf/l2_event_adapter.ini
From:
[Polling]
poll_cycle=300
To:
[Polling]
poll_cycle=1800
In this case, the PSM polling cycle was changed from 5 minutes (300 seconds) to 30 minutes.
You can set this parameter at whatever value is best for your network. The reason for making
this change is that the PSM polling is SNMP based. This can be quite intense for some servers,
especially if the user is polling thousands of switch devices.
37
Next, you should update the ports that PSM should examine by actually decreasing the ports to
be monitored as follows:
File: /usr/OV/ITSL2/conf/correlator.ini
From:
[PortStatus]
poll_all_ports=y
To:
[PortStatus]
poll_all_ports=n
With this configuration, the PSM function only status polls switch ports that do not have
connections to them, which are devices attached to switch ports as determined by IBM Tivoli
Switch Analyzer. The Layer 2 Discovery Report, as well as the Physical View of a switch via the
Web Console, will indicate which switch ports have connections. These particular ports will be
the ports “not” actually monitored by PSM when the poll_all_ports value is set to n. For
these connected ports not monitored by PSM in this scenario, the NetView Interface Down event
will trigger the correlation engine as before and result in the root cause event being generated for
these particular ports.
In general, when poll_all_ports is set to n, PSM will only manage (poll) switch ports that do
not have connections to them, which include the following ports:
38