Professional Documents
Culture Documents
• Available on
ftp://ftp-eng.cisco.com/pfs/seminars/NANOG29-BGP-
Troubleshooting.pdf
• Fundamentals of Troubleshooting
• Local Configuration Problems
• Internet Reachability Problems
Human error
Typos, using wrong commands, accidents, poorly
planned maintenance activities
NANOG29 © 2003, Cisco Systems, Inc. All rights reserved. 5
Fundamentals:
Problem Areas
Interoperability issues
Differences in interpretation of RFC1771 and its
developments
• Fundamentals
• Local Configuration Problems
• Internet Reachability Problems
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
1.1.1.1 2.2.2.2
iBGP
? R1 R2 eBGP
3.3.3.3
AS 1
R3
?
AS 2
• Is the Local AS configured correctly?
• Is the remote-as assigned correctly?
• Verify with your diagram or other
documentation!
NANOG29 © 2003, Cisco Systems, Inc. All rights reserved. 14
Peer Establishment:
iBGP Problems
• Common problem:
iBGP is run between loopback interfaces on router (for
stability), but the configuration is missing from the router ⇒
iBGP fails to establish
Remember that source address is the IP address of the
outgoing interface unless otherwise specified
AS 1 AS 2
R1 eBGP R2
Layer 2
• Diagnosis
Keepalives get lost because they get stuck in the router’s
queue behind BGP update packets.
BGP update packets are packed to the size of the MTU –
keepalives and BGP OPEN packets are not packed to the size
of the MTU ⇒ Path MTU problems
Use ping with different size packets to confirm the above –
100byte ping succeeds, 1500byte ping fails = MTU problem
somewhere
• Solution
Pass the problem to the L2 folks – but be helpful, try and
pinpoint using ping where the problem might be in the
network
AS 1 AS 2
R1 eBGP R2
Layer 2
Small Packets
Large Packets
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
• Common issues
Clashing router IDs
Clashing cluster IDs
• Two RR clusters
• R1 is a RR for R3
• R2 is a RR for R4
R1 R2
• R4 is advertising
7.0.0.0/8 R3 R4
• Solution
do NOT set the router ID by hand unless you have a very
good reason to do so and have a very good plan for
deployment
Router-ID is usually calculated automatically by router
• One RR cluster
R1 R2
• R1 and R2 are RRs
• R3 and R4 are RRCs
• R4 is advertising R3 R4
7.0.0.0/8
R2 has it
R1 and R3 do not
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
• Type of filters
Prefix filters
AS_PATH filters
Community filters
Policy/Attribute manipulation
• Applied incoming and/or outgoing
• Communities
Each implementation has different defaults for when
communities are sent
Some don’t send communities by default
Others do for iBGP and not for eBGP by default
Others do for all BGP peers by default
Watch how your implementation handles communities
There may be implicit filtering rules
Each ISP has different policies – never assume that
because communities exist that people will use them,
or pay attention to the ones you send
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
1.1.1.1 R5 2.2.2.2
R1 iBGP R2
eBGP
3.3.3.3 AS 1
R3 4.4.4.4
R4 B
eBGP
AS 3
A
• Customer connected to R1 can see
10.10.0.0/24 AS3, but not AS2
AS 2
• Also complains about not being able
to see sites connected to R5
• No complaints from other customers
NANOG29 © 2003, Cisco Systems, Inc. All rights reserved. 44
Missing Routes—iBGP
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
AS 3 AS 10
10.0.0.0/8
R3
R2
MED 30
MED 20
RouterA
AS 2
AS 1
MED 0
• RouterA will have three paths
• MEDs from AS 3 will not be compared R1
with MEDs from AS 1
• RouterA will sometimes select the path from R1 as best and but may
also select the path from R3 as best
NANOG29 © 2003, Cisco Systems, Inc. All rights reserved. 52
Deterministic MED—Operation
AS 3 AS 10
10.0.0.0/8
R3
R2
MED 30
MED 20 RouterA
AS 2
AS 1
MED 0
R1
• RouterA will have three paths
• RouterA will consistently select the path from R1 as best!
AS 10 AS 20
R1 R2
• Summary:
RFC1771 isn’t prefect when it comes to path selection –
years of operational experience have shown this
Vendors and ISPs have worked to put in stability
enhancements
But these can lead to interesting problems
And of course some defaults linger much longer than
they ought to – so never assume that an out of the box
default configuration will be perfect for your network
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
R3
R1
AS 3 R2
142.108.10.2
AS 4
AS 12
• R3 prefers routes via AS 4 one minute
• 1 minute later R3 prefers routes via AS 12
• And 1 minute after that R3 prefers AS 4 again
NANOG29 © 2003, Cisco Systems, Inc. All rights reserved. 60
Route Oscillation—Symptom
• Fundamentals
• Local Configuration Problems
• Internet Reachability Problems
• Community confusion
set community does just that – it overwrites any other
community set on the prefix
Use additive keyword to add community to existing list
Use Internet format for community (AS:xx) not the 32-
bit IETF format
Cisco IOS never sends community by default
Other implementations may send community by default
for iBGP and/or eBGP
Never assume that your neighbouring AS will honour
your no-export community – ask first!
• AS-PATH prepends
20 prepends won’t lessen the priority of your path any
more than 10 prepends will – check it out at a Looking
Glass
The Internet is on average only 5 ASes deep, maximum AS
prepend most ISPs have to use is around this too
Know you BGP path selection algorithm
Some ISPs use bgp maxas-path 15 to drop prefixes
with ridiculously long AS-paths
AS 1 AS 3
192.168.1.0/24
R1 R3
R2
AS 2
• Checklist:
AS1 announces, but does AS2 see it?
We are checking eBGP filters on R1 and R2. Remember
that R2 access will require cooperation and assistance
from your peer
• Checklist:
Does AS2 send it to AS3?
We are checking eBGP configuration on R2. There may be
a configuration error with as-path filters, or prefix-lists, or
communities such that only local prefixes get out
AS 1 AS 3
203.51.206.0
R1 R3
The Internet
• Checklist:
AS1 announces, but do its upstreams see it?
We are checking eBGP filters on R1 and upstreams.
Remember that upstreams will need to be able to help
you with this
• Hmmm….
• Looking Glass can see 203.48.0.0/14
This includes 203.51.206.0/24
So the problem must be with AS3, or AS3’s
upstream
• A traceroute confirms the connectivity
• Checklist:
Does AS3’s upstream send it to AS3?
We are checking eBGP configuration on AS3’s upstream.
There may be a configuration error with as-path filters, or
prefix-lists, or communities such that only local prefixes get
out. This needs AS3’s assistance.
The Internet
AS 2 AS 3
R2 R3
R1
AS 1
• Checklist:
What does “trouble” mean?
• Is outbound traffic loadsharing okay?
Can usually fix this with selectively rejecting prefixes,
and using local preference
Generally easy to fix, local problem, simple application
of policy
• Is inbound traffic loadsharing okay?
Errummm, bigger problem if not
Need to do some troubleshooting if configuration with
communities, AS-PATH prepends, MEDs and selective
leaking of subprefixes don’t seem to help
NANOG29 © 2003, Cisco Systems, Inc. All rights reserved. 90
Troubleshooting Connectivity –
Example III
• Checklist:
AS1 announces, but does AS2 see it?
We are checking eBGP filters on R1 and R2. Remember
that R2 access will require cooperation and assistance
from your peer
• Checklist:
Does AS2 send it to its upstream?
We are checking eBGP configuration on R2. There may be
a configuration error with as-path filters, or prefix-lists, or
communities such that only local prefixes get out
• Checklist:
Repeat all of the above for AS3
The Internet
AS 2 AS 3
R2 R3
R1
AS 1
• Checklist:
Assume AS1 has done everything in this
tutorial so far
All the configurations look fine, the Looking Glass
outputs look fine, life is wonderful… Apart from those
annoying traffic swings every hour or so
• Fundamentals
• Local Configuration Problems
• Internet Reachability Problems