Professional Documents
Culture Documents
Content
Objective:
Know tools to troubleshoot and provide
reporting and how to use them with
Exchange
What you will find:
General presentation
Practical examples
Q&A
To begin with
General Presentation
Scope: Exchange 2003, 2007 & 2010
Structure: CAS, HUB, MBX
Specific troubleshooting tools and
Interactive!
4
Exchange Services
1st thing to do when something goes
wrong:
Check if all Exchange services
configured in Automatic mode are in
service
Very simple task which can save a lot
of troubleshooting time on occasions
5
Exchange Services
Exchange 2003:
Exchange Services
Exchange 2007:
Exchange Services
Exchange 2010:
Exchange Services
If there is an Automatic Exchange
Exchange Services
Tool that I find most useful in case I
http://
technet.microsoft.com/en-us/sysinternals/bb896645.aspx
10
Exchange Services
Example
1
ISSUE:
After patching, Outlook clients could no longer
connect to one specific CAS server
TROUBLESHOOTING:
Ran Test-ServiceHealth and noticed that
Exchange Services
Example
1
TROUBLESHOOTING:
Tried to start the RCA service but it failed
with:
Exchange Services
Example
1
TROUBLESHOOTING:
Ran ProcMon and filtered for Process and Thread
13
Exchange Services
Example
1
TROUBLESHOOTING:
Added File Activity to the view and started looking
there
14
Exchange Services
Example
SOLUTION: 1
Copy the DLL from another CAS server
Exchange Services
Example
2
ISSUE:
There was an Exchange service no longer
starting.
For this example, I used the RCA service to illustrate the issue
that I saw although another Exchange service was involved at
that time.
16
Exchange Services
TROUBLESHOOTING:
Example
2
17
Exchange Services
Example
Security team2
had
done some
reinforcements:
SOLUTION:
Gave back the
appropriate permissions
to Network Service and
started the service
18
CAS
Troubleshooting
CAS
In Exchange 2010, the CAS server is
CAS
CAS servers have gotten more
CAS
When discussing client connectivity,
kind of issues
22
Client Access\"
Max size in KB that a single log file can grow to before a new one is
generated: "10240"
Max size in KB that the entire directory of logs can grow to before
the oldest log is deleted: 1048576"
Length of time in hours a log will be kept before being deleted:
720"
23
Log type tags to be logged: ConnectDisconnect,
Logon,
change
The 2 most useful changes are the Default Folder and the Log Tags
Default Folder Setting
You may want to change the location where the log files are stored if you want to move
OperationSpecific
This option shows more detail for each ROP (for example, SetProps on CreateMessage
operation)
Throttling
24
2012-04-01T19:43:55.197Z,2,1,/o=Contoso/ou=Exchange
Administrative Group
(FYDIBOHF23SPDLT)/cn=Recipients/cn=UserA,,OUTLOOK.EXE,14.
0.4760.1000,Cached,,,ncacn_ip_tcp,,,,00:00:00.1399752,"BS=Co
nn:2,HangingConn:0,AD:3000/3000/0%,CAS:123000/122722/
1%,AB:3000/3000/0%,RPC:120000/120000/0%,FC:1000/0,
Policy:DefaultThrottlingPolicy_766c8f13-bb90-4fee-912de1aec48df1ab,Norm",
27
CAS Example 1
ISSUE:
Some users are not able to access any
of the Exchange Web Services:
Autodiscover, OWA, availability, OAB
Not all users are having this issue
30
CAS Example 1
TROUBLESHOOTING:
Reproduced the issue with an affected
user tried to access OWA, but got
error 400 Bad request in the web
browser
Looked into the IIS logs of the CAS
servers, but there was no sign of the
request
31
CAS Example 1
TROUBLESHOOTING:
Need to see if the request made by the user
CAS Example 1
EXPLANATION:
User making the request has a large
RESOLUTION:
Applied the following KBs:
http://support.microsoft.com/kb/2491354
http://support.microsoft.com/kb/2020943
CAS Example 2
ISSUE:
Exchange Server 2010 SP1
TMG reverse proxy
OOF does not work in Outlook and OWA
34
CAS Example 2
TROUBLESHOOTING:
Reproduced the issue from the users
workstation
Looked into the IIS log of the CAS:
CAS Example 2
EXPLANATION:
SSL is activated for both OWA and ECP
AutomaticReplies demand is forwarded to
Port 80 = HTTP
The TMG OWA rule is configured to use HTTP,
not HTTPS for the traffic between TMG and
Exchange
SOLUTION:
Disabled SSL for OWA and ECP, did IISRESET,
36
CAS Example 3
ISSUE:
Exchange Server 2010
Hardware Load-balanced CAS array
Outlook 2010
OAB download is not possible
37
CAS Example 3
TROUBLESHOOTING:
Made an attempt to download the OAB
Checked IIS log on the CAS and saw the
following:
CAS Example 3
TROUBLESHOOTING:
Forced Outlook to connect directly to the
EXPLANATION:
Issue with the caching on the load balancer
39
CAS Example 4
ISSUE:
After moving the mailbox from
Exchange 2007 to Exchange 2010,
user cannot connect anymore
He gets the following error:
Cannot open your default e-mail folders, Microsoft Exchange
is not available. Either there are network problems or the
Exchange server is down for maintenance.
40
CAS Example 4
TROUBLESHOOTING:
Looked into the RCA logs and found the
following:
2011-05-10T18:09:17.507Z,41254,1'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
,,OUTLOOK.EXE,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,0x6BA
(rpc::Exception),00:00:00.0156002,SessionDropped,RpcEndPoint: [ServerUnavailableException] Connection
must be re-established -> [SessionDeadException] The primary owner logon has failed. Dropping a
connection.
2011-05-10T18:09:17.881Z,41255,0'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
,,OUTLOOK.EXE,14.0.4760.1000,Classic,NN.NN.110.192,fe80::88c4:5649:7cca:ad19%12,ncacn_ip_tcp,,Conn
ect,0,00:00:00,"SID=XXXXXXXXXXXXXXXXX, Flags=None",
2011-05-10T18:09:17.881Z,41255,1'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
,,OUTLOOK.EXE,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,-2147024809
(rop::InvalidParam),00:00:00,,RopHandler: Logon: [RopExecutionException] Invalid LegacyDN syntax..
Error code = InvalidParam
41
CAS Example 4
EXPLANATION:
Incorrect Legacy Exchange DN due to special
character
SOLUTION:
Changed the Legacy Exchange DN from:
'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
to '/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc-TTTT'
Then we added the X500 address as '/o=XYZ/ou=ABCXXX/cn=ZZZZZZ/cn=VVVVV/cn=abc-TTTT'
42
CAS Example 5
ISSUE:
Exchange 2007/2010 mixed
environment
TMG publishing rules for ActiveSync
Users with a certain type of mobile
phone are complaining that from time
to time their Inbox does a full reload
43
CAS Example 5
TROUBLESHOOTING:
Pushed ActiveSync logging at max on
Exchange 2010 CAS servers
Asked users to call Help Desk as soon
as they hit the issue and mention the
time as accurately as possible
Did that for one day and then got the
logs (Event logs, IIS and HTTPERR logs)
44
CAS Example 5
ANALYSIS:
When the users reported the issue, we have
CAS Example 5
ANALYSIS:
When we match the IDs 1008 with the IIS
CAS Example 5
ANALYSIS:
So, the error is coming from the HTTP layer
Looked at the HTTPERR logs and saw the
following:
Connection_Dropped MSExchangeSyncAppPool
CAS Example 5
SOLUTION:
Involved TMG team since we could not
find anything wrong with the CAS
servers
They noticed that TMG was in RTM
This issue was known and it was solved
in SP2
Installed SP2 for TMG which solved the
issue
48
CAS Example 6
ISSUE:
Exchange Server 2003 BE cluster
hosting approx. 3000 mailboxes
The W3WP.EXE process runs at almost
100% CPU. As a consequence,
ActiveSync synchronization is slow or
does not work at all
49
CAS Example 6
TROUBLESHOOTING:
Asked for the IIS logs from that day
Used the following parser to process logs:
http://
blogs.technet.com/b/exchange/archive/2012/01/31/a-sc
ript-to-troubleshoot-issues-with-exchange-activesync
.aspx
Lised only hits bigger than 1000 for the day:
.\ActiveSyncReport.ps1 -IISLog "C:\Logs" -LogparserExec
C:\Program Files (x86)\Log Parser 2.2\LogParser.exe
-ActiveSyncOutputFolder C:\EASReports -MinimumHits 1000
50
-Date 05-30-2012
CAS Example 6
TROUBLESHOOTING:
Exchange 2003/2007/2010
Can also be used for reporting, for
example:
.\ActiveSyncReport.ps1 -IISLog
"C:\inetpub\logs\LogFiles\W3SVC1" -LogparserExec C:\Program
Files (x86)\Log Parser 2.2\LogParser.exe
-ActiveSyncOutputFolder C:\EASReports -MinimumHits 1000
52
-SendEmailReport -SMTPRecipient
user@contoso.com
Studio
http://blogs.technet.com/b/exchange/archive/2
012/03/07/introducing-log-parser-studio.aspx
In addition to IIS logs, it can browse RCA
54
Transport
Troubleshooting
56
Transport - Exchange
2007/2010 logs
57
Transport - Exchange
2007/2010
logs
Connectivity log
Logs SMTP connection activity of the outbound message delivery
Protocol log
Disabled by default on all SMTP Send and Receive connectors
Enabled or disabled on a per connector basis Protocol logging
Transport - Tools
Logs (activation if needed) and
analysis
NetMon / Wireshark message
routing, latencies, general
communication
TELNET on port 25 to test SMTP
communication between email servers
Queue Viewer (Get-Queue powershell
commandlets) in case of stuck
62
Transport - Tools
Process Tracking Log (PTL)
Works only with Exchange 2007 and 2010
Allows parsing, monitoring and analyzing
Message Tracking logs
Powerful tool for troubleshooting as well as
for generating statistics for message traffic
63
Transport - Tools
Process Tracking Log (PTL) useful
for:
Message Looping
Message failures, such as in Delivery Status
Notifications (DSNs)
List of top mail senders
List of top mail recipients
Top large message size generators
Queues backing up
Performance issues due 64
to message load
Transport - Tools
Process Tracking Log (PTL)
To parse one file:
cscript ProcessTrackingLog.vbs "C:\Program
Files\Microsoft\Exchange
Server\V14\TransportRoles\Logs\MessageTracking\MSGTRKxx.LO
G" 1 all
logged yesterday:
cscript ProcessTrackingLog.vbs "C:\Program
Files\Microsoft\Exchange
Server\V14\TransportRoles\Logs\MessageTracking" 0 all
yesterday
MTRecipientStatistics.csv
66
Transport Example 1
ISSUE:
After migrating around 1700 users to
Exchange 2010, there are constantly
400-500 messages in the E2003-E2010
routing group queue of the Exchange
2003 bridgehead.
67
Transport Example 1
TROUBLESHOOTING:
Exchange 2003 bridgehead:
Increased logging for Transport\SMTP to
max
Activated SMTP logging
Exchange 2010 HUB:
Connectivity logging was running already
Activated Protocol log for the Default
Receive connector 68
Transport Example 1
ANALYSIS:
Transport Example 1
EXPLANATION:
The MaxAcknowledgementDelay parameter specifies the period the transport
server delays acknowledgement when receiving messages from a host that
doesn't support shadow redundancy. When receiving messages from a host that
doesn't support shadow redundancy, a Microsoft Exchange Server 2010 transport
server delays issuing an acknowledgement until it verifies that the message has
been successfully delivered to all recipients. However, if it takes too long to verify
successful delivery, the transport server times out and issues an
acknowledgement anyway. The default value is 30 seconds.
The TarpitInterval parameter specifies the period of time to delay an SMTP
response to a remote server that Exchange determines may be abusing the
connection. Authenticated connections are never delayed in this manner. The
default value is 5 seconds. To specify a value, enter the value as a time span:
70 and s = seconds. The valid input range
hh:mm:ss, where h = hours, m = minutes,
Transport Example 1
SOLUTION:
On all the HUB servers in the
organization, deactivated TarpitInterval
& MaxAcknowledgementDelay:
Set-ReceiveConnector -Identity Default <HUB> -TarpitInterval 00:00:00
Set-ReceiveConnector -Identity Default <HUB> -MaxAcknowledgementDelay
00:00:00
Transport Example 2
ISSUE:
The Exchange 2010 EDGE server
Transport Example 2
TROUBLESHOOTING:
Activated Protocol logging for the SMTP
send connector
Took a network trace on the Edge
server
Reproduced the issue
Collected the traces
73
Transport Example 2
ANALYSIS:
Used
www.hscripts.com/tools/HDNT/dns-reco
rd.php
to resolve the IP addresses:
Domain
Type
contoso.com.
MX IN
mta.contoso.com.AAAA
mta.contoso.com.A
IN
mta.contoso.com.A
IN
Class
Result
5 mta.contoso.com
IN ****:****:0:aaaa::1:c
**.**.68.125
**.**.68.126
74
Transport Example 2
ANALYSIS:
SMTP log:
Transport Example 2
ANALYSIS and EXPLANATION:
Network trace on the Edge server:
IPv6 address ****:****:0:aaaa::1:c" for
Transport Example 2
SOLUTION:
Prioritize IPv4 communication over IPv6 as
Transport Example 3
ISSUE:
Some users are receiving emails with
78
Transport Example 3
TROUBLESHOOTING:
Asked for a message in PST/MSG format
ANALYSIS:
The message is multipart
Noticed that the only part of the email being
79
Transport Example 3
EXPLANATION:
This can happen if the charset of the
Transport Example 3
FURTHER TROUBLESHOOTING:
Asked for pipeline tracing to be
activated for an external user
Got the result and at that moment it
became clear that the disclaimer was
the cause:
Email body for all message snapshots prior
Transport Example 3
SOLUTION:
Did further investigations on the
disclaimer
Turned out that it was configured
incorrectly, since it was getting applied
for messges coming from external
recipients
82
Transport Example 4
ISSUE:
The inheritance was cut in AD for one
OU
As a result, emails sent to those users
were queuing up in the Unreachable
Queue on the Exchange 2007 HUB
server
83
Transport Example 4
SOLUTION:
Put back the inheritance
New emails are delivered to users
Transport Example 5
ISSUE:
Root cause analysis (Exchange 2003)
Different change operations had been
done over the weekend, overlapping
On Monday, they noticed that the
queue Messages awaiting directory
lookup was growing and growing
Situation got back to normal on
Tuesday
85
Transport Example 5
TO DOs:
Find out as accurately as possible when
the issue started
Same requirement for the end of the
downtime
Explain what happened
86
Transport Example 5
TROUBLESHOOTING:
Started by looking into the event logs
During the downtime, Expert logging
for the Transport components was not
activated so information was not
enough
Asked for the message tracking logs
before, during and after the issue
87
Transport Example 5
ANALYSIS:
As this is Exchange 2003, we have IDs
for the Event Types
ID 1031 is SMTP end outbound
transfer
If we can build statistics to see how
many messages are getting sent per
minute, we can determine when the
issue started/ended
88
Transport Example 5
89
Transport Example 5
90
Transport Example 5
ANALYSIS:
During the downtime, we have
between 20 and 40 messages being
sent per minute
Identified the beginning and the end
easily:
Before it started, we had 300 mails/minute,
Transport Example 5
ANALYSIS:
By adding ID 1024 SMTP submit
message to categorizer to the Excel
filter, we can clearly see that during
the downtime it takes between 30
minutes and 2 hours to categorize a
message
On the DCs there was no sign of
performance problems
92
Transport Example 5
EXPLANATION:
The trigger was a proxy change on
Saturday
Following this change, the Antivirus
could no longer contact his update
server for updates
In addition to that, the engine version
of the AV was very old and no longer
supported
93
Mailbox
Troubleshooting
Mailbox
Probably the most sensitive part since
Mailbox Dismounted DB
Check the database status:
ESEUTIL /MH
Check log status:
ESEUTIL /ML
Mailbox Dismounted DB
Replay the logs into the database (soft recovery):
ESEUTIL /R
Mailbox Dismounted DB
What NOT to do if a database is down:
ESEUTIL /P
This is the last resort if everything else
failed
Following KB, although old, gives an
excellent explanation of what happens
when doing a hard repair:
259851 Ramifications of running the eseutil /p or
edbutil /d /r command in Exchange
98
Mailbox Log/Database
Growth
1 thing to do: use ExMon (Exchange
st
Mailbox Log/Database
Growth
Before moving forward, here is an ExMon
example:
Mailbox Log/Database
Growth
If ExMon cannot help, next thing to do on the
Mailbox Log/Database
Growth
If no problem message was detected in
command:
Get-LogonStatistics | select-object
102
username,Windows2000account,identity,messagingoperationcount,otheropera
Mailbox Log/Database
Growth
Here is an example of how operation rates can help:
the *** account is flooding the ### account with
mails
UserName : ***
Windows2000Account : <domain>\***
Identity : /o=<Org>/ou=<Admin_Group>/cn=Recipients/cn=*** account causing the
flood
MessagingOperationCount : 1453
mail storm
UserName :
blank
Windows2000Account : <domain>\<HUB>$
machine account of a HUB
Identity : /o=<Org>/ou=<Admin_Group>/cn=Recipients/cn=### reference to the user
account
MessagingOperationCount : 765
103
Mailbox Log/Database
Growth
If the previous troubleshooting steps on the
mailbox server did not help, move on the
HUBs to check for:
Messages queued or in retry:
Get-ExchangeServer | where {$_.IsHubTransportServer -eq "true"} |
Get-Queue | where {$_.Deliverytype -eq MapiDelivery} | SelectObject Identity, NextHopDomain, Status, MessageCount | ft -auto
Mailbox Log/Database
Growth
Still having the issue after all that
troubleshooting?
Dont forget PTL for both Mailbox &
HUB!
Still no solution in radar?
Call support, tell us what you have
already tried so that we dont ask for it
again
105
Mailbox
Checking and repairing database integrity
ISINTEG for Exchange 2003 & 2007
Repair requests replaced ISINTEG in Exchange
2010
New-MailboxRepairRequest for mailbox DBs
You can run this command against a specific mailbox or against a database
While this task is running, mailbox access is disrupted only for the mailbox being
repaired
If you're running this command against a database, only the mailbox being
repaired is disrupted - all other mailboxes on the database remain operational
request is running
106
However, access isn't available to the public folder currently being repaired.
Mailbox
107
Mailbox
In Exchange 2010, we have the
transaction logs
109
110
ID:
518
Level:Error
Source:
ESE
Message: eseutil (8052) JetDBUtilities - 11416: The log
file \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy97\REP365\E030002C510.log is missing (error
-528) and cannot be used. If this log file is required for recovery, a good copy of the log file will be needed
for recovery to complete successfully.
111
Q&A
ExMon
Tool for gathering statistics about how the Outlook
http://www.microsoft.com/downloads/details.aspx?familyid=9A49C22E-E0C7-4B7C-ACEF-729D48AF7BC9&dis
playlang=en
114
ExMon
What ExMon can do:
For each user, show the consumption on the server for: CPU,
disk, network traffic
Measure server-side CPU latency
Give the following information about the clients:
user
name, version, IP address
Process data sent to the Exchange server by MAPI clients and
exposes the user experience by showing the actual RPC
(network & server) latency
Give some data about ActiveSync traffic
What ExMon cannot do:
Measure SMTP, OWA, DAV, POP/IMAP traffic
Give information about Spam
115
ExMon
Start the installation of the Exchange
116
ExMon
Once installed, ExMon can be used either live to see the real-
ExMon
Error when running ExMon: Unknown StartTrace error (183)
Code 183 translates to: ERROR_ALREADY_EXISTS
This happens when ExMon crashes or gets killed when
collecting data, but the Exchange trace continues to run (there
is a 512 MB limit, when reached it stops)
After a crash, ExMon wants to start a new trace session, but
cannot do so because the Exchange Event Trace provider
only supports one instance at a time
Solution:
logman query -ets to see if Exchange Event Trace runs
logman stop Exchange Event Trace -ets to stop it
When using tracelog.exe on Windows Server 2003:
118
tracelog -stop Exchange Event Trace
2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered
trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this
presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.