Exchange Troubleshooting and Reporting Basics1

November 2nd 2012
Chalk & Talk Online
Exchange Troubleshooting and

Rporting basics (tools and howto)
Mihai Bobu
Support Engineer
Microsoft France
Content
Objective:
Know tools to troubleshoot and provide
reporting and how to use them with
Exchange
What you will find:
General presentation
When to use what
Tools and techniques for different kind of

issues
Practical examples
Q&A
To begin with
General Presentation
Scope: Exchange 2003, 2007 & 2010
Structure: CAS, HUB, MBX
Specific troubleshooting tools and
techniques for each role as well as

common ones that can be used across
Interactive!
4
Exchange Services
1st thing to do when something goes
wrong:
Check if all Exchange services
configured in Automatic mode are in
service
Very simple task which can save a lot
of troubleshooting time on occasions
5
Exchange Services
Exchange 2003:
Exchange Services
Exchange 2007:
Exchange Services
Exchange 2010:
Exchange Services
If there is an Automatic Exchange
service that is stopped, try to restart it

Restart OK? Check if the issue is solved
Restart KO? Make note of the error
message and check Application and
System event logs
Many possible reasons why restart
fails:
Local server configuration
9
Exchange Services
Tool that I find most useful in case I
have Exchange services not wanting to

start:
Process Monitor (ProcMon)
http://
technet.microsoft.com/en-us/sysinternals/bb896645.aspx
10
Exchange Services
Example
1
ISSUE:
After patching, Outlook clients could no longer
connect to one specific CAS server
TROUBLESHOOTING:
Ran Test-ServiceHealth and noticed that
the RPC Client Access service was not

running
11
Exchange Services
Example
1
TROUBLESHOOTING:
Tried to start the RCA service but it failed
with:
Nothing more in the Event logs

12
Exchange Services
Example
1
TROUBLESHOOTING:
Ran ProcMon and filtered for Process and Thread
Activity only for the RCA executable in the first

place:
13
Exchange Services
Example
1
TROUBLESHOOTING:
Added File Activity to the view and started looking
for Results that were not Success
Went into \Bin folder
to look for the DLL

The file was no longer
there
14
Exchange Services
Example
SOLUTION: 1
Copy the DLL from another CAS server
(same SP and RU level)

The service could be started afterwards
Turned out that there was a problem
during patching while upgrading that
specific DLL
15
Exchange Services
Example
2
ISSUE:
There was an Exchange service no longer
starting.
For this example, I used the RCA service to illustrate the issue
that I saw although another Exchange service was involved at
that time.
16
Exchange Services
TROUBLESHOOTING:
Example
2
Again, nothing useful in the Event logs

Filtering step by step in ProcMon we hit the
issue:
17
Exchange Services
Example
Security team2
had
done some
reinforcements:
SOLUTION:
Gave back the
appropriate permissions
to Network Service and
started the service
18
CAS
Troubleshooting
CAS
In Exchange 2010, the CAS server is
the entry point for all client

connections, with one exception: PF
referral
In Exchange 2007, MAPI clients would
connect directly to the mailbox server
Another difference between Exchange
2007 and 2010: Directory referral is
now handled by 2010 CAS instead of
20
CAS
CAS servers have gotten more
intelligent and they handle different

tasks (OAB distribution, mailbox moves
etc.)
Still, the first thing that comes into our
mind when we think of CAS is client
connectivity
Plays a decisive role in end user
experience
21
CAS
When discussing client connectivity,
there are 2 main types of connections:

MAPI connections such as Outlook
Web connections such as OWA,
ActiveSync and all the other Web services

which are being handled by the IIS logic
(virtual directories)
Different logs and tests for different
kind of issues
22
CAS logging: RCA

The default settings for this logging are as follows
and located in the file called

Microsoft.Exchange.RpcClientAccess.Service.exe.co
nfig
Protocol logging is enabled by default
Default folder setting: "%ExchangeInstallDir%\Logging\RPC
Client Access\"
Max size in KB that a single log file can grow to before a new one is
generated: "10240"
Max size in KB that the entire directory of logs can grow to before
the oldest log is deleted: 1048576"
Length of time in hours a log will be kept before being deleted:
720"
23
Log type tags to be logged: ConnectDisconnect,
Logon,
CAS logging: RCA

You can change the default values as needed
The RPC Client Access service must be restarted to pick up this
change
The 2 most useful changes are the Default Folder and the Log Tags
Default Folder Setting
You may want to change the location where the log files are stored if you want to move
them to a different drive etc.
Log Tags to be logged

Here you can specify more details to log; your other options are:
Rops (remote operations)
Rops option offers you a top level mention of the operation being performed (Logon,
OpenFolder, Release, GetContentsTable, CreateMessage)
OperationSpecific
This option shows more detail for each ROP (for example, SetProps on CreateMessage
operation)
Throttling
24
CAS logging: RCA

This logging can be very helpful to verify if a user
has connected in the past at all. The administrator

might receive a complaint by a user who says they
are not able to connect at all. With protocol logging
turned on by default, the administrator can check
when they last time was they did successfully
connect to the server or if the user has ever
connected before
Even though this logging is designed to simply keep
track of who is connecting and disconnecting, it can

provide valuable troubleshooting information
25
CAS logging: RCA Example

1
Here is a trace where UserA is creating
a new message and sending the
message to UserB:
2012-04-01T16:07:20.069Z,69,94,/o=Contoso/ou=Exchange
Administrative Group
(FYDIBOHF23SPDLT)/cn=Recipients/cn=UserA,,,192.168.0.20,::1,
ncacn_ip_tcp,,,0,,>CreateMessage;>SetProps;>Release;>Rel
ease;>RemoveAllRecipients;>FlushRecipients;>SetProps;>Sub
mitMessage;>GetPropertiesSpecific;<CreateMessage;<SetProp
s;<Release;<Release;<RemoveAllRecipients;<FlushRecipients;<
26
SetProps;<SubmitMessage;<GetPropertiesSpecific
CAS logging: RCA Example

2
Here is a trace with throttling
information for UserA:
2012-04-01T19:43:55.197Z,2,1,/o=Contoso/ou=Exchange
Administrative Group
(FYDIBOHF23SPDLT)/cn=Recipients/cn=UserA,,OUTLOOK.EXE,14.
0.4760.1000,Cached,,,ncacn_ip_tcp,,,,00:00:00.1399752,"BS=Co
nn:2,HangingConn:0,AD:3000/3000/0%,CAS:123000/122722/
1%,AB:3000/3000/0%,RPC:120000/120000/0%,FC:1000/0,
Policy:DefaultThrottlingPolicy_766c8f13-bb90-4fee-912de1aec48df1ab,Norm",
27
CAS logging: IIS & HTTPERR

IIS logs contain essential and detailed
information logged for Web

connections: OWA, ActiveSync, OAB,
Autodiscover, EWS
When there is an issue with a Web
service, IIS logs will normally contain
an HTTP error code which can help for
furhter troubleshooting
HTTPERR logs are one level below on
28
CAS: NTLM vs Kerberos

Scenario: Both Outlook Anywhere and Outlook TCP
clients are unable to connect. They get prompted

multiple times and still cannot connect. The
connection status window may show Connecting.
Possible answer: By default, Netlogon only allows
2 concurrent API calls for authentication requests. If
your clients are connecting with NTLM it is possible
that a large number of users could cause these
authentication requests to timeout if the DCs are
overloaded.
What to do:
Prioritize Kerberos authentication over NTLM

29
Change the MaxConcurrentAPI parameter on DCs and CAS
CAS Example 1
ISSUE:
Some users are not able to access any
of the Exchange Web Services:
Autodiscover, OWA, availability, OAB
Not all users are having this issue
30
CAS Example 1
TROUBLESHOOTING:
Reproduced the issue with an affected
user tried to access OWA, but got
error 400 Bad request in the web
browser
Looked into the IIS logs of the CAS
servers, but there was no sign of the
request
31
CAS Example 1
TROUBLESHOOTING:
Need to see if the request made by the user
actually gets up to the CAS server

So, got down one level and checked
HTTPERR logs
Error 400 is visible here with a clear indicator
to the issue:
2012-10-02 18:36:25 ##.##.180.216 2866 ##.##.56.21 443
HTTP/1.1 POST /autodiscover/autodiscover.xml 400 RequestLength 32
CAS Example 1
EXPLANATION:
User making the request has a large
Kerberos ticket, as he is member in >200

security groups
RESOLUTION:
Applied the following KBs:
http://support.microsoft.com/kb/2491354
http://support.microsoft.com/kb/2020943
Please note that the regkey

33
CAS Example 2
ISSUE:
Exchange Server 2010 SP1
TMG reverse proxy
OOF does not work in Outlook and OWA
34
CAS Example 2
TROUBLESHOOTING:
Reproduced the issue from the users
workstation
Looked into the IIS log of the CAS:
GET /ecp/ rfr=owa&p=Organize/AutomaticReplies.slab 80 .......

403 4
403.4 translates to SSL required
Asked for a test from OWA performed directly
on the CAS to bypass TMG, we got success:

35
GET /ecp/ rfr=owa&p=Organize/AutomaticReplies.slab 443 .......
CAS Example 2
EXPLANATION:
SSL is activated for both OWA and ECP
AutomaticReplies demand is forwarded to
Port 80 = HTTP
The TMG OWA rule is configured to use HTTP,
not HTTPS for the traffic between TMG and
Exchange
SOLUTION:
Disabled SSL for OWA and ECP, did IISRESET,
OOF is working now
36
CAS Example 3
ISSUE:
Exchange Server 2010
Hardware Load-balanced CAS array
Outlook 2010
OAB download is not possible
37
CAS Example 3
TROUBLESHOOTING:
Made an attempt to download the OAB
Checked IIS log on the CAS and saw the
following:
404.0 translates to Not found
Looked on the CAS server and found another
OAB file for that date: 24/05/2012

38
CAS Example 3
TROUBLESHOOTING:
Forced Outlook to connect directly to the
CAS (CAS IP in the hosts file), bypassing the

load balancer
OAB download was successful
EXPLANATION:
Issue with the caching on the load balancer
39
CAS Example 4
ISSUE:
After moving the mailbox from
Exchange 2007 to Exchange 2010,
user cannot connect anymore
He gets the following error:
Cannot open your default e-mail folders, Microsoft Exchange
is not available. Either there are network problems or the
Exchange server is down for maintenance.
40
CAS Example 4
TROUBLESHOOTING:
Looked into the RCA logs and found the
following:
2011-05-10T18:09:17.507Z,41254,1'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
,,OUTLOOK.EXE,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,0x6BA
(rpc::Exception),00:00:00.0156002,SessionDropped,RpcEndPoint: [ServerUnavailableException] Connection
must be re-established -> [SessionDeadException] The primary owner logon has failed. Dropping a
connection.
,,OUTLOOK.EXE,14.0.4760.1000,Classic,NN.NN.110.192,fe80::88c4:5649:7cca:ad19%12,ncacn_ip_tcp,,Conn
ect,0,00:00:00,"SID=XXXXXXXXXXXXXXXXX, Flags=None",
,,OUTLOOK.EXE,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,-2147024809
(rop::InvalidParam),00:00:00,,RopHandler: Logon: [RopExecutionException] Invalid LegacyDN syntax..
Error code = InvalidParam
41
CAS Example 4
EXPLANATION:
Incorrect Legacy Exchange DN due to special
character
SOLUTION:
Changed the Legacy Exchange DN from:
'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
to '/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc-TTTT'
Then we added the X500 address as '/o=XYZ/ou=ABCXXX/cn=ZZZZZZ/cn=VVVVV/cn=abc-TTTT'
42
CAS Example 5
ISSUE:
Exchange 2007/2010 mixed
environment
TMG publishing rules for ActiveSync
Users with a certain type of mobile
phone are complaining that from time
to time their Inbox does a full reload
43
CAS Example 5
TROUBLESHOOTING:
Pushed ActiveSync logging at max on
Exchange 2010 CAS servers
Asked users to call Help Desk as soon
as they hit the issue and mention the
time as accurately as possible
Did that for one day and then got the
logs (Event logs, IIS and HTTPERR logs)
44
CAS Example 5
ANALYSIS:
When the users reported the issue, we have
Warning 1008 in Application Event log

This log alone does not tell us what happen
it is known to give false positives
ID: 1008
Level: Warning
Source: MSExchange ActiveSync
Message: An exception occurred and was handled by Exchange ActiveSync.
This may have been caused by an outdated or corrupted Exchange ActiveSync
device partnership. This can occur if a user tries to modify the same item from
multiple computers. If this is the case, Exchange ActiveSync will re-create the
partnership with the device. Items will be45
updated at the next synchronization.
CAS Example 5
ANALYSIS:
When we match the IDs 1008 with the IIS
logs, we find the same pattern at the times

reported by users:
2012-09-24 13:59:37 ##.##.215.71 POST /Microsoft-ServerActiveSync/default.eas

User=****&DeviceId=Appl81035YZJA4T&DeviceType=iPhone&Cmd=Sync&Log=V
140_LdapC1_LdapL16_S110_Error:HttpLayerFailure_Mbx:MBX01.contoso.com_
Throttle0_Budget:(A)Conn%3a0%2cHangingConn%3a0%2cAD%3a%24null%2f
%24null%2f1%25%2cCAS%3a%24null%2f%24null%2f1%25%2cAB%3a%24null
%2f%24null%2f0%25%2cRPC%3a%24null%2f%24null%2f1%25%2cFC
%3a1000%2f0%2cPolicy%3aDefaultThrottlingPolicy%5F1b07898a-e904-4fe48a9b-77ec7d9f4e18%2cNorm_ 443 CORP\**** 205.223.229.89 AppleiPhone3C1/902.206 200 0 64 128748
46
CAS Example 5
ANALYSIS:
So, the error is coming from the HTTP layer
Looked at the HTTPERR logs and saw the
following:
Connection_Dropped MSExchangeSyncAppPool
When the iPhone/iPad receives two 500
errors consecutively it resets the connection

and proceeds to re-download the users
entire mailbox
This could come either47 from the CAS servers,
CAS Example 5
SOLUTION:
Involved TMG team since we could not
find anything wrong with the CAS
servers
They noticed that TMG was in RTM
This issue was known and it was solved
in SP2
Installed SP2 for TMG which solved the
issue
48
CAS Example 6
ISSUE:
Exchange Server 2003 BE cluster
hosting approx. 3000 mailboxes
The W3WP.EXE process runs at almost
100% CPU. As a consequence,
ActiveSync synchronization is slow or
does not work at all
49
CAS Example 6
TROUBLESHOOTING:
Asked for the IIS logs from that day
Used the following parser to process logs:
http://
blogs.technet.com/b/exchange/archive/2012/01/31/a-sc
ript-to-troubleshoot-issues-with-exchange-activesync
.aspx
Lised only hits bigger than 1000 for the day:
.\ActiveSyncReport.ps1 -IISLog "C:\Logs" -LogparserExec
C:\Program Files (x86)\Log Parser 2.2\LogParser.exe
-ActiveSyncOutputFolder C:\EASReports -MinimumHits 1000
50
-Date 05-30-2012
CAS Example 6
TROUBLESHOOTING:
SOLUTION: Moved the top talkers to a

more powerful Exchange 2010 platform
51
CAS Troubleshooting Tools

http://blogs.technet.com/b/exchange/archive/2012/01/
31/a-script-to-troubleshoot-issues-with-exchange-act
ivesync.aspx
Very useful for parsing IIS logs on
Exchange 2003/2007/2010
Can also be used for reporting, for
example:
.\ActiveSyncReport.ps1 -IISLog
"C:\inetpub\logs\LogFiles\W3SVC1" -LogparserExec C:\Program
Files (x86)\Log Parser 2.2\LogParser.exe
-ActiveSyncOutputFolder C:\EASReports -MinimumHits 1000
52
-SendEmailReport -SMTPRecipient
user@contoso.com

A more advanced tool is Log Parser
Studio
http://blogs.technet.com/b/exchange/archive/2
012/03/07/introducing-log-parser-studio.aspx
In addition to IIS logs, it can browse RCA
logs, HTTPERR logs, event logs, address book

logs and even registry
There is a library of predefined queries and it
is possible to enrich it 53with own configured
54
Transport
Troubleshooting
Transport Exchange 2003

logs
56
Transport - Exchange
2007/2010 logs
57
Transport - Exchange
2007/2010
logs
Connectivity log
Logs SMTP connection activity of the outbound message delivery
queues to the destination MBX server, smart host, or domain

Disabled by default
Protocol log
Disabled by default on all SMTP Send and Receive connectors
Enabled or disabled on a per connector basis Protocol logging
level must be configured to Verbose in order to record

Circular logging is used to limit file size
There is a special, invisible, intra-organization Send connector (used
for HUB HUB, HUB EDGE, HUB Exchange 2003 relay) for
which we can enable protocol logging only via PowerShell:
Set-TransportServer server -IntraOrgProtocolLoggingLevel Verbose
58
Transport - Exchange logs

Message tracking logs
Record detailed activity as the message
moves within Exchange
Enabled by default
On Exchange 2003 there is less information
and the Event ID is not in text format
On Exchange 2007/2010, they exist on
HUB, EDGE (MSGTRKyyyymmdd-n.log) and
MBX (MSGTRKMyyyymmdd-n.log) servers
59

Message tracking logs
Search only in graphical mode in Exchange
2003
In EMC 2007/2010, search only works for
the HUB/MBX on which it is being launched
PowerShell command to search in logs from
all HUB and MBX servers:
Get-ExchangeServer | where {$_.isHubTransportServer -eq $true
-or $_.isMailboxServer -eq $true} | Get-MessageTrackingLog
-MessageId id
60

Pipeline Tracing (Exchange 2007/2010)
Disabled by default
Verbose logging not recommended over long periods of time
Useful for tracking down issues with transport agents and rules as the
message moves through the transport pipeline
Takes message snapshots to capture message changes as a result of
transport agents and rules being applied
By default, the pipeline tracing log directory is
<Exchange>\Server\TransportRoles\Logs\PipelineTracing
Can be activated and managed only via EMS
Set-TransportServer <server> -PipelineTracingEnabled $True
Designed to log messages that are sent only from a specific SMTP address
(mailbox inside or outside your Exchange organization)
Set-TransportServer <server> -PipelineTracingSenderAddress x@y.com
Setting the PipelineTracingSenderAddress parameter to <> captures all
email server-generated messages that
61 are received by the HUB or EDGE
being configured huge logfiles!!!
Transport - Tools
Logs (activation if needed) and
analysis
NetMon / Wireshark message
routing, latencies, general
communication
TELNET on port 25 to test SMTP
communication between email servers
Queue Viewer (Get-Queue powershell
commandlets) in case of stuck
62
Transport - Tools
Process Tracking Log (PTL)
Works only with Exchange 2007 and 2010
Allows parsing, monitoring and analyzing
Message Tracking logs
Powerful tool for troubleshooting as well as
for generating statistics for message traffic
63
Transport - Tools
Process Tracking Log (PTL) useful
for:
Message Looping
Message failures, such as in Delivery Status
Notifications (DSNs)
List of top mail senders
List of top mail recipients
Top large message size generators
Queues backing up
Performance issues due 64
to message load
Transport - Tools
Process Tracking Log (PTL)
To parse one file:
cscript ProcessTrackingLog.vbs "C:\Program
Files\Microsoft\Exchange
Server\V14\TransportRoles\Logs\MessageTracking\MSGTRKxx.LO
G" 1 all
To parse all files in each subdirectory that were
logged yesterday:
cscript ProcessTrackingLog.vbs "C:\Program
Files\Microsoft\Exchange
Server\V14\TransportRoles\Logs\MessageTracking" 0 all
yesterday
Read more about it65 @

http://blogs.technet.com/b/exchange/archive/2011/10/21/updated-process-tracking-log-ptl-tool-for-use-with-exchange-2007-and-exchange-2010.a
Transport PTL examples

MTDsnFailureResults.csv
MTRecipientStatistics.csv
66
Transport Example 1
ISSUE:
After migrating around 1700 users to
Exchange 2010, there are constantly
400-500 messages in the E2003-E2010
routing group queue of the Exchange
2003 bridgehead.
67
Transport Example 1
TROUBLESHOOTING:
Exchange 2003 bridgehead:
Increased logging for Transport\SMTP to
max
Activated SMTP logging
Exchange 2010 HUB:
Connectivity logging was running already
Activated Protocol log for the Default
Receive connector 68
Transport Example 1
ANALYSIS:
Get-ReceiveConnector -server <HUB> | fl

name,maxack*,tarpit*
Name:Default****E0024
MaxAcknowledgementDelay:00:00:30
69
TarpitInterval:00:00:05
Transport Example 1
EXPLANATION:
The MaxAcknowledgementDelay parameter specifies the period the transport
server delays acknowledgement when receiving messages from a host that
doesn't support shadow redundancy. When receiving messages from a host that
doesn't support shadow redundancy, a Microsoft Exchange Server 2010 transport
server delays issuing an acknowledgement until it verifies that the message has
been successfully delivered to all recipients. However, if it takes too long to verify
successful delivery, the transport server times out and issues an
acknowledgement anyway. The default value is 30 seconds.
The TarpitInterval parameter specifies the period of time to delay an SMTP
response to a remote server that Exchange determines may be abusing the
connection. Authenticated connections are never delayed in this manner. The
default value is 5 seconds. To specify a value, enter the value as a time span:
70 and s = seconds. The valid input range
hh:mm:ss, where h = hours, m = minutes,
Transport Example 1
SOLUTION:
On all the HUB servers in the
organization, deactivated TarpitInterval
& MaxAcknowledgementDelay:
Set-ReceiveConnector -Identity Default <HUB> -TarpitInterval 00:00:00
Set-ReceiveConnector -Identity Default <HUB> -MaxAcknowledgementDelay
00:00:00
Microsoft Exchange Transport

service needs top be restarted to
71
Transport Example 2
ISSUE:
The Exchange 2010 EDGE server
cannot send emails to a certain domain

contoso.com.
It works for some time, but then it
breaks and we need to restart
Transport to make it work again.
Actually, it looks like a few other
domains are having this issue, this is
72
Transport Example 2
TROUBLESHOOTING:
Activated Protocol logging for the SMTP
send connector
Took a network trace on the Edge
server
Reproduced the issue
Collected the traces
73
Transport Example 2
ANALYSIS:
Used
www.hscripts.com/tools/HDNT/dns-reco
rd.php
to resolve the IP addresses:
Domain
Type
contoso.com.
MX IN
mta.contoso.com.AAAA
mta.contoso.com.A
IN
mta.contoso.com.A
IN
Class
Result
5 mta.contoso.com
IN ****:****:0:aaaa::1:c
**.**.68.125
**.**.68.126
74
Transport Example 2
ANALYSIS:
SMTP log:
By looking at the log, we can only say
that we fail when we contact

contoso.com in IPv6 and that the
request does not go all the way in IPv4.
75
Transport Example 2
ANALYSIS and EXPLANATION:
Network trace on the Edge server:
IPv6 address ****:****:0:aaaa::1:c" for
"mta.contoso.com" appears in the trace

The only thing we have for it is SynReTransmit,
contoso.com does not respond
IPv4 addresses **.**.68.125 and **.**.68.126 for
"mta.contoso.com" do not appear in the trace
whatsoever
The attempts made in IPv4
never leave the Edge
76
Transport Example 2
SOLUTION:
Prioritize IPv4 communication over IPv6 as
described in the following KB article:
929852 How to disable IP version 6 (IPv6) or its specific components in

Windows 7, in
Windows Vista, in Windows Server 2008 R2, and in Windows
Server 2008
http://support.microsoft.com/default.aspx?scid=kb;EN-US;929852
Attention: we do not deactivate IPv6
(possible althoufg), as it is required by

Windows Core components
As a result, we never switch to IPv6 when
77
talking to domains publishing
IPv6
Transport Example 3
ISSUE:
Some users are receiving emails with
the body in non-European characters

These emails are coming from outside
the Exchange organization
78
Transport Example 3
TROUBLESHOOTING:
Asked for a message in PST/MSG format
ANALYSIS:
The message is multipart
Noticed that the only part of the email being
displayed in European characters was the

disclaimer:
79
Transport Example 3
EXPLANATION:
This can happen if the charset of the
message body is different from the charset

of the disclaimer
Exchange cannot include both and needs to
select one (all Exchange versions have the
same behavior: http://
support.microsoft.com/kb/916299)
Bad luck: Exchange chooses the charset of
the disclaimer (utf-8), displaying it correctly
80
but garbling the message
body by doing so
Transport Example 3
FURTHER TROUBLESHOOTING:
Asked for pipeline tracing to be
activated for an external user
Got the result and at that moment it
became clear that the disclaimer was
the cause:
Email body for all message snapshots prior
to the disclaimer being applied is displayed

correctly
81
Transport Example 3
SOLUTION:
Did further investigations on the
disclaimer
Turned out that it was configured
incorrectly, since it was getting applied
for messges coming from external
recipients
82
Transport Example 4
ISSUE:
The inheritance was cut in AD for one
OU
As a result, emails sent to those users
were queuing up in the Unreachable
Queue on the Exchange 2007 HUB
server
83
Transport Example 4
SOLUTION:
Put back the inheritance
New emails are delivered to users
The Unreachable queue is resubmitted
automatically only if the old routing

tables are not the same as the old ones
Resubmit the messages still blocked in
the Unreachable queue to the
categorizer:
84
Transport Example 5
ISSUE:
Root cause analysis (Exchange 2003)
Different change operations had been
done over the weekend, overlapping
On Monday, they noticed that the
queue Messages awaiting directory
lookup was growing and growing
Situation got back to normal on
Tuesday
85
Transport Example 5
TO DOs:
Find out as accurately as possible when
the issue started
Same requirement for the end of the
downtime
Explain what happened
86
Transport Example 5
TROUBLESHOOTING:
Started by looking into the event logs
During the downtime, Expert logging
for the Transport components was not
activated so information was not
enough
Asked for the message tracking logs
before, during and after the issue
87
Transport Example 5
ANALYSIS:
As this is Exchange 2003, we have IDs
for the Event Types
ID 1031 is SMTP end outbound
transfer
If we can build statistics to see how
many messages are getting sent per
minute, we can determine when the
issue started/ended
88
Transport Example 5
89
Transport Example 5
90
Transport Example 5
ANALYSIS:
During the downtime, we have
between 20 and 40 messages being
sent per minute
Identified the beginning and the end
easily:
Before it started, we had 300 mails/minute,
then 100, then 20-40

When it stopped, we 91start having 150, then
Transport Example 5
ANALYSIS:
By adding ID 1024 SMTP submit
message to categorizer to the Excel
filter, we can clearly see that during
the downtime it takes between 30
minutes and 2 hours to categorize a
message
On the DCs there was no sign of
performance problems
92
Transport Example 5
EXPLANATION:
The trigger was a proxy change on
Saturday
Following this change, the Antivirus
could no longer contact his update
server for updates
In addition to that, the engine version
of the AV was very old and no longer
supported
93
Mailbox
Troubleshooting
Mailbox
Probably the most sensitive part since
this is where the databases reside

Loosing data (mails, appointments) is
probably the administrators worst
nightmare
2 main issues for this section:
Dismounted database
Log/Database growth issues
95
Mailbox Dismounted DB
Check the database status:
ESEUTIL /MH
Check log status:
ESEUTIL /ML
Replay the logs into the database (soft recovery):
ESEUTIL /R
What NOT to do if a database is down:
ESEUTIL /P
This is the last resort if everything else
failed
Following KB, although old, gives an
excellent explanation of what happens
when doing a hard repair:
259851 Ramifications of running the eseutil /p or
edbutil /d /r command in Exchange
98
Mailbox Log/Database
Growth
1 thing to do: use ExMon (Exchange
st
User Monitor works with Exchange

2003, 2007 & 2010) on the mailbox
server with the issue to see if an user
is causing the growth
Sort on CPU (%) column and see if the
same user(s) is always on top

Sort by Bytes Out and check for top talkers
If the top talker is ? (a question mark), then
99
Growth
Before moving forward, here is an ExMon
example:
The user who causes the issue on the server
(transaction log growth) is the one consuming the

most CPU. It is the first user listed (Jean).
The column client version will tell which version of
MAPI client it is. For example, a BlackBerry is a 6.x
100
version, whereas 11.x is for
Outlook 2003.
Growth
If ExMon cannot help, next thing to do on the
mailbox server is to check for large, stranded

or looping message(s) in users Outbox
In Exchange 2003, use a network sniffer (NetMon,
Wireshark) to identify the client

In Exchange 2007/2010, for users running in online
mode:
Get-Mailbox -ResultSize Unlimited | Get-MailboxFolderStatistics -folderscope
Outbox | Sort-Object Foldersize -Descending | select-object
identity,name,foldertype,itemsinfolder,@{Name="FolderSize
MB";expression={$_.folderSize.toMB()}} | export-csv OutboxItems.csv
101
Growth
If no problem message was detected in
Outbox, check the operation rates for users:

Exchange 2003: add Ops columns to Mailbox Store\ Logons
Exchange 2007/2010: run the following PowerShell
command:
Get-LogonStatistics | select-object
102
username,Windows2000account,identity,messagingoperationcount,otheropera
Growth
Here is an example of how operation rates can help:
the *** account is flooding the ### account with
mails
UserName : ***
Windows2000Account : <domain>\***
Identity : /o=<Org>/ou=<Admin_Group>/cn=Recipients/cn=*** account causing the
flood
MessagingOperationCount : 1453
mail storm
UserName :
blank
Windows2000Account : <domain>\<HUB>$
machine account of a HUB
Identity : /o=<Org>/ou=<Admin_Group>/cn=Recipients/cn=### reference to the user
account
MessagingOperationCount : 765
103
Growth
If the previous troubleshooting steps on the
mailbox server did not help, move on the
HUBs to check for:
Messages queued or in retry:
Get-ExchangeServer | where {$_.IsHubTransportServer -eq "true"} |
Get-Queue | where {$_.Deliverytype -eq MapiDelivery} | SelectObject Identity, NextHopDomain, Status, MessageCount | ft -auto
Large queued messages:

Get-ExchangeServer | where {$_.IsHubTransportServer -eq "true"} |
Get-Message -resultsize unlimited | Select-Object
Identity,Subject,status,LastError,RetryCount,queue,size | sort-object
-property size -descending | ft -auto
104
Growth
Still having the issue after all that
troubleshooting?
Dont forget PTL for both Mailbox &
HUB!
Still no solution in radar?
Call support, tell us what you have
already tried so that we dont ask for it
again
105
Mailbox
Checking and repairing database integrity
ISINTEG for Exchange 2003 & 2007
Repair requests replaced ISINTEG in Exchange
2010
New-MailboxRepairRequest for mailbox DBs
You can run this command against a specific mailbox or against a database
While this task is running, mailbox access is disrupted only for the mailbox being
repaired
If you're running this command against a database, only the mailbox being
repaired is disrupted - all other mailboxes on the database remain operational
New-PublicFolderDatabaseRepairRequest for PF DBs

Use this cmdlet to detect and fix replication issues in the public folder database
Public folders on the public folder database can still be accessed while the
request is running
106
However, access isn't available to the public folder currently being repaired.
Mailbox
107
Mailbox
In Exchange 2010, we have the
possibility to configure the Calendar

Repair Assistant
Not set to run by default, needs to be
activated with the cmdlet SetMailboxServer
It detects and corrects inconsistencies
with single and recurring meeting
items for mailboxes located on that
108
Mailbox One simple

example
ISSUE:
Exchange Server 2010
Backup software cannot purge
transaction logs
109
Mailbox One simple

TROUBLESHOOTING:
example
110
Mailbox One simple

example
TROUBLESHOOTING:
ID:
519
Level:Error
Source:
ESE
Message: eseutil (11900) JetDBUtilities - 9384: The range of log
files \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy48\INFO365b\E020000435B.log
to \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy48\INFO365b\E0200004401.log is missing (error
-528) and cannot be used. If these log files are required for recovery, a good copy of these log files will be
needed for recovery to complete successfully.
ID:
518
Level:Error
Source:
ESE
Message: eseutil (8052) JetDBUtilities - 11416: The log
file \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy97\REP365\E030002C510.log is missing (error
-528) and cannot be used. If this log file is required for recovery, a good copy of the log file will be needed
for recovery to complete successfully.
111
Mailbox One simple

example
SOLUTION:
We managed to find missing logs in
older file-level backups

Restored the missing transaction logs
into the Exchange folder
Backup was able to purge the logs
afterwards
112
Q&A
ExMon
Tool for gathering statistics about how the Outlook
clients are making use of the Exchange Server

resources such as CPU and network bandwidth
It was designed and used to help understand the
way MAPI applications such as Outlook were

affecting the performance on Exchange Server 2003
The latest version of the tool now works also with
Exchange 2007 and 2010:
http://www.microsoft.com/downloads/details.aspx?familyid=9A49C22E-E0C7-4B7C-ACEF-729D48AF7BC9&dis
playlang=en
114
ExMon
What ExMon can do:
For each user, show the consumption on the server for: CPU,
disk, network traffic
Measure server-side CPU latency
Give the following information about the clients:
user
name, version, IP address
Process data sent to the Exchange server by MAPI clients and
exposes the user experience by showing the actual RPC
(network & server) latency
Give some data about ActiveSync traffic
What ExMon cannot do:
Measure SMTP, OWA, DAV, POP/IMAP traffic
Give information about Spam
115
ExMon
Start the installation of the Exchange
Server User Monitor

finish the installation
In the ExMon installation folder, double
click ExMon.reg, which will add the
following registry keys:
116
ExMon
Once installed, ExMon can be used either live to see the real-
time consumption on the server, or in tracing mode, to get

statistics
Live mode is going to be used mostly with small update
intervals, the minimum being 1 minute
ExMon GUI to be used for this type of monitoring
The maximum snapshotting interval is 30 minutes, which is
useful for doing captures over longer periods of time

Traces > 30 minutes cannot be created from GUI
System Monitor or tracelog.exe to be used with
Windows Server 2003
logman.exe for Windows Server 2008 & R2
117
ExMon
Error when running ExMon: Unknown StartTrace error (183)
Code 183 translates to: ERROR_ALREADY_EXISTS
This happens when ExMon crashes or gets killed when
collecting data, but the Exchange trace continues to run (there
is a 512 MB limit, when reached it stops)
After a crash, ExMon wants to start a new trace session, but
cannot do so because the Exchange Event Trace provider
only supports one instance at a time
Solution:
logman query -ets to see if Exchange Event Trace runs
logman stop Exchange Event Trace -ets to stop it
When using tracelog.exe on Windows Server 2003:
118
tracelog -stop Exchange Event Trace
2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered
trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this
presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Exchange Troubleshooting and Reporting Basics1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exchange Troubleshooting and Reporting Basics1

Uploaded by

Copyright:

Available Formats

November 2nd 2012

Chalk & Talk Online

Exchange Troubleshooting and

When to use what

Tools and techniques for different kind of

techniques for each role as well as

service that is stopped, try to restart it

have Exchange services not wanting to

the RPC Client Access service was not

Nothing more in the Event logs

Activity only for the RCA executable in the first

for Results that were not Success

Went into \Bin folder

to look for the DLL

(same SP and RU level)

Again, nothing useful in the Event logs

the entry point for all client

intelligent and they handle different

there are 2 main types of connections:

ActiveSync and all the other Web services

Different logs and tests for different

CAS logging: RCA

and located in the file called

CAS logging: RCA

them to a different drive etc.

Log Tags to be logged

OpenFolder, Release, GetContentsTable, CreateMessage)

CAS logging: RCA

has connected in the past at all. The administrator

Even though this logging is designed to simply keep

track of who is connecting and disconnecting, it can

CAS logging: RCA Example

CAS logging: RCA Example

CAS logging: IIS & HTTPERR

information logged for Web

CAS: NTLM vs Kerberos

clients are unable to connect. They get prompted

Prioritize Kerberos authentication over NTLM

actually gets up to the CAS server

Kerberos ticket, as he is member in >200

Please note that the regkey

GET /ecp/ rfr=owa&p=Organize/AutomaticReplies.slab 80 .......

403.4 translates to SSL required

Asked for a test from OWA performed directly

on the CAS to bypass TMG, we got success:

GET /ecp/ rfr=owa&p=Organize/AutomaticReplies.slab 443 .......

OOF is working now

404.0 translates to Not found

Looked on the CAS server and found another

OAB file for that date: 24/05/2012

CAS (CAS IP in the hosts file), bypassing the

Warning 1008 in Application Event log

logs, we find the same pattern at the times

2012-09-24 13:59:37 ##.##.215.71 POST /Microsoft-ServerActiveSync/default.eas

When the iPhone/iPad receives two 500

errors consecutively it resets the connection

SOLUTION: Moved the top talkers to a

CAS Troubleshooting Tools

Very useful for parsing IIS logs on

CAS Troubleshooting Tools

logs, HTTPERR logs, event logs, address book

CAS Troubleshooting Tools

Transport Exchange 2003

queues to the destination MBX server, smart host, or domain