You are on page 1of 119

November 2nd 2012

Chalk & Talk Online

Exchange Troubleshooting and


Rporting basics (tools and howto)
Mihai Bobu
Support Engineer
Microsoft France

Content
Objective:
Know tools to troubleshoot and provide
reporting and how to use them with
Exchange
What you will find:

General presentation

When to use what

Tools and techniques for different kind of


issues

Practical examples

Q&A

To begin with

General Presentation
Scope: Exchange 2003, 2007 & 2010
Structure: CAS, HUB, MBX
Specific troubleshooting tools and

techniques for each role as well as


common ones that can be used across

Interactive!
4

Exchange Services
1st thing to do when something goes

wrong:
Check if all Exchange services
configured in Automatic mode are in
service
Very simple task which can save a lot
of troubleshooting time on occasions
5

Exchange Services
Exchange 2003:

Exchange Services
Exchange 2007:

Exchange Services
Exchange 2010:

Exchange Services
If there is an Automatic Exchange

service that is stopped, try to restart it


Restart OK? Check if the issue is solved
Restart KO? Make note of the error
message and check Application and
System event logs
Many possible reasons why restart
fails:
Local server configuration
9

Exchange Services
Tool that I find most useful in case I

have Exchange services not wanting to


start:
Process Monitor (ProcMon)

http://
technet.microsoft.com/en-us/sysinternals/bb896645.aspx

10

Exchange Services
Example
1
ISSUE:
After patching, Outlook clients could no longer
connect to one specific CAS server

TROUBLESHOOTING:
Ran Test-ServiceHealth and noticed that

the RPC Client Access service was not


running
11

Exchange Services
Example
1
TROUBLESHOOTING:
Tried to start the RCA service but it failed

with:

Nothing more in the Event logs


12

Exchange Services
Example
1
TROUBLESHOOTING:
Ran ProcMon and filtered for Process and Thread

Activity only for the RCA executable in the first


place:

13

Exchange Services
Example
1
TROUBLESHOOTING:
Added File Activity to the view and started looking

for Results that were not Success

Went into \Bin folder

to look for the DLL


The file was no longer

there
14

Exchange Services
Example
SOLUTION: 1
Copy the DLL from another CAS server

(same SP and RU level)


The service could be started afterwards
Turned out that there was a problem
during patching while upgrading that
specific DLL
15

Exchange Services
Example
2
ISSUE:
There was an Exchange service no longer
starting.
For this example, I used the RCA service to illustrate the issue
that I saw although another Exchange service was involved at
that time.

16

Exchange Services
TROUBLESHOOTING:
Example
2

Again, nothing useful in the Event logs


Filtering step by step in ProcMon we hit the
issue:

17

Exchange Services
Example
Security team2
had
done some
reinforcements:

SOLUTION:
Gave back the

appropriate permissions
to Network Service and
started the service
18

CAS
Troubleshooting

CAS
In Exchange 2010, the CAS server is

the entry point for all client


connections, with one exception: PF
referral
In Exchange 2007, MAPI clients would
connect directly to the mailbox server
Another difference between Exchange
2007 and 2010: Directory referral is
now handled by 2010 CAS instead of
20

CAS
CAS servers have gotten more

intelligent and they handle different


tasks (OAB distribution, mailbox moves
etc.)
Still, the first thing that comes into our
mind when we think of CAS is client
connectivity
Plays a decisive role in end user
experience
21

CAS
When discussing client connectivity,

there are 2 main types of connections:


MAPI connections such as Outlook
Web connections such as OWA,

ActiveSync and all the other Web services


which are being handled by the IIS logic
(virtual directories)

Different logs and tests for different

kind of issues

22

CAS logging: RCA


The default settings for this logging are as follows

and located in the file called


Microsoft.Exchange.RpcClientAccess.Service.exe.co
nfig
Protocol logging is enabled by default
Default folder setting: "%ExchangeInstallDir%\Logging\RPC

Client Access\"
Max size in KB that a single log file can grow to before a new one is
generated: "10240"
Max size in KB that the entire directory of logs can grow to before
the oldest log is deleted: 1048576"
Length of time in hours a log will be kept before being deleted:
720"
23
Log type tags to be logged: ConnectDisconnect,
Logon,

CAS logging: RCA


You can change the default values as needed
The RPC Client Access service must be restarted to pick up this

change
The 2 most useful changes are the Default Folder and the Log Tags
Default Folder Setting
You may want to change the location where the log files are stored if you want to move

them to a different drive etc.

Log Tags to be logged


Here you can specify more details to log; your other options are:
Rops (remote operations)
Rops option offers you a top level mention of the operation being performed (Logon,

OpenFolder, Release, GetContentsTable, CreateMessage)

OperationSpecific
This option shows more detail for each ROP (for example, SetProps on CreateMessage

operation)

Throttling

24

CAS logging: RCA


This logging can be very helpful to verify if a user

has connected in the past at all. The administrator


might receive a complaint by a user who says they
are not able to connect at all. With protocol logging
turned on by default, the administrator can check
when they last time was they did successfully
connect to the server or if the user has ever
connected before

Even though this logging is designed to simply keep

track of who is connecting and disconnecting, it can


provide valuable troubleshooting information
25

CAS logging: RCA Example


1
Here is a trace where UserA is creating
a new message and sending the
message to UserB:
2012-04-01T16:07:20.069Z,69,94,/o=Contoso/ou=Exchange
Administrative Group
(FYDIBOHF23SPDLT)/cn=Recipients/cn=UserA,,,192.168.0.20,::1,
ncacn_ip_tcp,,,0,,>CreateMessage;>SetProps;>Release;>Rel
ease;>RemoveAllRecipients;>FlushRecipients;>SetProps;>Sub
mitMessage;>GetPropertiesSpecific;<CreateMessage;<SetProp
s;<Release;<Release;<RemoveAllRecipients;<FlushRecipients;<
26
SetProps;<SubmitMessage;<GetPropertiesSpecific

CAS logging: RCA Example


2
Here is a trace with throttling
information for UserA:

2012-04-01T19:43:55.197Z,2,1,/o=Contoso/ou=Exchange
Administrative Group
(FYDIBOHF23SPDLT)/cn=Recipients/cn=UserA,,OUTLOOK.EXE,14.
0.4760.1000,Cached,,,ncacn_ip_tcp,,,,00:00:00.1399752,"BS=Co
nn:2,HangingConn:0,AD:3000/3000/0%,CAS:123000/122722/
1%,AB:3000/3000/0%,RPC:120000/120000/0%,FC:1000/0,
Policy:DefaultThrottlingPolicy_766c8f13-bb90-4fee-912de1aec48df1ab,Norm",
27

CAS logging: IIS & HTTPERR


IIS logs contain essential and detailed

information logged for Web


connections: OWA, ActiveSync, OAB,
Autodiscover, EWS
When there is an issue with a Web
service, IIS logs will normally contain
an HTTP error code which can help for
furhter troubleshooting
HTTPERR logs are one level below on
28

CAS: NTLM vs Kerberos


Scenario: Both Outlook Anywhere and Outlook TCP

clients are unable to connect. They get prompted


multiple times and still cannot connect. The
connection status window may show Connecting.
Possible answer: By default, Netlogon only allows
2 concurrent API calls for authentication requests. If
your clients are connecting with NTLM it is possible
that a large number of users could cause these
authentication requests to timeout if the DCs are
overloaded.
What to do:

Prioritize Kerberos authentication over NTLM


29
Change the MaxConcurrentAPI parameter on DCs and CAS

CAS Example 1
ISSUE:
Some users are not able to access any
of the Exchange Web Services:
Autodiscover, OWA, availability, OAB
Not all users are having this issue

30

CAS Example 1
TROUBLESHOOTING:
Reproduced the issue with an affected
user tried to access OWA, but got
error 400 Bad request in the web
browser
Looked into the IIS logs of the CAS
servers, but there was no sign of the
request
31

CAS Example 1
TROUBLESHOOTING:
Need to see if the request made by the user

actually gets up to the CAS server


So, got down one level and checked
HTTPERR logs
Error 400 is visible here with a clear indicator
to the issue:
2012-10-02 18:36:25 ##.##.180.216 2866 ##.##.56.21 443
HTTP/1.1 POST /autodiscover/autodiscover.xml 400 RequestLength 32

CAS Example 1
EXPLANATION:
User making the request has a large

Kerberos ticket, as he is member in >200


security groups

RESOLUTION:
Applied the following KBs:
http://support.microsoft.com/kb/2491354
http://support.microsoft.com/kb/2020943

Please note that the regkey


33

CAS Example 2
ISSUE:
Exchange Server 2010 SP1
TMG reverse proxy
OOF does not work in Outlook and OWA

34

CAS Example 2
TROUBLESHOOTING:
Reproduced the issue from the users

workstation
Looked into the IIS log of the CAS:

GET /ecp/ rfr=owa&p=Organize/AutomaticReplies.slab 80 .......


403 4

403.4 translates to SSL required

Asked for a test from OWA performed directly

on the CAS to bypass TMG, we got success:


35

GET /ecp/ rfr=owa&p=Organize/AutomaticReplies.slab 443 .......

CAS Example 2
EXPLANATION:
SSL is activated for both OWA and ECP
AutomaticReplies demand is forwarded to

Port 80 = HTTP
The TMG OWA rule is configured to use HTTP,
not HTTPS for the traffic between TMG and
Exchange

SOLUTION:
Disabled SSL for OWA and ECP, did IISRESET,

OOF is working now

36

CAS Example 3
ISSUE:
Exchange Server 2010
Hardware Load-balanced CAS array
Outlook 2010
OAB download is not possible

37

CAS Example 3
TROUBLESHOOTING:
Made an attempt to download the OAB
Checked IIS log on the CAS and saw the

following:

404.0 translates to Not found

Looked on the CAS server and found another

OAB file for that date: 24/05/2012


38

CAS Example 3
TROUBLESHOOTING:
Forced Outlook to connect directly to the

CAS (CAS IP in the hosts file), bypassing the


load balancer
OAB download was successful

EXPLANATION:
Issue with the caching on the load balancer
39

CAS Example 4
ISSUE:
After moving the mailbox from
Exchange 2007 to Exchange 2010,
user cannot connect anymore
He gets the following error:
Cannot open your default e-mail folders, Microsoft Exchange
is not available. Either there are network problems or the
Exchange server is down for maintenance.

40

CAS Example 4
TROUBLESHOOTING:
Looked into the RCA logs and found the

following:

2011-05-10T18:09:17.507Z,41254,1'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
,,OUTLOOK.EXE,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,0x6BA
(rpc::Exception),00:00:00.0156002,SessionDropped,RpcEndPoint: [ServerUnavailableException] Connection
must be re-established -> [SessionDeadException] The primary owner logon has failed. Dropping a
connection.
2011-05-10T18:09:17.881Z,41255,0'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
,,OUTLOOK.EXE,14.0.4760.1000,Classic,NN.NN.110.192,fe80::88c4:5649:7cca:ad19%12,ncacn_ip_tcp,,Conn
ect,0,00:00:00,"SID=XXXXXXXXXXXXXXXXX, Flags=None",
2011-05-10T18:09:17.881Z,41255,1'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
,,OUTLOOK.EXE,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,-2147024809
(rop::InvalidParam),00:00:00,,RopHandler: Logon: [RopExecutionException] Invalid LegacyDN syntax..
Error code = InvalidParam
41

CAS Example 4
EXPLANATION:
Incorrect Legacy Exchange DN due to special

character

SOLUTION:
Changed the Legacy Exchange DN from:

'/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc = TTTT'
to '/o=XYZ/ou=ABC-XXX/cn=ZZZZZZ/cn=VVVVV/cn=abc-TTTT'
Then we added the X500 address as '/o=XYZ/ou=ABCXXX/cn=ZZZZZZ/cn=VVVVV/cn=abc-TTTT'
42

CAS Example 5
ISSUE:
Exchange 2007/2010 mixed
environment
TMG publishing rules for ActiveSync
Users with a certain type of mobile
phone are complaining that from time
to time their Inbox does a full reload
43

CAS Example 5
TROUBLESHOOTING:
Pushed ActiveSync logging at max on
Exchange 2010 CAS servers
Asked users to call Help Desk as soon
as they hit the issue and mention the
time as accurately as possible
Did that for one day and then got the
logs (Event logs, IIS and HTTPERR logs)
44

CAS Example 5
ANALYSIS:
When the users reported the issue, we have

Warning 1008 in Application Event log


This log alone does not tell us what happen
it is known to give false positives
ID: 1008
Level: Warning
Source: MSExchange ActiveSync
Message: An exception occurred and was handled by Exchange ActiveSync.
This may have been caused by an outdated or corrupted Exchange ActiveSync
device partnership. This can occur if a user tries to modify the same item from
multiple computers. If this is the case, Exchange ActiveSync will re-create the
partnership with the device. Items will be45
updated at the next synchronization.

CAS Example 5
ANALYSIS:
When we match the IDs 1008 with the IIS

logs, we find the same pattern at the times


reported by users:

2012-09-24 13:59:37 ##.##.215.71 POST /Microsoft-ServerActiveSync/default.eas


User=****&DeviceId=Appl81035YZJA4T&DeviceType=iPhone&Cmd=Sync&Log=V
140_LdapC1_LdapL16_S110_Error:HttpLayerFailure_Mbx:MBX01.contoso.com_
Throttle0_Budget:(A)Conn%3a0%2cHangingConn%3a0%2cAD%3a%24null%2f
%24null%2f1%25%2cCAS%3a%24null%2f%24null%2f1%25%2cAB%3a%24null
%2f%24null%2f0%25%2cRPC%3a%24null%2f%24null%2f1%25%2cFC
%3a1000%2f0%2cPolicy%3aDefaultThrottlingPolicy%5F1b07898a-e904-4fe48a9b-77ec7d9f4e18%2cNorm_ 443 CORP\**** 205.223.229.89 AppleiPhone3C1/902.206 200 0 64 128748
46

CAS Example 5
ANALYSIS:
So, the error is coming from the HTTP layer
Looked at the HTTPERR logs and saw the

following:

Connection_Dropped MSExchangeSyncAppPool

When the iPhone/iPad receives two 500

errors consecutively it resets the connection


and proceeds to re-download the users
entire mailbox
This could come either47 from the CAS servers,

CAS Example 5
SOLUTION:
Involved TMG team since we could not
find anything wrong with the CAS
servers
They noticed that TMG was in RTM
This issue was known and it was solved
in SP2
Installed SP2 for TMG which solved the
issue
48

CAS Example 6
ISSUE:
Exchange Server 2003 BE cluster
hosting approx. 3000 mailboxes
The W3WP.EXE process runs at almost
100% CPU. As a consequence,
ActiveSync synchronization is slow or
does not work at all
49

CAS Example 6
TROUBLESHOOTING:
Asked for the IIS logs from that day
Used the following parser to process logs:
http://
blogs.technet.com/b/exchange/archive/2012/01/31/a-sc
ript-to-troubleshoot-issues-with-exchange-activesync
.aspx
Lised only hits bigger than 1000 for the day:
.\ActiveSyncReport.ps1 -IISLog "C:\Logs" -LogparserExec
C:\Program Files (x86)\Log Parser 2.2\LogParser.exe
-ActiveSyncOutputFolder C:\EASReports -MinimumHits 1000
50
-Date 05-30-2012

CAS Example 6
TROUBLESHOOTING:

SOLUTION: Moved the top talkers to a


more powerful Exchange 2010 platform
51

CAS Troubleshooting Tools


http://blogs.technet.com/b/exchange/archive/2012/01/
31/a-script-to-troubleshoot-issues-with-exchange-act
ivesync.aspx

Very useful for parsing IIS logs on

Exchange 2003/2007/2010
Can also be used for reporting, for
example:
.\ActiveSyncReport.ps1 -IISLog
"C:\inetpub\logs\LogFiles\W3SVC1" -LogparserExec C:\Program
Files (x86)\Log Parser 2.2\LogParser.exe
-ActiveSyncOutputFolder C:\EASReports -MinimumHits 1000
52
-SendEmailReport -SMTPRecipient
user@contoso.com

CAS Troubleshooting Tools


A more advanced tool is Log Parser

Studio

http://blogs.technet.com/b/exchange/archive/2
012/03/07/introducing-log-parser-studio.aspx
In addition to IIS logs, it can browse RCA

logs, HTTPERR logs, event logs, address book


logs and even registry
There is a library of predefined queries and it
is possible to enrich it 53with own configured

CAS Troubleshooting Tools

54

Transport
Troubleshooting

Transport Exchange 2003


logs

56

Transport - Exchange
2007/2010 logs

57

Transport - Exchange
2007/2010
logs
Connectivity log
Logs SMTP connection activity of the outbound message delivery

queues to the destination MBX server, smart host, or domain


Disabled by default

Protocol log
Disabled by default on all SMTP Send and Receive connectors
Enabled or disabled on a per connector basis Protocol logging

level must be configured to Verbose in order to record


Circular logging is used to limit file size
There is a special, invisible, intra-organization Send connector (used
for HUB HUB, HUB EDGE, HUB Exchange 2003 relay) for
which we can enable protocol logging only via PowerShell:
Set-TransportServer server -IntraOrgProtocolLoggingLevel Verbose
58

Transport - Exchange logs


Message tracking logs
Record detailed activity as the message
moves within Exchange
Enabled by default
On Exchange 2003 there is less information
and the Event ID is not in text format
On Exchange 2007/2010, they exist on
HUB, EDGE (MSGTRKyyyymmdd-n.log) and
MBX (MSGTRKMyyyymmdd-n.log) servers
59

Transport - Exchange logs


Message tracking logs
Search only in graphical mode in Exchange
2003
In EMC 2007/2010, search only works for
the HUB/MBX on which it is being launched
PowerShell command to search in logs from
all HUB and MBX servers:
Get-ExchangeServer | where {$_.isHubTransportServer -eq $true
-or $_.isMailboxServer -eq $true} | Get-MessageTrackingLog
-MessageId id
60

Transport - Exchange logs


Pipeline Tracing (Exchange 2007/2010)
Disabled by default
Verbose logging not recommended over long periods of time
Useful for tracking down issues with transport agents and rules as the
message moves through the transport pipeline
Takes message snapshots to capture message changes as a result of
transport agents and rules being applied
By default, the pipeline tracing log directory is
<Exchange>\Server\TransportRoles\Logs\PipelineTracing
Can be activated and managed only via EMS
Set-TransportServer <server> -PipelineTracingEnabled $True
Designed to log messages that are sent only from a specific SMTP address
(mailbox inside or outside your Exchange organization)
Set-TransportServer <server> -PipelineTracingSenderAddress x@y.com
Setting the PipelineTracingSenderAddress parameter to <> captures all
email server-generated messages that
61 are received by the HUB or EDGE
being configured huge logfiles!!!

Transport - Tools
Logs (activation if needed) and

analysis
NetMon / Wireshark message
routing, latencies, general
communication
TELNET on port 25 to test SMTP
communication between email servers
Queue Viewer (Get-Queue powershell
commandlets) in case of stuck
62

Transport - Tools
Process Tracking Log (PTL)
Works only with Exchange 2007 and 2010
Allows parsing, monitoring and analyzing
Message Tracking logs
Powerful tool for troubleshooting as well as
for generating statistics for message traffic

63

Transport - Tools
Process Tracking Log (PTL) useful

for:

Message Looping
Message failures, such as in Delivery Status

Notifications (DSNs)
List of top mail senders
List of top mail recipients
Top large message size generators
Queues backing up
Performance issues due 64
to message load

Transport - Tools
Process Tracking Log (PTL)
To parse one file:
cscript ProcessTrackingLog.vbs "C:\Program
Files\Microsoft\Exchange
Server\V14\TransportRoles\Logs\MessageTracking\MSGTRKxx.LO
G" 1 all

To parse all files in each subdirectory that were

logged yesterday:
cscript ProcessTrackingLog.vbs "C:\Program
Files\Microsoft\Exchange
Server\V14\TransportRoles\Logs\MessageTracking" 0 all
yesterday

Read more about it65 @


http://blogs.technet.com/b/exchange/archive/2011/10/21/updated-process-tracking-log-ptl-tool-for-use-with-exchange-2007-and-exchange-2010.a

Transport PTL examples


MTDsnFailureResults.csv

MTRecipientStatistics.csv

66

Transport Example 1
ISSUE:
After migrating around 1700 users to
Exchange 2010, there are constantly
400-500 messages in the E2003-E2010
routing group queue of the Exchange
2003 bridgehead.

67

Transport Example 1
TROUBLESHOOTING:
Exchange 2003 bridgehead:
Increased logging for Transport\SMTP to
max
Activated SMTP logging
Exchange 2010 HUB:
Connectivity logging was running already
Activated Protocol log for the Default
Receive connector 68

Transport Example 1
ANALYSIS:

Get-ReceiveConnector -server <HUB> | fl


name,maxack*,tarpit*
Name:Default****E0024
MaxAcknowledgementDelay:00:00:30
69
TarpitInterval:00:00:05

Transport Example 1
EXPLANATION:
The MaxAcknowledgementDelay parameter specifies the period the transport
server delays acknowledgement when receiving messages from a host that
doesn't support shadow redundancy. When receiving messages from a host that
doesn't support shadow redundancy, a Microsoft Exchange Server 2010 transport
server delays issuing an acknowledgement until it verifies that the message has
been successfully delivered to all recipients. However, if it takes too long to verify
successful delivery, the transport server times out and issues an
acknowledgement anyway. The default value is 30 seconds.
The TarpitInterval parameter specifies the period of time to delay an SMTP
response to a remote server that Exchange determines may be abusing the
connection. Authenticated connections are never delayed in this manner. The
default value is 5 seconds. To specify a value, enter the value as a time span:
70 and s = seconds. The valid input range
hh:mm:ss, where h = hours, m = minutes,

Transport Example 1
SOLUTION:
On all the HUB servers in the
organization, deactivated TarpitInterval
& MaxAcknowledgementDelay:
Set-ReceiveConnector -Identity Default <HUB> -TarpitInterval 00:00:00
Set-ReceiveConnector -Identity Default <HUB> -MaxAcknowledgementDelay
00:00:00

Microsoft Exchange Transport


service needs top be restarted to
71

Transport Example 2
ISSUE:
The Exchange 2010 EDGE server

cannot send emails to a certain domain


contoso.com.
It works for some time, but then it
breaks and we need to restart
Transport to make it work again.
Actually, it looks like a few other
domains are having this issue, this is
72

Transport Example 2
TROUBLESHOOTING:
Activated Protocol logging for the SMTP

send connector
Took a network trace on the Edge
server
Reproduced the issue
Collected the traces
73

Transport Example 2
ANALYSIS:
Used

www.hscripts.com/tools/HDNT/dns-reco
rd.php
to resolve the IP addresses:
Domain
Type
contoso.com.
MX IN
mta.contoso.com.AAAA
mta.contoso.com.A
IN
mta.contoso.com.A
IN

Class
Result
5 mta.contoso.com
IN ****:****:0:aaaa::1:c
**.**.68.125
**.**.68.126
74

Transport Example 2
ANALYSIS:
SMTP log:

By looking at the log, we can only say

that we fail when we contact


contoso.com in IPv6 and that the
request does not go all the way in IPv4.
75

Transport Example 2
ANALYSIS and EXPLANATION:
Network trace on the Edge server:
IPv6 address ****:****:0:aaaa::1:c" for

"mta.contoso.com" appears in the trace


The only thing we have for it is SynReTransmit,
contoso.com does not respond
IPv4 addresses **.**.68.125 and **.**.68.126 for
"mta.contoso.com" do not appear in the trace
whatsoever
The attempts made in IPv4
never leave the Edge
76

Transport Example 2
SOLUTION:
Prioritize IPv4 communication over IPv6 as

described in the following KB article:

929852 How to disable IP version 6 (IPv6) or its specific components in


Windows 7, in
Windows Vista, in Windows Server 2008 R2, and in Windows
Server 2008
http://support.microsoft.com/default.aspx?scid=kb;EN-US;929852

Attention: we do not deactivate IPv6

(possible althoufg), as it is required by


Windows Core components
As a result, we never switch to IPv6 when
77
talking to domains publishing
IPv6

Transport Example 3
ISSUE:
Some users are receiving emails with

the body in non-European characters


These emails are coming from outside
the Exchange organization

78

Transport Example 3
TROUBLESHOOTING:
Asked for a message in PST/MSG format

ANALYSIS:
The message is multipart
Noticed that the only part of the email being

displayed in European characters was the


disclaimer:

79

Transport Example 3
EXPLANATION:
This can happen if the charset of the

message body is different from the charset


of the disclaimer
Exchange cannot include both and needs to
select one (all Exchange versions have the
same behavior: http://
support.microsoft.com/kb/916299)
Bad luck: Exchange chooses the charset of
the disclaimer (utf-8), displaying it correctly
80
but garbling the message
body by doing so

Transport Example 3
FURTHER TROUBLESHOOTING:
Asked for pipeline tracing to be
activated for an external user
Got the result and at that moment it
became clear that the disclaimer was
the cause:
Email body for all message snapshots prior

to the disclaimer being applied is displayed


correctly
81

Transport Example 3
SOLUTION:
Did further investigations on the
disclaimer
Turned out that it was configured
incorrectly, since it was getting applied
for messges coming from external
recipients
82

Transport Example 4
ISSUE:
The inheritance was cut in AD for one

OU
As a result, emails sent to those users
were queuing up in the Unreachable
Queue on the Exchange 2007 HUB
server
83

Transport Example 4
SOLUTION:
Put back the inheritance
New emails are delivered to users

The Unreachable queue is resubmitted

automatically only if the old routing


tables are not the same as the old ones
Resubmit the messages still blocked in
the Unreachable queue to the
categorizer:
84

Transport Example 5
ISSUE:
Root cause analysis (Exchange 2003)
Different change operations had been
done over the weekend, overlapping
On Monday, they noticed that the
queue Messages awaiting directory
lookup was growing and growing
Situation got back to normal on
Tuesday
85

Transport Example 5
TO DOs:
Find out as accurately as possible when
the issue started
Same requirement for the end of the
downtime
Explain what happened

86

Transport Example 5
TROUBLESHOOTING:
Started by looking into the event logs
During the downtime, Expert logging
for the Transport components was not
activated so information was not
enough
Asked for the message tracking logs
before, during and after the issue
87

Transport Example 5
ANALYSIS:
As this is Exchange 2003, we have IDs
for the Event Types
ID 1031 is SMTP end outbound
transfer
If we can build statistics to see how
many messages are getting sent per
minute, we can determine when the
issue started/ended
88

Transport Example 5

89

Transport Example 5

90

Transport Example 5
ANALYSIS:
During the downtime, we have
between 20 and 40 messages being
sent per minute
Identified the beginning and the end
easily:
Before it started, we had 300 mails/minute,

then 100, then 20-40


When it stopped, we 91start having 150, then

Transport Example 5
ANALYSIS:
By adding ID 1024 SMTP submit
message to categorizer to the Excel
filter, we can clearly see that during
the downtime it takes between 30
minutes and 2 hours to categorize a
message
On the DCs there was no sign of
performance problems
92

Transport Example 5
EXPLANATION:
The trigger was a proxy change on
Saturday
Following this change, the Antivirus
could no longer contact his update
server for updates
In addition to that, the engine version
of the AV was very old and no longer
supported
93

Mailbox
Troubleshooting

Mailbox
Probably the most sensitive part since

this is where the databases reside


Loosing data (mails, appointments) is
probably the administrators worst
nightmare
2 main issues for this section:
Dismounted database
Log/Database growth issues
95

Mailbox Dismounted DB
Check the database status:
ESEUTIL /MH
Check log status:
ESEUTIL /ML

Mailbox Dismounted DB
Replay the logs into the database (soft recovery):
ESEUTIL /R

Mailbox Dismounted DB
What NOT to do if a database is down:

ESEUTIL /P
This is the last resort if everything else
failed
Following KB, although old, gives an
excellent explanation of what happens
when doing a hard repair:
259851 Ramifications of running the eseutil /p or
edbutil /d /r command in Exchange
98

Mailbox Log/Database
Growth
1 thing to do: use ExMon (Exchange
st

User Monitor works with Exchange


2003, 2007 & 2010) on the mailbox
server with the issue to see if an user
is causing the growth
Sort on CPU (%) column and see if the

same user(s) is always on top


Sort by Bytes Out and check for top talkers
If the top talker is ? (a question mark), then
99

Mailbox Log/Database
Growth
Before moving forward, here is an ExMon
example:

The user who causes the issue on the server

(transaction log growth) is the one consuming the


most CPU. It is the first user listed (Jean).
The column client version will tell which version of
MAPI client it is. For example, a BlackBerry is a 6.x
100
version, whereas 11.x is for
Outlook 2003.

Mailbox Log/Database
Growth
If ExMon cannot help, next thing to do on the

mailbox server is to check for large, stranded


or looping message(s) in users Outbox
In Exchange 2003, use a network sniffer (NetMon,

Wireshark) to identify the client


In Exchange 2007/2010, for users running in online
mode:
Get-Mailbox -ResultSize Unlimited | Get-MailboxFolderStatistics -folderscope
Outbox | Sort-Object Foldersize -Descending | select-object
identity,name,foldertype,itemsinfolder,@{Name="FolderSize
MB";expression={$_.folderSize.toMB()}} | export-csv OutboxItems.csv
101

Mailbox Log/Database
Growth
If no problem message was detected in

Outbox, check the operation rates for users:


Exchange 2003: add Ops columns to Mailbox Store\ Logons

Exchange 2007/2010: run the following PowerShell

command:
Get-LogonStatistics | select-object
102
username,Windows2000account,identity,messagingoperationcount,otheropera

Mailbox Log/Database
Growth
Here is an example of how operation rates can help:
the *** account is flooding the ### account with
mails

UserName : ***
Windows2000Account : <domain>\***
Identity : /o=<Org>/ou=<Admin_Group>/cn=Recipients/cn=*** account causing the
flood
MessagingOperationCount : 1453
mail storm
UserName :
blank
Windows2000Account : <domain>\<HUB>$
machine account of a HUB
Identity : /o=<Org>/ou=<Admin_Group>/cn=Recipients/cn=### reference to the user
account
MessagingOperationCount : 765
103

Mailbox Log/Database
Growth
If the previous troubleshooting steps on the
mailbox server did not help, move on the
HUBs to check for:
Messages queued or in retry:
Get-ExchangeServer | where {$_.IsHubTransportServer -eq "true"} |
Get-Queue | where {$_.Deliverytype -eq MapiDelivery} | SelectObject Identity, NextHopDomain, Status, MessageCount | ft -auto

Large queued messages:


Get-ExchangeServer | where {$_.IsHubTransportServer -eq "true"} |
Get-Message -resultsize unlimited | Select-Object
Identity,Subject,status,LastError,RetryCount,queue,size | sort-object
-property size -descending | ft -auto
104

Mailbox Log/Database
Growth
Still having the issue after all that

troubleshooting?
Dont forget PTL for both Mailbox &
HUB!
Still no solution in radar?
Call support, tell us what you have
already tried so that we dont ask for it
again
105

Mailbox
Checking and repairing database integrity
ISINTEG for Exchange 2003 & 2007
Repair requests replaced ISINTEG in Exchange
2010
New-MailboxRepairRequest for mailbox DBs
You can run this command against a specific mailbox or against a database
While this task is running, mailbox access is disrupted only for the mailbox being

repaired
If you're running this command against a database, only the mailbox being
repaired is disrupted - all other mailboxes on the database remain operational

New-PublicFolderDatabaseRepairRequest for PF DBs


Use this cmdlet to detect and fix replication issues in the public folder database
Public folders on the public folder database can still be accessed while the

request is running
106
However, access isn't available to the public folder currently being repaired.

Mailbox

107

Mailbox
In Exchange 2010, we have the

possibility to configure the Calendar


Repair Assistant
Not set to run by default, needs to be
activated with the cmdlet SetMailboxServer
It detects and corrects inconsistencies
with single and recurring meeting
items for mailboxes located on that
108

Mailbox One simple


example
ISSUE:
Exchange Server 2010
Backup software cannot purge

transaction logs

109

Mailbox One simple


TROUBLESHOOTING:
example

110

Mailbox One simple


example
TROUBLESHOOTING:
ID:
519
Level:Error
Source:
ESE
Message: eseutil (11900) JetDBUtilities - 9384: The range of log
files \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy48\INFO365b\E020000435B.log
to \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy48\INFO365b\E0200004401.log is missing (error
-528) and cannot be used. If these log files are required for recovery, a good copy of these log files will be
needed for recovery to complete successfully.

ID:
518
Level:Error
Source:
ESE
Message: eseutil (8052) JetDBUtilities - 11416: The log
file \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy97\REP365\E030002C510.log is missing (error
-528) and cannot be used. If this log file is required for recovery, a good copy of the log file will be needed
for recovery to complete successfully.
111

Mailbox One simple


example
SOLUTION:
We managed to find missing logs in

older file-level backups


Restored the missing transaction logs
into the Exchange folder
Backup was able to purge the logs
afterwards
112

Q&A

ExMon
Tool for gathering statistics about how the Outlook

clients are making use of the Exchange Server


resources such as CPU and network bandwidth

It was designed and used to help understand the

way MAPI applications such as Outlook were


affecting the performance on Exchange Server 2003

The latest version of the tool now works also with

Exchange 2007 and 2010:

http://www.microsoft.com/downloads/details.aspx?familyid=9A49C22E-E0C7-4B7C-ACEF-729D48AF7BC9&dis
playlang=en
114

ExMon
What ExMon can do:
For each user, show the consumption on the server for: CPU,
disk, network traffic
Measure server-side CPU latency
Give the following information about the clients:
user
name, version, IP address
Process data sent to the Exchange server by MAPI clients and
exposes the user experience by showing the actual RPC
(network & server) latency
Give some data about ActiveSync traffic
What ExMon cannot do:
Measure SMTP, OWA, DAV, POP/IMAP traffic
Give information about Spam
115

ExMon
Start the installation of the Exchange

Server User Monitor


finish the installation
In the ExMon installation folder, double
click ExMon.reg, which will add the
following registry keys:

116

ExMon
Once installed, ExMon can be used either live to see the real-

time consumption on the server, or in tracing mode, to get


statistics
Live mode is going to be used mostly with small update
intervals, the minimum being 1 minute
ExMon GUI to be used for this type of monitoring
The maximum snapshotting interval is 30 minutes, which is

useful for doing captures over longer periods of time


Traces > 30 minutes cannot be created from GUI
System Monitor or tracelog.exe to be used with
Windows Server 2003
logman.exe for Windows Server 2008 & R2
117

ExMon
Error when running ExMon: Unknown StartTrace error (183)
Code 183 translates to: ERROR_ALREADY_EXISTS
This happens when ExMon crashes or gets killed when
collecting data, but the Exchange trace continues to run (there
is a 512 MB limit, when reached it stops)
After a crash, ExMon wants to start a new trace session, but
cannot do so because the Exchange Event Trace provider
only supports one instance at a time
Solution:
logman query -ets to see if Exchange Event Trace runs
logman stop Exchange Event Trace -ets to stop it
When using tracelog.exe on Windows Server 2003:
118
tracelog -stop Exchange Event Trace

2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered
trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this
presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

You might also like