Professional Documents
Culture Documents
User Guide
June 22, 2016
USG-LogRhythm_Diagnostics-revA
LogRhythm Diagnostics Module User Guide
Disclaimer
The information contained in this document is subject to change without notice. LogRhythm, Inc. makes no
warranty of any kind with respect to this information. LogRhythm, Inc. specifically disclaims the implied warranty
of merchantability and fitness for a particular purpose. LogRhythm, Inc. shall not be liable for any direct, indirect,
incidental, consequential, or other damages alleged in connection with the furnishing or use of this information.
Trademark
LogRhythm is a registered trademark of LogRhythm, Inc. All other company or product names mentioned may be
trademarks, registered trademarks, or service marks of their respective holders.
LogRhythm Inc.
4780 Pearl East Circle
Boulder, CO 80301
(303) 413-8745
www.logrhythm.com
Contents
Introduction ....................................................................................................................................................................................................... 1
Module Contents ............................................................................................................................................................................................. 1
Alarm Rules .................................................................................................................................................................................................... 1
Reports ............................................................................................................................................................................................................ 3
Investigations ................................................................................................................................................................................................ 3
Tails ................................................................................................................................................................................................................... 3
Troubleshooting Guidance .......................................................................................................................................................................... 4
Appendix: Summary of Changes ............................................................................................................................................................. 12
Renamed Alarm Rules ............................................................................................................................................................................. 12
New Alarm Rules ....................................................................................................................................................................................... 13
LogRhythm Diagnostics Module User Guide
Introduction
The LogRhythm Diagnostics module is provided as part of the LogRhythm Knowledge Base and includes content
intended to monitor the health of the LogRhythm deployment and generate alarms when key health-impacting
events occur. The module contains tails, reports, and investigations to monitor all diagnostic events, as well as
alarms triggered by specific conditions.
NOTE: The LogRhythm Diagnostics Module replaces content currently available in the QsEMP module and
will be automatically synchronized on all deployments. Existing rules have been modified for accuracy,
updated with new components where necessary, and have undergone settings changes to reduce alarm
volume. With the exception of LogRhythm Component Critical Condition, which is suppressed by default
for one hour, the default suppression for all alarms has been updated to two hours.
Module Contents
Alarm Rules
Alarm Rule Name Alarm Description Alarm Rule ID
LogRhythm Mediator Database Alarms on the occurrence of the Mediator Database reaching 96
Capacity Error 90% capacity. The Mediator Server inserting log data into the
affected Mediator database will cease accepting new log
messages from connected agents and will force agents to
disconnect.
LogRhythm Mediator Database Alarms on the occurrence of the Mediator Database reaching 97
Capacity Warning 80% capacity. At 90% capacity the Mediator Server inserting
data into the affected Mediator database will cease accepting
new log messages.
LogRhythm Agent Heartbeat Missed Alarms on the occurrence of a LogRhythm Agent Heartbeat 98
Missed event which could indicate a LogRhythm Agent going
down.
LogRhythm Component Critical Alarms on the occurrence of any critical LogRhythm component 99
Condition event which indicates the failure of a LogRhythm component.
LogRhythm Component Successive Alarms on successive occurrences of critical and error 100
Errors LogRhythm component events which likely indicate the failure of
a LogRhythm component.
LogRhythm Component Excessive Alarms on excessive occurrences of critical, error or warning 101
Warnings LogRhythm component events which could indicate pending
failures of the LogRhythm solution.
LogRhythm Mediator Heartbeat Alarms on the occurrence of a LogRhythm Mediator Heartbeat 102
Missed Missed event which could indicate that a log manager has gone
down.
LogRhythm MPE Rule Disabled Alarms on the occurrence of a LogRhythm MPE Rule Disabled 103
event.
LogRhythm Silent Log Source Error Alarms on a LogRhythm Silent Log Source Error event which 104
could indicate a log source that has gone silent.
Page 1
LogRhythm Diagnostics Module User Guide
Page 2
LogRhythm Diagnostics Module User Guide
Reports
Report Name Report Description Report ID
LogRhythm Diagnostic Events Provides a detailed account of critical and error conditions 431
experienced by LogRhythm components.
Investigations
Investigation Name Investigation Description Investigation ID
LogRhythm Diagnostic Events This investigation is used to bring back all diagnostics events 12
from any LogRhythm Component (Agent, AI Engine, ARM,
Mediator, etc.).
Tails
Tail Name Tail Description Tail ID
LogRhythm Diagnostic Events This tail returns all diagnostic events from any LogRhythm 1
component (System Monitor Agent, AI Engine, ARM, Mediator,
etc.).
Page 3
LogRhythm Diagnostics Module User Guide
Troubleshooting Guidance
This section provides information about steps you can take to further analyze specific alarms or how to gather
additional information to provide to LogRhythm Customer Support.
97: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
Mediator Database and surrounding logs from affected sources.
Capacity Warning 2. Check Data Management settings for proper tuning, ensuring only required logs are being
brought online (Classification-Based Data Management or run investigation to identify major
producers of white noise).
3. Investigate top talkers, log counts, and summary totals for spikes in the environment that could
point to a misconfiguration or a potential threat Log Volume Reports.
4. Ensure only required logs are being kept online.
5. If oversubscription, and a justified volume of logs are being brought online, the LMDB can be
manually grown (drive space permitting) or the appliance/deployment may need to be re-
scoped for additional resources.
6. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: Provides early warning to administrators that action may be necessary to maintain the LMDB
and prevent the system from going into suspend mode. This is caused by oversubscription of the
LMDB. This alarm does not apply to the Data Processor.
Page 4
LogRhythm Diagnostics Module User Guide
99: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
Component Critical and surrounding logs from affected sources.
Condition 2. Review the status of the component listed in the alarm, and restart the component if necessary.
3. If an issue is identified, remediation steps will vary according to the affected component and the
error observed.
4. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: Identifies and detects critical problems at an early stage on any LogRhythm component. It will
most likely require analysis to verify the scope, validity, and priority if a source issue identified.
100: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
Component and surrounding evidence from affected sources.
Successive Errors 2. Review the status of the component listed in the alarm. Restart component if necessary.
3. If an issue is identified, remediation steps will vary according to the affected component and the
error observed.
4. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support
Note: Identifies and detects potential problems at an early stage on any LogRhythm component. It
will most likely require analysis to verify the scope, validity, and priority if a source issue identified.
101: LogRhythm 1. Investigate event and related logs from problem host around time-frame of logs with common
Component events matching that of the alarm criteria.
Excessive Warnings 2. Review the status of the component listed in the alarm, and restart the component if necessary.
3. If an issue is identified, remediation steps will vary according to the affected component and the
error observed.
4. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: Identifies and detects potential problems at an early stage on any LogRhythm component. It
will most likely require analysis to verify the scope, validity, and priority if a source issue identified.
Page 5
LogRhythm Diagnostics Module User Guide
103: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
MPE Rule Disabled and surrounding logs from affected sources.
2. Check scmpe.log and lps_detail.log for policy processing issues, statistics, stalled threads, further
issues with the same rule, and/or subsequent issues with MPE rules in the same policy (possibly
indicating a change or update to a log source or log source type, logging operations).
3. Investigate associated MPE Rules it is recommended that you contact LogRhythm Support
for this step.
4. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: The MPE service now detects when individual rules continue to generate processing warnings
and may disable rules repeatedly raising warnings if hung processes jeopardize the health of the
system.
This capability increases the reliability of each Log Manager by allowing the MPE to more accurately
identify and gracefully handle parsing rules that risk the health of the overall system.
104: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
Silent Log Source and surrounding logs from affected sources.
Error 2. Examine log source in the Client Console to check if the log source record has updated the "last
log message" field since the receipt of the alarm.
3. Verify correct configuration of the Silent Log Message Source Settings in the log source
properties.
4. Investigate log source host to verify health or issues that may be interrupting communications
to the collecting agent.
5. Ensure no configuration changes security or administrative have been made to the log
source, and verify no changes in the communications path that may prevent logging to
LogRhythm.
6. If this is expected behavior from the log source, silent log source settings may be tuned in the
advanced properties of each log source.
7. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: While Silent log source alarms can be extremely valuable, and can be set individually per log
source, environmental factors, like a log source that is not very chatty, can cause flooding of this
alarm. As this is tuned per log source it can have a high administrative cost. When tuned, however,
the value of these alarms can be extremely high and produce valuable insight into each log sources
normal behavior.
Page 6
LogRhythm Diagnostics Module User Guide
212: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
Failed To Submit and surrounding logs from affected sources.
Batch Job To DB 2. Check SQL and Mediator service health.
3. Check logmsgprocessor.log, evtmsgprocessor.log, scmedsvr.log, and SQL server/error and
history logs around relative time-frame for cause of failure and/or related SQL maintenance
errors.
4. Investigate volume reports and log spikes that may have caused a temporary failure and
monitor for subsequent failures.
5. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: Processed logs, coupled with their respective instructions for inserting, are batched together
into the respective destination database.
Page 7
LogRhythm Diagnostics Module User Guide
231: LogRhythm 1. Investigate top talkers, log counts, and summary totals for spikes in environment that could
Excessive point to a misconfiguration, oversubscription, or a potential threat Log Volume Reports.
Processed Logs 2. Monitor for continuance by checking Deployment Monitor and related Performance Counters
Spooled to Disk to gather usage statistics counter descriptions are available in the LR Help File
3. Check logmsgprocessor.log, evtmsgprocessor.log, and scmedsvr.log around relative time-frame
for clues.
4. Check SQL maintenance jobs for successes and failures.
5. Check resource utilization.
6. Check Data Management and RBP settings for improper tuning identify through
investigations and ensure insert rates adhere to best practices.
7. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: Spooling logs to disk is an expected condition that happens under periods of peak load.
Excessive spooling, however, can result in disk starvation and the Mediator going into a suspend
state. This rule can be tuned to exclude lower-volume diagnostic events (e.g., Unprocessed Log
Spooled Count Exceeds 1 Million) on high-volume deployments.
Page 8
LogRhythm Diagnostics Module User Guide
408: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
GLPR Error and surrounding logs from affected sources.
2. Review GLPR configurations.
3. Check GLPR Performance Counters.
4. Check scmpe.log for errors and disabled GLPRs.
5. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: The following error types may trigger this alarm:
Collection Update Error: Error updating the entire rule-base of GLPRs after a configuration
change
Preparation Error: Error observed while preparing a rule for processing
Processing Errors: Observed error while processing against a specific GLPR
676: LogRhythm AI 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
Engine Heartbeat and surrounding logs from affected sources.
Missed 2. Review the AI Engine Communication Manager service on AIE appliance.
3. Check network connectivity between AIE and Platform Manager.
4. Check LRAIEEngine.log.
5. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Page 9
LogRhythm Diagnostics Module User Guide
947: LogRhythm Collect SQL error logs, note database and disk sizes, and then raise a ticket with LogRhythm
CMDB Database Support.
Warning
948: LogRhythm Collect SQL error logs, note database and disk sizes, and then raise a ticket with LogRhythm
CMDB Stats Support.
Warning
949: LogRhythm Collect SQL error logs, note database and disk sizes, and then raise a ticket with LogRhythm
CMDB Database Support.
Error
1002: LogRhythm 1. Determine the affected Agent by investigating logs and events.
Agent Cannot 2. Ensure the Agent is communicating and last Agent Heartbeat is up to date.
Update
3. Ensure the Agent is collecting logs as expected.
4. Collect scmedsvr.log and scsm.log, and then raise a ticket with LogRhythm Support.
1084: LogRhythm 1. Verify that you can log in to Network Monitor and access the UI.
Network Monitor 2. Check the diagnostics page for clues.
Heartbeat Missed
3. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
1093: LogRhythm Attempt to start the Indexer services using the start script:
Data Indexer Windows: C:\Program Files\LogRhythm\Data Indexer\tools\start-all-services.bat
Stopped
Linux: /usr/local/logrhythm/tools/start-all-services-linux.sh
Page 10
LogRhythm Diagnostics Module User Guide
1139: LogRhythm 1. Use LogRhythm to analyze and collect all information regarding the alarm, related events/logs,
Mediator Recycling and surrounding logs from affected sources.
Hung MPE 2. Check scmpe.log and lps_detail.log for policy processing issues, statistics, stalled threads, further
Threads issues with the same rule, and/or subsequent issues with MPE rules in the same policy (possibly
indicating a change or update to a log source or log source type, logging operations).
3. Investigate associated MPE Rules it is recommended that you contact LogRhythm Support
for this step.
4. If the steps above do not provide a solution or if you require assistance, please contact
LogRhythm Support.
Note: The MPE service now detects when individual rules continue to generate processing warnings
and may disable rules repeatedly raising warnings if hung processes jeopardize the health of the
system.
This capability increases the reliability of each Log Manager by allowing the MPE to more accurately
identify and gracefully handle parsing rules that risk the health of the overall system.
1140: LogRhythm Collect the lrjobmgr.log file, and then raise a ticket with LogRhythm Support.
Scheduled Report
Failure
1141: LogRhythm 1. Investigate events and logs that produced the alarm.
AD Sync Failure 2. Ensure that the credentials used for AD Sync are still valid by performing an AD sync validation
test via the Active Directory Domain Manager on the Platform Manager tab of Deployment
Manager.
3. Determine if any recent network changes may have affected the communication between
LogRhythm and the AD server.
4. Collect the lrjobmgr.log file, and then raise a ticket with LogRhythm Support.
Page 11
LogRhythm Diagnostics Module User Guide
Page 12
LogRhythm Diagnostics Module User Guide
Alarm Rule ID
LogRhythm Mediator Recycling - Hung MPE Threads 1139
LogRhythm Scheduled Report Failure 1140
LogRhythm AD Sync Failure 1141
Page 13