You are on page 1of 55

Copyright

2014 Splunk Inc.

Data On-Boarding
Andrew Duca
Sr. Professional Services Consultant, Splunk
Disclaimer
During the course of this presentaGon, we may make forward-looking statements regarding future events or the
expected performance of the company. We cauGon you that such statements reect our current expectaGons and
esGmates based on factors currently known to us and that actual events or results could dier materially. For
important factors that may cause actual results to dier from those contained in our forward-looking statements,
please review our lings with the SEC. The forward-looking statements made in the this presentaGon are being made
as of the Gme and date of its live presentaGon. If reviewed aQer its live presentaGon, this presentaGon may not contain
current or accurate informaGon. We do not assume any obligaGon to update any forward-looking statements we may
make. In addiGon, any informaGon about our roadmap outlines our general product direcGon and is subject to change
at any Gme without noGce. It is for informaGonal purposes only, and shall not be incorporated into any contract or
other commitment. Splunk undertakes no obligaGon either to develop the features or funcGonality described or to
include any such feature or funcGonality in a future release.

2
About Me
! Senior Professional Services Consultant based in Boston, MA
! 14+ Years of world-wide Professional Services ConsulGng
with the last two at Splunk
! Involved in 20+ deployments from 1GB to 5TB

3
Agenda
! Data
! Splunk Components
! Index Data
! Proper Parsing
! Challenging Data
! Advanced Inputs

4
Are You in The Right Room?
! You have used Splunk at least once, or at least read about it
! You are interested in Splunk best pracGces
! You like to use Splunks default parsing rules
! You just took over a Splunk deployment and youre not
sure what to do
! This is not an educaGon class; its best pracGce

5
Data
Splunk is the engine for machine data
!
Machine data is more than just logs - it's conguraGon data, data
from APIs and message queues, change events, the output of
diagnosGc commands and more
! Log types: ApplicaGon, Web Access and Proxy, Call Detail Records
(CDR), Clickstream, Message Queues, Packet, Database audit and
tables, File audit, Syslog, WMI, PerfMon
! Manual: Gecng Data In
hdp://docs.splunk.com/DocumentaGon/Splunk/latest/Data/
WhatSplunkcanmonitor
6
Splunk Apps
! Look to Splunk Apps rst and uGlize Technical Add-On (TA)
! Applies the Common InformaGon Model (CIM)
! CIM details the standard elds, event type tags, and host
tags that Splunk uses when it processes most IT data
! Example TAs:
Windows
Unix
Exchange
AcGve Directory
VMware Vcenter
WebSphere

7
Splunk Distributed Components

Search Head

Deployment Server

Indexer

Forwarder

8
Test Environment
! Every Splunk deployment should
have a test environment
! It can be a laptop, virtual
machine or spare server
! Should have the same version of
Splunk running in producGon
! Accessible to other Splunk
developers and administrators

9
One Shot
! Easiest way to get data into your test environment
! Components of the oneshot:
./splunk add oneshot user_conf.txt index indexname sourcetype sourcetype name
! Where to nd more informaGon:
hdp://docs.splunk.com/DocumentaGon/Splunk/latest/Data/
MonitorlesanddirectoriesusingtheCLI

10
Data - Broken

11
Props
! Always set these six parameters

# USER CONFERENCE
[user_conf_2014]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TRUNCATE = 10000

12
Props
! Defaults to empty

# USER CONFERENCE
[user_conf_2014]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TRUNCATE = 10000

13
Props
! strpGme Style format

# USER CONFERENCE
[user_conf_2014]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TRUNCATE = 10000

14
Props
! By default MAX_TIMESTAMP_LOOKAHEAD = 150 characters

# USER CONFERENCE
[user_conf_2014]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TRUNCATE = 10000

15
Props
! By default set to True

# USER CONFERENCE
[user_conf_2014]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TRUNCATE = 10000

16
Props
! By default set to ([\r\n]+); change to posiGve lookahead

# USER CONFERENCE
[user_conf_2014]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TRUNCATE = 10000

17
Props
! By default set to 10000 bytes; set to 0 to never truncate

# USER CONFERENCE
[user_conf_2014]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
TRUNCATE = 10000

18
Data - Fixed

19
6.2 Splunk Web Data
On-Boarding
Why to Use Splunk Web to On-board?
Quick and easy way to
! Easily visualize the data into events rather then lines of text
! Quickly get the data properly broken into events
! Accurately get the Gme stamp extracted

All in a wicked cool GUI


Once everything is good you take your PROPS secngs and deploy

21
Splunk Web Data On-Boarding
! Locate the source le on the Splunk Servers le system

22
Splunk Web Data On-Boarding
! Validate event breaking and Gmestamp recogniGon

23
Splunk Web Data On-Boarding
! Resolve event breaking

24
Splunk Web Data On-Boarding
! Set Gmestamp format even if Splunk gures it out automaGcally

25
Splunk Web Data On-Boarding
! Copy the props.conf secngs and deploy in a custom app

26
Challenging Data
Limit Indexed Data
! Anonymize data:
[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

! Rewrite raw data:


[source::.../sql.log]
SEDCMD-sqllog = s/(.*?)Command:EXECUTE[.\d\D\w\W]*/\1/g

! Discard events:
transforms
props
[setnull]
[source::/var/log/user_conf.txt] REGEX = (?i)DEBUG
TRANSFORMS-null= setnull DEST_KEY = queue
FORMAT = nullQueue

28
Limit Indexed Data
! Anonymize data:
[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

! Rewrite raw data:


[source::.../sql.log]
SEDCMD-sqllog = s/(.*?)Command:EXECUTE[.\d\D\w\W]*/\1/g

! Discard events:
transforms
props
[setnull]
[source::/var/log/user_conf.txt] REGEX = (?i)DEBUG
TRANSFORMS-null= setnull DEST_KEY = queue
FORMAT = nullQueue

29
Limit Indexed Data
! Anonymize data:
[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

! Rewrite raw data:


[source::.../sql.log]
SEDCMD-sqllog = s/(.*?)Command:EXECUTE[.\d\D\w\W]*/\1/g

! Discard events:
transforms
props
[setnull]
[source::/var/log/user_conf.txt] REGEX = (?i)DEBUG
TRANSFORMS-null= setnull DEST_KEY = queue
FORMAT = nullQueue

30
Limit Indexed Data
6.X or later Windows forwarders

! Whitelist events or blacklist specic events


! Inputs.conf ConguraGon

31
Index ExtracGons
! Provides reliable and consistent indexing of data with headers.
! Address issue on forwarder:
INDEX_EXTRACTIONS = {CSV | W3C | TSV | PSV | JSON}
! Supports custom header parsing and easy mode for common formats.
! Extract IIS elds using Props.conf on Windows forwarder:

[iis]
INDEX_EXTRACTIONS = w3c

32
MulGple Timestamps
12-Sep-2014,09:01:00,12-Sep-2014,09:02:00,-4 INFO Gtle="User Conference" msg="Splunk hosted user conference in Las Vegas."
12-Sep-2014,19:01:00,12-Sep-2014,19:02:00,-5 DEBUG Gtle="User Conference" msg="Gecng Data In, Correctly is a solid session."
datePme.xml
<datetime>

<define name=two_tz" extract="day, litmonth, year, hour, minute, second, zone">

<text><![CDATA[^(\d+)-(\w+)-(\d+),(\d+):(\d+):(\d+),(?:[^,]*,){2}([\w\-]*)]]></text>

</define>

<timePatterns>

<use name=two_tz">

</timePatterns>

<datePatterns>

<use name=two_tz">

</datePatterns>

</datetime>

props.conf
# USER CONF

[user_conf]

DATETIME_CONFIG = /etc/apps/splk_ps_user_conf_props/local/datetime.xml
* Do not set TIME_FORMAT

33
Database Connect
Database Connect
! Allows for indexing data from database sources directly
! Allows for adding meta data to events from database sources using
lookups

Caveats
! Java required on Splunk server
! Search head pooling requires custom conguraGon to share the DB
connecGon passwords. Not meant for data input sources

35

Database Connect Best PracGces
! Normalize Gmestamps naGvely inside the SQL Query
! Filter results down in SQL Query to reduce garbage in Splunk index
! Repeated DBLookups should be converted to staGc lookup
! Search head pooling requires encrypted password replicaGon

36
Modular and
Scripted Inputs
Modular and Scripted Inputs
Benets
! Almost any program that can output text can be used to index
!Modular inputs allow for conguraGon les and conguraGon secngs inside Splunk
Dierences
! Scripted inputs require conguraGon to be done in the script
! Modular inputs can be congured via deployed .conf les and accessed via REST API
!Scripted inputs need are specic to the OS deployed on where modular inputs can
support mulGple
Examples
vmstat, iostat, Checkpoint Opsec, Twider, Stream, Amazon S3 Online storage and more


38
Scripted Inputs Example
! Shell script saved in /opt/splunk/bin/scripts/ OR in a specic app
! Allows you to execute any program on Splunk forwarder and index
STDOUT data.
! UGlizing key value pairs makes for easier searching.

Sample output from custom script /Applica3ons/Splunk/bin/scripts/FantasyFootball.sh

39
Scripted Inputs Example
Shell script calls local system binary programs and can provide conguraGon opGons.

Use Inputs.conf to dene INDEX, SOURCETYPE, and INTERVAL for the scripted input

40
ProducGon
Deployment
ProducGon Environment
! Complexity managing
conguraGons across tens,
hundreds, or thousands of SHP
forwarders
! Not all indexers and search
heads receive the same
conguraGons
! Should think about version
control for deployment apps,
e.g., GitHub

42
Deployment Server Terminology
! Deployment Server - A Splunk instance that acts as a centralized conguraGon manager,
grouping together and collecGvely managing any number of Splunk instances. Any Splunk
instance can act as a deployment server, even one that is indexing data locally. Splunk
instances that are remotely congured by deployment servers are called deployment
clients.
! Deployment Client - A Splunk instance that is remotely congured by a deployment server.
! Server Class - Represents a conguraGon of Splunk deployment clients. Server classes
enable the management of a group of deployment clients as a single unit. A server class can
be used to group deployment clients together by applicaGon, OS, data type to be indexed,
or any other feature of your Splunk deployment.

43
Deployment App
! A deployment app (conguraGon bundle) is a set of deployment
content (including conguraGon les) deployed as a unit to clients of
a server class
! Located in $SPLUNK_HOME/etc/deployment-apps and pushed to
deployment clients $SPLUNK_HOME/etc/apps folder
! DO NOT store conguraGons in $SPLUNK_HOME/etc/system/local
! Use deployment apps regardless of your deployment tool

44
Deployment App - Naming ConvenGon

org group applicaGon conguraGon

acme nance apache inputs


acme markeGng iis props
splk all indexer Base
splk ps user_conf inputs

45
Deployment App - Naming ConvenGon

org group applicaGon conguraGon

acme nance apache inputs


acme markeGng iis props
splk all indexer base
splk ps user_conf inputs

46
Deployment App - Naming ConvenGon

org group applicaGon conguraGon

acme nance apache inputs


acme markeGng iis props
splk all indexer base
splk ps user_conf inputs

47
Deployment App - Naming ConvenGon

org group applicaGon conguraGon

acme nance apache inputs


acme markeGng iis props
splk all indexer base
splk ps user_conf inputs

48
Deployment App - Naming ConvenGon

org group applicaGon conguraGon

acme nance apache inputs


acme markeGng iis props
splk all indexer base
splk ps user_conf inputs

49
Deployment App - Naming ConvenGon
splk_ps_user_conf_inputs

org group applicaGon conguraGon

acme nance apache inputs


acme markeGng iis props
splk all indexer base
splk ps user_conf inputs

50
Deployment Apps
mba13:apps $ ls -la
! SplunkForwarder
! SplunkLightForwarder
! Splunk_for_AcGveDirectory
! Splunk_for_Exchange
! splk_all_deploymentclient
! splk_all_forwarder_outputs
! splk_all_indexer_base
! splk_all_search_base
! splk_ps_user_conf_inputs
! splk_ps_user_conf_props
! splk_ps_user_conf_web
! splunk_app_was
user-prefs

51
CollecGng Syslog
! Send device, e.g., routers, rewalls
to a syslog collector
! Write les to this directory
structure: /sourcetype/host/log.txt
! Monitor the sourcetype level cisco_asa

my.rewall.name
# CISCO ASA
[monitor:///data/cisco_asa//]
sourcetype = cisco_asa
host_segment = 3
index = firewall

52
Summary
! Test in a non-producGon environment
! Always use key props parameters:

TIME_PREFIX
TIME_FORMAT
MAX_TIMESTAMP_LOOKAHEAD
SHOULD_LINEMERGE
LINE_BREAKER
TRUNCATE
! Deploy apps to /etc/apps; not /etc/system/local
! Clear predictable naming convenGon
! When youre stuck, use Answers and Re-Use apps from Apps.Splunk.com

53
Resources
! Get educated: hdp://www.splunk.com/view/educaGon/SP-CAAAAH9
! Download Splunk applicaGons: hdp://apps.splunk.com/
! Hire Splunk Professional Services:
hdp://www.splunk.com/view/professional-services/SP-CAAABH9
! Watch some videos: hdp://www.splunk.com/videos

54
THANK YOU

You might also like