You are on page 1of 49

Copyright

2013 Splunk Inc.

How to Write Modular Inputs


Igor Stojanovski
Server Engineer at Splunk
#splunkconf

Legal NoGces
During the course of this presentaGon, we may make forward-looking statements regarding future events or the
expected performance of the company. We cauGon you that such statements reect our current
expectaGons and esGmates based on factors currently known to us and that actual events or results could dier
materially. For important factors that may cause actual results to dier from those contained in our forward-
looking statements, please review our lings with the SEC. The forward-looking statements made in this
presentaGon are being made as of the Gme and date of its live presentaGon. If reviewed aSer its live
presentaGon, this presentaGon may not contain current or accurate informaGon. We do not assume any
obligaGon to update any forward-looking statements we may make. In addiGon, any informaGon about
our roadmap outlines our general product direcGon and is subject to change at any Gme without noGce. It is for
informaGonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk
undertakes no obligaGon either to develop the features or funcGonality described or to include any such feature or
funcGonality in a future release.

Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of
Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecCve
owners.
2013 Splunk Inc. All rights reserved.

Who am I?
!
!

Igor Stojanovski
I have worked on:

Windows port
Data inputs
Indexer
Search commands
Clustering
Modular Inputs

AGENDA
!
!
!
!

Birds eye view of inputs in Splunk


Scripted Inputs
IntroducGon to Modular Inputs
Developing a Modular Input

Amazon S3 input
!

QuesGons

Inputs in Splunk
!

Inputs are managed by the Data Inputs page:

Inputs in Splunk (contd)


Reside in inputs.conf
[monitor://$SPLUNK_HOME/etc/splunk.version]
!

_TCP_ROUTING = *
index = _internal
sourcetype = splunk_version
!

Managed by endpoints

Accessed via the management port


https://localhost:8089/services/data/inputs

Scripted Inputs
!

A way to extend input capabiliGes

[script://./bin/my_script.sh]
!

Great for running simple scripts

UNIX app
!

Easy to index the output of the script

What is Wrong with Scripted Inputs?


!
!

No conguraGon support
Data processing issues

Event boundaries

Universal Forwarder issues

Managing mulGple source streams


!

CreaGng a UI is hard

What are Modular Inputs?


!

A beier way to add new inputs to Splunk


An interface

Uses a script for input

Inputs with a NaGve Look and Feel


!

Dened inputs become rst-class ciGzens

inputs.conf (e.g. [twiier://#splunk])


!

Auto-generated endpoint in /services/data/inputs

CRUD acGons
Support for senng source/sourcetype/host/index
Support of enable/disable acGons
!

Splunk provides a default UI

Can be customized
!

Can be managed with an SDK


10

Amazon S3 Input App


!
!

Online storage web service from Amazon Web Services


IdenGes resources with a URI, such as:

s3://bucket-name/dir/le.txt
!

Consists of buckets with les and directories

Note: s3 buckets are dissimilar from Splunk indexer buckets


!

Needs Key ID and Secret key to access

11

Dening the UI for the S3 app


!

Goal is to have this inputs feel blends with the other inputs

12

ConguraGon for Our S3 Input


!

Sample inputs.conf:

[s3://bucket2/http_logs/access.log]
key_id = AKQWERRQWWAG5J2Y6HGA
secret_key = 0UrnXj2D6YvDio/xwvoCrikEjCKbXV8V5casdfQ6
index = test1
sourcetype = access_combined
disabled = true

13

Dening a Modular Input (Requirements)


!

Create a Modular Input

Provided inside an app


App name: s3
Input scheme: s3
!

$SPLUNK_HOME/etc/apps/s3/README/inputs.conf.spec

Input scheme name (s3)


ConguraGon parameters (key_id, secret_key)
Endpoint arguments
!

$SPLUNK_HOME/etc/apps/s3/bin/s3.py

14

LocaGng a Script
!

Splunkd searches for a suitable script in apps bin directory

$SPLUNK_HOME/etc/apps/s3/bin/s3.py
!

Name must match the scheme name

On *nix: s3.sh, s3.py, s3


On Windows: s3.bat, s3.cmd, s3.py, s3.exe

15

Splunkd Runs the Script in Three Scenarios


!

IntrospecGon

$SPLUNK_HOME/etc/apps/s3/bin/s3.py --scheme

Runs the script in input mode

$SPLUNK_HOME/etc/apps/s3/bin/s3.py

ConguraGon passed via script's STDIN


Script invocaGon per input stanza
!

External validaGon (opGonal)

$SPLUNK_HOME/etc/apps/s3/bin/s3.py --validate-arguments

16

Steps for IniGalizing a Modular Input


!
!

Searches all apps for presence of inputs.conf.spec


Reads inputs.conf.spec spec le

Contains scheme name and parameters


!
!

Searches for a suitable script


Runs introspecGon

$SPLUNK_HOME/etc/apps/s3/bin/s3.py --scheme
!

Runs the script in input mode

$SPLUNK_HOME/etc/apps/s3/bin/s3.py

17

Spec File for Our S3 App


$ cat etc/apps/s3/README/inputs.conf.spec

[s3://<name>]

key_id = <value>
* This is Amazon key ID.

secret_key = <value>
* This is the secret key.

18

S3 Script Template (etc/apps/s3/bin/s3.py)


import sys
def print_scheme():
sys.exit(0)
def run():
pass
if __name__ == "__main__":
if len(sys.argv) > 1:
if sys.argv[1] == "--scheme":
print_scheme()
else:
run()
sys.exit(0)
19

Lets see it
20

Specializing the Input via IntrospecGon


!
!
!
!

When run with --scheme, the script can return an XML document
Adds descripGve text for the UI
Allows declaraGon on whether endpoint arguments are required
Data input mode (simple or xml)

21

Endpoint Parameter ValidaGon


!
!
!

IntrospecGon can contain a validaGon


The endpoint will return an HTTP status 400 on invalid data
ValidaGon examples:

<validaGon>
is_port('port_num')
</validaGon>

<validaGon>
validate(is_pos_int('param1') AND 'param1' > 100, "param1 must be > 100.")
</validaGon>
22

Running the IntrospecGon


$ python splunk/etc/apps/s3/bin/s3.py --scheme
<scheme>
<Gtle>Amazon S3</Gtle>
<descripGon>Get data from Amazon S3.</descripGon>
<use_external_validaGon>true</use_external_validaGon>
<streaming_mode>xml</streaming_mode>

[to be conGnued]

23

Running the IntrospecGon (contd)


[conCnued]
<endpoint>
<args><arg name="name">
<Gtle>Resource name</Gtle>
<descripGon>An S3 resource ...</descripGon>
</arg>
<arg name="key_id">
<Gtle>Key ID</Gtle>
<descripGon>Your Amazon key.</descripGon>
</arg>
[skipped]
</args></endpoint>
</scheme>
24

More helpful UI

25

How Splunk Runs the Script

26

Passing ConguraGon to the Script


!
!

ConguraGon is serialized into XML and passed to scripts STDIN


Given the following inside etc/system/local/inputs.conf:

[s3://splunk-2/access.common.log]
key_id = AKIAJIYU5KG35WTX5G6Q
secret_key = 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

27

PrinGng Script ConguraGon O


ine
Directory
for

Key for
accessing
endpoints
$ splunkd print-modinput-cong s3 s3://splunk-2/access.common.log

saving state

<input>
<session_key>b2bf1835dea8782e29e6b8ca33b42ea7</session_key>
<checkpoint_dir>/opt/splunk/var/lib/splunk/modinputs/s3</checkpoint_dir>
<congura@on>
<stanza name="s3://splunk-2/access.common.log">

Run@me
congura@on

<param name="host">Gny</param>
<param name="index">default</param>
<param name="key_id">AKIAJIYU5KG35WTX5G6Q</param>

<param name="secret_key">D8te8n9WZ2C8MRh01x8HAMJshgQoMUJLFMosg33Q</param>
</stanza>
</congura@on></input>
28

Running Your Script Oine


!

Running the script


splunkd print-modinput-config s3 s3://splunk-1 |
python $SPLUNK_HOME/etc/apps/s3/bin/s3.py

More insight into whats going on with the --debug ag

splunkd print-modinput-config --debug s3 s3://splunk-1

29

Sending Data
!
!

Scripts STDOUT gets indexed by Splunk


Old scripted input style sGll available

STDOUT is the raw data for indexing


simple streaming mode
!

Alternate way of sending data xml mode

<streaming_mode>xml</streaming_mode>

30

Sending Data (contd)


<stream>
<event>
<data>09/08/2009 14:01:59.0398
event_status="(0)The operaGon completed."</data>
<source>my_source</source>
<index>test1</index>
</event>
[]
</stream>

31

Sending Data (contd)


<stream>
<event>
<Gme>1326831964.1</Gme>
<data>event_status="(0)The operaGon completed.</data>
</event>
<event>
<Gme>1326831964.2</Gme>
<data>event_status="(0)The operaGon completed."</data>
</event>
</stream>

32

Sending Data (contd)


<stream>
<event unbroken="1">
<data>part of the event ...</data>
</event>
<event unbroken="1">
<data>nal part of the event.</data>
<done/>
</event>
</stream>
33

Sending Data From the S3 App


$ splunkd print-modinput-cong s3 s3://splunk-3/ | python splunk/etc/apps/s3/bin/s3.py
<stream>
<event unbroken="1">
<source>s3://splunk-3/le1.txt</source><data>File 1 contents.</data></event>
<event unbroken="1">
<source>s3://splunk-3/le1.txt</source><done /></event>
<event unbroken="1">
<source>s3://splunk-3/le2.txt</source><data>File 2 contents.</data></event>
<event unbroken="1">
<source>s3://splunk-3/le2.txt</source><done /></event>
</stream>
34

Logging
!

Any STDERR from the script goes into splunkd.log

By default messages end up as level ERROR


!

Script can opGonally specify logging level:

INFO Connecting to the endpoint...


Search Splunk for messages in splunkd.log:

ERROR Unable to connect

index=_internal source=*splunkd.log (component=ModularInputs stderr) OR


component=ExecProcessor

35

ConguraGon Layering
etc/system/local/inputs.conf
[default]
x = y
host = myhost
index = default

etc/apps/search/local/inputs.conf
[monitor:///data/dir/]
sourcetype = access_combined

Layered congura@on outcome


[monitor:///data/dir/]
x = y
host = myhost
index = default
sourcetype = access_combined

36

ConguraGon Layering
etc/system/local/inputs.conf
[default]
x = y
host = myhost
index = default

etc/apps/app1/default/inputs.conf
[s3]
key_id = AKQWERRQWWAG5J2Y6HGA

etc/apps/search/local/inputs.conf
[s3://data-bucket/]
secret_key = CrikEjCKbXV8V5casdfQ6

Layered congura@on outcome


[s3://data-bucket/]
key_id = AKQWERRQWWAG5J2Y6HGA
secret_key = CrikEjCKbXV8V5casdfQ6
host = myhost
index = default

37

External Script ValidaGon

38

Lets see it in acGon


39

Saving State
!
!
!

Ability to save state checkpoint


LocaGon directory provided by checkpoint_dir as part of the
conguraGon
Directory managed by Splunk: can be deleted by:
splunk clean inputdata
splunk clean all

40

Input Status
!

Use input status to nd out:

Is the input script running?


Why is there no searchable data?
How much data did the script send?
When did it exit and why?

Input status endpoint:

hips://localhost:8089/services/admin/inputstatus

41

Input Status (contd)


Descrip@ve
exit status

Script path

Start and
stop @me

Total bytes
sent

Cong
stanza that
it serves
42

One Script Instance Per Input Stanza Mode

43

Single Script Instance Mode

44

Single Script Instance Mode (contd)


!

Turning on Single script instance mode in introspecGon:

<scheme>
<title>Foobar monitoring</title>
<use_single_instance>true</use_single_instance>
[...]

45

Sending Data in Single Script Instance Mode


<stream>
<event stanza=s3://bucket-2/dir/le.txt>
<data>09/08/2009 14:01:59.0398 le line</data>
</event>
<event stanza=s3://bucket-3/some_le.txt>
<data>some le text</data>
</event>
</stream>
46

Summary
!
!
!
!

General overview of inputs


Scripted Inputs and what their role is
IntroducGon to Modular Inputs
Building a Modular Input

47

Where Next?
!
!

DocumentaGon

hip://docs.splunk.com/DocumentaGon/Splunk/latest/AdvancedDev/ModInputsIntro

ExisGng apps that implement Modular Inputs:

S3
Twiier
HDFS le monitor, part of the Splunk Hadoop Connect app
Windows Inputs (starGng with 6.0): perfmon, WinEventLog, WinRegMon

48

THANK YOU