You are on page 1of 61

Logo

[]

ElasticSearch

[1.0]

0 61

Logo

[]

[2014.1.21]

1 61

Logo

[]

.................................................................................................................3

.........................................................................................................................3

.........................................................................................................................3

.................................................................................................................................3

................................................................................................3

1.

2.

3.

.....................................................................................................................................4
1.1.

.........................................................................................................................4

1.2.

....................................................................................................4

1.3.

.........................................................................................................6
1.3.1.

Cluster............................................................................................................6

1.3.2.

Shards.............................................................................................................6

1.3.3.

Replicas..........................................................................................................6

1.3.4.

Recovery........................................................................................................7

1.3.5.

River...............................................................................................................7

1.3.6.

Gateway.........................................................................................................7

1.3.7.

discovery.zen..................................................................................................7

1.3.8.

Transport........................................................................................................7

.........................................................................................................................8
2.1.

.................................................................................................................8

2.2.

.............................................................................................................8

2.3.

.........................................................................................................9

2.4.

...............................................................................................................12

Java API............................................................................................................................15
3.1.

...........................................................................................................15
3.1.1.

Node ....................................................................................................15

3.1.2.

TransportClient ...................................................................................16

3.2.

put Mapping .........................................................................16

3.3.

...............................................................................................................19

3.4.

.......................................................................................................19

2 61

Logo

[]

3.5.

.......................................................................................................................20

3.6.

.......................................................................................................21

3.7.

MongoDB ........................................................................................22

3.8.

More like this ..........................................................25

3 61

Logo

[]

1.0

2014.1.21

1.

Elasticsearch http://www.elasticsearch.cn/guide/

ElasticSearch

ES

Elasticsearch

4 61

Logo

[]

1.
1.1.
ElasticSearch Lucene RESTful
HTTP
JSON

JSON HTTP

Elasticsearch

1.2.
Github
Github Elasticsearch 20TB 13 1300
Github 2013 1
solr elasticsearch 26 8
https://github.com/blog/1381-a-whole-new-code-search
Foursquare
5 Foursquare Elasticsearch
Foursquare
Foursquare

SoundCloud
SoundCloud Elasticsearch 1.8
SoundCloud Alexa
236 SoundCloud

5 61

Logo

[]

2 100 SoundCloud Flash

Fog Creek
Elasticsearch Fog Creek 400 3
StumbleUpon
Elasticsearch StumbleUpon
StumbleUpon
stumble

25 HBase elasticsearch
elasticsearch solr
solr elasticsearch
Mozilla
Mozilla WarOnOrange
json elasticsearch bug
Socorro Mozilla Hbase Postgres
Hbase elasticsearch
Sony
Sony elasticsearch
Infochimps
Infochimps 25 4TB
Infochimps
hadoop

6 61

Logo

[]

1.3. Scaling Lucene


Lucene ?
lucene
elasticsearch
building blocks
partitioning
lucene
replication
transaction log elasticsearch

1.3.1. Building Blocks


lucene
lucene .

Directory

Lucene Directory lucene

7 61

Logo

[]

lucene Directory

IndexWriter

IndexWriter lucene
flush
flush commit
io
, IndexWriter
.

Index Segments

Lucene segments lucene



commit lucene
(MergePolisy, MergeScheuler, etc)Because segments
need to be kept at bay they are being merged continuously by internal Lucene processes
(MergePolisy, MergeScheuler, etc)
caching
term skip lists ) FieldCache

IndexReader

IndexReader IndexWriter
IndexReader
IndexWriter IndexReader flush
.

Near Real-Time Search

IndexReader
IndexReader
( near real-time ).

8 61

Logo

[]

1.3.2. Partitioning
Lucene Possible approach to Scale Lucene

Distributed Directory

Lucene chunks
Coherence, Terracota, GigaSpaces or Infinispan )
IndexWriter IndexReader Directory
.

lucene
ps solandra
IndexReader IndexReader
term
IndexWriter

Partitioning

2 scale : Document based


partitioning and Term based partitioning. Elasticsearch

Document Based Partitioning

ID

9 61

Logo

[]

O(K*N) K
Term FieldN

, talk by Jeffrey Dean


(Google)
Term Based Partitioning
index
Riak Search (built on top of Riak key-value store engine)
Lucandra/Solandra (on top of Cassandra).

5 term
5 5 term
50
K Term O(K)

Lucene Segment
The main problem is that whole notion of Lucene Segment which is inherent to a lot of
constructs in Lucene is lost.

expand term fuzzy prefix

google PageRank

faceting

1.3.3. Replication
(replication) 2

10 61

Logo

[]

High Availability (HA )

scalability ()
slave nodes

: Push Replication Pull Replication


Elasticsearch Push Replication()

Push Replication

[master]
(document) [replica]

(
Lucene )
You index the same document several times but we transfer much less data compared to
Pull replication (and Lucene is known to index very fast)

versioning


(:
) refresh IndexReader

primary
shard

Pull Replication

master slave(Solr ) master


commit segment

11 61

Logo

[]

slavesegments
lucene

master commit segment slave


pull lucene commit
commit

( EC2 $$$)
2 stored
fields Lucene 2

slaves master
commit slave
high availability slavea real time high available
slave 1
lucene

1.3.4. Transaction Log


commit

Data Persistency

ElasticSearch transaction log ((write ahead log))


commit
push replication
shard transaction log
replay
Transaction log (flushed)

12 61

Logo

[]

elasticsearch trasncation
log es kill -9
Transaction log
shared gateway snapshot peer shard recovery shard
Hot relocation

Shared Gateway Snapshot

gateway (changes) ( snapshots )


shared storage) transaction log

Peer Shard Reovery

( )
gateway
Lucene segment files)
flushing commit
transaction log ie
transaction log replica replay

blocking

2.
http://www.elasticsearch.org/download/ elasticsearch
0.20.5es bug
:bin config lib
plugins

2.1.
elasticsearch linux bin/elasticsearch windows

13 61

Logo

[]

bin/elasticsearch.bat elasticsearch
cluster.name es

2.2.
elasticsearch-servicewrapper es
es es es
ctrl+c es https://github.com/elasticsearch/elasticsearch-servicewrapper
service es bin
bin/service/elasticsearch +
console es
start es
stop es
install es
remove
service elasticsearch.conf java

#es home
set.default.ES_HOME=<Path to ElasticSearch Home>
# es
set.default.ES_MIN_MEM=256
# es
set.default.ES_MAX_MEM=1024
#
wrapper.startup.timeout=300
#
wrapper.shutdown.timeout=300
# ping ()
wrapper.ping.timeout=300

2.3.
elasticsearch smartcn medcl
14 61

Logo

[]

es ik mmseg

ik
plugin -install medcl/elasticsearch-analysis-ik/1.1.0
github
https://github.com/medcl/elasticsearch-rtf/blob/master/elasticsearch/plugins/analysisik/elasticsearch-analysis-ik-1.2.5.jar
plugin --install //
plugin --url file://path/to/plugin --install plugin-name

ik config
cd config
wget http://github.com/downloads/medcl/elasticsearch-analysis-ik/ik.zip --no-checkcertificate
unzip ik.zip
rm ik.zip
mmseg
bin/plugin -install medcl/elasticsearch-analysis-mmseg/1.1.0
config
cd config
wget http://github.com/downloads/medcl/elasticsearch-analysis-mmseg/mmseg.zip --nocheck-certificate
unzip mmseg.zip
rm mmseg.zip

ik elasticsearch.yml
index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider

index.analysis.analyzer.ik.type : ik

mmseg elasticsearch.yml
index:
15 61

Logo

[]

analysis:
analyzer:
mmseg:
alias: [news_analyzer, mmseg_analyzer]
type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider

index.analysis.analyzer.default.type : "mmseg"
mmseg
index:
analysis:
tokenizer:
mmseg_maxword:
type: mmseg
seg_type: "max_word"
mmseg_complex:
type: mmseg
seg_type: "complex"
mmseg_simple:
type: mmseg
seg_type: "simple"
es
mapping
mapping
{
"page":{
"properties":{
"title":{
"type":"string",
"indexAnalyzer":"ik",
"searchAnalyzer":"ik"
},
"content":{
"type":"string",
"indexAnalyzer":"ik",
"searchAnalyzer":"ik"
}
}
}
}
indexAnalyzer searchAnalyzer
java mapping

16 61

Logo

[]

XContentBuilder content = XContentFactory.jsonBuilder().startObject()


.startObject("page")
.startObject("properties")
.startObject("title")
.field("type", "string")
.field("indexAnalyzer", "ik")
.field("searchAnalyzer", "ik")
.endObject()
.startObject("code")
.field("type", "string")
.field("indexAnalyzer", "ik")
.field("searchAnalyzer", "ik")
.endObject()
.endObject()
.endObject()
.endObject()

api indexname
http://localhost:9200/indexname/_analyze?analyzer=ik&text= elasticsearch

ik https://github.com/medcl/elasticsearch-analysis-ik
mmseg https://github.com/medcl/elasticsearch-analysis-mmseg
es
https://github.com/medcl/elasticsearch-rtf

2.4.
elasticsearch config elasticsearch.yml logging.yml
es es log4j
logging.yml log4j
elasticsearch.yml
cluster.name: elasticsearch
es elasticsearches es

node.name: "Franz Kafka"


name es jar config
17 61

Logo

[]

name.txt
node.master: true
node truees
master master
node.data: true
true
index.number_of_shards: 5
5
index.number_of_replicas: 1
1
path.conf: /path/to/conf
es config
path.data: /path/to/data
es data

path.data: /path/to/data1,/path/to/data2
path.work: /path/to/work
es work
path.logs: /path/to/logs
es logs
path.plugins: /path/to/plugins
es plugins
bootstrap.mlockall: true
true jvm swapping es
swap ES_MIN_MEM ES_MAX_MEM
es elasticsearch linux
`ulimit -l unlimited`
network.bind_host: 192.168.0.1
ip ipv4 ipv6 0.0.0.0
network.publish_host: 192.168.0.1
ip
ip
network.host: 192.168.0.1
bind_host publish_host
transport.tcp.port: 9300

18 61

Logo

[]

tcp 9300
transport.tcp.compress: true
tcp false
http.port: 9200
http 9200
http.max_content_length: 100mb
100mb
http.enabled: false
http true
gateway.type: local
gateway local
hadoop HDFS amazon s3

gateway.recover_after_nodes: 1
N 1
gateway.recover_after_time: 5m
5
gateway.expected_nodes: 2
2 N

cluster.routing.allocation.node_initial_primaries_recoveries: 4
4
cluster.routing.allocation.node_concurrent_recoveries: 2
4
indices.recovery.max_size_per_sec: 0
100mb 0
indices.recovery.concurrent_streams: 5
5
discovery.zen.minimum_master_nodes: 1
N master 1
2-4
discovery.zen.ping.timeout: 3s
ping 3

discovery.zen.ping.multicast.enabled: false

19 61

Logo

[]

true
discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3[portX-portY]"]
master

index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug:500ms
index.search.slowlog.threshold.fetch.trace: 200ms

2.5.

2.5.1. elasticsearch-head
elasticsearch-head elasticsearch html5
es index.html
git https://github.com/Aconex/elasticsearch-head

1.elasticsearch/bin/plugin -install Aconex/elasticsearch-head


2. es
3. http://localhost:9200/_plugin/head/
es git
es ip connect
es

20 61

Logo

[]

node stats cluster nodes es


api json

info action info mapping


action

21 61

Logo

[]

browser

Structured Query
product boolquerytitle price 10
100

Any Request rest es


api product 1
api es

22 61

Logo

[]

2.5.2. elasticsearch-bigdesk
bigdesk elasticsearch es
cpuhttp git https://gith
ub.com/lukas-vlcek/bigdesk head
head

1.bin/plugin -install lukas-vlcek/bigdesk


2. es
3. http://localhost:9200/_plugin/bigdesk/
index.html
ip

23 61

Logo

[]

cpu

jvm
jvm jvm heap
heap gc

24 61

Logo

[]

es
cpu cpu

ps

Total virtual linux virtual memory map


++jar +jre
resident memory

tcp http

25 61

Logo

[]

26 61

Logo

[]

3. Moduls
3.1.1. Cluster

es
es
es

3.1.2. Shards
es

3.1.3. Replicas
es
es es

3.1.4. Recovery
es

27 61

Logo

[]

3.1.5. River
es es
es river es
river couchDB RabbitMQ Twitter Wikipedia river

3.1.6. Gateway
es es
es gateway es
gatewayHadoop HDFS amazon
s3

3.1.7. discovery.zen
es es p2p

3.1.8. Transport
es tcp
http json thriftservletmemcachedzeroMQ

28 61

Logo

[]

4. Java API
4.1.
elasticsearch es
es Node es es
TransportClient es

4.1.1. Node

import static org.elasticsearch.node.NodeBuilder.*;


//
Node node = nodeBuilder().node();
Client client = node.client();
//
node.close();
es es
cluster.name
es
node.data false node.client
true
Node node = nodeBuilder().clusterName(clusterName).client(true).node();

es jvm es
JVM local true
Node node = nodeBuilder().local(true).node();

4.1.2. TransportClient
TransportClient es
29 61

Logo

[]

es ip
Client client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));
client.close();
elasticsearch
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = new TransportClient(settings);
client.transport.sniff true
ip ip

Settings settings = ImmutableSettings.settingsBuilder()


.put("client.transport.sniff", true).build();
TransportClient client = new TransportClient(settings);

4.2. put Mapping


Mapping,
es mapping
mapping es

mapping mapping
mapping
mapping[mapping ].json
config/mappings/[] mapping
mapping default-mapping.json config
json
{
"mappings":{
"properties":{
"title":{
"type":"string",
"store":"yes"
},

30 61

Logo

[]

"description":{
"type":"string",
"index":"not_analyzed"
},
"price":{
"type":"double"
},
"onSale":{
"type":"boolean"
},
"type":{
"type":"integer"
},
"createDate":{
"type":"date"
}
}
}
}
mapping productIndex mapping
json productIndex properties type
store "index":"not_analyzed"
{
"productIndex":{
"properties":{
"title":{
"type":"string",
"store":"yes"
},
"description":{
"type":"string",
"index":"not_analyzed"
},
"price":{
"type":"double"
},
"onSale":{
"type":"boolean"
},
"type":{
"type":"integer"
},

31 61

Logo

[]

"createDate":{
"type":"date"
}
}
}
}
java api

client.admin().indices().prepareCreate("productIndex").execute().actionGet();
put mapping
XContentBuilder mapping = jsonBuilder()
.startObject()
.startObject("productIndex")
.startObject("properties")
.startObject("title").field("type", "string").field("store", "yes").endObject()
.startObject("description").field("type", "string").field("index",
"not_analyzed").endObject()
.startObject("price").field("type", "double").endObject()
.startObject("onSale").field("type", "boolean").endObject()
.startObject("type").field("type", "integer").endObject()
.startObject("createDate").field("type", "date").endObject()
.endObject()
.endObject()
.endObject();
PutMappingRequest mappingRequest =
Requests.putMappingRequest("productIndex").type("productIndex").source(mapping);
client.admin().indices().putMapping(mappingRequest).actionGet();

4.3.
es json es java api

XContentBuilder doc = jsonBuilder()


.startObject()
.field("title", "this is a title!")
.field("description", "descript what?")
.field("price", 100)
.field("onSale", true)
.field("type", 1)
.field("createDate", new Date())
32 61

Logo

[]

.endObject();
client.prepareIndex("productIndex","productType").setSource(doc).execute().actionGet(
);
productIndex es productType

4.4.
api id json id
Query
id
twitter tweetid 1
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
.execute()
.actionGet();
Query
productIndextitle query
QueryBuilder query = QueryBuilders.fieldQuery("title", "query");
client.prepareDeleteByQuery("productIndex").setQuery(query).execute().actionGet();

api api
api operationThreaded operationThreaded
api
operationThreaded true
false
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
.setOperationThreaded(false)
.execute()
.actionGet();

http://www.elasticsearch.org/guide/reference/api/delete.html
http://www.elasticsearch.org/guide/reference/java-api/delete.html

33 61

Logo

[]

4.5.
elasticsearch json java api
QueryBuilder elasticsearch queryDSL QueryBuilder
QueryBuildersfilter FilterBuilders QueryBuilder
import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb1 = termQuery("name", "kimchy");
QueryBuilder qb2 = boolQuery()
.must(termQuery("content", "test1"))
.must(termQuery("content", "test4"))
.mustNot(termQuery("content", "test2"))
.should(termQuery("content", "test3"));
QueryBuilder qb3 = filteredQuery(
termQuery("name.first", "shay"),
rangeFilter("age")
.from(23)
.to(54)
.includeLower(true)
.includeUpper(false)
);
qb1 TermQuery name
lucene TermQuery qb2 BoolQuery
lucene BooleanQuery mustshouldmustNot QueryBuilder
qb3 TermQuery
RangeFilter age 23 54
elasticsearch
Query elasticsearch
SearchResponse response = client.prepareSearch("test")
.setQuery(query)
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();
test query 0 60

34 61

Logo

[]

SearchResponse SearchResponse
SearchHits hits = searchResponse.hits();
for (int i = 0; i < 60; i++) {
System.out.println(hits.getAt(i).getSource().get("field"));
}
SearchResponse SearchHits hits.getAt(i).getSource().get("field") field

4.6.
elasticsearch java api
BulkRequestBuilder index/delete BulkRequestBuilder
BulkRequestBuilder
import static org.elasticsearch.common.xcontent.XContentFactory.*;
BulkRequestBuilder bulkRequest = client.prepareBulk();
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elastic Search")
.endObject()
)
);
bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "another post")
.endObject()
)
);
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {

35 61

Logo

[]

//
}

4.7. MongoDB
elasticsearch river es es
couchDB mongodb mongodb
git elasticsearch-river-mongodb
aparo mongodb
id mongodb id

richardwilly98 mongodb oplog mongodb


oplog es
mongodb mongodb oplog
monogodb mongodb gridfs
local oplog
local
mongodb
local

Elasticsearch 0.19.X
MongoDB 2.X
mongodb mongodb oplog

elasticsearch-mapper-attachments gridfs
%ES_HOME%\bin\plugin.bat -install elasticsearch/elasticsearch-mapper-attachments/1.4.0
elasticsearch-river-mongodb
%ES_HOME%\bin\plugin.bat -install laigood/elasticsearch-river-mongodb/laigoodv1.0.0
river
curl
36 61

Logo

[]

$ curl -XPUT "localhost:9200/_river/mongodb/_meta" -d '


{
type: "mongodb",
mongodb: {
db: "test",
host: "localhost",
port: "27017",
collection: "testdb",
fields:"title,content",
gridfs: "true",
local_db_user: "admin",
local_db_password:"admin",
db_user: "user",
db_password:"password"
},
index: {
name: "test",
type: "type",
bulk_size: "1000",
bulk_timeout: "30"
}
}
db
host mongodb ip localhost
port mongodb
collection
fields
gridfs gridfs collection gridfs true
local_db_user local
local_db_password local
db_user
db_password
name
type
bulk_size
bulk_timeout
java api
client.prepareIndex("_river", "testriver", "_meta")
37 61

Logo

[]

.setSource(
jsonBuilder().startObject()
.field("type", "mongodb")
.startObject("mongodb")
.field("host","localhost")
.field("port",27017)
.field("db","testdb")
.field("collection","test")
.field("fields","title,content")
.field("db_user","user")
<span style="white-space:pre">
</span>.field("db_password","password")
.field("local_db_user","admin")
<span style="white-space:pre">
</span>.field("local_db_password","admin")
.endObject()
.startObject("index")
.field("name","test")
.field("type","test")
.field("bulk_size","1000")
.field("bulk_timeout","30")
.endObject()
.endObject()
).execute().actionGet();
git https://github.com/laigood/elasticsearch-river-mongodb

4.8. More like this



Lucene api MoreLikeThisElasticsearch
Elasticsearch More like this

json
{
"more_like_this" : {
"fields" : ["title", "content"],
"like_text" : "text like this one",
}
}

38 61

Logo

[]

fields _all
like_text

percent_terms_to_matchterm 0.3
min_term_freq
2
max_query_terms 25
stop_words
min_doc_freq

max_doc_freq

min_word_len 0
max_word_len
boost_terms 1
boost 1
analyzer
java api

MoreLikeThisRequestBuilder mlt = new MoreLikeThisRequestBuilder(client,


"indexName", "indexType", "id");
mlt.setField("title");//
SearchResponse response = client.moreLikeThis(mlt.request()).actionGet();
id client
Query
MoreLikeThisQueryBuilder query = QueryBuilders.moreLikeThisQuery();
query.boost(1.0f).likeText("xxx").minTermFreq(10);
boostlikeText
MoreLikeThisQueryBuilder
MoreLikeThisFieldQueryBuilder query =
QueryBuilders.moreLikeThisFieldQuery("fieldNmae");

39 61

Logo

[]

5.
5.1.

cluster.routing.allocation.allow_rebalance
always,
indices_primaries_active indices_all_active indices_all_active

cluster.routing.allocation.cluster_concurrent_rebalance
2

cluster.routing.allocation.node_initial_primaries_recoveries
local gateway
cluster.routing.allocation.node_concurrent_recoveries
2
cluster.routing.allocation.disable_allocation

api
cluster.routing.allocation.disable_replica_allocation
api
indices.recovery.concurrent_streams
peer 5

rack_id

node.rack_id: rack_one
40 61

Logo

[]

rack_id rack_one
rack_id
cluster.routing.allocation.awareness.attributes: rack_id
rack_id node.rack_id
rack_one 5
node.rack_id rack_two
rack_id

cluster.routing.allocation.awareness.attributes: rack_id,zone

zone zonezone1 zone2.

cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
cluster.routing.allocation.awareness.attributes: zone
node.zone zone1 5
5 node.zone
zone2

include/exclude
:
tag
tag node.tag: value1 node.tag: value2
tag value1 value2

index.routing.allocation.include.tag value1,value2
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'
index.routing.allocation.exclude.tag value3
41 61

Logo

[]

tag value3
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.exclude.tag" : "value3"
}'
include exclude value*
_ip ip

node.group1: group1_value1
node.group2: group2_value4
include exclude
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.group1" : "xxx"
"index.routing.allocation.include.group2" : "yyy",
"index.routing.allocation.exclude.group3" : "zzz",
}'
api
api
ip
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'

5.2.
Elasticsearch
index cached
search cached
bulk cached
refresh cached
type
blocking
min: 1

42 61

Logo

[]

size: 30
wait_time: 30s

cache
cache

threadpool:
index:
type: cached
fixed
fixed
size cpu 5
queue_size -1
reject_policy abort
caller io
threadpool:
index:
type: fixed
size: 30
queue: 1000
reject_policy: caller
blocking
blocking min 1size
cpu 5 queue_size 1000
wait_time 60 io

threadpool:
index:
type: blocking
min: 1
size: 30
wait_time: 30s

43 61

Logo

[]

5.3.

Java6Mustang 2006
Java7(Dolphin)

ElasticSearch Java6 7
Elasticsearch Java
ElasticSearch
ElasticSearch

Elasticsearch

Elasticsearch JVM
Elasticsearch 0.19.11
JVM

Elasticsearch

Environment

-Xms

256m

ES_MIN_MEM

-Xmx

1g

ES_MAX_MEM

-Xms and -Xmx

ES_HEAP_SIZE

-Xmn

ES_HEAP_NEW
SIZE

-XX:MaxDirectMemorySize

ES_DIRECT_SI
ZE

-Xss

256k

-XX:UseParNewGC

-XX:UseConcMarkSweepGC

75

XX:CMSInitiatingOccupancyFraction
-

XX:UseCMSInitiatingOccupancyOnly
-XX:UseCondCardMark

(commented

44 61

Logo

[]

out)
Elasticsearch 256M 1GB

./bin/elasticsearch -f Elasticsearch
Elasticsearch
2GB RAM
ES_MIN_MEM/ES_MAX_MEM ES_HEAP_SIZE

ES_HEAP_NEWSIZE
ES_DIRECT_SIZE JVM NIO
64
Elasticsearch ( OOM)
Java JVM
JVM parameter

Garbage collector

-XX:+UseSerialGC

serial collector

-XX:+UseParallelGC

parallel collector

-XX:+UseParallelOldGC

Parallel compacting collector

-XX:

Concurrent-Mark-Sweep (CMS)

+UseConcMarkSweepGC
-XX:+UseG1GC

collector
Garbage-First collector (G1)

UseParNewGC UseConcMarkSweepGC
UseConcMarkSweepGC UseParNewGC Serial collector
Java6
CMSInitiatingOccupancyFraction CMSConcurrent-Mark-Sweep
75.
JVM 75%
GC
UseCondCardMark card table marking
store UseCondCardMark Garbage-First
card table marking
ElasticSearch
45 61

Logo

[]

Apache Cassandra JVM


ElastciSearch
1. 1GB
2. gc 75%

3. Java7 G1 ElasticSearch Java7u4


JVM
JVM
Java

I/O
Java JVM
JVM
OOM

JVM
CMS Java

Java Java

Elasticsearch 128K
256K Java7 Java6 Java7
continuations Continuations

green threadfiber I/O



I/OElasticsearch Netty I/O Guava Elasticsearch
Java7
CPU
46 61

Logo

[]

JVM CPU
JVM Sloaris Sparc 64
JVM Xss 512KSloaris X86 320KLinux
256KWindows 32 Java6 320KWindows 64 1024K

GB G

Java 2006 Java6


GC stop - the - world CMS
GC
Prateek Khanna Aaron Morton CMS
Stop-the-world
Elasticsearch CMS GC
CMS
MB CMS

MB Lucene segment-based
CMS Lucene
index.merge.policy.segments_per_tier

Java JVM


GC

Java JDK 7u4 Garbage-FirstG1 Java7


G1

47 61

Logo

[]

1. 50% Java
2. promotion
3. gc compaction 0.5 1s
G1
G1 CPU
CPU CMS
Elasticsearch G1 stop-the-world
buffer memory I/O
G1 CPU

1.
2. log everything
3.
4.
5.
6.
7.
Elasticsearch
Elasticsearch GC warns
[2012-11-26 18:13:53,166][WARN ][monitor.jvm

] [Ectokid] [gc][ParNew]

[1135087][11248] duration [2.6m], collections [1]/[2.7m], total [2.6m]/[6.8m], memory [2.4gb]>[2.3gb]/[3.8gb], all_pools {[Code Cache] [13.7mb]->[13.7mb]/[48mb]}{[Par Eden Space]
[109.6mb]->[15.4mb]/[1gb]}{[Par Survivor Space] [136.5mb]->[0b]/[136.5mb]}{[CMS Old Gen]
[2.1gb]->[2.3gb]/[2.6gb]}{[CMS Perm Gen] [35.1mb]->[34.9mb]/[82mb]}
JvmMonitorService
Logfile

Explanation

gc

gc

ParNew

new parallel garbage collector

duration 2.6m

gc 2.6

collections [1]/[2.7m]

2.7

memory [2.4gb]->[2.3gb]/[3.8gb]

, 2.4gb, 2.3gb,

48 61

Logo

[]

3.8gb
Code Cache [13.7mb]->[13.7mb]/

code cache

[48mb]
Par Eden Space [109.6mb]->[15.4mb]/

Par Eden Space

[1gb]
Par Survivor Space [136.5mb]->[0b]/

Par Survivor Space

[136.5mb]
CMS Old Gen [2.1gb]->[2.3gb]/[2.6gb]

CMS Old Gen

CMS Perm Gen [35.1mb]->[34.9mb]/

CMS Perm Gen

[82mb]
JvmMonitorSer

1. Java 6u22 Elasticsearch bug


bug Elasticsearch OpenJDK 6
Sun/Oracle bug
2. Java6 Java7Oracle Java6 2013 2
Elasticsearch JVM
Java

3. Java sa Java
Java
4. Elasticsearch Elasticsearch
3
5. JVM
Elasticsearch
index.merge.policy.segments_per_tierparameter
6.
7.
8. CMS -XX:CMSWaitDuration
9. 6-8GB CMS
stop-the-world CMSInitiatingOccupancyFraction
GC G1
49 61

Logo

[]

10. JVM java


java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version

6.
6.1. Guice
elasticsearch google guice spring 100
spring
guice
elasticsearch guice es
jar es jar 10M
org.elasticsearch.common.inject
Guice Module Module
bind(A).to(B) Guice

public class BillingModule extends AbstractModule {


@Override
protected void configure() {
bind(TransactionLog.class).to(DatabaseTransactionLog.class);
bind(CreditCardProcessor.class).to(PaypalCreditCardProcessor.class);
bind(BillingService.class).to(RealBillingService.class);
}
}
AbstractModule
bind("interface").to("implement")
public class RealBillingService implements BillingService {
private final CreditCardProcessor processor;
private final TransactionLog transactionLog;

50 61

Logo

[]

@Inject
public RealBillingService(CreditCardProcessor processor,
TransactionLog transactionLog) {
this.processor = processor;
this.transactionLog = transactionLog;
}

public Receipt chargeOrder(PizzaOrder order, CreditCard creditCard) {


try {
ChargeResult result = processor.charge(creditCard, order.getAmount());
transactionLog.logChargeResult(result);
return result.wasSuccessful()
? Receipt.forSuccessfulCharge(order.getAmount())
: Receipt.forDeclinedCharge(result.getDeclineMessage());
} catch (UnreachableException e) {
transactionLog.logConnectException(e);
return Receipt.forSystemFailure(e.getMessage());
}
}
}
BillService @Inject Guice
Injector @Inject
CreditCardLog TransactionLog
public static void main(String[] args) {
Injector injector = Guice.createInjector(new BillingModule());
BillingService billingService = injector.getInstance(BillingService.class);
...
}
main Injector Guice
elasticsearch

51 61

Logo

[]

elasticsearch guice ModulesBuilder es es

PluginsModule
SettingsModule
NodeModule
NetworkModule
NodeCacheModule
ScriptModule
JmxModulejmx
EnvironmentModule
NodeEnvironmentModule
ClusterNameModule
ThreadPoolModule
DiscoveryModule
ClusterModule
RestModulerest
TransportModuletcp
HttpServerModulehttp
RiversModuleriver
IndicesModule
SearchModule
ActionModule
MonitorModule
GatewayModule
NodeClientModule

6.2.
elasticsearch

52 61

Logo

[]

java api es json es


XContent IndexRequest
indextype id ides UUID
uuid IndexRequest process
if (allowIdGeneration) {
if (id == null) {
id(UUID.randomBase64UUID());
opType(IndexRequest.OpType.CREATE);
}
}
netty TransportService tcp es rest
http
TransportAction
TransportShardReplicationOperationAction AsyncShardOperationAction.start()
id

private int shardId(ClusterState clusterState, String index, String type, @Nullable


String id, @Nullable String routing) {
if (routing == null) {
if (!useType) {
return Math.abs(hash(id) % indexMetaData(clusterState,
index).numberOfShards());
} else {
return Math.abs(hash(type, id) % indexMetaData(clusterState,
index).numberOfShards());
}
}
return Math.abs(hash(routing) % indexMetaData(clusterState,
index).numberOfShards());
}

TransportIndexAction.shardOperationOnPrimary
routing
MappingMetaData mappingMd =
clusterState.metaData().index(request.index()).mappingOrDefault(request.type());
if (mappingMd != null && mappingMd.routing().required()) {
if (request.routing() == null) {
throw new RoutingMissingException(request.index(), request.type(),
53 61

Logo

[]

request.id());
}
}
INDEX id
CREATE id

if (request.opType() == IndexRequest.OpType.INDEX)
InternalIndexShard
Engine.Index index = indexShard.prepareIndex(sourceToParse)
.version(request.version())
.versionType(request.versionType())
.origin(Engine.Operation.Origin.PRIMARY);
indexShard.index(index);
InternalIndexShardtype mapping
json mapping ParsedDocument
public Engine.Index prepareIndex(SourceToParse source) throws
ElasticSearchException {
long startTime = System.nanoTime();
DocumentMapper docMapper =
mapperService.documentMapperWithAutoCreate(source.type());
ParsedDocument doc = docMapper.parse(source);
return new Engine.Index(docMapper, docMapper.uidMapper().term(doc.uid()),
doc).startTime(startTime);
}
RobinEngine () lucene
lucene RobinEngine.innerIndex
if (currentVersion == -1) {
// document does not exists, we can optimize for create
if (index.docs().size() > 1) {
writer.addDocuments(index.docs(), index.analyzer());
} else {
writer.addDocument(index.docs().get(0), index.analyzer());
}
} else {
if (index.docs().size() > 1) {
writer.updateDocuments(index.uid(), index.docs(), index.analyzer());
} else {
writer.updateDocument(index.uid(), index.docs().get(0), index.analyzer());
}
}

54 61

Logo

[]

TranslogTranslog
flush
Translog.Location translogLocation = translog.add(new Translog.Create(create));

7.
7.1.
lucene elasticsearchsolr

Caused by: java.io.EOFException: read past EOF:


NIOFSIndexInput(path="/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/_59ct.f
dt")
lucene

elasticsearch lucene lucene elasticsearch


luke lucene
CheckIndex lucene-core jar org.apache.lucene.index
segments -fix
segments segments
, es lib
cd es_home/lib

java -cp lucene-core-3.6.1.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex


/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/

Segments file=segments_2cg numSegments=26 version=3.6.1 format=FORMAT_3_1


[Lucene 3.1+] userData={translog_id=1347536741715}

55 61

Logo

[]

1 of 26: name=_59ct docCount=4711242


compound=false
hasProx=true
numFiles=9
size (MB)=6,233.694
diagnostics = {mergeFactor=13, os.version=2.6.32-71.el6.x86_64, os=Linux,
lucene.version=3.6.1 1362471 - thetaphi - 2012-07-17 12:40:12, source=merge, os.arch=amd64,
mergeMaxNumSegments=-1, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_59ct_1b.del]
test: open reader.........OK [3107 deleted docs]
test: fields..............OK [25 fields]
test: field norms.........OK [10 fields]
test: terms, freq, prox...OK [36504908 terms; 617641081 terms/docs pairs; 742052507
tokens]
test: stored fields.......ERROR [read past EOF:
MMapIndexInput(path="/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/_59ct.f
dt")]
java.io.EOFException: read past EOF:
MMapIndexInput(path="/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/_59ct.f
dt")
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:307)
at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:400)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:253)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:492)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:1138)
at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:852)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:581)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1064)
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]
FAILED

56 61

Logo

[]

WARNING: fixIndex() would remove reference to this segment; full exception:


java.lang.RuntimeException: Stored Field test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:593)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1064)
WARNING: 1 broken segments (containing 4708135 documents) detected
WARNING: 4708135 documents will be lost
5 _59ct.fdt .fdt lucene
fields test: stored fields
segment 4708135
-fix ps

java -cp lucene-core-3.6.1.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex


/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/ -fix

NOTE: will write new segments file in 5 seconds; this will remove 4708135 docs from the
index. THIS IS YOUR LAST CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file "segments_2ch"
4708135
4708135
id

57 61

Logo

[]

7.2.

7.2.1. gc
gc jvm gc master ping3 zen
discovery ping 3

1 gc gc 2 zen discovery es
ping_retrieses ping_timeout

7.2.2. out of memory


es Field Data Cache
facet
out of
memory

1 es Soft Reference

Java OutOfMemory null


Cache
OutOfMemory es index.cache.field.type: soft
2 es index.cache.field.max_size:
50000 field 50000 index.cache.field.expire: 10m
10

58 61

Logo

[]

7.2.3.
es RecoverFilesRecoveryException[[index][3] Failed to transfer [215] files
with total size of [9.4gb]]; nested: OutOfMemoryError[unable to create new native thread]; ]]
too many open file
jvm /
*1024*1024

max user processes 1024

1 jvm heap xss 512K


2/etc/security/limits.d/90-nproc.conf soft nproc

1024 1024

7.2.4.
[7]: index [index], type [index], id [1569133], message [UnavailableShardsException[[index]
[1] [4] shardIt, [2] active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@5989fa07]]
2

es quorum quorum /2+1


2/2+1=2 quorum 1

12 one

59 61

Logo

[]

7.2.5. jvm
bootstrap.mlockall: true es Unknown mlockall error 0 linux
45k
linux ulimit -l unlimited

7.2.6. api

deleteByQuery BoolQuery id
BoolQuery 1024 100
es
bulkRequest

(This is the last page)

60 61

You might also like