软糖

Elastic Search Step by Step 1

This is follow up Elastic Search - Getting Start Post

More details and step by step instruction

Mapping and indexing data

Mapping

A mapping is a schema definition.

1
2
3
4
5
6
7
8
9
10
11
12
13
curl -X PUT http://{{host}}:9200/movies -H 'Content-Type: application/json' -d '
{
"mappings": {
"movie": {
"properties": {
"year": { "type": "date" }
}
}
}
}
curl -H 'Content-Type: application/json' -X GET http://{{host}}:9200/movies/_mapping/movie?pretty
'

Import Document

1
2
3
4
5
6
7
8
curl -X PUT http://{{host}}:9200/movies/movie/109487 -H 'Content-Type: application/json' -d '
{
"genre": ["IMAX", "Sci-Fi"],
"title": "Interstella",
"year": 2014
}'
curl -H 'Content-Type: application/json' -X GET http://{{host}}:9200/movies/movie/_search?pretty

Bulk import

1
2
3
wget http://media.sundog-soft.com/es/movies.json
curl -X PUT http://{{host}}:9200/_bulk?pretty --data-binary @movies.json -H 'Content-Type: application/json'

Update

Every document has a _version field

Elasticsearch documents are immutable.

  • a new doc is created with an incremented _version
  • the old document is marked for deletion
1
2
3
4
5
6
7
8
9
curl -H 'Content-Type: application/json' -X GET http://{{host}}:9200/movies/movie/109487?pretty
curl -X POST http://{{host}}:9200/movies/movie/109487/_update -H 'Content-Type: application/json' -d '
{
"doc": {
"title": "instellar"
}
}
'

create doc with same id, it also increase _version field. such as:

1
2
3
4
5
6
curl -X PUT http://{{host}}:9200/movies/movie/109487 -H 'Content-Type: application/json' -d '
{
"genre": ["IMAX", "Sci-Fi"],
"title": "Interstella",
"year": 2014
}'

Delete

1
2
3
4
5
# first of all, search id
curl -H 'Content-Type: application/json' -X GET http://{{host}}:9200/movies/_search?q=Dark_
# delete by id
curl -H 'Content-Type: application/json' -X DELETE http://{{host}}:9200/movies/movie/109487?pretty

Concurrency

Optimistic concurrency control

In short, ELK use _version field to control version of data

For conflict _version: Use retry_on_conflicts+N to automatically retry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
curl -H 'Content-Type: application/json' -X PUT http://{{host}}:9200/movies/movie/109487?version=3 -d '
{
"genre": ["Imax", "Sci-Fi"],
"title": "Interstellar Foo",
"year": 2014,
}
'
# keey to retry if conflict on data
curl -H 'Content-Type: application/json' -X POST http://{{host}}:9200/movies/movie/109487/_update?retry_on_conflict=5 -d '
{
"doc": {
"title": "Interstellar"
}
}
'

Data modeling

Normalized data VS Denormalized data

think about Denormalized data in ELK

Strategies for relational data

In short answer, join field (ELK 6)

parent and child data should be store in same shards

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
curl -H 'Content-Type: application/json' -X PUT http://{{host}}:9200/series -d '
{
"mappings": {
"movie": {
"properties": {`
"film_to_franchise": {
"type": "join", "relations": { "franchise": "film" }
}
}
}
}
}
wget http://media.sundog-soft.com/es6/series.json
curl -H 'Content-Type: application/json' -X PUT http://{{host}}:9200/_bulk --data-binary @series.json
'
# Query by parent
curl -H 'Content-Type: application/json' -X GET http://{{host}}:9200/series/movie/_search?pretty -d '
{
"query": {
"has_parent": {
"parent_type": "franchise",
"query": {
"match": { "title": "Star War" }
}
}
}
}
'
# Query by child
curl -H 'Content-Type: application/json' -X GET http://{{host}}:9200/series/movie/_search?pretty -d '
{
"query": {
"has_child": {
"type": "film",
"query": {
"match": { "title": "The Force Awakens" }
}
}
}
}