1. Overview
Upside
- Easy to setup
- Abstracts away low level
- Scales beautifully
- Feature rich (out of box)
Downside
- Poorly managed indeces
- Inefficient queries
- Web facing clasters
- Used as primary data store
Elasticsearch is Robust, Highly Available, Distributed Search and Analytics Engine.
2. Goals
- Lightning Fast Search
- Scalable
- Highly avaliable
- Distributed. no Single Point failure
- Analytics Engine
- Aggregations
- Log analysis
- Geo-location data
- Machine learning
- Near Real-Time (NRT)
- Add Doc -> Inverted Index -> Available for search
- Powerful Rest API
- Search API: localhost:9000/index_search?q=*&pretty
- DSL
3. Install
|
|
4. Terms
4.1 Compare to RDBMS
| Relationship DB | Elasticsearch |
|---|---|
| database | index |
| table | type |
| row | document |
| column | field |
4.2 Index
An Index is a Logical Namespace that points to 1 or more Shards in an Elasticsearch Cluster
Think about Shards as disk partition, or container of data.
Index is where data are stored in form of document
- Index is broken into shards
- Shards are containers for data
|
|
4.3 Type
A Representation of a class of similar Documents
For example
index/type/document
One type per index
4.4 Document
A Document in Elasticsearch is an Individual Entry that is the Primary method for adding data.
4.5 Field
A Field is an Individual Entry in an Elasticsearch Document
4.6 Example
| Object | Elasticsearch |
|---|---|
| Movies | index |
| Movie | type |
| Row | Document |
| TITLE :: “text” | field |
| RATING :: “keyword” | field |
| ACTOR_COUNT :: “int” | field |
4.7 Mapping
A mapping is a schema definition
- field types
- text, keyword, byte, short, integer, long, float, double, boolean, date
- field index
- do you want this field to be queryable? ture/false
- field analyzer
- define our tokenizer and token filter. standard / whitespace / simple/ english
- characher filters: remove html encoding convert & to and
- tokenizer: split strings on whitespace / punctuation/ non-letter
- token filter: lowercasing, stemming, synonyms, stopwords
- standard: splits on word boundaries
- simple: splits on anything isn’t a letter, and lowercases
- whitespace: splits on whitespace but doesn’t lowercase
- language: i.e. engligh. accounts for language-specific stopwords and stemming
- define our tokenizer and token filter. standard / whitespace / simple/ english
analyzer
Sometimes text fields should be exact-match
- use keyword mapping type to suppress analyzing (eact match only)
- use text type to allow analyzing
search on analyzed field will return anything remotely relevent
|
|
|
|
|
|
5. Shards and Replicas
Shards
- Default: Five shards per index
- Make Elasticsearch distributed
- Auto-balanced by failover
Replicas
Replicas are Duplications of Primary Shards
- Takes over if primary fails
- Node rejoins after failure
- New node asynchronizes with others
Example
2 nodes, 5 shards and 1 replica
N = Primary shard
R = Replica shard
|
|
|
|
- Cannot set shards after created
- But could change replicas
6. Bulk API
Allows you to index multiple documents at one time.
|
|
|
|
7. REST Queries
| DB | Verb |
|---|---|
| Create | POST |
| Read | GET |
| Update | POST (partial) / PUT (whole) |
| Delete | Delete |
curl -X <VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
|
|