文章导读

Elasticsearch 笔记

wars 15 10 月, 2020 1.02k

本文采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可。
访问 https://creativecommons.org/licenses/by-sa/4.0/ 查看该许可协议。

ElasticSearch(ES)

ElasticSearch(ES)

1) ES 基本概念名称

1.1) 基本术语

抽象成数据库

ES -> 数据库
Index -> 索引表
Type -> 表类型(新版 ES 弃用)
Document -> 行
Field -> 列

1.2) 集群术语

Shard: Primary Shard 主分片
Replica: Replica Shard 备份节点

2) ES 安装

2.1) ES 本体安装

展开查看

1. 下载二进制包并解压, 当前最新版本为 7.9.2:

```bash
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz
tar -zxvpf elasticsearch-7.9.2-linux-x86_64.tar.gz
cd elasticsearch-7.9.2
# 创建 Data 和 Log 目录
mkdir data
mkdir logs
```
2. 修改主配置文件 elasticsearch.yml
```bash
vim config/elasticsearch.yml
```
```yaml
cluster.name: es-cluster # 集群 name, 单机也可以设置一个替换默认名称
node.name: node-1 # 节点名称
path.data: /usr/local/es/elasticsearch-7.9.2/data # 数据目录
path.logs: /wars/es/elasticsearch-7.9.2/logs # 日志目录
network.host: 0.0.0.0 # 绑定 IP
#http.port: 9200 # 端口
cluster.initial_master_nodes: ["node-1"] # Cluster 中主节点
```
3. 适当调整 JVM 参数
```bash
vim config/jvm.options
```
```text
-Xms128m
-Xmx128m
```
4. 配置 Systemd 脚本, 修改自官方 Deb 二进制包, 适当修改一下脚本的一些 PATH:
```bash
vim /etc/systemd/system/elasticsearch.service
```
```text
[Unit]
Description=Elasticsearch
Documentation=https://www.elastic.co
Wants=network-online.target
After=network-online.target

[Service]
RuntimeDirectory=elasticsearch
PrivateTmp=true
WorkingDirectory=/wars/es/elasticsearch-7.9.2

User=elasticsearch
Group=elasticsearch

ExecStart=/wars/es/elasticsearch-7.9.2/bin/elasticsearch -p /wars/es/elasticsearch-7.9.2/elasticsearch.pid

# StandardOutput is configured to redirect to journalctl since
# some error messages may be logged in standard output before
# elasticsearch logging system is initialized. Elasticsearch
# stores its logs in /var/log/elasticsearch and does not use
# journalctl by default. If you also want to enable journalctl
# logging, you can simply remove the "quiet" option from ExecStart.
StandardOutput=journal
StandardError=inherit

# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65535

# Specifies the maximum number of processes
LimitNPROC=4096

# Specifies the maximum size of virtual memory
LimitAS=infinity

# Specifies the maximum file size
LimitFSIZE=infinity

# Disable timeout logic and wait until process is stopped
TimeoutStopSec=0

# SIGTERM signal is used to stop the Java process
KillSignal=SIGTERM

# Send the signal only to the JVM rather than its control group
KillMode=process

# Java process is never killed
SendSIGKILL=no

# When a JVM receives a SIGTERM signal it exits with code 143
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target

# Built for packages-7.9.2 (packages)
```
5. 创建 ES 用户并设置 ES 工作目录所有者供 Systemd 使用
```bash
useradd elasticsearch
chown -R elasticsearch:elasticsearch /wars/es/elasticsearch-7.9.2
```
6. 配置 Kernel 参数否则启动报错
```bash
vim /etc/sysctl.conf
```
```text
vm.max_map_count=262144
```
```bash
sysctl -p
```
7. 配置 ES 开机启动并启动 ES
```bash
systemctl enable elasticsearch
systemctl start elasticsearch
```

2.2) ES Header Module 安装

官方描述: 一个 ES 集群的 WEB 前端, Github 项目链接:
https://github.com/mobz/elasticsearch-head
推荐安装方式是 Chrome 插件安装, 新 EDGE 也可以使用, 插件链接:
https://chrome.google.com/webstore/detail/elasticsearch-head/ffmkiejjmecolpfloofpjologoblkegm/

3) ES 基本操作

安装完 ES 后, ES 所在机器 IP + 配置中配置的端口(默认 9200)即 ES 的接口 Base Path, 如:
10.0.96.156:9200
可以直接访问它查看 ES 的一些 About

3.1) Cluster Health

观察集群情况
GET /_cluster/health

3.2) 索引

Index 就像一张数据库表, 用于存放我们需要搜索的数据

3.2.1) 创建索引

PUT /{IndexName}

3.2.2) 查询索引

GET /_cat/indices?v

3.2.3) 删除索引

DELETE /{IndexName}

3.3) 索引映射

Index Mappings 即索引表结构, 规定 Field 的类型, 名称等

3.3.0) Mappings 常用数据类型

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

text, keyword, ~~string~~
byte, short, integer, long
float, double
boolean
date
object

3.3.1) 同时创建索引和 Mappings

PUT /{indexName}?include_type_name=true

{
  "mappings": {
    "properties": {
      "{fieldName}": {
        "type": "{type}",
        "index": {是否被索引}
      },
      "{fieldName2}": {
        "type": "{type}",
        "index": false
      }
    }
  }
}

3.3.2) 分词测试

GET /{indexName}/_analyze

{
  "field": "{fieldName}",
  "text": "I am from china"
}

3.3.3) 修改(创建) Mappings

已经创建的 Mappings Field 是不能修改的, 一般只新增 Field
POST /{indexName}/_mapping?include_type_name=true

{
  "properties": {
    "{fieldName}": {
      "type": "{type}",
      "index": {是否被索引}
    },
    "{fieldName2}": {
      "type": "{type}",
      "index": false
    }
  }
}

3.4) 文档

Document 即索引表中的每一行, 我们在搜索时, 依赖的真正数据源

3.4.1) 创建文档

POST /{indexName}/_doc/[id]
如 URL 不指定 ID 则自动生成, 此 ID 为 ES 的 ID, 我们另外还能在 Json 里指定业务 ID.
可以根据 Mappings 创建文档, 若文档中的 Field 在 Mappings 中没有, 会自动生成 Mapping

{
  "id": 1,
  "name": "wars",
  "age": 16,
  "about": "I am from China, I am from Beijing, Beijing is capital of China",
  "create_date": "2020-01-01"
}

3.4.2) 删除文档

DELETE /{indexName}/_doc/{id}

3.4.3) 编辑文档

局部替换
POST /{indexName}/_doc/{id}

{
  "doc": {
    "name": "wars"
  }
}

全文替换
PUT /{indexName}/_doc/{id}

{
  "name": "wars",
  "age": 16
}

使用 ES 提供的乐观锁编辑文档, 请求后携带两个 VERSION 参数, 查询文档时可以获取到当前的 VERSION 参数:
?if_seq_no={seq_no}&if_primary_term={primary_term}

3.4.4) 查询文档

GET /{indexName}/_doc/_search[?source=id,name]
GET /{indexName}/_doc/{id}[?source=id,name]

3.4.5) 文档是否存在

HEAD /{indexName}/_doc/{id}

3.5) 分词

`POST /_analyze'

{
  "analyzer": "standard",
  "text": "分词文字"
}

`POST /{indexName}/_analyze'

{
  "analyzer": "standard",
  "field": "",
  "text": "分词文字"
}

analyzer 即使用的分词器, ES 内置分词器如下:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html

standard：默认分词，单词会被拆分，大小会转换为小写。
simple：按照非字母分词。大写转为小写。
whitespace：按照空格分词。忽略大小写。
stop：去除无意义单词，比如 the / a / an / is …
keyword：不做分词。把整个文本作为一个单独的关键词。

4) ES 整合 IK 分词器

项目地址:
https://github.com/medcl/elasticsearch-analysis-ik

安装完成后即可获得两个分词器, ik_max_word 和 ik_smart.

4.1) 安装

Optional 1

download pre-build package from here: https://github.com/medcl/elasticsearch-analysis-ik/releases
create plugin folder cd your-es-root/plugins/ && mkdir ik
unzip plugin to folder your-es-root/plugins/ik

Optional 2

use elasticsearch-plugin to install ( supported from version v5.5.1 ):
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip

4.2) ik_max_word 和 ik_smart 什么区别？

ik_max_word: 会将文本做最细粒度的拆分，比如会将 “中华人民共和国国歌” 拆分为 “中华人民共和国，中华人民，中华，华人，人民共和国，人民，人，民，共和国，共和，和，国国，国歌”，会穷尽各种可能的组合，适合 Term Query；
ik_smart: 会做最粗粒度的拆分，比如会将 “中华人民共和国国歌” 拆分为 “中华人民共和国，国歌”，适合 Phrase 查询。

4.3) IK 自定义字典

https://github.com/medcl/elasticsearch-analysis-ik#dictionary-configuration

5) ES 搜索

5.1) QueryString 搜索

可针对于一些简单的搜索
GET /{indexName}/_search?q={field}:{搜索关键字}&q={field2}:{搜索关键字2}...

5.2) DSL(Domain Specific Language)

对于一些复杂查询, 我们根据使用 DSL, 基于 JSON 格式来构建请求

5.2.1) 查询

Query, 对 ES 中的数据做一个检索, 一些范围区间查询可以使用过滤器实现, 并不属于检索这一部分

5.2.1.1) 查询所有

POST /{indexName}/_search

{
  "query": {
    "match_all": {}
  }
}

5.2.1.2) 分词查询

POST /{indexName}/_search

{
  "query": {
    "match": {
      "{fieldName}": "{搜索关键字}"
    }
  }
}

其中 match 关键字有其他两种常用替换品:

match: 分词查询
term: 单词输入, 不分词查询
terms: 多词输入, 不分词查询
match_phrase: 匹配名词(短语), 先对传入的多个名词分词, 且按照分词后的前后顺序, 去搜索顺序匹配的 Document
- slop: 匹配名词默认要求: 必须匹配顺序, 名词在文档里词间隔需为 0 , 可以通过 slop 参数指定词间隔, 如 slop 是 2, 代表名词之间可以最多有两个词存在.
```
{
"query": {
"term": {
"{fieldName}": "{搜索关键字}"
}
}
}
```
```
{
"query": {
"terms": {
"{fieldName}": ["{搜索关键字1}", "{搜索关键字2}..."]
}
}
}
```
```
{
"query": {
"match_phrase": {
"{fieldName}": {
"query": "China Sichuan Chengdu",
"slop": 2
}
}
}
}
```

5.2.1.3) 过滤响应 Field

通过 _source 指定需要的 Field

{
  "query": {
    "match": {
      "{fieldName}": "{搜索关键字}"
    }
  },
  "_source": [ "name", "age" ]
}

5.2.1.4) 分页查询

通过 from 和 size 指定起始和单页数据量

{
  "query": {
    "match": {
      "{fieldName}": "{搜索关键字}"
    }
  },
  "_source": [ "name", "age" ],
  "from": 0,
  "size": 10
}

5.2.1.5) Match 扩展

使用 Match 扩展, 需要将将 Query 简写打开, 如:

{
  "query": {
    "match": {
      "{fieldName}": {
        "query": "{搜索关键字}"
      }
    }
  }
}

operator: 有两个选项 or(默认) 和 and

or: 分词后只需匹配其中一个词

and: 分词后必须匹配所有词

{
"query": {
"match": {
  "{fieldName}": {
    "query": "{搜索关键字}",
    "operator": "and"
  }
}
}
}

minimum_should_match: 上面的 and 相当于 100% 匹配, minimum_should_match 可以指定匹配的百分比, 文档需满足其百分比

{
  "query": {
    "match": {
      "{fieldName}": {
        "query": "{搜索关键字}",
        "minimum_should_match": "50%"
      }
    }
  }
}

5.2.1.6) IDS 查询

可通过单个或多个 ID 查询文档:

GET /{indexName}/_search

{
  "query": {
    "ids": {
      "type": "_doc",
      "values": [ "1", "2", "3" ]
    }
  }
}

5.2.1.7) 多 Field 匹配

multi_match: 将输入分词, 匹配多个字段

GET /{indexName}/_search

{
  "query": {
    "multi_match": {
      "query": "{搜索关键字}",
      "fields": [ "name", "about" ]
    }
  }
}

还可以为 Field 指定对应的权重, 提高搜索体验:
在 Field 后加上 ^{提升权重级别}, 非 multi_match 查询的时候想提高权重, 可以在 Field 标签中添加 boost 字段指定权重放大倍数

{
  "query": {
    "multi_match": {
      "query": "{搜索关键字}",
      "fields": [ "name^10", "about^20" ]
    }
  }
}

5.2.1.8) 多条件查询

ES 中的条件查询叫 bool 查询, bool 查询有三种类型, 几种类型是可以组合使用的:

must: 查询条件必须全部匹配
must_not: 查询条件必须全部不匹配
should: 查询条件匹配一个即可
should(must_not): 任意一个不匹配即可(待验证)

filter: 与 must 作用一致, 不过不统计 ES 的命中分数.

{
"query": {
"bool": {
  "must": [
  {
    "term": {
      "name": "${搜索关键字}"
    }
  },
  {
    "match": {
      "about": "${搜索关键字}"
    }
  }
  ],
  "must_not": [],
  "should": []
}
}
}

5.2.2) 是否存在

POST /{indexName}/_search

{
  "query": {
    "exists": {
      "{fieldName}": "{搜索关键字}"
    }
  }
}

5.2.3) 过滤器

Query 负责分词, 检索, 检索后的数据可以使用过滤器进行一次过滤, 通常用于实现区间查询等操作, 过滤器即 post_filter:
range 操作包含常见的 gte, lte, gt, lt, 此外过滤还可以使用 match 等操作

{
  "query": {
    "match": {
      "name": "wars"
    }
  },
  "post_filter": {
    "range": {
      "age": {
        "gte": 16,
        "lte": 24
      }
    }
  }
}

5.2.4) 排序

在过滤之后, 可以 sort 对过滤后的数据进行排序

5.2.4.1) Field 排序

Field 排序不能对 text 类型的 Field 排序, 如果想对 text 排序, 可以使用复合 mapping 为 Field 分配多个 type, 增加一个 keyword 类型

{
  "query": {
    "match": {
      "name": "wars"
    }
  },
  "sort": [
    {
      "age": "desc"
    },
    {
      "_id": "desc"
    }
  ]
}

5.2.5) 高亮

针对最后返回的结果, 可以进行一个类似与淘宝商品搜索的高亮显示

{
  "query": {
    "match": {
      "about": "Beijing"
    }
  },
  "highlight": {
    "pre_tags": ["<H>"],
    "post_tags": ["</H>"],
    "fields": {
      "desc": {}
    }
  }
}

6) ES 集群搭建

只需要两台机器就可以实现 ES 的集群了, 且能同时实现高可用和分片负载

修改主配置文件 elasticsearch.yml

vim config/elasticsearch.yml

cluster.name: es-cluster # 集群 name, 各节点须相同
node.name: node-1 # 节点名称, 各节点须不同
node.master: true # 是否主节点, 集群中只需要设置一个
node.data: true # 是否数据节点, 对外提供服务
discovery.seed_hosts: ["192.168.1.156", "192.168.1.157"] # 节点列表
cluster.initial_master_nodes: ["node-1"] # Cluster 中主节点

path.data: /usr/local/es/elasticsearch-7.9.2/data # 数据目录
path.logs: /wars/es/elasticsearch-7.9.2/logs # 日志目录
network.host: 0.0.0.0 # 绑定 IP
#http.port: 9200 # 端口

7) SpringBoot 整合 ES

7.1) 依赖

引入 SpringBoot 官方提供的依赖, 注意一下依赖最好和 ES 版本对应:
spring-boot-starter-data-elasticsearch
然后, SpringBoot 万物皆 Template, ElasticsearchRestTemplate 搞定

7.2) Config

application.yml

spring:
  data:
    elasticsearch:
      cluster-name: es-cluster
      cluster-nodes: 192.168.1.156:9200

查看评论列表

暂无评论

发表评论取消回复

访客未填写邮箱

[填写昵称邮箱后可以评论]

Elasticsearch 笔记

ElasticSearch(ES)

1) ES 基本概念名称

1.1) 基本术语

1.2) 集群术语

2) ES 安装

2.1) ES 本体安装

2.2) ES Header Module 安装

3) ES 基本操作

3.1) Cluster Health

3.2) 索引

3.2.1) 创建索引

3.2.2) 查询索引

3.2.3) 删除索引

3.3) 索引映射

3.3.0) Mappings 常用数据类型

3.3.1) 同时创建索引和 Mappings

3.3.2) 分词测试

3.3.3) 修改(创建) Mappings

3.4) 文档

3.4.1) 创建文档

3.4.2) 删除文档

3.4.3) 编辑文档

3.4.4) 查询文档

3.4.5) 文档是否存在

3.5) 分词

4) ES 整合 IK 分词器

4.1) 安装

4.2) ik_max_word 和 ik_smart 什么区别？

4.3) IK 自定义字典

5) ES 搜索

5.1) QueryString 搜索

5.2) DSL(Domain Specific Language)

5.2.1) 查询

5.2.1.1) 查询所有

5.2.1.2) 分词查询

5.2.1.3) 过滤响应 Field

5.2.1.4) 分页查询

5.2.1.5) Match 扩展

5.2.1.6) IDS 查询

5.2.1.7) 多 Field 匹配

5.2.1.8) 多条件查询

5.2.2) 是否存在

5.2.3) 过滤器

5.2.4) 排序

5.2.4.1) Field 排序

5.2.5) 高亮

6) ES 集群搭建

7) SpringBoot 整合 ES

7.1) 依赖

7.2) Config

评论（0）

发表评论 取消回复

插入代码

Wars' Blog

最新文章

Spring AnnotationConfigApplicationContext(更新)

Java JUC 库(更新)

Java 版本特性(更新)

真正的 Java 基础

ID Generation

时代

标签云

发表评论取消回复