6.1、ES搜索详解 —— 搜索机制


本章参考文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html
注意文档对应的Elasticsearch版本

索引过程:

第2象限 有 原始文档;

Elasticsearch 保存 文档的原始内容 和 对应的倒排序索引文件;

搜索过程:

在 倒排索引文件 维护的 倒排记录表 找 关键词 对应的 文档集合,然后做评分、排序、高亮,将结果返回给用户;

过滤机制:

根据条件对文档进行过滤,不计算评分;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
DELETE books
PUT books
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 3
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "ik_max_word"
},
"language": {
"type": "keyword"
},
"author": {
"type": "keyword"
},
"price": {
"type": "double"
},
"publish_time": {
"type": "date",
"format": "yyyy-MM-dd"
},
"desc": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
PUT books/_doc/1
{
"id":1,
"title":"Java编程思想",
"language":"Java",
"author": "Bruce Eckel",
"price":70.20,
"publish_time":"2017-01-02",
"desc":"Java学习必读经典,殿堂级著作!赢得全球程序员的广泛赞誉。"
}
PUT books/_doc/2
{
"id":2,
"title":"Java程序性能优化",
"language":"Java",
"author": "葛一鸣",
"price":46.50,
"publish_time":"2012-08-02",
"desc":"Java程序更快,更稳定。深入剖析软件设计层面、代码层面、JVM层面的优化方法。"
}
PUT books/_doc/3
{
"id":3,
"title":"Python科学计算",
"language":"python",
"author": "张若愚",
"price":81.40,
"publish_time":"2014-01-02",
"desc":"零基础学python, 光盘中坐着度假开发winPython运行环境,涵盖了Python各个扩展库。"
}
PUT books/_doc/4
{
"id":4,
"title":"Python基础教程",
"language":"python",
"author": "Helant",
"price":54.50,
"publish_time":"2014-03-02",
"desc":"经典的python入门教程,层次鲜明,结构严谨,内容详实。"
}
PUT books/_doc/5
{
"id":5,
"title":"JavaScript高级程序设计",
"language":"javascript",
"author": "Nicholas C. Zakas",
"price":66.40,
"publish_time":"2012-03-02",
"desc":"JavaScript 技术经典名著。"
}
GET books/_mapping

match_all搜索,简化语法:
GET books/_search
term query 搜索:
GET books/_search
{
"query": {
"term": {
"title": "思想"
}
}
}

数据量很大的情况下,需要分页:

from:开始位置

size:返回文档最大数量

1
2
3
4
5
6
7
8
9
10
GET books/_search
{
"from": 1,
"size": 2,
"query": {
"term": {
"title": "java"
}
}
}

最小评分过滤机制:

相关文档很多的情况下,相关性比较低的文档可以过滤掉

1
2
3
4
5
6
7
8
9
GET books/_search
{
"min_score": 0.6,
"query": {
"term": {
"title": "java"
}
}
}

高亮查询关键字:

1
2
3
4
5
6
7
8
9
10
11
12
13
GET books/_search
{
"query": {
"term": {
"title": "java"
}
},
"highlight": {
"fields": {
"title": {}
}
}
}