ES中的聚合方式
聚合方式
ES支持灵活的聚合方式,它不仅支持聚合和查询相结合,而且还可以使聚合的过滤条件不影响搜索条件,并且还支持在聚合后的结果中进行过滤筛选。
1.1 直接聚合
直接聚合指的是聚合时的DSL中国没有query子句,是直接对索引内的所有文档进行聚合。
比如下面的DSL:
1.2 先查询再聚合
与直接聚合相对应,这种查询方式需要增加query
子句,query
子句和普通的query
查询没有区别,参加聚合的文档必须匹配query
查询。示例如下:
# 先查询再聚合
GET /hotel_poly/_search
{
"size": 0,
"query": {
"term": {
"city": {
"value": "北京"
}
}
},
"aggs": {
"my_agg": {
"avg": {
"field": "price"
}
}
}
}
在Java中先查询再聚合的逻辑如下:
public void getQueryAggSearch() throws IOException {
//创建搜索请求
SearchRequest searchRequest = new SearchRequest("hotel_poly");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
String avgAggName = "my_avg"; //avg聚合的名称
//定义sum聚合,指定字段为价格
AvgAggregationBuilder avgAgg = AggregationBuilders.avg(avgAggName).field("price");
//添加聚合
searchSourceBuilder.aggregation(avgAgg);
//构建query查询
searchSourceBuilder.query(QueryBuilders.termQuery("city", "北京"));
searchRequest.source(searchSourceBuilder); //设置查询请求
//执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHits searchHits = searchResponse.getHits(); //获取搜索结果集
log.info("--------hit--------");
for (SearchHit searchHit : searchHits) {
String index = searchHit.getIndex();
String id = searchHit.getId();
float score = searchHit.getScore();
String source = searchHit.getSourceAsString();
log.info("index={},id={},source={}", index, id, source);
}
log.info("--------agg--------");
//获取聚合结果
Aggregations aggregations = searchResponse.getAggregations();
ParsedAvg avg = aggregations.get(avgAggName); //获取聚合返回的对象
String avgName = avg.getName(); //获取聚合名称
double avgVal = avg.getValue(); //获取聚合值
log.info("avgName={},avgVal={}", avgName, avgVal);
}
1.3 前过滤器
有时需要对聚合条件进一步过滤,但是又不能影响当前的查询条件。例如用户进行酒店搜索时的搜索条件是天津的酒店,但是聚合时需要将非满房的酒店平均价格进行聚合并展示给用户。此时不能变更用户的查询条件,需要在聚合子句中添加过滤条件。
# 前过滤器,在聚合子句中添加过滤条件
GET /hotel_poly/_search
{
"query": {
"term": {
"city": {
"value": "天津"
}
}
},
"aggs": {
"my_agg": {
"filter": {
"term": {
"full_room": false
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "price"
}
}
}
}
}
}
通过上述结果可以知道,满足查询条件的文档个数为2,命中的文档为004和005,但是在聚合时要求匹配非满房的酒店,只有文档004满足聚合条件,因此酒店的平均值为文档004的price字段值。
在Java中使用前过滤器的逻辑如下:
public void getFilterAggSearch() throws IOException {
//创建搜索请求
SearchRequest searchRequest = new SearchRequest("hotel_poly");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
String filterAggName = "my_terms"; //聚合的名称
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("full_room", false);
FilterAggregationBuilder filterAggregationBuilder = AggregationBuilders.filter(filterAggName, termQueryBuilder);
String avgAggName = "my_avg"; //avg聚合的名称
//定义聚合,指定字段为价格
AvgAggregationBuilder avgAgg = AggregationBuilders.avg(avgAggName).field("price");
//为filter聚合添加子聚合
filterAggregationBuilder.subAggregation(avgAgg);
searchSourceBuilder.aggregation(filterAggregationBuilder); //添加聚合
//构建term查询
searchSourceBuilder.query(QueryBuilders.termQuery("city","天津"));
searchRequest.source(searchSourceBuilder); //设置查询请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);//执行搜索
//获取聚合结果
Aggregations aggregations = searchResponse.getAggregations();
//获取sum聚合返回的对象
ParsedFilter filter = aggregations.get(filterAggName);
Avg avg = filter.getAggregations().get(avgAggName);
String key = avg.getName(); //获取聚合名称
double avgVal = avg.getValue(); //获取聚合值
log.info("key={},avgVal={}",key,avgVal);
}
1.4 后过滤器
在有些场景中,需要根据条件进行数据查询,但是聚合的结果集不受影响。例如在酒店搜索场景中,用户的查询词为”假日“,此时应该展现标题中带有”假日“的酒店。但是在该页面中,如果还希望给用户呈现北京市的酒店的平均价格,这时可以使用ES提供的后过滤器功能。该过滤器是在查询和聚合之后进行过滤的,因此它的过滤条件对聚合没有影响。
# 后过滤器
GET /hotel_poly/_search
{
"query": {
"match": {
"title": "假日"
}
},
"post_filter": {
"term": {
"city": "北京"
}
},
"aggs": {
"my_agg": {
"avg": {
"field": "price",
"missing": 200
}
}
}
}
在上面的查询中,使用match
匹配title中包含”假日“的酒店,并且查询出这些酒店的平均价格,最后使用post_filter
设置后过滤器的条件,将酒店的城市锁定为”北京“,执行该DSL后,ES返回的结果如下:
根据查询结果可知,match
查询命中了4个文档,对这4个文档的price字段取平均值为364,最后通过post_filter
将其中的文档004过滤掉,因此hits子句中的total数量为3。
在Java中使用后过滤器的逻辑如下:
public void getPostFilterAggSearch() throws IOException{
SearchRequest searchRequest = new SearchRequest("hotel_poly");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
String avgAggName="my_avg"; //avg聚合的名称
//定义sum聚合,指定字段为价格
AvgAggregationBuilder avgAgg = AggregationBuilders.avg(avgAggName).field("price");
avgAgg.missing(200); //设置默认值为200
searchSourceBuilder.aggregation(avgAgg);
//构建term查询
searchSourceBuilder.query(QueryBuilders.matchQuery("title","假日"));
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("city", "北京");
searchSourceBuilder.postFilter(termQueryBuilder);
searchRequest.source(searchSourceBuilder); //设置查询请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
//获取聚合结果
Aggregations aggregations = searchResponse.getAggregations();
Avg avg=aggregations.get(avgAggName);
String key = avg.getName(); //获取聚合名称
double avgVal = avg.getValue(); //获取聚合值
log.info("key={},avgVal={}",key,avgVal);
}
数据源
索引结构
PUT /hotel_poly
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"title":{
"type": "text"
},
"city":{
"type": "keyword"
},
"price":{
"type": "double"
},
"create_time":{
"type": "date"
},
"full_room":{
"type": "boolean"
},
"location":{
"type": "geo_point"
},
"tags":{
"type": "keyword"
},
"comment_info":{
"properties": {
"favourable_comment":{
"type":"integer"
},
"negative_comment":{
"type":"integer"
}
}
}
}
}
}
酒店数据
POST /_bulk
{"index":{"_index":"hotel_poly","_id":"001"}}
{"title":"文雅假日酒店","city":"北京","price":556.00,"create_time":"20200418120000","full_room":true,"location":{"lat":39.938838,"lon":106.449112},"tags":["wifi","小型电影院"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"002"}}
{"title":"金都嘉怡假日酒店","city":"北京","create_time":"20210315200000","full_room":false,"location":{"lat":39.915153,"lon":116.4030},"tags":["wifi","免费早餐"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"003"}}
{"title":"金都假日酒店","city":"北京","price":200.00,"create_time":"20210509160000","full_room":true,"location":{"lat":40.002096,"lon":116.386673},"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"004"}}
{"title":"金都假日酒店","city":"天津","price":500.00,"create_time":"20210218080000","full_room":false,"location":{"lat":39.155004,"lon":117.203976},"tags":["wifi","免费车位"]}
{"index":{"_index":"hotel_poly","_id":"005"}}
{"title":"文雅精选酒店","city":"天津","price":800.00,"create_time":"20210101080000","full_room":true,"location":{"lat":39.178447,"lon":117.219999},"tags":["wifi","充电车位"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
- 0
- 0
-
分享