全栈开发那些事

全栈开发那些事

ES中的聚合方式

2024-06-25
ES中的聚合方式

聚合方式

ES支持灵活的聚合方式,它不仅支持聚合和查询相结合,而且还可以使聚合的过滤条件不影响搜索条件,并且还支持在聚合后的结果中进行过滤筛选。

1.1 直接聚合

直接聚合指的是聚合时的DSL中国没有query子句,是直接对索引内的所有文档进行聚合。

比如下面的DSL:

image-20240423102328958

1.2 先查询再聚合

与直接聚合相对应,这种查询方式需要增加query子句,query子句和普通的query查询没有区别,参加聚合的文档必须匹配query查询。示例如下:

# 先查询再聚合
GET /hotel_poly/_search
{
  "size": 0,
  "query": {
    "term": {
      "city": {
        "value": "北京"
      }
    }
  },
  "aggs": {
    "my_agg": {
      "avg": {
        "field": "price"
      }
    }
  }
}

在Java中先查询再聚合的逻辑如下:

public void getQueryAggSearch() throws IOException {
    //创建搜索请求
    SearchRequest searchRequest = new SearchRequest("hotel_poly");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    String avgAggName = "my_avg";     //avg聚合的名称
    //定义sum聚合,指定字段为价格
    AvgAggregationBuilder avgAgg = AggregationBuilders.avg(avgAggName).field("price");
    //添加聚合
    searchSourceBuilder.aggregation(avgAgg);
    //构建query查询
    searchSourceBuilder.query(QueryBuilders.termQuery("city", "北京"));
    searchRequest.source(searchSourceBuilder);  //设置查询请求
    //执行搜索
    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    SearchHits searchHits = searchResponse.getHits();   //获取搜索结果集
    log.info("--------hit--------");
    for (SearchHit searchHit : searchHits) {
        String index = searchHit.getIndex();
        String id = searchHit.getId();
        float score = searchHit.getScore();
        String source = searchHit.getSourceAsString();
        log.info("index={},id={},source={}", index, id, source);
    }
    log.info("--------agg--------");
    //获取聚合结果
    Aggregations aggregations = searchResponse.getAggregations();
    ParsedAvg avg = aggregations.get(avgAggName);   //获取聚合返回的对象
    String avgName = avg.getName(); //获取聚合名称
    double avgVal = avg.getValue(); //获取聚合值
    log.info("avgName={},avgVal={}", avgName, avgVal);
}

1.3 前过滤器

有时需要对聚合条件进一步过滤,但是又不能影响当前的查询条件。例如用户进行酒店搜索时的搜索条件是天津的酒店,但是聚合时需要将非满房的酒店平均价格进行聚合并展示给用户。此时不能变更用户的查询条件,需要在聚合子句中添加过滤条件。

# 前过滤器,在聚合子句中添加过滤条件
GET /hotel_poly/_search
{
  "query": {
    "term": {
      "city": {
        "value": "天津"
      }
    }
  },
  "aggs": {
    "my_agg": {
      "filter": {
        "term": {
          "full_room": false
        }
      },
      "aggs": {
        "my_avg": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

通过上述结果可以知道,满足查询条件的文档个数为2,命中的文档为004和005,但是在聚合时要求匹配非满房的酒店,只有文档004满足聚合条件,因此酒店的平均值为文档004的price字段值。

在Java中使用前过滤器的逻辑如下:

public void getFilterAggSearch() throws IOException {
    //创建搜索请求
    SearchRequest searchRequest = new SearchRequest("hotel_poly");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    String filterAggName = "my_terms";    //聚合的名称
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("full_room", false);
    FilterAggregationBuilder filterAggregationBuilder = AggregationBuilders.filter(filterAggName, termQueryBuilder);

    String avgAggName = "my_avg"; //avg聚合的名称
    //定义聚合,指定字段为价格
    AvgAggregationBuilder avgAgg = AggregationBuilders.avg(avgAggName).field("price");

    //为filter聚合添加子聚合
    filterAggregationBuilder.subAggregation(avgAgg);
    searchSourceBuilder.aggregation(filterAggregationBuilder);  //添加聚合
    //构建term查询
    searchSourceBuilder.query(QueryBuilders.termQuery("city","天津"));
    searchRequest.source(searchSourceBuilder);  //设置查询请求
    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);//执行搜索
    //获取聚合结果
    Aggregations aggregations = searchResponse.getAggregations();
    //获取sum聚合返回的对象
    ParsedFilter filter = aggregations.get(filterAggName);
    Avg avg = filter.getAggregations().get(avgAggName);
    String key = avg.getName();                 //获取聚合名称
    double avgVal = avg.getValue();             //获取聚合值
    log.info("key={},avgVal={}",key,avgVal);
}

1.4 后过滤器

在有些场景中,需要根据条件进行数据查询,但是聚合的结果集不受影响。例如在酒店搜索场景中,用户的查询词为”假日“,此时应该展现标题中带有”假日“的酒店。但是在该页面中,如果还希望给用户呈现北京市的酒店的平均价格,这时可以使用ES提供的后过滤器功能。该过滤器是在查询和聚合之后进行过滤的,因此它的过滤条件对聚合没有影响。

# 后过滤器
GET /hotel_poly/_search
{
  "query": {
    "match": {
      "title": "假日"
    }
  },
  "post_filter": {
    "term": {
      "city": "北京"
    }
  },
  "aggs": {
    "my_agg": {
      "avg": {
        "field": "price",
        "missing": 200
      }
    }
  }
}

在上面的查询中,使用match匹配title中包含”假日“的酒店,并且查询出这些酒店的平均价格,最后使用post_filter设置后过滤器的条件,将酒店的城市锁定为”北京“,执行该DSL后,ES返回的结果如下:

根据查询结果可知,match查询命中了4个文档,对这4个文档的price字段取平均值为364,最后通过post_filter将其中的文档004过滤掉,因此hits子句中的total数量为3。

在Java中使用后过滤器的逻辑如下:

public void getPostFilterAggSearch() throws IOException{
    SearchRequest searchRequest = new SearchRequest("hotel_poly");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    String avgAggName="my_avg"; //avg聚合的名称
    //定义sum聚合,指定字段为价格
    AvgAggregationBuilder avgAgg = AggregationBuilders.avg(avgAggName).field("price");
    avgAgg.missing(200);    //设置默认值为200
    searchSourceBuilder.aggregation(avgAgg);
    //构建term查询
    searchSourceBuilder.query(QueryBuilders.matchQuery("title","假日"));
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("city", "北京");
    searchSourceBuilder.postFilter(termQueryBuilder);
    searchRequest.source(searchSourceBuilder);  //设置查询请求
    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    //获取聚合结果
    Aggregations aggregations = searchResponse.getAggregations();
    Avg avg=aggregations.get(avgAggName);
    String key = avg.getName();     //获取聚合名称
    double avgVal = avg.getValue(); //获取聚合值
    log.info("key={},avgVal={}",key,avgVal);
}

数据源

索引结构

PUT /hotel_poly
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "title":{
        "type": "text"
      },
      "city":{
        "type": "keyword"
      },
      "price":{
        "type": "double"
      },
      "create_time":{
        "type": "date"
      },
      "full_room":{
        "type": "boolean"
      },
      "location":{
        "type": "geo_point"
      },
      "tags":{
        "type": "keyword"
      },
      "comment_info":{
        "properties": {
          "favourable_comment":{
            "type":"integer"
          },
          "negative_comment":{
            "type":"integer"
          }
        }
      }
    }
  }
}

酒店数据

POST /_bulk
{"index":{"_index":"hotel_poly","_id":"001"}}
{"title":"文雅假日酒店","city":"北京","price":556.00,"create_time":"20200418120000","full_room":true,"location":{"lat":39.938838,"lon":106.449112},"tags":["wifi","小型电影院"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"002"}}
{"title":"金都嘉怡假日酒店","city":"北京","create_time":"20210315200000","full_room":false,"location":{"lat":39.915153,"lon":116.4030},"tags":["wifi","免费早餐"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"003"}}
{"title":"金都假日酒店","city":"北京","price":200.00,"create_time":"20210509160000","full_room":true,"location":{"lat":40.002096,"lon":116.386673},"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"004"}}
{"title":"金都假日酒店","city":"天津","price":500.00,"create_time":"20210218080000","full_room":false,"location":{"lat":39.155004,"lon":117.203976},"tags":["wifi","免费车位"]}
{"index":{"_index":"hotel_poly","_id":"005"}}
{"title":"文雅精选酒店","city":"天津","price":800.00,"create_time":"20210101080000","full_room":true,"location":{"lat":39.178447,"lon":117.219999},"tags":["wifi","充电车位"],"comment_info":{"favourable_comment":20,"negative_comment":10}}