2021年3月5日星期五

Elasticsearch Collapsing Based on Text Similarity

I'm working with Elasticsearch and trying to come up with a way to filter a text field based on a phrase. I have basic searching working, but I also want to collapse "similar" results rather than duplicating them.

For example, given 5 objects with text content as

  • Buy 1 car get one car free until March
  • Buy 1 car get one car free until April
  • 50% off your car insurance when you buy through us
  • Get 50% off your oven

If searching for car then I'd be looking for 2 results:

  • 50% off your car insurance [...]
  • EITHER of the 1st or 2nd one (with both showing in inner_hits)

I've tried to do this using collapse on the content field but that will only collapse on exact matches.

    'query' => [          'match' => [              'content' => 'car',          ],      ],      'collapse' => [          'field' => 'content',          'inner_hits' => [              'name' => 'recently_seen_on',              'size' => 3,              'sort' => [['seen_on' => 'desc']],           ],       ],  

I've also tried creating adding a similarity property to the content field but I couldn't figure out if it's possible to collapse using that.

I also come across this https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-significanttext-aggregation.html but when I tried something similar I got 0 result. I set the content type to keywords in the mappings:

[      'content' => ['type' => 'keyword'],  ]  

And then using:

'query' => [      'match' => [          'content' => 'car',      ],  ],  'aggs' => [      'keywords' => [          'significant_text' => [              'field' => 'content',              'filter_duplicate_text' => true,          ],      ],  ],  

Is achieving something like this possible without coming adding a field that groups fields based on content manually?

https://stackoverflow.com/questions/66501391/elasticsearch-collapsing-based-on-text-similarity March 06, 2021 at 09:06AM

没有评论:

发表评论