Java 与 ElasticSearch 完全匹配(在查询时)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18402480/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 01:16:24  来源:igfitidea点击:

Exact matches with ElasticSearch (at query time)

javaelasticsearch

提问by Michael Junk

I have a locations index, which has lots of location names and their respective countries.

我有一个位置索引,其中包含许多位置名称及其各自的国家/地区。

I then want to know whether we have locations with title "Berlin" in the country with country code "DE".

然后我想知道我们在国家代码为“DE”的国家/地区是否有标题为“柏林”的地点。

Here's my Java code attempt:

这是我的 Java 代码尝试:

SearchResponse response = client.prepareSearch("locations")
                .setQuery(QueryBuilders.matchQuery("title", "Berlin"))
                .setFilter(FilterBuilders.termFilter("country", "DE"))
                .execute()
                .actionGet();

But this gives me too many replies, e.g. results for "Zoo Berlin" and so on. I need exact matches.

但这给了我太多的回复,例如“Zoo Berlin”的结果等等。我需要精确匹配。

(But please note that I have other scenarios where this substring/text search matching is desired.)

(但请注意,我还有其他需要此子字符串/文本搜索匹配的场景。)

Is there a way to decide at querying time, rather than at indexing time which behaviour (exact vs. analyzed text) one wants?

有没有办法在查询时而不是在索引时决定想要哪种行为(精确文本与分析文本)?

采纳答案by Scott Rice

Index the field you perform a term filter on as not_analyzed. For example, you can index the "country" field as a multi_field, with one of the sub-fields not_analyzed:

将您执行术语过滤器的字段索引为 not_analyzed。例如,您可以将“国家/地区”字段索引为 multi_field,其中子字段之一为 not_analyzed:

        "country": {
            "type": "multi_field",
            "fields": {
                "country": {"type": "string", "index": "analyzed"},
                "exact": {"type": "string","index": "not_analyzed"}
            }
        }

Additionally, you could do the same with the "title" field in order to perform a term query:

此外,您可以对“title”字段执行相同的操作以执行术语查询:

        "title": {
            "type": "multi_field",
            "fields": {
                "title": {"type": "string", "index": "analyzed"},
                "exact": {"type": "string","index": "not_analyzed"}
            }
        }

Then at query time, if you want a title with the exact term "Berlin" filtered by the exact term "DE", use a term query and term filter with the not_analyzed fields:

然后在查询时,如果您想要一个由确切术语“DE”过滤的确切术语“柏林”的标题,请使用带有 not_analyzed 字段的术语查询和术语过滤器:

SearchResponse response = client.prepareSearch("locations")
                .setQuery(QueryBuilders.termQuery("title.exact", "Berlin"))
                .setFilter(FilterBuilders.termFilter("country.exact", "DE"))
                .execute()
                .actionGet();

Note that term filtersand term queriesrequire not_analyzed fields to work (i.e., to return exact matches).

请注意,术语过滤器术语查询需要 not_analyzed 字段才能工作(即返回完全匹配)。

回答by Dean Jain

With Version 5 + on ElasticSearch there is no concept of analyzed and not analyzed for index, its driven by type !

在 ElasticSearch 版本 5 + 中,没有分析和未分析索引的概念,它是由类型驱动的!

String data type is deprecated and is replaced with text and keyword, so if your data type is text it will behave like string and can be analyzed and tokenized.

不推荐使用字符串数据类型并替换为文本和关键字,因此如果您的数据类型是文本,它将表现得像字符串,并且可以进行分析和标记化。

But if the data type is defined as keyword then automatically its NOT analyzed, and return full exact match.

但是如果数据类型被定义为关键字,那么它会自动不被分析,并返回完全完全匹配。

SO you should remember to mark the type as keyword when you want to do exact match.

因此,当您想要进行完全匹配时,您应该记住将类型标记为关键字。

and you can use the same term query and term filter as explained by @Scott Rice.

并且您可以使用与@Scott Rice 解释的相同的术语查询和术语过滤器。

code example below for creating index with this definition, note that i have created two types for each field one as tokenizable so type is text and other one exact so type is keyword, some times its useful to keep both for certain fields:

下面用于使用此定义创建索引的代码示例,请注意,我为每个字段创建了两种类型,一种是可标记的,因此类型是文本,另一种是精确的,因此类型是关键字,有时对于某些字段保留这两种类型很有用:

PUT testindex
{
    "mappings": {
      "original": {
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "@version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "APPLICATION": {
            "type": "text",
            "fields": {
                "token": {"type": "text"},
                "exact": {"type": "keyword"}
            }
          },
          "type": {
            "type": "text",
            "fields": {
                "token": {"type": "text"},
                "exact": {"type": "keyword"}
            }
          }
        }
      }
    }
  }