java Lucene 区分大小写和不敏感搜索

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2487736/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 21:26:32  来源:igfitidea点击:

Lucene case sensitive & insensitive search

javalucene

提问by zvikico

I have a Lucene index which is currently case sensitive. I want to add the optionof having a case insensitive search as a fall-back. This means that results that match the case will get more weight and will appear first. For example, if the number of results is limited to 10, and there are 10 matches which match my case, this is enough. If I only found 7 results, I can add 3 more results from the case-insensitive search.

我有一个当前区分大小写的 Lucene 索引。我想添加不区分大小写搜索的选项作为后备。这意味着与案例匹配的结果将获得更多权重并首先出现。例如,如果结果数量限制为 10,并且有 10 个匹配项符合我的情况,这就足够了。如果我只找到 7 个结果,我可以从不区分大小写的搜索中再添加 3 个结果。

My case is actually more complex, since I have items with different weights. Ideally, having a match with "wrong" case will add some weight. Needless to say, I do not want duplicate results.

我的情况实际上更复杂,因为我有不同重量的物品。理想情况下,与“错误”案例匹配会增加一些重量。不用说,我不想要重复的结果。

One possible approach is to have 2 indexes. One with case and one without and search both. Naturally, there's some redundancy here, since I need to index twice.

一种可能的方法是拥有 2 个索引。一个有案例,一个没有,并且都搜索。自然,这里有一些冗余,因为我需要索引两次。

Is there a better solution? Ideas?

有更好的解决方案吗?想法?

采纳答案by Karussell

Did you already tried copyField? see http://wiki.apache.org/solr/SchemaXml#Copy_Fields

您是否已经尝试过 copyField?见http://wiki.apache.org/solr/SchemaXml#Copy_Fields

If not define a new field B with a different configuration and copy field A into B via copyField

如果没有定义具有不同配置的新字段 B 并通过 copyField 将字段 A 复制到 B

回答by Narayan

The Lucene search is case sensitive, it's just that all input is usually lower-cased upon passing through Queryparser , so it feels like it's case insensitive. In other words, don't lower-case your input before indexing, and don't lower-case your queries (i.e. pick an Analyzer that doesn't lower-case) keyword-analyzer for example.

Lucene 搜索是区分大小写的,只是所有输入在通过 Queryparser 时通常都是小写的,所以感觉它不区分大小写。换句话说,在索引之前不要小写您的输入,也不要小写您的查询(即选择一个不小写的分析器)关键字分析器。

[setLowercaseExpandedTerms][1](boolean lowercaseExpandedTerms)

you can index the terms using case sensitive analyzer and when u want case-insensitive query use a class which doesnot convert your terms to lowercase

您可以使用区分大小写的分析器来索引术语,并且当您想要不区分大小写的查询时,请使用不会将您的术语转换为小写的类

look at Wildcard, Prefix, and Fuzzy queries

查看通配符、前缀和模糊查询