java 如何在 Lucene 4 中搜索 int 字段?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14074613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to search an int field in Lucene 4?
提问by Konrad Garus
I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like:
我正在尝试实现文档索引(大致对应于 DB 行),其中一个字段是整数。我将它们添加到索引中,例如:
Document doc = new Document();
doc.add(new StringField("ticket_number", rs.getString("ticket_number"),
Field.Store.YES));
doc.add(new IntField("ticket_id", rs.getInt("ticket_id"),
Field.Store.YES));
doc.add(new StringField("id_s", rs.getString("ticket_id"),
Field.Store.YES));
w.addDocument(doc);
It seems I can't query the ticket_id
field at all, while id_s
works just fine.
似乎我根本无法查询该ticket_id
字段,而id_s
工作得很好。
One of the documents is (I added whitespace for readability):
其中一个文件是(我添加了空格以提高可读性):
Document<
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W>
stored<ticket_id:152>
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>>
So my int field is stored, but not indexed. This query works as expected: id_s:152
, while this one never returns anything: ticket_id:152
.
所以我的 int 字段被存储,但没有索引。此查询按预期工作:id_s:152
,而此查询从不返回任何内容:ticket_id:152
。
What am I doing wrong? How can I add such a field to the index and make it searchable?
我究竟做错了什么?如何将这样的字段添加到索引并使其可搜索?
回答by mindas
Below works for me:
以下对我有用:
RAMDirectory idx = new RAMDirectory();
IndexWriter writer = new IndexWriter(
idx,
new IndexWriterConfig(Version.LUCENE_40, new ClassicAnalyzer(Version.LUCENE_40))
);
Document document = new Document();
document.add(new StringField("ticket_number", "t123", Field.Store.YES));
document.add(new IntField("ticket_id", 234, Field.Store.YES));
document.add(new StringField("id_s", "234", Field.Store.YES));
writer.addDocument(document);
writer.commit();
IndexReader reader = DirectoryReader.open(idx);
IndexSearcher searcher = new IndexSearcher(reader);
Query q1 = new TermQuery(new Term("id_s", "234"));
TopDocs td1 = searcher.search(q1, 1);
System.out.println(td1.totalHits); // prints "1"
Query q2 = NumericRangeQuery.newIntRange("ticket_id", 1, 234, 234, true, true);
TopDocs td2 = searcher.search(q2, 1);
System.out.println(td2.totalHits); // prints "1"
As femtoRgon pointed out, for numeric values (longs, dates, floats, etc.) you need to have NumericRangeQuery
and specify precision. Otherwise Lucene has no idea how do you want to define similarity.
正如 femtoRgon 指出的那样,对于数值(长、日期、浮点数等),您需要具有NumericRangeQuery
并指定精度。否则 Lucene 不知道你想如何定义相似性。
回答by femtoRgon
Numeric Fields can be queried with a NumericRangeQuery. For an exact match, simply set the max and min to equal values.
可以使用NumericRangeQuery查询数字字段。对于精确匹配,只需将最大值和最小值设置为相等的值。
Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. Considering that the field is transformed into Lucene's numeric representation, the literal value 152
will indeed not be indexed
指示该字段未编入索引的输出可能是由于与文本值相比,数字值的编入索引方式不同。考虑到字段转化为Lucene的数值表示,字面值152
确实不会被索引
At a glance, however, it's possible that your handling of id_s may be the better alternative. IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. If you don't need numeric sorting or range querying on the field, indexing as a StringField
certainly makes more sense.
但是,乍一看,您对 id_s 的处理可能是更好的选择。ID 通常不作为数字值处理,而是作为碰巧用数字表示的简单标识符处理。如果您不需要对字段进行数字排序或范围查询,那么将索引作为 aStringField
肯定更有意义。
回答by D.Ogranos
Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term
另一个答案来自这个线程(第三个答案):Lucene 4.0 IndexWriter updateDocument for Numeric Term
Basically, you create a Term with your int value like this:
基本上,您使用 int 值创建一个 Term ,如下所示:
String field = "myfield";
int value = 4711;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(value, 0, bytes);
Term term = new Term(field, bytes);
Then you can use this term for searching, or deleting/updating your index. In a first test, this worked fine for me. I can't tell if this is the "right" way to do things however. I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead.
然后你可以使用这个词来搜索,或者删除/更新你的索引。在第一次测试中,这对我来说效果很好。然而,我不知道这是否是做事的“正确”方式。我之前使用过 NumericRangeFilter 来过滤 IntFields,但现在我倾向于使用这种方法并使用常规的 termFilter 或 TermQueries。