java Solr 中 StandardTokenizerFactory 和 KeywordTokenizerFactory 的区别？

Question

提问by ravidev

I am new to Solr.I want to know when to use StandardTokenizerFactoryand KeywordTokenizerFactory?

我是 Solr 的新手。我想知道什么时候使用StandardTokenizerFactory和KeywordTokenizerFactory？

I read the docs on Apache Wiki, but I am not getting it.

我阅读了 Apache Wiki 上的文档，但我不明白。

Can anybody explain the difference between StandardTokenizerFactory and KeywordTokenizerFactory?

有人可以解释StandardTokenizerFactory 和 KeywordTokenizerFactory 之间的区别吗？

Answer 1

回答by Jayendra

StandardTokenizerFactory :-
It tokenizes on whitespace, as well as strips characters

StandardTokenizerFactory :-
它对空格进行标记，并去除字符

Documentation :-

文件：-

Splits words at punctuation characters, removing punctuations. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token. In that case, the whole token is interpreted as a product number and is not split. Recognizes email addresses and Internet hostnames as one token.

在标点字符处拆分单词，删除标点符号。但是，后面没有空格的点被视为标记的一部分。在连字符处拆分单词，除非标记中有数字。在这种情况下，整个令牌将被解释为产品编号并且不会被拆分。将电子邮件地址和 Internet 主机名识别为一个标记。

Would use this for fields where you want to search on the field data.

将此用于要搜索字段数据的字段。

e.g. -

例如——

http://example.com/I-am+example?Text=-Hello

would generate 7 tokens (separated by comma) -

将生成 7 个标记（以逗号分隔）-

http,example.com,I,am,example,Text,Hello

KeywordTokenizerFactory :-

Keyword Tokenizer does not split the input at all.
No processing in performed on the string, and the whole string is treated as a single entity.
This doesn't actually do any tokenization. It returns the original text as one term.

Keyword Tokenizer 根本不拆分输入。
不对字符串执行任何处理，整个字符串被视为单个实体。
这实际上并没有进行任何标记化。它将原始文本作为一个术语返回。

Mainly used for sorting or faceting requirements, where you want to match the exact facet when filtering on multiple words and sorting as sorting does not work on tokenized fields.

主要用于排序或分面要求，在过滤多个单词时要匹配确切的分面，并且排序对标记化字段不起作用。

e.g.

例如

http://example.com/I-am+example?Text=-Hello

would generate a single token -

将生成一个令牌 -

http://example.com/I-am+example?Text=-Hello

java Solr 中 StandardTokenizerFactory 和 KeywordTokenizerFactory 的区别？

提问by ravidev

回答by Jayendra

相关推荐

最近更新

标签

java Solr 中 StandardTokenizerFactory 和 KeywordTokenizerFactory 的区别？

提问by ravidev

回答by Jayendra

相关推荐

java 使用java登录网站

需要用 Java 编写一个 RESTful JSON 服务

使用 lwjgl 的 Java 3D 世界？

Java 相当于 C# 的委托/事件：动态更改事件调用代码

相关推荐

最近更新

标签