java 如何确定 SOLR 索引的字段类型?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2118634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to determine field-type for SOLR indexing?
提问by memnoch_proxy
I have two table fields in a MySQL table. One is VARCHAR and is a "headline" for a classified (classifieds website). The other is TEXT field which contains the "text" for the classified.
我在 MySQL 表中有两个表字段。一个是 VARCHAR 并且是分类(分类网站)的“标题”。另一个是 TEXT 字段,其中包含分类的“文本”。
Two Questions:
How should I determine how to index these two fields?(what field-type, what classes to use etc)
两个问题:
我应该如何确定如何索引这两个字段?(什么字段类型,使用什么类等)
Currently I have an "ad_id" as a unique identifier for each ad, example "bmw_m3_82398292".
How can I make SOLR return this identifier whenever a 'query match' is found by SOLR?(The first part of the identifier is actually the headline fields content, the second part is a random number chosen)
目前我有一个“ad_id”作为每个广告的唯一标识符,例如“bmw_m3_82398292”。
每当 SOLR 找到“查询匹配”时,如何让 SOLR 返回此标识符?(标识符的第一部分实际上是标题字段内容,第二部分是随机选择的数字)
Thanks
谢谢
回答by memnoch_proxy
1. Schema
1. 架构
Your Solr schema is very much determined by your intended search behavior. In your schema.xml file, you'll see a bunch of choices like "text" and "string". They behave differently.
您的 Solr 架构很大程度上取决于您预期的搜索行为。在您的 schema.xml 文件中,您将看到一堆选项,例如“文本”和“字符串”。他们的行为不同。
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
The string field type is a literal string match. It would operate like ==in a SQL statement.
字符串字段类型是文字字符串匹配。它会像==在 SQL 语句中一样操作。
<fieldtype name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldtype>
The text_ws field type does tokenization. However, a big difference in the textfield is the filters for stop-words and delimiters and lower-casing. Notice how these filters are designated for both the Lucene index and the Solr query. So when searching a text field, it will adapt the query terms using these filters to help find a match.
text_ws 字段类型进行标记化。但是,该text领域的一个很大差异是用于停用词和分隔符以及小写的过滤器。请注意如何为 Lucene 索引和 Solr 查询指定这些过滤器。因此,在搜索文本字段时,它将使用这些过滤器调整查询词以帮助找到匹配项。
<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter ..... />
<filter ..... />
<filter ..... />
</analyzer>
</fieldtype>
When indexing things like news stories, for example, you probably want to search for company names and headlines differently.
例如,在为新闻报道等内容编制索引时,您可能希望以不同的方式搜索公司名称和标题。
<field name="headline" type="text" />
<field name="coname" type="string" indexed="true" multiValued="false" omitNorms="true" />
The above example would allow you to do a search like &coname:Intel&headline:processor+specificationsand retrieve matches hitting exactly Intel stories.
上面的示例将允许您进行搜索,&coname:Intel&headline:processor+specifications并检索与英特尔故事完全匹配的匹配项。
If you wanted to search a range
如果你想搜索一个范围
2. Result Fields
2. 结果字段
You can defined a standard set of return fields in your RequestHandler
您可以在RequestHandler 中定义一组标准的返回字段
<requestHandler name="mumble" class="solr.DisMaxRequestHandler" >
<str name="fl">
category,coname,headline
</str>
</requestHandler>
You may also define the desired fields in your query string, using the flparameter.:
您还可以使用fl参数在查询字符串中定义所需的字段。:
/select?indent=on&version=2.2&q=coname%3AIn*&start=0&rows=10&fl=coname%2Cid&qt=standard
You can also select rangesin your query terms using the field:[x TO *]syntax. If you wanted to select certain ads by their date , you might build a query with
您还可以使用语法在查询词中选择范围field:[x TO *]。如果您想按日期选择某些广告,您可以使用
ad_date:[20100101 TO 20100201]
in your query terms. (There are many ways to search ranges, I'm presenting a method that uses integers instead of Date class.)
在您的查询条件中。(搜索范围的方法有很多种,我提出了一种使用整数而不是 Date 类的方法。)

