java 模式中现有字段的 Solr 复合唯一键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17806821/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-01 15:03:51  来源:igfitidea点击:

Solr Composite Unique key from existing fields in schema

javasolrsolrjunique-keysolr-schema

提问by N D Thokare

I have an index named LocationIndexin solr with fields as follows:

我有一个以LocationIndexsolr命名的索引,其字段如下:

<fields>
    <field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
    <field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
    // and some more fields
</fields>
<uniqueKey>solr_id</uniqueKey>

But now I want to change schema so that unique key must be composite of two already present fields solr_idand solr_ver... something as follows:

但是现在我想更改架构,以便唯一键必须由两个已经存在的字段组成,solr_id并且solr_ver......如下所示:

<fields>
    <field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
    <field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
    <field name="composite-id" type="string" stored="true" required="true" indexed="true"/>
    // and some more fields
</fields>
<uniqueKey>solr_ver-solr_id</uniqueKey>

After searching I found that it's possible by adding following to schema: (ref: Solr Composite Unique key from existing fields in schema)

搜索后,我发现可以通过向架构添加以下内容:(参考:Solr Composite Unique key from existing fields in schema

<updateRequestProcessorChain name="composite-id">
  <processor class="solr.CloneFieldUpdateProcessorFactory">
    <str name="source">docid_s</str>
    <str name="source">userid_s</str>
    <str name="dest">id</str>
  </processor>
  <processor class="solr.ConcatFieldUpdateProcessorFactory">
    <str name="fieldName">id</str>
    <str name="delimiter">--</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

So I changed schema and finally it looks like:

所以我改变了架构,最后它看起来像:

<updateRequestProcessorChain name="composite-id">
  <processor class="solr.CloneFieldUpdateProcessorFactory">
    <str name="source">solr_ver</str>
    <str name="source">solr_id</str>
    <str name="dest">id</str>
  </processor>
  <processor class="solr.ConcatFieldUpdateProcessorFactory">
    <str name="fieldName">id</str>
    <str name="delimiter">-</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

<fields>
    <field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
    <field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
    <field name="id" type="string" stored="true" required="true" indexed="true"/>
    // and some more fields
</fields>
<uniqueKey>id</uniqueKey>

But while adding a document it's giving me error:

但是在添加文档时它给了我错误:

org.apache.solr.client.solrj.SolrServerException: Server at http://localhost:8983/solr/LocationIndex returned non ok status:400, message:Document [null] missing required field: id

I'm not getting what changes in schema are required to work as desired?

我没有得到需要对架构进行哪些更改才能按预期工作?

In a document I add, it contain fields solr_verand solr_id. How and where it'll (solr) create idfield by combining both these field something like solr_ver-solr_id?

在我添加的文档中,它包含字段solr_versolr_id. 它如何以及在哪里(solr)id通过组合这些字段来创建字段,例如solr_ver-solr_id

EDIT:

编辑:

At this linkIt's given how refer to this chain. Bu I'm unable to understand how would it be used in schema? And where should I make changes?

此链接中给出了如何引用此链。但是我无法理解如何在模式中使用它?我应该在哪里进行更改?

回答by Paige Cook

So it looks like you have your updateRequestProcessorChain defined appropriately and it should work. However, you need to add this to the solrconfig.xml file and not the schema.xml. The additional link you provided shows you how to modify your solrconfig.xml file and add your defined updateRequestProcessorChain to the current /updaterequest handler for your solr instance.

所以看起来你已经正确定义了 updateRequestProcessorChain 并且它应该可以工作。但是,您需要将它添加到 solrconfig.xml 文件而不是 schema.xml。您提供的附加链接向您展示了如何修改您的 solrconfig.xml 文件并将您定义的 updateRequestProcessorChain 添加到/update您的 solr 实例的当前请求处理程序。

So find do the following:

所以找到执行以下操作:

  1. Move your <updateRequestProcessorChain>to your solrconfig.xml file.
  2. Update the <requestHandler name="/update" class="solr.UpdateRequestHandler">entry in your solrconfig.xml file and modify it so it looks like the following:

    <requestHandler name="/update" class="solr.UpdateRequestHandler">
       <lst name="defaults">
          <str name="update.chain">composite-id</str>
       </lst>
    </requestHandler>
    
  1. 将您移动<updateRequestProcessorChain>到您的 solrconfig.xml 文件。
  2. 更新<requestHandler name="/update" class="solr.UpdateRequestHandler">solrconfig.xml 文件中的条目并修改它,使其看起来如下所示:

    <requestHandler name="/update" class="solr.UpdateRequestHandler">
       <lst name="defaults">
          <str name="update.chain">composite-id</str>
       </lst>
    </requestHandler>
    

This should then execute your defined update chain and populate the id field when new documents are added to the index.

然后,这应该执行您定义的更新链并在新文档添加到索引时填充 id 字段。

回答by Maksim

The described above solution may have some limitations, what if "dest" is over maximum length because concatenated fields are too long. There is also one more solution with MD5Signature (A class capable of generating a signature String from the concatenation of a group of specified document fields, 128 bit hash used for exact duplicate detection)

上面描述的解决方案可能有一些限制,如果“dest”因为连接的字段太长而超过了最大长度怎么办。还有一个使用 MD5Signature 的解决方案(一个能够从一组指定文档字段的连接中生成签名字符串的类,用于精确重复检测的 128 位哈希)

<!-- An example dedup update processor that creates the "id" field on the fly 
     based on the hash code of some other fields.  This example has 
     overwriteDupes set to false since we are using the id field as the 
     signatureField and Solr will maintain uniqueness based on that anyway. --> 
<updateRequestProcessorChain name="dedupe"> 
  <processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> 
    <bool name="enabled">true</bool> 
    <bool name="overwriteDupes">false</bool> 
    <str name="signatureField">id</str> 
    <str name="fields">name,features,cat</str> 
    <str name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str> 
  </processor> 
  <processor class="solr.LogUpdateProcessorFactory" /> 
  <processor class="solr.RunUpdateProcessorFactory" /> 
</updateRequestProcessorChain> 

From here: http://lucene.472066.n3.nabble.com/Solr-duplicates-detection-td506230.html

从这里:http: //lucene.472066.n3.nabble.com/Solr-duplicates-detection-td506230.html

回答by Dan

I'd like to add this as a comment, but it's impossible to get the creds these days... anyway, here is a better link: https://wiki.apache.org/solr/Deduplication

我想将此添加为评论,但这些天不可能获得信用......无论如何,这里有一个更好的链接:https: //wiki.apache.org/solr/Deduplication