Java 如何使用 Solr 管理“分页”?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2348094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 06:25:53  来源:igfitidea点击:

How to manage "paging" with Solr?

javaphpsqlmysqlsolr

提问by

I have a classifieds website... I have Solr doing the searching of the classifieds, and then return ID:nrs which I then use to put into an array. Then I use this array to find any classifieds in a MySql db where the ID:s match the ID:s in the array returned by Solr.

我有一个分类广告网站...我让 Solr 搜索分类广告,然后返回 ID:nrs,然后我将其放入一个数组中。然后我使用这个数组在 MySql 数据库中查找任何分类,其中 ID:s 与 Solr 返回的数组中的 ID:s 匹配。

Now, because this array can be very very big (100thousand records or more) then I would need to "page" the results so that maybe 100 where returned at a time. And then use those 100 ID:s in MySql to find the classifieds.

现在,因为这个数组可能非常大(10 万条记录或更多),所以我需要“分页”结果,以便一次返回 100 个。然后在 MySql 中使用这 100 个 ID:s 来查找分类广告。

So, is it possible to page with SOLR?

那么,是否可以使用 SOLR 进行分页?

And if so, how? I need example code... And what the results would be please.

如果是这样,如何?我需要示例代码......结果会是什么。

Mostly I need a thorough example!

大多数情况下我需要一个完整的例子!

Thanks

谢谢

采纳答案by jasonbar

Take a look at IBM. Maybe that will get you on the right course.

看看IBM。也许这会让你走上正确的道路。

Number of results: Specifies the maximum number of results to return.

Start: The offset to start at in the result set. This is useful for pagination.

结果数:指定要返回的最大结果数。

开始:结果集中开始的偏移量。这对于分页很有用。

So you probably want some variation on

所以你可能想要一些变化

<str name="rows">10</str>
<str name="start">0</str>

Your solr client should provide some way to get the total number of results without much trouble.

您的 solr 客户端应该提供一些方法来轻松获得结果总数。

回答by Mauricio Scheffer

Paging is managed with the startand rowsparameters, e.g.:

分页是用startrows参数管理的,例如:

?q=something&rows=10&start=20

will give you 10 documents, starting at the document 20.

会给你 10 个文件,从文件 20 开始。

About getting other information from MySQL, you're on your own. Me and other people already suggested to youto store everything in Solr to avoid the additional queries to MySQL.

关于从 MySQL 获取其他信息,您需要自行决定。我和其他人已经建议您将所有内容存储在 Solr 中,以避免对 MySQL 进行额外查询。

回答by Marco Altieri

I think that it is worth to say that solr returns together with the current page results a count of the total number of records found.

我认为值得说的是,solr 与当前页面结果一起返回找到的记录总数。

For example calling:

例如调用:

http://192.168.0.1:8983/solr/select?qt=edismax&fl=*,score&qf=content^2%20metatag.description^3%20title^5%20metatag.keywords^10&q=something&start=20&rows=10&wt=xml&version=2.2

The response is:

回应是:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="fl">*,score</str>
            <str name="q">something</str>
            <str name="qf">content^2 metatag.description^3 title^5 metatag.keywords^10</str>
            <str name="qt">edismax</str>
            <str name="wt">xml</str>
            <str name="rows">10</str>
            <str name="version">2.2</str>
            </lst>
        </lst>
        <result name="response" numFound="1801" start="0" maxScore="0.15953878">
            <doc>...</doc>
            <doc>...</doc>
            <doc>...</doc>
...

Using solrj, the method query returns a SolrDocumentList that has the method: getNumFound().

使用 solrj,方法查询返回一个 SolrDocumentList,其中包含方法:getNumFound()。

回答by Yonik

The "start" parameter controls the offset into the search results, and the "rows" parameter controls how many documents to return from there.

“start”参数控制搜索结果的偏移量,“rows”参数控制从那里返回的文档数量。

If you are doing "deep paging" (iterating over many pages), then you can achieve much better performance using a cursor to iterate over the result set.

如果您正在执行“深度分页”(迭代多个页面),那么使用游标迭代结果集可以获得更好的性能。

回答by Paul T. Rawkeen

Probably a bit old question and a lot of helpful answers and recommendations, but I'll try to summarize the results and describe solution for paginating large data sets using cursor, bec. I faced this issue recently.

可能是一个有点老的问题和很多有用的答案和建议,但我会尝试总结结果并描述使用cursor,bec对大型数据集进行分页的解决方案。我最近遇到了这个问题。

As mentioned by Yonikthe problem of usual start/rowsis that when we have large dataset and startis a bit further (much more further) than zero we have nice overhead in terms of efficiency and memory. It is because fetching of 20documents from the "middle" of 500Krecords + using sorting, at least requires sorting of all dataset (sorting of internal unique's). Moreover, if search is distributed it will be even more resource consuming, bec. dataset (of 500 020 rows) from each shard should be returned to the aggregator node to be merged, to find out applicable 20 rows.

正如Yonik提到的,通常start/的问题rows是,当我们有大型数据集并且比零start更远(远得多)时,我们在效率和内存方面有很好的开销。这是因为从500K记录的“中间”获取20 个文档+使用排序,至少需要对所有数据集进行排序(内部 unique 的排序)。此外,如果搜索是分布式的,它将消耗更多资源,bec。来自每个分片的数据集(500 020 行)应返回到聚合器节点进行合并,以找出适用的 20 行。

Solr can't compute which matching document is the 999001st result in sorted order, without first determining what the first 999000 matching sorted results are.

Solr 无法计算哪个匹配文档是排序顺序中的第 999001 个结果,而无需先确定前 999000 个匹配的排序结果是什么。



The solution here is to use Solr cursorMark.

这里的解决方案是使用 Solr cursorMark

On the first query you are announcing that the &cursorMark=*. It means next:

在第一个查询中,您宣布&cursorMark=*. 接下来的意思是:

You can think of this being analogous to start=0as a way to tell Solr "start at the beginning of my sorted results" except that it also informs Solr that you want to use a Cursor.

您可以认为这类似于start=0告诉 Solr“从我的排序结果的开头开始”的一种方式,除了它还通知 Solr 您想要使用 Cursor。

!One "caveat" here is that your sortclauses must include the uniqueKeyfield. It can be idfield if its unique.

!这里的一个“警告”是您的sort子句必须包含 uniqueKey字段。id如果它是唯一的,它可以是字段。

A part of first query will look like this:

第一个查询的一部分将如下所示:

?sort=price desc,id asc&start=0&cursorMark=* ...

As the result you will receive next structure

结果,您将收到下一个结构

{
    "response":{"numFound":20,"start":0,"docs":[ /* docs here */ ]},
    "nextCursorMark":"AoIIRPoAAFBX" // Here is cursor mark for next "page"
}

To retrieve the next page, the next query will look next:

要检索下一页,下一个查询将如下所示:

?sort=price desc,id asc&start=0&cursorMark=AoIIRPoAAFBX ...

Notice the cursorMarkfrom previous response. And as the result you will get next page of results (same structure as the first response, but with another nextCursorMarkervalue). And so on ...

注意上cursorMark一个回复。作为结果,您将获得下一页结果(与第一个响应的结构相同,但具有另一个nextCursorMarker)。等等 ...

This approach ideally fits to infinite scroll pagination, but to use it within classic pagination there are some things to think about :).

这种方法非常适合无限滚动分页,但要在经典分页中使用它,需要考虑一些事情:)。

Here are some reference materials I found solving this problem, hope it will help someone to get it done.

以下是我找到的一些解决此问题的参考资料,希望它可以帮助某人完成它。