Java 如何在solr结果中获得构面范围?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 07:25:27  来源:igfitidea点击:

How to get facet ranges in solr results?

javalucenesolr

提问by cnu

Assume that I have a field called pricefor the documents in Solr and I have that field faceted. I want to get the facets as ranges of values (eg: 0-100, 100-500, 500-1000, etc). How to do it?

假设我在 Solr 中有一个名为price的字段,并且我对该字段进行了分面。我想将方面作为值的范围(例如:0-100、100-500、500-1000 等)。怎么做?

I can specify the ranges beforehand, but I also want to know whether it is possible to calculate the ranges (say for 5 values) automatically based on the values in the documents?

我可以事先指定范围,但我也想知道是否可以根据文档中的值自动计算范围(比如 5 个值)?

回答by erickson

There may well be a better Solr-specific answer, but I work with straight Lucene, and since you're not getting much traction I'll take a stab. There, I'd create a populate a Filterwith a FilteredQuerywrapping the original Query. Then I'd get a FieldCachefor the field of interest. Enumerate the hits in the filter's bitset, and for each hit, you get the value of the field from the field cache, and add it to a SortedSet. When you've got all of the hits, divide the size of the set into the number of ranges you want (five to seven is a good number according the user interface guys), and rather than a single-valued constraint, your facets will be a range query with the lower and upper bounds of each of those subsets.

很可能有更好的特定于 Solr 的答案,但我使用的是纯 Lucene,而且由于您没有得到太多的关注,我会尝试一下。在那里,我会创建一个填充 aFilterFilteredQuery包装原始Query. 然后我会得到一个FieldCache感兴趣的领域。枚举过滤器位集中的命中,对于每个命中,您从字段缓存中获取字段的值,并将其添加到 SortedSet。当您获得所有点击次数后,将集合的大小划分为您想要的范围数量(根据用户界面人员的说法,五到七是一个很好的数字),而不是单值约束,您的方面将是具有每个子集的下限和上限的范围查询。

I'd recommend using some special-case logic for a small number of values; obviously, if you only have four distinct values, it doesn't make sense to try and make 5 range refinements out of them. Below a certain threshold (say 3*your ideal number of ranges), you just show the facets normally rather than ranges.

我建议对少量值使用一些特殊情况逻辑;显然,如果您只有四个不同的值,那么尝试对它们进行 5 次范围细化是没有意义的。低于某个阈值(比如 3*您的理想范围数),您只需正常显示方面而不是范围。

回答by Mauricio Scheffer

To answer your first question, you can get facet ranges by using the the generic facet query support. Here's an example:

要回答您的第一个问题,您可以使用通用方面查询支持来获取方面范围。下面是一个例子:

http://localhost:8983/solr/select?q=video&rows=0&facet=true&facet.query=price:[*+TO+500]&facet.query=price:[500+TO+*]

As for your second question (automatically suggesting facet ranges), that's not yet implemented. Some argue that this kind of querying would be best implemented on your application rather that letting Solr "guess" the best facet ranges.

至于您的第二个问题(自动建议方面范围),尚未实施。有些人认为这种查询最好在您的应用程序上实现,而不是让 Solr“猜测”最佳构面范围。

Here are some discussions on the topic:

以下是有关该主题的一些讨论:

回答by Graham

I have worked out how to calculate sensible dynamic facets for product price ranges. The solution involves some pre-processing of documents and some post-processing of the query results, but it requires only one query to Solr, and should even work on old version of Solr like 1.4.

我已经研究出如何计算产品价格范围的合理动态方面。该解决方案涉及对文档的一些预处理和对查询结果的一些后处理,但它只需要对 Solr 进行一次查询,甚至应该可以在 1.4 等旧版本的 Solr 上工作。

Round up prices before submission

提交前四舍五入价格

First, before submitting the document, round upthe the price to the nearest "nice round facet boundary" and store it in a "rounded_price" field. Users like their facets to look like "250-500" not "247-483", and rounding also means you get back hundreds of price facets not millions of them. With some effort the following code can be generalised to round nicely at any price scale:

首先,在提交文档之前,将价格四舍五入到最近的“nice round facet boundary”并将其存储在“rounded_price”字段中。用户喜欢他们的方面看起来像“250-500”而不是“247-483”,四舍五入也意味着你得到数百个价格方面而不是数百万个。通过一些努力,可以将以下代码推广到在任何价格范围内都很好地舍入:

    public static decimal RoundPrice(decimal price)
    {
        if (price < 25)
            return Math.Ceiling(price);
        else if (price < 100)
            return Math.Ceiling(price / 5) * 5;
        else if (price < 250)
            return Math.Ceiling(price / 10) * 10;
        else if (price < 1000)
            return Math.Ceiling(price / 25) * 25;
        else if (price < 2500)
            return Math.Ceiling(price / 100) * 100;
        else if (price < 10000)
            return Math.Ceiling(price / 250) * 250;
        else if (price < 25000)
            return Math.Ceiling(price / 1000) * 1000;
        else if (price < 100000)
            return Math.Ceiling(price / 2500) * 2500;
        else
            return Math.Ceiling(price / 5000) * 5000;
    }

Permissible prices go 1,2,3,...,24,25,30,35,...,95,100,110,...,240,250,275,300,325,...,975,1000 and so forth.

允许的价格为 1,2,3,...,24,25,30,35,...,95,100,110,...,240,250,275,300,325,...,975,1000 等等。

Get all facets on rounded prices

获取四舍五入价格的所有方面

Second, when submitting the query, request all facets on rounded prices sorted by price: facet.field=rounded_price. Thanks to the rounding, you'll get at most a few hundred facets back.

其次,在提交查询时,请求按价格排序的四舍五入价格的所有方面:facet.field=rounded_price。由于四舍五入,您最多会得到几百个方面。

Combine adjacent facets into larger facets

将相邻的刻面组合成更大的刻面

Third, after you have the results, the user wants see only 3 to 7 facets, not hundreds of facets. So, combine adjacent facets into a few large facets (called "segments") trying to get a roughly equal number of documents in each segment. The following rather more complicated code does this, returning tuples of (start, end, count) suitable for performing range queries. The counts returned will be correct provided prices were been rounded upto the nearest boundary:

第三,在你得到结果后,用户只想看到 3 到 7 个方面,而不是数百个方面。因此,将相邻的 facet 组合成几个大的 facet(称为“段”),试图在每个段中获得大致相等数量的文档。以下更复杂的代码执行此操作,返回适合执行范围查询的 (start, end, count) 元组。返回的次数将正确的提供价格被四舍五入最多到最近的边界:

    public static List<Tuple<string, string, int>> CombinePriceFacets(int nSegments, ICollection<KeyValuePair<string, int>> prices)
    {
        var ranges = new List<Tuple<string, string, int>>();
        int productCount = prices.Sum(p => p.Value);
        int productsRemaining = productCount;
        if (nSegments < 2)
            return ranges;
        int segmentSize = productCount / nSegments;
        string start = "*";
        string end = "0";
        int count = 0;
        int totalCount = 0;
        int segmentIdx = 1;
        foreach (KeyValuePair<string, int> price in prices)
        {
            end = price.Key;
            count += price.Value;
            totalCount += price.Value;
            productsRemaining -= price.Value;
            if (totalCount >= segmentSize * segmentIdx)
            {
                ranges.Add(new Tuple<string, string, int>(start, end, count));
                start = end;
                count = 0;
                segmentIdx += 1;
            }
            if (segmentIdx == nSegments)
            {
                ranges.Add(new Tuple<string, string, int>(start, "*", count + productsRemaining));
                break;
            }
        }
        return ranges;
    }

Filter results by selected facet

按选定方面过滤结果

Fourth, suppose ("250","500",38) was one of the resulting segments. If the user selects "$250 to $500" as a filter, simply do a filter query fq=price:[250 TO 500]

第四,假设 ("250","500",38) 是结果段之一。如果用户选择“$250 to $500”作为过滤器,只需执行过滤器查询fq=price:[250 TO 500]