java 按对象值分组,计数,然后按最大对象属性设置组键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30210547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 16:40:21  来源:igfitidea点击:

Grouping by object value, counting and then setting group key by maximum object attribute

javajava-8groupingjava-stream

提问by Jernej Jerin

I have managed to write a solution using Java 8 Streams API that first groups a list of object Route by its value and then counts the number of objects in each group. It returns a mapping Route -> Long. Here is the code:

我设法使用 Java 8 Streams API 编写了一个解决方案,该解决方案首先按其值对对象 Route 列表进行分组,然后计算每个组中的对象数量。它返回一个映射 Route -> Long。这是代码:

Map<Route, Long> routesCounted = routes.stream()
                .collect(Collectors.groupingBy(gr -> gr, Collectors.counting()));

And the Route class:

和 Route 类:

public class Route implements Comparable<Route> {
    private long lastUpdated;
    private Cell startCell;
    private Cell endCell;
    private int dropOffSize;

    public Route(Cell startCell, Cell endCell, long lastUpdated) {
        this.startCell = startCell;
        this.endCell = endCell;
        this.lastUpdated = lastUpdated;
    }

    public long getLastUpdated() {
        return this.lastUpdated;
    }

    public void setLastUpdated(long lastUpdated) {
        this.lastUpdated = lastUpdated;
    }

    public Cell getStartCell() {
        return startCell;
    }

    public void setStartCell(Cell startCell) {
        this.startCell = startCell;
    }

    public Cell getEndCell() {
        return endCell;
    }

    public void setEndCell(Cell endCell) {
        this.endCell = endCell;
    }

    public int getDropOffSize() {
        return this.dropOffSize;
    }

    public void setDropOffSize(int dropOffSize) {
        this.dropOffSize = dropOffSize;
    }

    @Override
    /**
     * Compute hash code by using Apache Commons Lang HashCodeBuilder.
     */
    public int hashCode() {
        return new HashCodeBuilder(43, 59)
                .append(this.startCell)
                .append(this.endCell)
                .toHashCode();
    }

    @Override
    /**
     * Compute equals by using Apache Commons Lang EqualsBuilder.
     */
    public boolean equals(Object obj) {
        if (!(obj instanceof Route))
            return false;
        if (obj == this)
            return true;

        Route route = (Route) obj;
        return new EqualsBuilder()
                .append(this.startCell, route.startCell)
                .append(this.endCell, route.endCell)
                .isEquals();
    }

    @Override
    public int compareTo(Route route) {
        if (this.dropOffSize < route.dropOffSize)
            return -1;
        else if (this.dropOffSize > route.dropOffSize)
            return 1;
        else {
                // if contains drop off timestamps, order by last timestamp in drop off
                // the highest timestamp has preceding
            if (this.lastUpdated < route.lastUpdated)
                return -1;
            else if (this.lastUpdated > route.lastUpdated)
                return 1;
            else
                return 0;
        }
    }
}

What I would like to additionally achieve is that the key for each group would be the one with the largest lastUpdated value. I was already looking at this solutionbut I do not know how to combine the counting and grouping by value and Route maximum lastUpdated value. Here is the example data of what I want to achieve:

我想另外实现的是,每个组的键将是具有最大 lastUpdated 值的键。我已经在看这个解决方案,但我不知道如何将计数和分组按值与路由最大 lastUpdated 值结合起来。这是我想要实现的示例数据:

EXAMPLE:

例子:

List<Route> routes = new ArrayList<>();
routes.add(new Route(new Cell(1, 2), new Cell(2, 1), 1200L));
routes.add(new Route(new Cell(3, 2), new Cell(2, 5), 1800L));
routes.add(new Route(new Cell(1, 2), new Cell(2, 1), 1700L));

SHOULD BE CONVERTED TO:

应该转换为:

Map<Route, Long> routesCounted = new HashMap<>();
routesCounted.put(new Route(new Cell(1, 2), new Cell(2, 1), 1700L), 2);
routesCounted.put(new Route(new Cell(3, 2), new Cell(2, 5), 1800L), 1);

Notice that the key for mapping, which counted 2 Routes is the one with the largest lastUpdated value.

请注意,映射的键(计算 2 条路由)是lastUpdated 值最大的那个

采纳答案by Misha

Here's one approach. First group into lists and then process the lists into the values you actually want:

这是一种方法。首先分组为列表,然后将列表处理为您实际想要的值:

import static java.util.Comparator.comparingLong;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toMap;


Map<Route,Integer> routeCounts = routes.stream()
        .collect(groupingBy(x -> x))
        .values().stream()
        .collect(toMap(
            lst -> lst.stream().max(comparingLong(Route::getLastUpdated)).get(),
            List::size
        ));

回答by Tagir Valeev

You can define an abstract "library" method which combines two collectors into one:

您可以定义一个抽象的“库”方法,它将两个收集器合二为一:

static <T, A1, A2, R1, R2, R> Collector<T, ?, R> pairing(Collector<T, A1, R1> c1, 
        Collector<T, A2, R2> c2, BiFunction<R1, R2, R> finisher) {
    EnumSet<Characteristics> c = EnumSet.noneOf(Characteristics.class);
    c.addAll(c1.characteristics());
    c.retainAll(c2.characteristics());
    c.remove(Characteristics.IDENTITY_FINISH);
    return Collector.of(() -> new Object[] {c1.supplier().get(), c2.supplier().get()},
            (acc, v) -> {
                c1.accumulator().accept((A1)acc[0], v);
                c2.accumulator().accept((A2)acc[1], v);
            },
            (acc1, acc2) -> {
                acc1[0] = c1.combiner().apply((A1)acc1[0], (A1)acc2[0]);
                acc1[1] = c2.combiner().apply((A2)acc1[1], (A2)acc2[1]);
                return acc1;
            },
            acc -> {
                R1 r1 = c1.finisher().apply((A1)acc[0]);
                R2 r2 = c2.finisher().apply((A2)acc[1]);
                return finisher.apply(r1, r2);
            }, c.toArray(new Characteristics[c.size()]));
}

After that the actual operation may look like this:

之后的实际操作可能是这样的:

Map<Route, Long> result = routes.stream()
        .collect(Collectors.groupingBy(Function.identity(),
            pairing(Collectors.maxBy(Comparator.comparingLong(Route::getLastUpdated)), 
                    Collectors.counting(), 
                    (route, count) -> new AbstractMap.SimpleEntry<>(route.get(), count))
            ))
        .values().stream().collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));

Update: such collector is available in my StreamExlibrary: MoreCollectors.pairing(). Also similar collector is implemented in jOOLlibrary, so you can use Tuple.collectorsinstead of pairing.

更新:此类收集器在我的StreamEx库中可用:MoreCollectors.pairing()。类似的还有收集器中实现jOOL库,这样你就可以使用Tuple.collectors,而不是pairing

回答by Stuart Marks

In principle it seems like this ought to be doable in one pass. The usual wrinkle is that this requires an ad-hoc tuple or pair, in this case with a Routeand a count. Since Java lacks these, we end up using an Object array of length 2 (as shown in Tagir Valeev's answer), or AbstractMap.SimpleImmutableEntry, or a hypothetical Pair<A,B>class.

原则上,这似乎应该可以一次性完成。通常的问题是这需要一个特别的元组或对,在这种情况下有一个Route和一个计数。由于 Java 缺少这些,我们最终使用了一个长度为 2 的 Object 数组(如Tagir Valeev 的回答所示),或者AbstractMap.SimpleImmutableEntry,或者一个假设的Pair<A,B>类。

The alternative is to write a little value class that holds a Routeand a count. Of course there's some pain in doing this, but in this case I think it pays off because it provides a place to put the combining logic. That in turn simplifies the stream operation.

另一种方法是编写一个包含 aRoute和计数的小值类。当然,这样做会有一些痛苦,但在这种情况下,我认为这是值得的,因为它提供了一个放置组合逻辑的地方。这反过来又简化了流操作。

Here's the value class containing a Routeand a count:

这是包含 aRoute和计数的值类:

class RouteCount {
    final Route route;
    final long count;

    private RouteCount(Route r, long c) {
        this.route = r;
        count = c;
    }

    public static RouteCount fromRoute(Route r) {
        return new RouteCount(r, 1L);
    }

    public static RouteCount combine(RouteCount rc1, RouteCount rc2) {
        Route recent;
        if (rc1.route.getLastUpdated() > rc2.route.getLastUpdated()) {
            recent = rc1.route;
        } else {
            recent = rc2.route;
        }
        return new RouteCount(recent, rc1.count + rc2.count);
    }
}

Pretty straightforward, but notice the combinemethod. It combines two RouteCountvalues by choosing the Routethat's been updated more recently and using the sum of the counts. Now that we have this value class, we can write a one-pass stream to get the result we want:

非常简单,但请注意combine方法。它RouteCount通过选择Route最近更新的值并使用计数的总和来组合两个值。现在我们有了这个值类,我们可以编写一个单程流来获得我们想要的结果:

    Map<Route, RouteCount> counted = routes.stream()
        .collect(groupingBy(route -> route,
                    collectingAndThen(
                        mapping(RouteCount::fromRoute, reducing(RouteCount::combine)),
                        Optional::get)));

Like other answers, this groups the routes into equivalence classes based on the starting and ending cell. The actual Routeinstance used as the key isn't significant; it's just a representative of its class. The value will be a single RouteCountthat contains the Routeinstance that has been updated most recently, along with the count of equivalent Routeinstances.

与其他答案一样,这会根据起始单元格和结束单元格将路由分组为等价类。Route用作键的实际实例并不重要;它只是同类产品的代表。该值将是一个RouteCount包含Route最近更新的实例以及等效Route实例的计数的单个值。

The way this works is that each Routeinstance that has the same start and end cells is then fed into the downstream collector of groupingBy. This mappingcollector maps the Routeinstance into a RouteCountinstance, then passes it to a reducingcollector that reduces the instances using the combining logic described above. The and-then portion of collectingAndThenextracts the value from the Optional<RouteCount>that the reducingcollector produces.

其工作方式是将Route具有相同开始和结束单元格的每个实例送入 的下游收集器groupingBy。此mapping收集器将Route实例映射到一个RouteCount实例,然后将其传递给reducing使用上述组合逻辑减少实例的收集器。该和然后的部分collectingAndThen提取从所述值Optional<RouteCount>,所述reducing集电极产生。

(Normally a bare getis dangerous, but we don't get to this collector at all unless there's at least one value available. So getis safe in this case.)

(通常,bareget是危险的,但除非至少有一个值可用,否则我们根本不会访问此收集器。因此get在这种情况下是安全的。)

回答by Mati

Changed equals and hashcode to be dependent only on start cell and end cell.

将 equals 和 hashcode 更改为仅依赖于开始单元格和结束单元格。

@Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Cell cell = (Cell) o;

        if (a != cell.a) return false;
        if (b != cell.b) return false;

        return true;
    }

    @Override
    public int hashCode() {
        int result = a;
        result = 31 * result + b;
        return result;
    }

My solution looks like this:

我的解决方案如下所示:

Map<Route, Long> routesCounted = routes.stream()
            .sorted((r1,r2)-> (int)(r2.lastUpdated - r1.lastUpdated))
            .collect(Collectors.groupingBy(gr -> gr, Collectors.counting()));

Of course casting to int should be replaced with something more appropriated.

当然,转换为 int 应该用更合适的东西代替。