Java 8 Streams:多个过滤器与复杂条件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24054773/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 09:56:00  来源:igfitidea点击:

Java 8 Streams: multiple filters vs. complex condition

javalambdafilterjava-8java-stream

提问by deamon

Sometimes you want to filter a Streamwith more than one condition:

有时你想Stream用多个条件过滤一个:

myList.stream().filter(x -> x.size() > 10).filter(x -> x.isCool()) ...

or you could do the same with a complex condition and a singlefilter:

或者您可以对复杂条件和单个执行相同的操作filter

myList.stream().filter(x -> x.size() > 10 && x -> x.isCool()) ...

My guess is that the second approach has better performance characteristics, but I don't knowit.

我的猜测是第二种方法具有更好的性能特征,但我不知道

The first approach wins in readability, but what is better for the performance?

第一种方法在可读性上胜出,但什么对性能更好呢?

采纳答案by Holger

The code that has to be executed for both alternatives is so similar that you can't predict a result reliably. The underlying object structure might differ but that's no challenge to the hotspot optimizer. So it depends on other surrounding conditions which will yield to a faster execution, if there is any difference.

必须为两种替代方案执行的代码非常相似,以至于您无法可靠地预测结果。底层对象结构可能有所不同,但这对热点优化器没有挑战。因此,如果有任何差异,它取决于其他周围条件,这些条件将产生更快的执行速度。

Combining two filter instances creates more objects and hence more delegating code but this can change if you use method references rather than lambda expressions, e.g. replace filter(x -> x.isCool())by filter(ItemType::isCool). That way you have eliminated the synthetic delegating method created for your lambda expression. So combining two filters using two method references might create the same or lesser delegation code than a single filterinvocation using a lambda expression with &&.

组合两个过滤器实例会创建更多对象,因此会创建更多委托代码,但如果您使用方法引用而不是 lambda 表达式(例如,替换filter(x -> x.isCool())为 ),这可能会发生变化filter(ItemType::isCool)。这样,您就消除了为 lambda 表达式创建的合成委托方法。因此,与filter使用 lambda 表达式的单个调用相比,使用两个方法引用组合两个过滤器可能会创建相同或更少的委托代码&&

But, as said, this kind of overhead will be eliminated by the HotSpot optimizer and is negligible.

但是,如上所述,这种开销会被 HotSpot 优化器消除并且可以忽略不计。

In theory, two filters could be easier parallelized than a single filter but that's only relevant for rather computational intense tasks1.

理论上,两个过滤器比单个过滤器更容易并行化,但这仅与计算量大的任务相关1。

So there is no simple answer.

所以没有简单的答案。

The bottom line is, don't think about such performance differences below the odor detection threshold. Use what is more readable.

最重要的是,不要考虑低于气味检测阈值的性能差异。使用更具可读性的内容。



1…and would require an implementation doing parallel processing of subsequent stages, a road currently not taken by the standard Stream implementation

1……并且需要一个实现对后续阶段进行并行处理,这是标准 Stream 实现目前未采用的方法

回答by Hank D

This test shows that your second option can perform significantly better. Findings first, then the code:

此测试表明您的第二个选项的性能明显更好。先发现,再上代码:

one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=4142, min=29, average=41.420000, max=82}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=13315, min=117, average=133.150000, max=153}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10320, min=82, average=103.200000, max=127}

now the code:

现在的代码:

enum Gender {
    FEMALE,
    MALE
}

static class User {
    Gender gender;
    int age;

    public User(Gender gender, int age){
        this.gender = gender;
        this.age = age;
    }

    public Gender getGender() {
        return gender;
    }

    public void setGender(Gender gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }
}

static long test1(List<User> users){
    long time1 = System.currentTimeMillis();
    users.stream()
            .filter((u) -> u.getGender() == Gender.FEMALE && u.getAge() % 2 == 0)
            .allMatch(u -> true);                   // least overhead terminal function I can think of
    long time2 = System.currentTimeMillis();
    return time2 - time1;
}

static long test2(List<User> users){
    long time1 = System.currentTimeMillis();
    users.stream()
            .filter(u -> u.getGender() == Gender.FEMALE)
            .filter(u -> u.getAge() % 2 == 0)
            .allMatch(u -> true);                   // least overhead terminal function I can think of
    long time2 = System.currentTimeMillis();
    return time2 - time1;
}

static long test3(List<User> users){
    long time1 = System.currentTimeMillis();
    users.stream()
            .filter(((Predicate<User>) u -> u.getGender() == Gender.FEMALE).and(u -> u.getAge() % 2 == 0))
            .allMatch(u -> true);                   // least overhead terminal function I can think of
    long time2 = System.currentTimeMillis();
    return time2 - time1;
}

public static void main(String... args) {
    int size = 10000000;
    List<User> users =
    IntStream.range(0,size)
            .mapToObj(i -> i % 2 == 0 ? new User(Gender.MALE, i % 100) : new User(Gender.FEMALE, i % 100))
            .collect(Collectors.toCollection(()->new ArrayList<>(size)));
    repeat("one filter with predicate of form u -> exp1 && exp2", users, Temp::test1, 100);
    repeat("two filters with predicates of form u -> exp1", users, Temp::test2, 100);
    repeat("one filter with predicate of form predOne.and(pred2)", users, Temp::test3, 100);
}

private static void repeat(String name, List<User> users, ToLongFunction<List<User>> test, int iterations) {
    System.out.println(name + ", list size " + users.size() + ", averaged over " + iterations + " runs: " + IntStream.range(0, iterations)
            .mapToLong(i -> test.applyAsLong(users))
            .summaryStatistics());
}

回答by Venkat Madhav

This is the result of the 6 different combinations of the sample test shared by @Hank D It's evident that predicate of form u -> exp1 && exp2is highly performant in all the cases.

这是@Hank D 共享的样本测试的 6 种不同组合的结果很明显,形式谓词u -> exp1 && exp2在所有情况下都具有很高的性能。

one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=3372, min=31, average=33.720000, max=47}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9150, min=85, average=91.500000, max=118}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9046, min=81, average=90.460000, max=150}

one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8336, min=77, average=83.360000, max=189}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9094, min=84, average=90.940000, max=176}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10501, min=99, average=105.010000, max=136}

two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=11117, min=98, average=111.170000, max=238}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8346, min=77, average=83.460000, max=113}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9089, min=81, average=90.890000, max=137}

two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10434, min=98, average=104.340000, max=132}
one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9113, min=81, average=91.130000, max=179}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8258, min=77, average=82.580000, max=100}

one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=9131, min=81, average=91.310000, max=139}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10265, min=97, average=102.650000, max=131}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8442, min=77, average=84.420000, max=156}

one filter with predicate of form predOne.and(pred2), list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8553, min=81, average=85.530000, max=125}
one filter with predicate of form u -> exp1 && exp2, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=8219, min=77, average=82.190000, max=142}
two filters with predicates of form u -> exp1, list size 10000000, averaged over 100 runs: LongSummaryStatistics{count=100, sum=10305, min=97, average=103.050000, max=132}

回答by Serge

A complex filter condition is better in performance perspective, but the best performance will show old fashion for loop with a standard if clauseis the best option. The difference on a small array 10 elements difference might ~ 2 times, for a large array the difference is not that big.
You can take a look on my GitHub project, where I did performance tests for multiple array iteration options

从性能角度来看,复杂的过滤条件更好,但最好的性能将显示带有标准的旧式 for 循环if clause是最佳选择。小数组 10 个元素的差异可能是 ~ 2 倍,对于大数组,差异不是那么大。
您可以查看我的GitHub 项目,我在那里对多个数组迭代选项进行了性能测试

For small array 10 element throughput ops/s: 10 element arrayFor medium 10,000 elements throughput ops/s: enter image description hereFor large array 1,000,000 elements throughput ops/s: 1M elements

对于小型阵列 10 个元素吞吐量 ops/s: 10 元素数组对于中型 10,000 个元素吞吐量 ops/s: 在此处输入图片说明对于大型阵列 1,000,000 个元素吞吐量 ops/s: 100万个元素

NOTE: tests runs on

注意:测试运行

  • 8 CPU
  • 1 GB RAM
  • OS version: 16.04.1 LTS (Xenial Xerus)
  • java version: 1.8.0_121
  • jvm: -XX:+UseG1GC -server -Xmx1024m -Xms1024m
  • 8个CPU
  • 1 GB 内存
  • 操作系统版本:16.04.1 LTS(Xenial Xerus)
  • java版本:1.8.0_121
  • jvm: -XX:+UseG1GC -server -Xmx1024m -Xms1024m

UPDATE:Java 11 has some progress on the performance, but the dynamics stay the same

更新:Java 11 在性能上有一些进步,但动态保持不变

Benchmark mode: Throughput, ops/time Java 8vs11

基准模式:吞吐量、操作数/时间 Java 8vs11