java:List.contains() 与手动搜索的性能差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20448540/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 02:01:05  来源:igfitidea点击:

java: List.contains() performance difference with manual searching

javalistcontains

提问by Ismail Sahin

I tried to demonstrate the difference between List.contains()and manually searching execution times the result is awesome. Here is the code,

我试图演示List.contains()和手动搜索执行时间之间的区别,结果很棒。这是代码,

public static void main(String argv[]) {
    List<String> list = new ArrayList<String>();
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("b");

    long startTime = System.nanoTime();

    list.contains("b");

    long endTime = System.nanoTime();
    long duration = endTime - startTime;

    System.out.println("First run: "+duration);

    startTime = System.nanoTime();
    for(String s: list){
        if(s.equals("b"))
            break;
    }
    endTime = System.nanoTime();

    duration = endTime - startTime;
    System.out.println("Second run: "+duration);

}

Output:

输出:

  • First run: 7500
  • Second run: 158685

    1. how contains() function makes such a big difference?

    2. which search algorithm does it use?

    3. if list conatins the searched element, does it terminate searching at first element?

  • 首次运行:7500
  • 第二轮:158685

    1. contains() 函数如何产生如此大的差异?

    2. 它使用哪种搜索算法?

    3. 如果列表包含搜索到的元素,它是否会在第一个元素处终止搜索?

采纳答案by torquestomp

First off, it is not wise to trust results coming from a singular test like that. There are too many variable factors, caching implications to consider, and other such things - you should rather consider writing a test that uses randomization over trials to some extent, and performs the different checks millions of times, not just once.

首先,相信来自这样的单一测试的结果是不明智的。有太多可变因素、缓存影响需要考虑,以及其他类似的事情——你应该考虑编写一个测试,在某种程度上使用随机化试验,并执行数百万次不同的检查,而不仅仅是一次。

That said, I expect your results will remain the same; ArrayListimplements contains()using its own indexOf()method, which loops directly over the underlying array it stores. You can see this for yourself here

也就是说,我希望您的结果将保持不变; ArrayList器具contains()使用其自己的indexOf()方法,它直接循环底层数组其存储结束。你可以在这里亲眼看看

The foreach loop, on the other hand, requires instantiating an Iterator, accessing the array through all of its methods, and just in general doing a lot more work than ArrayList's own direct implementation does. Again, though, you should benchmark it more thoroughly!

另一方面,foreach 循环需要实例化 an Iterator,通过它的所有方法访问数组,并且通常比它ArrayList自己的直接实现做更多的工作。不过,您应该再次对其进行更彻底的基准测试!

回答by user987339

As you can see from the codecontainsneeds O(n)iterations. If you re-implement your forloop to:

代码可以看出contains需要O(n)次迭代。如果您将for循环重新实现为:

for(int i=0; i < list.size(); i++){
    if(list.get(i).equals("b"))
        break;
}

you'll see dramatic improvement in your search time. So you can put blame on List iterator for the time overhead. Iteratorinstantiation and calls of nextand hasNextmethods are adding some milliseconds.

您会看到搜索时间的显着改善。因此,您可以将时间开销归咎于 List 迭代器。和方法的Iterator实例化和调用增加了一些毫秒。nexthasNext

回答by meriton

Writing a correct microbenchmarkis hard. If you use a better benchmark, you'll likely see that the difference between the approaches is minor - at least, the following benchmark is far more robust, and shows a mere 10% difference in execution time between the two approaches:

编写正确的微基准测试很难。如果您使用更好的基准,您可能会发现两种方法之间的差异很小——至少,以下基准要健壮得多,并且显示两种方法之间的执行时间仅相差 10%:

public abstract class Benchmark {

    final String name;

    public Benchmark(String name) {
        this.name = name;
    }

    abstract int run(int iterations) throws Throwable;

    private BigDecimal time() {
        try {
            int nextI = 1;
            int i;
            long duration;
            do {
                i = nextI;
                long start = System.nanoTime();
                run(i);
                duration = System.nanoTime() - start;
                nextI = (i << 1) | 1;
            } while (duration < 1000000000 && nextI > 0);
            return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
        } catch (Throwable e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public String toString() {
        return name + "\t" + time() + " ns";
    }

    public static void main(String[] args) throws Exception {
        final List<String> list = new ArrayList<String>();
        for (int i = 0; i < 1000; i++) {
            list.add("a");
        }

        Benchmark[] marks = {
            new Benchmark("contains") {
                @Override
                int run(int iterations) throws Throwable {
                    for (int i = 0; i < iterations; i++) {
                        if (list.contains("b")) {
                            return 1;
                        }
                    }
                    return 0;
                }
            },
            new Benchmark("loop") {
                @Override
                int run(int iterations) throws Throwable {
                    for (int i = 0; i < iterations; i++) {
                        for (String s : list) {
                            if (s.equals("b")) {
                                return 1;
                            }
                        }
                    }
                    return 0;
                }
            }
        };

        for (Benchmark mark : marks) {
            System.out.println(mark);
        }
    }
}

prints (on my dated notebook, on a Java 7 Oracle JVM in server mode):

打印(在我过时的笔记本上,在服务器模式下的 Java 7 Oracle JVM 上):

contains    10150.420 ns
loop        11363.640 ns

The slightly larger overhead of the loop is likely caused by the Iterator checking for concurrent modification and twice for the end of the list on every access, see the source code of java.util.ArrayList.Itr.next()for details.

循环稍大的开销可能是由迭代器检查并发修改和每次访问时对列表末尾进行两次检查造成的java.util.ArrayList.Itr.next(),详细信息请参见 的源代码。

Edit: With very short lists, the difference is more pronounced. For instance for a list of length 1:

编辑:对于非常短的列表,差异更加明显。例如对于长度为 1 的列表:

contains    15.316 ns
loop        69.401 ns

Still, nowhere near the 20:1 ratio your measurements indicate ...

尽管如此,您的测量结果表明的比例远不及 20:1...