为什么循环数组比 JavaScript 的原生 `indexOf` 快得多?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6682951/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 22:47:59  来源:igfitidea点击:

Why is looping through an Array so much faster than JavaScript's native `indexOf`?

javascriptperformance

提问by Lime

Why is looping through an Array so much faster than JavaScript's native indexOf? Is there an error or something that I'm not accounting for? I expected native implementations would be faster.

为什么遍历 Array 比 JavaScript 的 native 快得多indexOf?是否有错误或我没有考虑的事情?我预计本机实现会更快。

                For Loop        While Loop      indexOf
Chrome 10.0     50,948,997      111,272,979     12,807,549
Firefox 3.6     9,308,421       62,184,430      2,089,243
Opera 11.10     11,756,258      49,118,462      2,335,347   

http://jsben.ch/#/xm2BV

http://jsben.ch/#/xm2BV

回答by Pierre Maoui

5 years from then, lot of changes happened in browsers. Now, indexOf performance has increased and is definitely better than any other custom alternative.

5 年后,浏览器发生了很多变化。现在, indexOf 性能有所提高,绝对优于任何其他自定义替代方案。

Version 49.0.2623.87 (64-bit)

版本 49.0.2623.87(64 位)

Chrome Version 49.0.2623.87 (64-bit)

Chrome 版本 49.0.2623.87(64 位)

回答by Job

Ok, looking at the other benchmarks here I am scratching my head at the way that most developers seem to do their benchmarking.

好吧,看看这里的其他基准测试,我对大多数开发人员似乎进行基准测试的方式感到头疼。

Apologies, but the way it is done leads to horribly wrong conclusions, so I have to go a bit meta and give a comment on the answers provided.

道歉,但它的完成方式导致了可怕的错误结论,所以我必须有点元并对提供的答案发表评论。

What is wrong with the other benchmarks here

这里的其他基准有什么问题

Measuring where to find element 777 in an array that never changes, always leading to index 117 seems so inappropriate for obvious reasons, that I have trouble explaining why. You can't reasonably extrapolate anything from such an overly specific benchmark! The only analogy I can come up with is performing anthropological research on oneperson, and then calling the findings a generalized overview of the entire culture of the country that this person lives in. The other benchmarks aren't much better.

测量在一个永远不会改变的数组中找到元素 777 的位置,总是导致索引 117 似乎很不合适,原因很明显,我很难解释为什么。你不能从这样一个过于具体的基准中合理地推断出任何东西!我能想出的唯一类比是对一个人进行人类学研究,然后将调查结果称为对该人所居住国家的整个文化的概括概述。其他基准也好不到哪里去。

Even worse: the accepted answer is an image without a link to the benchmark that was used, so we have no way to control if the code for that benchmark is correct (I hopeit is a screenshot to a jsperf link that was originally in the question and later edited out in favour of the new jsben.ch link). It's not even an explanation of the original question: whyone performs better than the other (a highly debatable statement to begin with).

更糟糕的是:接受的答案是一张没有链接到所用基准的图像,因此我们无法控制该基准的代码是否正确(我希望它是最初在 jsperf 链接中的屏幕截图)问题,后来编辑掉以支持新的 jsben.ch 链接)。它甚至不是对原始问题的解释:为什么一个比另一个表现更好(一开始是一个非常有争议的陈述)。

First, you should know that not all benchmarking sites are created equal- some can add significant errors to certain types of measurements due to their own framework interfering with the timing.

首先,您应该知道并非所有的基准测试站点都是平等的——由于它们自己的框架干扰了时间,有些站点可能会给某些类型的测量增加重大错误。

Now, we are supposedto be comparing the performance of different ways to do linear search on an array. Think about the algorithm itself for a second:

现在,我们应该比较在数组上进行线性搜索的不同方法的性能。考虑一下算法本身:

  • look at a value for a given index into an array.
  • compare the value to another value.
    • if equal, return the index
    • if it is not equal, move to the next index and compare the next value.
  • 查看数组中给定索引的值。
  • 将该值与另一个值进行比较。
    • 如果相等,返回索引
    • 如果不相等,则移动到下一个索引并比较下一个值。

That's the whole linear search algorithm, right?

这就是整个线性搜索算法,对吧?

So some of the linked benchmarks compare sorted and unsorted arrays (sometimes incorrectly labeled "random", despite being in the same order each iteration - relevant XKCD). It should be obvious that this does not affect the above algorithm in any way- the comparison operator does not see that all values increase monotonically.

所以一些链接的基准比较排序和未排序的数组(有时错误地标记为“随机”,尽管每次迭代的顺序相同 -相关 XKCD)。很明显,这不会以任何方式影响上述算法- 比较运算符不会看到所有值单调增加。

Yes, ordered vs unsorted arrays can matter, when comparing the performance of linear searchto binaryor interpolation searchalgorithms, but nobody here is doing that!

是的,在比较线性搜索二分插值搜索算法的性能时,有序数组与未排序数组很重要,但这里没有人这样做!

Furthermore, all benchmarks shown use a fixed length array, with a fixed index into it. All that tells you is how quickly indexOffinds that exact index for that exact length - as stated above, you cannot generalise anything from this.

此外,显示的所有基准测试都使用一个固定长度的数组,其中有一个固定的索引。所有告诉您的是indexOf找到该确切长度的确切索引的速度- 如上所述,您无法从中概括出任何内容。

Here is the result of more-or-less copying the benchmark linked in the question to perf.zone (which is more reliable than jsben.ch), but with the following modifications:

这是将问题中链接的基准或多或少复制到 perf.zone(比 jsben.ch 更可靠)的结果,但进行了以下修改:

  • we pick a random value of the array each run, meaning we assume each element is as likely to be picked as any other
  • we benchmark for 100 and for 1000 elements
  • we compare integers and short strings.
  • 我们每次运行都选择数组的一个随机值,这意味着我们假设每个元素与其他元素一样有可能被选中
  • 我们针对 100 和 1000 个元素进行基准测试
  • 我们比较整数和短字符串。

https://run.perf.zone/view/for-vs-while-vs-indexof-100-integers-1516292563568

https://run.perf.zone/view/for-vs-while-vs-indexof-100-integers-1516292563568

https://run.perf.zone/view/for-vs-while-vs-indexof-1000-integers-1516292665740

https://run.perf.zone/view/for-vs-while-vs-indexof-1000-integers-1516292665740

https://run.perf.zone/view/for-vs-while-vs-indexof-100-strings-1516297821385

https://run.perf.zone/view/for-vs-while-vs-indexof-100-strings-1516297821385

https://run.perf.zone/view/for-vs-while-vs-indexof-1000-strings-1516293164213

https://run.perf.zone/view/for-vs-while-vs-indexof-1000-strings-1516293164213

Here are the results on my machine:

这是我机器上的结果:

https://imgur.com/a/fBWD9

https://imgur.com/a/fBWD9

As you can see, the result changes drasticallydepending on the benchmark and the browser being used, and each of the options wins in at least one of the scenarios: cached length vs uncached length, while loop vs for-loop vs indexOf.

如您所见,结果会因基准测试和所使用的浏览器而发生巨大变化,并且每个选项至少在以下一种情况下获胜:缓存长度 vs 未缓存长度,while 循环 vs for 循环 vs indexOf.

So there is no universal answer here, and this will surely change in the future as browsers and hardware changes as well.

所以这里没有统一的答案,而且随着浏览器和硬件的变化,这肯定会在未来发生变化。

Should you even be benchmarking this?

你甚至应该对此进行基准测试吗?

It should be noted that before you proceed to build benchmarks, you should determine whether or not the linear search part is a bottleneck to begin with! It probably isn't, and if it is, the better strategy is probably to use a different data structure for storing and retrieving your data anyway, and/or a different algorithm.

应该注意的是,在开始构建基准之前,您应该确定线性搜索部分是否是一个瓶颈!它可能不是,如果是,更好的策略可能是使用不同的数据结构来存储和检索您的数据,和/或不同的算法。

That is not to say that this question is irrelevant - it is rare, but it canhappen that linear search performance matters; I happen to have an example of that: establishing the speed of constructing/searching through a prefix trieconstructed through nested objects (using dictionary look-up) or nested arrays (requiring linear search).

那是不是说,这个问题是不相关的-它是罕见的,但它可以发生线性搜索性能事项; 我碰巧有一个例子:通过通过嵌套对象(使用字典查找)或嵌套数组(需要线性搜索)构造的前缀树来建立构造/搜索的速度。

As can be seen this github comment, the benchmarks involve various realistic and best/worst-case payloads on various browsers and platforms. Only after going through all that do I draw conclusions about expected performance. In my case, for most realistic situations the linear search through an array is faster than dictionary look-up, but worst-case performance is worse to the point of freezing the script (and easy to construct), so the implementation was marked as as an "unsafe" method to signal to others that they should think about the context the code would be used.

从这个 github 评论可以看出,基准测试涉及各种浏览器和平台上的各种现实和最佳/最坏情况的有效负载。只有在经历了所有这些之后,我才能得出关于预期性能的结论。就我而言,对于大多数实际情况,通过数组的线性搜索比字典查找要快,但最坏情况下的性能更差到冻结脚本(并且易于构建)的程度,因此实现被标记为一种“不安全”的方法,向其他人发出信号,告诉他们应该考虑使用代码的上下文。

Jon J's answeris also a good example of taking a step back to think about the real problem.

Jon J 的回答也是退一步思考真正问题的一个很好的例子。

What to do when you dohave to micro-benchmark

当您必须进行微基准测试时该怎么

So let's assume we know that we did our homework and established that we need to optimize our linear search.

所以让我们假设我们知道我们做了功课并确定我们需要优化我们的线性搜索。

What matters then is the eventual index at which we expect to find our element (if at all), the type of data being searched, and of course which browsers to support.

那么重要的是我们期望找到元素的最终索引(如果有的话),正在搜索的数据类型,当然还有要支持的浏览器。

In other words: is any index equally likely to be found (uniform distribution), or is it more likely to be centered around the middle (normal distribution)? Will be find our data at the start or near the end? Is our value guaranteed to be in the array, or only a certain percentage of the time? What percentage?

换句话说:找到任何指数的可能性是否相等(均匀分布),还是更有可能以中间为中心(正态分布)?会在开始还是接近结束时找到我们的数据?我们的值是保证在数组中,还是只在一定百分比的时间内?什么百分比?

Am I searching an array of strings? Objects Numbers? If they're numbers, are they floating point values or integers? Are we trying to optimize for older smartphones, up-to-date laptops, or school desktops stuck with IE10?

我在搜索字符串数组吗?对象编号?如果它们是数字,它们是浮点值还是整数?我们是否正在尝试针对旧智能手机、最新笔记本电脑或使用 IE10 的学校台式机进行优化?

This is another important thing: do not optimize for the best-case performance, optimize for realistic worst-case performance. If you are building a web-app where 10% of your customers use very old smart phones, optimize for that; their experience will be one that is unbearable with bad performance, while the micro-optimization is wasted on the newest generation flagship phones.

这是另一件重要的事情:不要针对最佳情况进行优化,而是针对实际的最坏情况进行优化。如果您正在构建一个 Web 应用程序,其中 10% 的客户使用非常旧的智能手机,请为此进行优化;他们的体验将是性能不佳的体验,而微优化则浪费在最新一代的旗舰手机上。

Ask yourself these questions about the data you are applying linear search to, and the context within which you do it. Then make test-cases fitting for these criteria, and test them on the browsers/hardware that represents the targets you are supporting.

问问自己这些关于您应用线性搜索的数据的问题,以及您在其中进行搜索的上下文。然后制作适合这些标准的测试用例,并在代表您支持的目标的浏览器/硬件上测试它们。

回答by Mark Kahn

Probably because the actual indexOf implementation is doing a lot more than just looping through the array. You can see the Firefox internal implementation of it here:

可能是因为实际的 indexOf 实现所做的不仅仅是遍历数组。你可以在这里看到它的 Firefox 内部实现:

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/indexOf

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/indexOf

There are several things that can slow down the loop that are there for sanity purposes:

出于理智的目的,有几件事可以减慢循环的速度:

  • The array is being re-cast to an Object
  • The fromIndexis being cast to a Number
  • They're using Math.maxinstead of a ternary
  • They're using Math.abs
  • 数组正在被重新转换为对象
  • fromIndex被转换为数值
  • 他们正在使用Math.max而不是三元
  • 他们正在使用 Math.abs

回答by shelman

indexOf does a bunch of type-checking and validation that the for loop and while loop ignore.

indexOf 做了一堆 for 循环和 while 循环忽略的类型检查和验证。

Here's the indexOf algorithm:

这是 indexOf 算法:

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/indexOf

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/indexOf

Edit: My guess is indexOf is faster for big arrays because it caches the length of the array before looping through it.

编辑:我的猜测是 indexOf 对于大数组更快,因为它在循环之前缓存数组的长度。

回答by Tyler Biscoe

Run the test one more time with the edits I've made.

使用我所做的编辑再运行一次测试。

I've increased the size of the array, and made the index you're searching for larger as well. It seems in large arrays indexOf may be a faster choice.

我增加了数组的大小,并使您正在搜索的索引也更大。似乎在大型数组中 indexOf 可能是一个更快的选择。

http://jsben.ch/#/xm2BV

http://jsben.ch/#/xm2BV

EDIT: Based on more tests, indexOf seems to run faster than a for loop in the version of Safari I'm using (5.0.3) and slower in just about everything else.

编辑:基于更多的测试,在我使用的 Safari 版本 (5.0.3) 中,indexOf 似乎比 for 循环运行得更快,而在其他所有方面都比 for 循环慢。

回答by Jon J

It might be worth noting that if all you are trying to do is keep a list of items and check for existence (e.g. avoid adding duplicate IDs to an array), it would be far faster to keep an OBJECT with keys that reflect each ID. If you think I'm wrong, compare the following with an array + indexOf. We are talking 181ms for the object method vs. 1 MINUTE for the array indexOf method.

可能值得注意的是,如果您尝试做的只是保留项目列表并检查是否存在(例如,避免将重复的 ID 添加到数组中),那么保留带有反映每个 ID 的键的 OBJECT 会快得多。如果您认为我错了,请将以下内容与数组 + indexOf 进行比较。我们说对象方法为 181 毫秒,而数组 indexOf 方法为 1 分钟。

var objs = []
var i_uid = {} // method 1
var a_uid = [] // method 2
var total_count = 100000, idLen = 5
var ts, te, cObj = 0

// method 1
ts = new Date()
while (cObj < total_count) {
    var u = uid(idLen),
        o = {
            uid: u,
            text: 'something',
            created: new Date()
        }
    if (!i_uid[u]) { // ensure unique uids only
        objs.push(o)
        i_uid[u] = cObj // current array position as placeholder
        cObj++
    }
    else {
        console.log('unique violation [duplicate uid', u, ']')
    }
}
te = new Date()
console.log('loaded ' + total_count + ' with object method in', (te - ts), 'ms')

i_uid = {} // free-up
cObj = 0 // reset
objs = [] // reset

// method 2
ts = new Date()
while (cObj < total_count) {
    var u = uid(idLen),
        o = {
            uid: u,
            text: 'something',
            created: new Date()
        }
    if (a_uid.indexOf(u) == -1) { // ensure unique uids only
        objs.push(o)
        a_uid.push(u)
        cObj++
    }
    else {
        console.log('unique violation [duplicate uid', u, ']')
    }
}
te = new Date()
console.log('loaded ' + total_count + ' with array + indexOf method in', (te - ts), 'ms')

function uid(l) {
    var t = '',
        p = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
        pl = p.length
    for (var i = 0; i < l; i++)
        t += p.charAt(Math.floor(Math.random() * pl))
    return t
}

回答by efkan

We can trust to a plain for loop at every single time.

我们每次都可以信任一个普通的 for 循环。

I wrote the code below to discover the answer of this question. It is also runnable. It shows that plain for loopis the best solution when considering performance.

我写了下面的代码来发现这个问题的答案。它也是可运行的。这表明for loop在考虑性能时,plain是最好的解决方案。

(The code also can be found on jsfiddle)

(代码也可以在jsfiddle找到

console.clear()

let a = []
// populating array data
for (let i = 0; i < 100000; i++) {
 a.push(i)
}

let testNum = 90000
let found
let totalMS4ForLoop = 0
let totalMS4IndexOf = 0
let start
let end
// simulating 10000 requests which are come consecutively
for (o = 0; o < 10000; o++) {

  start = Date.now()
  for (let i = 0; i < a.length; i++) {
    if (a[i] == testNum) { found = a[i]; break }
  }
  end = Date.now()
  totalMS4ForLoop += end - start

  start = Date.now()
  found = a[a.indexOf(testNum)]
  end = Date.now()
  totalMS4IndexOf += end - start

}

console.log("10000 for-loop executions took total " + totalMS4ForLoop + " ms.")
console.log("10000 indexOf executions took total " + totalMS4IndexOf + " ms.")