javascript 为什么 <= 在 V8 中使用此代码片段比 < 慢?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53643962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is <= slower than < using this code snippet in V8?
提问by Leonardo Physh
I am reading the slides Breaking the Javascript Speed Limit with V8, and there is an example like the code below. I cannot figure out why <=is slower than <in this case, can anybody explain that? Any comments are appreciated.
我正在阅读幻灯片Breaking the Javascript Speed Limit with V8,并且有一个类似于下面代码的示例。我无法弄清楚为什么<=比<这种情况下慢,有人可以解释一下吗?任何意见表示赞赏。
Slow:
慢的:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
(Hint: primes is an array of length prime_count)
(提示:primes 是一个长度为 prime_count 的数组)
Faster:
快点:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i < this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
[More Info]the speed improvement is significant, in my local environment test, the results are as follows:
【更多信息】速度提升显着,在我本地环境测试中,结果如下:
V8 version 7.3.0 (candidate)
Slow:
慢的:
time d8 prime.js
287107
12.71 user
0.05 system
0:12.84 elapsed
Faster:
快点:
time d8 prime.js
287107
1.82 user
0.01 system
0:01.84 elapsed
采纳答案by Mathias Bynens
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
我在 Google 从事 V8 工作,并希望在现有答案和评论之上提供一些额外的见解。
For reference, here's the full code example from the slides:
作为参考,这里是幻灯片中的完整代码示例:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the <and <=operators directly. So please don't jump through hoops just to avoid <=in your code because you read on Stack Overflow that it's slow --- it isn't!
首先,性能差异与<和<=运算符没有直接关系。所以请不要为了避免<=在你的代码中跳过箍,因为你在 Stack Overflow 上读到它很慢——它不是!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes:
其次,人们指出阵列是“有洞的”。这在 OP 帖子中的代码片段中并不清楚,但是当您查看初始化的代码时就很清楚了this.primes:
this.primes = new Array(iterations);
This results in an array with a HOLEYelements kindin V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i]in the loop within isPrimeDivisible. No big deal!
这会导致数组在 V8 中具有HOLEY元素种类,即使数组最终完全填充/压缩/连续。一般而言,上多孔阵列操作比上堆积阵列的操作速度较慢,但在这种情况下的差是可忽略的:它相当于1个额外的SMI(小的整数)检查(以防止孔),每个我们击中时间this.primes[i]在内的循环isPrimeDivisible。没什么大不了的!
TL;DR The array being HOLEYis not the problem here.
TL;DR数组HOLEY不是这里的问题。
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact.What's so special about this particular case, then?
其他人指出代码读取越界。通常建议避免读取超出数组的长度,在这种情况下,它确实可以避免性能的大幅下降。但为什么?V8 可以处理其中一些越界场景,而对性能的影响很小。那么这个特殊案例有什么特别之处呢?
The out-of-bounds read results in this.primes[i]being undefinedon this line:
在出界外的阅读结果this.primes[i]是undefined在这条线:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the %operator is now being used with non-integer operands!
这给我们带来了真正的问题:%运算符现在与非整数操作数一起使用!
integer % someOtherIntegercan be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.integer % undefinedon the other hand amounts to a way less efficientFloat64Mod, sinceundefinedis represented as a double.
integer % someOtherInteger可以非常有效地计算;JavaScript 引擎可以为这种情况生成高度优化的机器代码。integer % undefined另一方面相当于一种效率较低的方式Float64Mod,因为undefined被表示为双精度。
The code snippet can indeed be improved by changing the <=into <on this line:
确实可以通过更改此行中的<=into来改进代码片段<:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <=is somehow a superior operator than <, but just because this avoids the out-of-bounds read in this particular case.
...不是因为<=在某种程度上比<更高级的运算符,而只是因为这避免了在这种特殊情况下越界读取。
回答by Michael Geary
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
其他答案和评论提到两个循环之间的区别在于第一个循环比第二个循环多执行一次迭代。这是真的,但在增长到 25,000 个元素的数组中,一次迭代或多或少只会产生微小的差异。作为一个大概的猜测,如果我们假设它增长时的平均长度是 12,500,那么我们可能期望的差异应该在 1/12,500 左右,或者只有 0.008%。
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
这里的性能差异远大于一次额外迭代所解释的,并且在演示接近尾声时解释了问题。
this.primesis a contiguous array (every element holds a value) and the elements are all numbers.
this.primes是一个连续数组(每个元素都有一个值)并且元素都是数字。
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
JavaScript 引擎可能会将这样的数组优化为一个简单的实际数字数组,而不是一个包含数字但可能包含其他值或没有值的对象数组。第一种格式的访问速度要快得多:它需要更少的代码,而且数组要小得多,因此它更适合缓存。但是有一些条件可能会阻止使用这种优化的格式。
One condition would be if some of the array elements are missing. For example:
一种情况是缺少某些数组元素。例如:
let array = [];
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]? It has no value. (It isn't even correct to say it has the value undefined- an array element containing the undefinedvalue is different from an array element that is missing entirely.)
现在的价值是a[1]多少?它没有价值。(说它具有值甚至是不正确的undefined- 包含该undefined值的数组元素与完全缺失的数组元素不同。)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1]contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
没有办法仅用数字表示这一点,因此 JavaScript 引擎被迫使用优化程度较低的格式。如果a[1]像其他两个元素一样包含一个数值,则该数组可能会被优化为仅由数字组成的数组。
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
数组被强制为去优化格式的另一个原因可能是如果您尝试访问数组边界之外的元素,如演示中所讨论的。
The first loop with <=attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
第一个循环<=尝试读取数组末尾的元素。该算法仍然正常工作,因为在最后一次额外的迭代中:
this.primes[i]evaluates toundefinedbecauseiis past the array end.candidate % undefined(for any value ofcandidate) evaluates toNaN.NaN == 0evaluates tofalse.- Therefore, the
return trueis not executed.
this.primes[i]评估为undefined因为i已超过数组末尾。candidate % undefined(对于 的任何值candidate)的计算结果为NaN。NaN == 0评估为false.- 因此,
return true不执行。
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
所以就好像额外的迭代从未发生过一样——它对其余的逻辑没有影响。该代码产生的结果与没有额外迭代的结果相同。
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
但是为了到达那里,它尝试读取数组末尾之后不存在的元素。这迫使阵列退出优化——或者至少在本次演讲时如此。
The second loop with <reads only elements that exist within the array, so it allows an optimized array and code.
第二个循环<只读取数组中存在的元素,因此它允许优化数组和代码。
The problem is described in pages 90-91of the talk, with related discussion in the pages before and after that.
这个问题在演讲的第 90-91 页有描述,在之前和之后的页面中都有相关的讨论。
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even readpast the end of an array, it would prevent the simple optimized format from being used.
我碰巧参加了这个非常 Google I/O 的演讲,然后与演讲者(V8 的作者之一)进行了交谈。我一直在自己的代码中使用一种技术,该技术涉及读取数组末尾作为优化特定情况的误导(事后看来)尝试。他证实,如果你试图读取数组的末尾,它会阻止使用简单的优化格式。
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
如果 V8 作者所说的仍然正确,那么读取数组的末尾将阻止它被优化,并且它必须回退到较慢的格式。
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
现在有可能同时改进 V8 以有效地处理这种情况,或者其他 JavaScript 引擎以不同的方式处理它。我不知道其中的一种或另一种方式,但这种去优化正是演示文稿所谈论的内容。
回答by GitaarLAB
TL;DRThe slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
TL;DR较慢的循环是由于访问数组“越界”,这要么强制引擎以较少甚至没有优化的方式重新编译函数,要么不使用任何这些优化开始编译函数(如果(JIT-)编译器在第一次编译“版本”之前检测到/怀疑这种情况,请阅读下面的原因;
有人只是 has不得不这样说(完全惊讶没有人已经这样做了):
曾经有一段时间,OP 的代码片段将成为初学者编程书中的一个事实上的例子,旨在概述/强调 javascript 中的“数组”从 0 开始索引,而不是 1,因此可用作常见“初学者错误”的示例(您不喜欢我如何避免使用“编程错误”一词
;);)):out-of-bounds Array access越界数组访问。Example 1:
a Dense Array(being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
实施例1:
一个Dense Array(是连续的(在索引之间没有间隙的装置),实际上在每个索引处的元素)使用基于0的索引5种元素的(总是在ES262)。
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between <vs <=(or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
因此,我们并不是真正在谈论<vs <=(或“一次额外迭代”)之间的性能差异,而是在谈论:
“为什么正确的代码段 (b) 比错误的代码段 (a) 运行得更快”?
The answer is 2-fold(although from a ES262 language implementer's perspective both are forms of optimization):
答案是 2 倍(尽管从 ES262 语言实现者的角度来看,两者都是优化形式):
- Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
- Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
- 数据表示:如何在内存中内部表示/存储数组(对象、哈希图、“真实”数值数组等)
- 函数式机器代码:如何编译访问/处理(读取/修改)这些“数组”的代码
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
第 1 项已被接受的答案充分(并且恕我直言)充分解释,但仅在第 2 项:编译上花费了 2 个字(“代码”)。
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
更准确地说:JIT-Compilation,更重要的是JIT- RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language. And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results. A spec conforming engine should give spec conforming results for any defined input.
语言规范基本上只是对一组算法的描述(“为实现定义的最终结果而执行的步骤”)。事实证明,这是描述语言的一种非常漂亮的方式。并且它将引擎用于实现特定结果的实际方法留给了实施者,从而有足够的机会提出更有效的方法来产生定义的结果。符合规范的引擎应该为任何定义的输入提供符合规范的结果。
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
现在,随着 javascript 代码/库/使用量的增加,并记住“真正的”编译器使用了多少资源(时间/内存/等),很明显我们不能让访问网页的用户等待那么长时间(并要求他们拥有那么多可用资源)。
Imagine the following simple function:
想象一下下面的简单函数:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr...
非常清楚,对吧?不需要任何额外的说明,对吧?返回类型是Number,对吧?
嗯..不,不&不......这取决于你传递给命名函数参数的参数arr......
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations... We don't even know what kind of TYPE the function RETURN until we are done...
看到问题了吗?然后考虑一下这只是勉强刮掉了大量可能的排列……在我们完成之前,我们甚至不知道函数 RETURN 的类型是什么……
Now imagine this same function-codeactually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
现在想象一下,这个相同的函数代码实际上被用于不同类型甚至是输入的变体,无论是完全字面上(在源代码中)描述的还是在程序中动态生成的“数组”。
Thus, if you were to compile function sumJUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
因此,如果您只编译一次函数sum,那么对于任何和所有类型的输入总是返回规范定义的结果的唯一方法显然只有通过执行所有规范规定的主要和子步骤才能保证符合规范的结果(就像一个未命名的 pre-y2k 浏览器)。没有优化(因为没有假设)和死慢的解释性脚本语言仍然存在。
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
JIT-Compilation(JIT 就像 Just In Time)是当前流行的解决方案。
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
因此,您开始使用有关函数执行、返回和接受内容的假设来编译该函数。
你想出尽可能简单的检查来检测函数是否可能开始返回不符合规范的结果(比如因为它收到了意外的输入)。然后,丢弃之前的编译结果并重新编译为更复杂的结果,决定如何处理您已有的部分结果(是否可信或再次计算是否有效),将函数重新绑定到程序中并再试一次。最终回到规范中的逐步脚本解释。
All of this takes time!
这一切都需要时间!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
所有浏览器都在它们的引擎上工作,对于每个子版本,您都会看到情况有所改善和倒退。字符串在历史上的某个时刻确实是不可变的字符串(因此 array.join 比字符串连接更快),现在我们使用绳索(或类似的)来缓解这个问题。两者都返回符合规范的结果,这才是最重要的!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
长话短说:仅仅因为 javascript 语言的语义经常得到我们的支持(就像 OP 示例中的这个无声错误)并不意味着“愚蠢”的错误会增加我们编译器吐出快速机器代码的机会。它假设我们编写了“通常”正确的指令:我们“用户”(编程语言的)当前必须拥有的口头禅是:帮助编译器,描述我们想要的东西,支持常用习语(从 asm.js 获取提示以获得基本理解哪些浏览器可以尝试优化以及为什么)。
Because of this, talking about performance is both important BUT ALSO a mine-field(and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
正因为如此,谈论性能既重要又是一个雷区(并且由于上述雷区,我真的想以指向(并引用)一些相关材料作为结尾:
Access to nonexistent object properties and out of bounds array elements returns the
undefinedvalue instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code....
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
访问不存在的对象属性和越界数组元素将返回
undefined值而不是引发异常。这些动态特性使得用 JavaScript 编程很方便,但也使得将 JavaScript 编译成高效的机器码变得困难。...
有效 JIT 优化的一个重要前提是程序员系统地使用 JavaScript 的动态特性。例如,JIT 编译器利用这样一个事实,即对象属性通常以特定顺序添加到给定类型的对象中,或者越界数组访问很少发生。JIT 编译器利用这些规律性假设在运行时生成高效的机器代码。如果代码块满足这些假设,JavaScript 引擎就会执行高效的生成机器代码。否则,引擎必须退回到较慢的代码或解释程序。
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
来源:
“JITProf: Pinpointing JIT-unfriendly JavaScript Code”
伯克利出版物,2014 年,Liang Gong、Michael Pradel、Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
ASM.JS(也不喜欢出界数组访问):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
- unboxed representations of integers and floating-point numbers;
- absence of runtime type checks;
- absence of garbage collection; and
- efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
提前编译
因为 asm.js 是 JavaScript 的严格子集,所以本规范只定义了验证逻辑——执行语义只是 JavaScript 的语义。但是,经过验证的 asm.js 适合提前 (AOT) 编译。此外,AOT 编译器生成的代码非常高效,具有以下特点:
- 整数和浮点数的未装箱表示;
- 缺少运行时类型检查;
- 没有垃圾收集;和
- 高效的堆加载和存储(实施策略因平台而异)。
无法验证的代码必须通过传统方式返回执行,例如解释和/或即时 (JIT) 编译。
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
最后是https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
是否有一小部分关于引擎在删除边界时的内部性能改进 -检查(虽然只是在循环外解除边界检查已经有 40% 的改进)。
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
编辑:
请注意,多个来源讨论了不同级别的 JIT 重新编译直至解释。
Theoretical examplebased on above information, regarding the OP's snippet:
基于上述信息的理论示例,关于 OP 的片段:
- Call to isPrimeDivisible
- Compile isPrimeDivisible using general assumptions (like no out of bounds access)
- Do work
- BAM, suddenly array accesses out of bounds (right at the end).
- Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
- Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
- Return result
- 调用 isPrimeDivisible
- 使用一般假设编译 isPrimeDivisible(例如没有越界访问)
- 做工作
- BAM,突然数组访问越界(就在最后)。
- 废话,引擎说,让我们使用不同的(较少的)假设重新编译 isPrimeDivisible,这个示例引擎不会试图弄清楚它是否可以重用当前的部分结果,所以
- 使用较慢的函数重新计算所有工作(希望它完成,否则重复,这次只解释代码)。
- 返回结果
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
因此,时间是:
第一次运行(最后失败)+ 为每次迭代使用较慢的机器代码重新完成所有工作+ 重新编译等。在这个理论示例中显然需要超过 2 倍的时间 !
EDIT 2:(disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
编辑 2:(免责声明:基于以下事实的推测)
我想得越多,我就越认为这个答案实际上可能解释了对错误片段 a(或片段 b 的性能奖励)的这种“惩罚”的更主要的原因,取决于你怎么想),正是为什么我坚持称它(片段a)为编程错误:
It's pretty tempting to assume that this.primesis a 'dense array' pure numerical which was either
很容易假设这this.primes是一个“密集数组”纯数字,要么
- Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler beforecompile-time) OR
- most likely generated using a numerical function filling a pre-sized (
new Array(/*size value*/)) in ascending sequential order (another long-time known candidate to become a 'real' array).
- 源代码中的硬编码文字(已知的优秀候选成为“真实”数组,因为编译器在编译时之前已经知道所有内容)或
- 最有可能使用数字函数
new Array(/*size value*/)以升序填充预先确定大小的 ( )生成(另一个长期已知的候选成为“真实”数组)。
We also know that the primesarray's length is cachedas prime_count! (indicating it's intent and fixed size).
我们也知道primes数组的长度缓存为prime_count! (表明它的意图和固定大小)。
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
我们还知道,大多数引擎最初将数组作为修改时复制(在需要时)传递,这使得处理它们的速度更快(如果您不更改它们)。
It is therefore reasonable to assume that Array primesis most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as ifit was a Typed Array.
因此可以合理地假设 Arrayprimes很可能在内部已经是一个优化的数组,它在创建后不会改变(如果在创建后没有代码修改数组,编译器很容易知道),因此已经是(如果适用于引擎)以优化的方式存储,就好像它是一个Typed Array.
As I have tried to make clear with my sumfunction example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a Stringto the sumfunction shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sumshould compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
正如我试图用我的sum函数示例阐明的那样,传递的参数极大地影响了实际需要发生的事情,以及如何将特定代码编译为机器代码。将 a 传递String给sum函数不应更改字符串,而是更改函数的 JIT 编译方式!传递一个 Arraysum应该编译一个不同的(对于这种类型甚至可能是额外的,或者他们称之为“形状”,被传递的对象的)版本的机器代码。
As it seems slightly bonkus to convert the Typed_Array-like primesArray on-the-fly to something_else while the compiler knows this function is not even going to modify it!
因为primes在编译器知道这个函数甚至不会修改它的情况下,将类似 Typed_Array 的Array即时转换为 something_else 似乎有点疯狂!
Under these assumptions that leaves 2 options:
在这些假设下,剩下 2 个选项:
- Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
- Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
- 假设没有越界,编译为数字处理器,最后遇到越界问题,重新编译并重做工作(如上面编辑 1 中的理论示例所述)
- 编译器已经预先检测到(或怀疑?)越界访问,并且该函数是 JIT 编译的,就好像传递的参数是一个稀疏对象,导致功能机器代码变慢(因为它会有更多的检查/转换/强制等等。)。换句话说:该函数永远不符合某些优化条件,它被编译为好像它收到了一个“稀疏数组”(类似)的参数。
I now really wonder which of these 2 it is!
我现在真的想知道这两个是哪一个!
回答by Nathan Adams
To add some scientificness to it, here's a jsperf
为了增加一些科学性,这里有一个 jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
它测试填充整数的数组的控制情况,并在保持在边界内的同时循环执行模算术。它有5个测试用例:
- 1. Looping out of bounds
- 2. Holey arrays
- 3. Modular arithmetic against NaNs
- 4. Completely undefined values
- 5. Using a
new Array()
- 1.循环越界
- 2. 孔阵列
- 3. 针对 NaN 的模块化算法
- 4. 完全未定义的值
- 5. 使用
new Array()
It shows that the first 4 cases are reallybad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array()case is almost as good as the raw array, just a few percent slower.
它表明前 4 种情况对性能非常不利。越界循环比其他 3 个要好一些,但所有 4 个都比最好的情况慢了大约 98%。
这种new Array()情况几乎和原始数组一样好,只是慢了几个百分点。

