为什么 JavaScript 看起来比 C++ 快 4 倍?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17036059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does JavaScript appear to be 4 times faster than C++?
提问by streaver91
For a long time, I had thought of C++ being faster than JavaScript. However, today I made a benchmark script to compare the speed of floating point calculations in the two languages and the result is amazing!
很长一段时间以来,我一直认为 C++ 比 JavaScript 更快。然而,今天我做了一个基准脚本来比较两种语言的浮点计算速度,结果令人惊叹!
JavaScript appears to be almost 4 times faster than C++!
JavaScript 似乎比 C++ 快 4 倍!
I let both of the languages to do the same job on my i5-430M laptop, performing a = a + b
for 100000000 times. C++ takes about 410 ms, while JavaScript takes only about 120 ms.
我让这两种语言在我的 i5-430M 笔记本电脑上做同样的工作,执行a = a + b
了 100000000 次。C++ 大约需要 410 毫秒,而 JavaScript 只需要大约 120 毫秒。
I really do not have any idea why JavaScript runs so fast in this case. Can anyone explain that?
我真的不知道为什么 JavaScript 在这种情况下运行得如此之快。谁能解释一下?
The code I used for the JavaScript is (run with Node.js):
我用于 JavaScript 的代码是(使用 Node.js 运行):
(function() {
var a = 3.1415926, b = 2.718;
var i, j, d1, d2;
for(j=0; j<10; j++) {
d1 = new Date();
for(i=0; i<100000000; i++) {
a = a + b;
}
d2 = new Date();
console.log("Time Cost:" + (d2.getTime() - d1.getTime()) + "ms");
}
console.log("a = " + a);
})();
And the code for C++ (compiled by g++) is:
而 C++ 的代码(由 g++ 编译)是:
#include <stdio.h>
#include <ctime>
int main() {
double a = 3.1415926, b = 2.718;
int i, j;
clock_t start, end;
for(j=0; j<10; j++) {
start = clock();
for(i=0; i<100000000; i++) {
a = a + b;
}
end = clock();
printf("Time Cost: %dms\n", (end - start) * 1000 / CLOCKS_PER_SEC);
}
printf("a = %lf\n", a);
return 0;
}
回答by paxdiablo
I may have some bad news for you if you're on a Linuxsystem (which complies with POSIX at least in this situation). The clock()
call returns number of clock ticks consumed by the program and scaled by CLOCKS_PER_SEC
, which is 1,000,000
.
如果您使用的是Linux系统(至少在这种情况下符合 POSIX),我可能会给您带来一些坏消息。该clock()
调用返回程序消耗的时钟滴答数并按 缩放CLOCKS_PER_SEC
,即1,000,000
。
That means, if you're onsuch a system, you're talking in microsecondsfor C and millisecondsfor JavaScript (as per the JS online docs). So, rather than JS being four times faster, C++ is actually 250 times faster.
这意味着,如果你在这样的系统上,你在谈论C 的微秒和JavaScript 的毫秒(根据JS 在线文档)。因此,与 JS 快 4 倍相比,C++ 实际上快了 250 倍。
Now it may be that you're on a system where CLOCKS_PER_SECOND
is something other than a million, you can run the following program on your system to see if it's scaled by the same value:
现在可能您在一个CLOCKS_PER_SECOND
不是一百万的系统上,您可以在您的系统上运行以下程序以查看它是否按相同的值进行缩放:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define MILLION * 1000000
static void commaOut (int n, char c) {
if (n < 1000) {
printf ("%d%c", n, c);
return;
}
commaOut (n / 1000, ',');
printf ("%03d%c", n % 1000, c);
}
int main (int argc, char *argv[]) {
int i;
system("date");
clock_t start = clock();
clock_t end = start;
while (end - start < 30 MILLION) {
for (i = 10 MILLION; i > 0; i--) {};
end = clock();
}
system("date");
commaOut (end - start, '\n');
return 0;
}
The output on my box is:
我的盒子上的输出是:
Tuesday 17 November 11:53:01 AWST 2015
Tuesday 17 November 11:53:31 AWST 2015
30,001,946
showing that the scaling factor is a million. If you run that program, or investigate CLOCKS_PER_SEC
and it's nota scaling factor of one million, you need to look at some other things.
表明比例因子是一百万。如果您运行该程序,或进行调查CLOCKS_PER_SEC
,但它不是一百万的比例因子,您需要查看其他一些内容。
The first step is to ensure your code is actually being optimised by the compiler. That means, for example, setting -O2
or -O3
for gcc
.
第一步是确保您的代码实际上正在被编译器优化。这意味着,例如,设置-O2
或-O3
为gcc
。
On my system with unoptimised code, I see:
在未优化代码的系统上,我看到:
Time Cost: 320ms
Time Cost: 300ms
Time Cost: 300ms
Time Cost: 300ms
Time Cost: 300ms
Time Cost: 300ms
Time Cost: 300ms
Time Cost: 300ms
Time Cost: 300ms
Time Cost: 300ms
a = 2717999973.760710
and it's three times faster with -O2
, albeit with a slightly different answer, though only by about one millionth of a percent:
用 快三倍-O2
,尽管答案略有不同,尽管只有大约百分之一:
Time Cost: 140ms
Time Cost: 110ms
Time Cost: 100ms
Time Cost: 100ms
Time Cost: 100ms
Time Cost: 100ms
Time Cost: 100ms
Time Cost: 100ms
Time Cost: 100ms
Time Cost: 100ms
a = 2718000003.159864
That would bring the two situations back on par with each other, something I'd expect since JavaScript is not some interpreted beast like in the old days, where each token is interpreted whenever it's seen.
这将使这两种情况恢复一致,这是我所期望的,因为 JavaScript 不像过去那样是某种解释型野兽,在过去,每个标记在看到时都会被解释。
Modern JavaScript engines (V8, Rhino, etc) can compile the code to an intermediate form (or even to machine language) which may allow performance roughly equal with compiled languages like C.
现代 JavaScript 引擎(V8、Rhino 等)可以将代码编译为中间形式(甚至机器语言),这可能使性能与 C 等编译语言大致相当。
But, to be honest, you don't tend to choose JavaScript or C++ for its speed, you choose them for their areas of strength. There aren't many C compilers floating around inside browsers and I've not noticed many operating systems nor embedded apps written in JavaScript.
但是,老实说,您不会因为速度而选择 JavaScript 或 C++,而是因为它们的优势领域而选择它们。浏览器中并没有很多 C 编译器,我也没有注意到许多操作系统或用 JavaScript 编写的嵌入式应用程序。
回答by Jerry Coffin
Doing a quick test with turning on optimization, I got results of about 150 ms for an ancient AMD 64 X2 processor, and about 90 ms for a reasonably recent Intel i7 processor.
通过打开优化进行快速测试,我得到了一个古老的 AMD 64 X2 处理器大约 150 毫秒的结果,以及一个相当新的 Intel i7 处理器大约 90 毫秒的结果。
Then I did a little more to give some idea of one reason you might want to use C++. I unrolled four iterations of the loop, to get this:
然后我做了更多的工作,以了解您可能想要使用 C++ 的一个原因。我展开循环的四次迭代,得到这个:
#include <stdio.h>
#include <ctime>
int main() {
double a = 3.1415926, b = 2.718;
double c = 0.0, d=0.0, e=0.0;
int i, j;
clock_t start, end;
for(j=0; j<10; j++) {
start = clock();
for(i=0; i<100000000; i+=4) {
a += b;
c += b;
d += b;
e += b;
}
a += c + d + e;
end = clock();
printf("Time Cost: %fms\n", (1000.0 * (end - start))/CLOCKS_PER_SEC);
}
printf("a = %lf\n", a);
return 0;
}
This let the C++ code run in about 44ms on the AMD (forgot to run this version on the Intel). Then I turned on the compiler's auto-vectorizer (-Qpar with VC++). This reduced the time a little further still, to about 40 ms on the AMD, and 30 ms on the Intel.
这让 C++ 代码在 AMD 上运行大约 44ms(忘记在 Intel 上运行这个版本)。然后我打开编译器的自动矢量化器(-Qpar 与 VC++)。这进一步减少了时间,在 AMD 上减少到大约 40 毫秒,在 Intel 上减少到 30 毫秒。
Bottom line: if you want to use C++, you really need to learn how to use the compiler. If you want to get really good results, you probably also want to learn how to write better code.
底线:如果你想使用 C++,你真的需要学习如何使用编译器。如果您想获得非常好的结果,您可能还想学习如何编写更好的代码。
I should add: I didn't attempt to test a version under Javascript with the loop unrolled. Doing so might provide a similar (or at least some) speed improvement in JS as well. Personally, I think making the code fast is a lot more interesting than comparing Javascript to C++.
我应该补充一点:我没有尝试在 Javascript 下测试一个版本并展开循环。这样做也可能在 JS 中提供类似(或至少部分)的速度改进。就个人而言,我认为使代码快速比将 Javascript 与 C++ 进行比较有趣得多。
If you want code like this to run fast, unroll the loop (at least in C++).
如果您希望这样的代码运行得更快,请展开循环(至少在 C++ 中)。
Since the subject of parallel computing arose, I thought I'd add another version using OpenMP. While I was at it, I cleaned up the code a little bit, so I could keep track of what was going on. I also changed the timing code a bit, to display the overall time instead of the time for each execution of the inner loop. The resulting code looked like this:
由于并行计算的主题出现,我想我会使用 OpenMP 添加另一个版本。当我在做的时候,我稍微清理了代码,这样我就可以跟踪发生了什么。我还稍微更改了计时代码,以显示总时间而不是每次执行内循环的时间。生成的代码如下所示:
#include <stdio.h>
#include <ctime>
int main() {
double total = 0.0;
double inc = 2.718;
int i, j;
clock_t start, end;
start = clock();
#pragma omp parallel for reduction(+:total) firstprivate(inc)
for(j=0; j<10; j++) {
double a=0.0, b=0.0, c=0.0, d=0.0;
for(i=0; i<100000000; i+=4) {
a += inc;
b += inc;
c += inc;
d += inc;
}
total += a + b + c + d;
}
end = clock();
printf("Time Cost: %fms\n", (1000.0 * (end - start))/CLOCKS_PER_SEC);
printf("a = %lf\n", total);
return 0;
}
The primary addition here is the following (admittedly somewhat arcane) line:
这里的主要补充是以下(不可否认有点神秘)的行:
#pragma omp parallel for reduction(+:total) firstprivate(inc)
This tells the compiler to execute the outer loop in multiple threads, with a separate copy of inc
for each thread, and adding together the individual values of total
after the parallel section.
这告诉编译器在多个线程中执行外循环,inc
每个线程都有一个单独的副本,并将total
并行部分之后的各个值加在一起。
The result is about what you'd probably expect. If we don't enable OpenMP with the compiler's -openmp
flag, the reported time is about 10 times what we saw for individual executions previously (409 ms for the AMD, 323 MS for the Intel). With OpenMP turned on, the times drop to 217 ms for the AMD, and 100 ms for the Intel.
结果与您可能期望的差不多。如果我们不使用编译器的-openmp
标志启用 OpenMP ,报告的时间大约是我们之前看到的单个执行时间的 10 倍(AMD 为 409 毫秒,英特尔为 323 毫秒)。打开 OpenMP 后,AMD 的时间下降到 217 毫秒,Intel 的时间下降到 100 毫秒。
So, on the Intel the original version took 90ms for one iteration of the outer loop. With this version we're getting just slightly longer (100 ms) for all 10 iterations of the outer loop -- an improvement in speed of about 9:1. On a machine with more cores, we could expect even more improvement (OpenMP will normally take advantage of all available cores automatically, though you can manually tune the number of threads if you want).
因此,在 Intel 上,原始版本的外循环迭代需要 90 毫秒。在这个版本中,我们的外循环的所有 10 次迭代的时间都略长(100 毫秒)——速度提高了大约 9:1。在具有更多内核的机器上,我们可以期待更多改进(OpenMP 通常会自动利用所有可用内核,但您可以根据需要手动调整线程数)。
回答by Raymund Hofmann
This is a polarizing topic, so one may have a look at:
这是一个两极分化的话题,所以你可以看看:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/
https://benchmarksgame-team.pages.debian.net/benchmarksgame/
Benchmarking all kinds of languages.
对各种语言进行基准测试。
Javascript V8 and such are surely doing a good job for simple loops as in the example, probably generating very similar machine code. For most "close to the user" applications Javscript surely is the better choice, but keep in mind the memory waste and the many times unavoidable performance hit (and lack of control) for more complicated algorithms/applications.
Javascript V8 等对于简单循环肯定做得很好,如示例中所示,可能生成非常相似的机器代码。对于大多数“接近用户”的应用程序,Javscript 肯定是更好的选择,但请记住,对于更复杂的算法/应用程序,内存浪费和多次不可避免的性能下降(和缺乏控制)。
回答by Felix Bertoni
Even if the post is old, I think it may be interesting to add some information. In summary, your test is too vague and may be biased.
即使帖子很旧,我认为添加一些信息可能会很有趣。总而言之,你的测试过于模糊,可能有偏见。
A bit about speed testing methodology
关于速度测试方法的一点
When comparing speed of two languages, you first have to define precisely in which context you want to compare how they perform.
在比较两种语言的速度时,您首先必须准确定义要在哪种上下文中比较它们的表现。
"naive" vs "optimized" code : whether or not code tested is made by a beginner or expert programmer. This parameter matter matter depending on who will participate in your project. For example, when working with scientists (non geeky ones), you will look more for "naive" code performance, because scientists aren't forcibly good programmers.
authorized compile time : whether you consider you allow the code to build for long or not. This parameter can matter depending on your project management methodology. If you need to do automated tests, maybe trading a bit of speed to increase compile time can be interesting. On the other hand, you can consider that distribution version is allowing a high amount of building time.
Platform portability : if your speed shall be compared on one platform or more (Windows, Linux, PS4...)
Compiler/interpreter portability : if your code's speed shall be compiler/interpreter independent or not. Can be useful for multiplatform and/or open source projects.
Other specialized parameters, as for example if you allow dynamic allocations in your code, if you want to enable plugins (dynamically loaded library at runtime) etc.
“天真”与“优化”代码:测试的代码是否由初学者或专家程序员编写。这个参数很重要,取决于谁将参与你的项目。例如,当与科学家(非极客)一起工作时,您会更多地寻找“天真的”代码性能,因为科学家并不是绝对优秀的程序员。
授权编译时间:无论您是否考虑允许代码构建很长时间。此参数可能很重要,具体取决于您的项目管理方法。如果您需要进行自动化测试,也许通过提高一些速度来增加编译时间会很有趣。另一方面,您可以考虑分发版本允许大量构建时间。
平台可移植性:如果您的速度需要在一个或多个平台上进行比较(Windows、Linux、PS4...)
编译器/解释器可移植性:您的代码速度是否应与编译器/解释器无关。可用于多平台和/或开源项目。
其他专用参数,例如,如果您允许在代码中动态分配,如果您想启用插件(在运行时动态加载的库)等。
Then, you have to make sure that your code is representative of what you want to test
然后,您必须确保您的代码代表您要测试的内容
Here, (I assume you didn't compiled C++ with optimization flags), you are testing fast-compile speed of "naive" (not so naive actually) code. Because your loop is fixed size, with fixed data, you don't test dynamic allocations, and you -supposedly- allow code transformations (more on that in the next section). And effectively, JavaScript performs usually better than C++ in this case, because JavaScript optimizes at compile time by default, while C++ compilers needs to be told to optimize.
在这里,(我假设您没有使用优化标志编译 C++),您正在测试“天真”(实际上并不那么天真)代码的快速编译速度。因为您的循环大小固定,数据固定,所以您不测试动态分配,而且您 - 据说 - 允许代码转换(下一节将详细介绍)。实际上,在这种情况下,JavaScript 通常比 C++ 执行得更好,因为 JavaScript 默认在编译时进行优化,而需要告诉 C++ 编译器进行优化。
A quick overview of C++ speed increase with parameters
使用参数提高 C++ 速度的快速概览
Because I am not knowledgeable enough about JavaScript, I'll only show how code optimization and compilation type can change c++ speed on a fixed for loop, hoping it will answer the question on "how JS can appear to be faster than C++ ?"
因为我对 JavaScript 不够了解,所以我只会展示代码优化和编译类型如何在固定的 for 循环上改变 C++ 的速度,希望它能回答关于“JS 如何看起来比 C++ 更快?”的问题。
For that let's use Matt Godbolt's C++ compiler explorerto see the assembly code generated by gcc9.2
为此,让我们使用 Matt Godbolt 的 C++编译器资源管理器查看 gcc9.2 生成的汇编代码
Non optimized code
非优化代码
float func(){
float a(0.0);
float b(2.71);
for (int i = 0; i < 100000; ++i){
a = a + b;
}
return a;
}
compiled with : gcc 9.2, flag -O0. Produces the following assembly code :
编译时使用:gcc 9.2,标志 -O0。产生以下汇编代码:
func():
pushq %rbp
movq %rsp, %rbp
pxor %xmm0, %xmm0
movss %xmm0, -4(%rbp)
movss .LC1(%rip), %xmm0
movss %xmm0, -12(%rbp)
movl func():
movss .LC1(%rip), %xmm1
movl 0000, %eax
pxor %xmm0, %xmm0
.L2:
addss %xmm1, %xmm0
subl , %eax
jne .L2
ret
.LC1:
.long 1076719780
, -8(%rbp)
.L3:
cmpl 999, -8(%rbp)
jg .L2
movss -4(%rbp), %xmm0
addss -12(%rbp), %xmm0
movss %xmm0, -4(%rbp)
addl , -8(%rbp)
jmp .L3
.L2:
movss -4(%rbp), %xmm0
popq %rbp
ret
.LC1:
.long 1076719780
The code for the loop is what is between ".L3" and ".L2". To be quick, we can see that the code created here is not optimized at all : a lot of memory access are made (no proper use of registers), and because of this there are a lot of wasted operations storing and reloading the result.
循环代码介于“.L3”和“.L2”之间。为快速起见,我们可以看到这里创建的代码根本没有优化:进行了大量内存访问(没有正确使用寄存器),因此有很多浪费的操作存储和重新加载结果。
This introduces an extra 5 or 6 cycles of store-forwarding latencyinto the critical path dependency chain of FP addition into a
, on modern x86 CPUs. This is on top of the 4 or 5 cycle latency of addss
, making the function more than twice as slow.
这会在现代 x86 CPU 上将额外的5 或 6 个存储转发延迟周期引入到 FP 添加到 的关键路径依赖链中a
。这是在 4 或 5 个周期延迟的基础上addss
,使函数慢了两倍多。
compiler optimization
编译器优化
The same C++ compiled with gcc 9.2, flag -O3. Produces the following assembly code:
使用 gcc 9.2 编译的相同 C++,标志 -O3。生成以下汇编代码:
constexpr float func(){
float a(0.0);
float b(2.71);
for (int i = 0; i < 100000; ++i){
a = a + b;
}
return a;
}
float call() {
return func();
}
The code is much more concise, and uses registers as much as possible.
代码更加简洁,并且尽可能多地使用寄存器。
code optimization
代码优化
A compiler optimizes code very well usually, especially C++, given that the code is expressing clearly what the programmer wants to achieve. Here we want a fixed mathematical expression to be as fast a possible, so let's change the code a bit.
考虑到代码清楚地表达了程序员想要实现的目标,编译器通常会很好地优化代码,尤其是 C++。这里我们希望一个固定的数学表达式尽可能快,所以让我们稍微改变一下代码。
call():
movss .LC0(%rip), %xmm0
ret
.LC0:
.long 1216623031
We added a constexpr to the function to tell the compiler to try to compute it's result at compile time. And added a calling function to be sure that it will generate some code.
我们在函数中添加了一个 constexpr 来告诉编译器在编译时尝试计算它的结果。并添加了一个调用函数以确保它会生成一些代码。
Compiled with gcc 9.2, -O3, leads to following assembly code :
用 gcc 9.2 编译,-O3,导致以下汇编代码:
##代码##The asm code is short, since the value returned by func has been computed at compile time, and call simply returns it.
asm 代码很短,因为 func 返回的值已经在编译时计算过,并且 call 只是简单地返回它。
Of course, a = b * 100000
would always compile to efficient asm, so only write the repeated-add loop if you need to explore FP rounding error over all those temporaries.
当然,a = b * 100000
总是会编译为高效的 asm,因此如果您需要探索所有这些临时对象的 FP 舍入错误,则仅编写重复添加循环。
回答by ai4humanity
JS of any popular runtime is compiled in C++, so like you probably can't get it to run faster than equivalent native code ... you can prove it by induction by counting from 1 by 1 to google if you want
任何流行运行时的 JS 都是用 C++ 编译的,所以就像你可能无法让它比等效的本机代码运行得更快……如果你愿意,你可以通过从 1 到 google 的归纳法来证明它