C++ 什么更有效?使用 pow 平方或只是乘以它自己?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2940367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is more efficient? Using pow to square or just multiply it with itself?
提问by
What of these two methods is in C more efficient? And how about:
这两种方法在 C 中哪个更有效?怎么样:
pow(x,3)
vs.
对比
x*x*x // etc?
采纳答案by Emile Cormier
I tested the performance difference between x*x*...
vs pow(x,i)
for small i
using this code:
我使用以下代码测试了x*x*...
vs pow(x,i)
for small之间的性能差异i
:
#include <cstdlib>
#include <cmath>
#include <boost/date_time/posix_time/posix_time.hpp>
inline boost::posix_time::ptime now()
{
return boost::posix_time::microsec_clock::local_time();
}
#define TEST(num, expression) \
double test##num(double b, long loops) \
{ \
double x = 0.0; \
\
boost::posix_time::ptime startTime = now(); \
for (long i=0; i<loops; ++i) \
{ \
x += expression; \
x += expression; \
x += expression; \
x += expression; \
x += expression; \
x += expression; \
x += expression; \
x += expression; \
x += expression; \
x += expression; \
} \
boost::posix_time::time_duration elapsed = now() - startTime; \
\
std::cout << elapsed << " "; \
\
return x; \
}
TEST(1, b)
TEST(2, b*b)
TEST(3, b*b*b)
TEST(4, b*b*b*b)
TEST(5, b*b*b*b*b)
template <int exponent>
double testpow(double base, long loops)
{
double x = 0.0;
boost::posix_time::ptime startTime = now();
for (long i=0; i<loops; ++i)
{
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
x += std::pow(base, exponent);
}
boost::posix_time::time_duration elapsed = now() - startTime;
std::cout << elapsed << " ";
return x;
}
int main()
{
using std::cout;
long loops = 100000000l;
double x = 0.0;
cout << "1 ";
x += testpow<1>(rand(), loops);
x += test1(rand(), loops);
cout << "\n2 ";
x += testpow<2>(rand(), loops);
x += test2(rand(), loops);
cout << "\n3 ";
x += testpow<3>(rand(), loops);
x += test3(rand(), loops);
cout << "\n4 ";
x += testpow<4>(rand(), loops);
x += test4(rand(), loops);
cout << "\n5 ";
x += testpow<5>(rand(), loops);
x += test5(rand(), loops);
cout << "\n" << x << "\n";
}
Results are:
结果是:
1 00:00:01.126008 00:00:01.128338
2 00:00:01.125832 00:00:01.127227
3 00:00:01.125563 00:00:01.126590
4 00:00:01.126289 00:00:01.126086
5 00:00:01.126570 00:00:01.125930
2.45829e+54
Note that I accumulate the result of every pow calculation to make sure the compiler doesn't optimize it away.
请注意,我累积了每个 pow 计算的结果,以确保编译器不会对其进行优化。
If I use the std::pow(double, double)
version, and loops = 1000000l
, I get:
如果我使用std::pow(double, double)
版本和loops = 1000000l
,我会得到:
1 00:00:00.011339 00:00:00.011262
2 00:00:00.011259 00:00:00.011254
3 00:00:00.975658 00:00:00.011254
4 00:00:00.976427 00:00:00.011254
5 00:00:00.973029 00:00:00.011254
2.45829e+52
This is on an Intel Core Duo running Ubuntu 9.10 64bit. Compiled using gcc 4.4.1 with -o2 optimization.
这是在运行 Ubuntu 9.10 64 位的 Intel Core Duo 上。使用带有 -o2 优化的 gcc 4.4.1 编译。
So in C, yes x*x*x
will be faster than pow(x, 3)
, because there is no pow(double, int)
overload. In C++, it will be the roughly same. (Assuming the methodology in my testing is correct.)
所以在 C 中, yesx*x*x
会比 快pow(x, 3)
,因为没有pow(double, int)
过载。在 C++ 中,它大致相同。(假设我的测试方法是正确的。)
This is in response to the comment made by An Markm:
这是对 An Markm 的评论的回应:
Even if a using namespace std
directive was issued, if the second parameter to pow
is an int
, then the std::pow(double, int)
overload from <cmath>
will be called instead of ::pow(double, double)
from <math.h>
.
即使using namespace std
发出指令,如果 to 的第二个参数pow
是 an int
,则将调用from而不是from的std::pow(double, int)
重载。<cmath>
::pow(double, double)
<math.h>
This test code confirms that behavior:
此测试代码确认了该行为:
#include <iostream>
namespace foo
{
double bar(double x, int i)
{
std::cout << "foo::bar\n";
return x*i;
}
}
double bar(double x, double y)
{
std::cout << "::bar\n";
return x*y;
}
using namespace foo;
int main()
{
double a = bar(1.2, 3); // Prints "foo::bar"
std::cout << a << "\n";
return 0;
}
回答by sbi
That's the wrong kind of question. The right question would be: "Which one is easier to understand for human readers of my code?"
这是一个错误的问题。正确的问题是:“对于我的代码的人类读者来说,哪个更容易理解?”
If speed matters (later), don't ask, but measure. (And before that, measure whether optimizing this actually will make any noticeable difference.) Until then, write the code so that it is easiest to read.
如果速度很重要(稍后),不要问,而是要测量。(在此之前,衡量优化这是否真的会产生任何显着的差异。)在那之前,编写代码使其最容易阅读。
Edit
Just to make this clear (although it already should have been): Breakthrough speedups usually come from things like using better algorithms, improving locality of data, reducing the use of dynamic memory, pre-computing results, etc. They rarely ever come from micro-optimizing single function calls, and where they do, they do so in very few places, which would only be found by careful(and time-consuming) profiling, more often than never they can be sped up by doing very non-intuitive things (like inserting noop
statements), and what's an optimization for one platform is sometimes a pessimization for another (which is why you need to measure, instead of asking, because we don't fully know/have your environment).
编辑
只是为了澄清这一点(尽管它应该已经是):突破性加速通常来自于使用更好的算法、改进数据的局部性、减少动态内存的使用、预计算结果等。它们很少来自微优化单个函数调用,并且在它们执行的地方,它们在很少的地方执行,这只能通过仔细(且耗时)的分析发现,通常情况下,它们可以通过执行非常非直观的操作来加速东西(比如插入noop
陈述),并且对一个平台的优化有时是对另一个平台的悲观(这就是为什么您需要衡量而不是询问,因为我们不完全了解/拥有您的环境)。
Let me underline this again: Even in the few applications where such things matter, they don't matter in most places they're used, and it is veryunlikely that you will find the places where they matter by looking at the code.You really do need to identify the hot spots first, because otherwise optimizing code is just a waste of time.
让我再次强调这一点:即使在这样的事情重要的几个应用程序,他们不事在他们使用最多的地方,而这是非常不可能的,你会发现,他们通过查看代码问题的地方。您确实需要首先确定热点,否则优化代码只是浪费时间。
Even if a single operation (like computing the square of some value) takes up 10% of the application's execution time(which IME is quite rare), and even if optimizing it saves 50% of the timenecessary for that operation (which IME is even much, much rarer), you still made the application take only 5% less time.
Your users will need a stopwatch to even notice that. (I guess in most cases anything under 20% speedup goes unnoticed for most users. And thatis four such spots you need to find.)
即使单个操作(如计算某个值的平方)占用应用程序执行时间的 10%(IME 非常罕见),即使优化它也节省了该操作所需时间的 50%(IME 是甚至更罕见),您仍然使应用程序花费的时间减少了 5%。
您的用户甚至需要一个秒表才能注意到这一点。(我想在大多数情况下,对于大多数用户来说,任何低于 20% 的加速都不会被注意到。这是您需要找到的四个这样的点。)
回答by Puppy
x*x
or x*x*x
will be faster than pow
, since pow
must deal with the general case, whereas x*x
is specific. Also, you can elide the function call and suchlike.
x*x
orx*x*x
将比 快pow
,因为pow
必须处理一般情况,而x*x
是特定的。此外,您可以省略函数调用等。
However, if you find yourself micro-optimizing like this, you need to get a profiler and do some serious profiling. The overwhelming probability is that you would never notice any difference between the two.
但是,如果您发现自己像这样进行微优化,则需要使用分析器并进行一些认真的分析。压倒性的可能性是您永远不会注意到两者之间的任何差异。
回答by jdtournier
I was also wondering about the performance issue, and was hoping this would be optimised out by the compiler, based on the answer from @EmileCormier. However, I was worried that the test code he showed would still allow the compiler to optimise away the std::pow() call, since the same values were used in the call every time, which would allow the compiler to store the results and re-use it in the loop - this would explain the almost identical run-times for all cases. So I had a look into it too.
我也想知道性能问题,并希望编译器可以根据@EmileCormier 的回答对此进行优化。但是,我担心他展示的测试代码仍然会允许编译器优化掉 std::pow() 调用,因为每次调用中都使用相同的值,这将允许编译器存储结果和在循环中重新使用它 - 这将解释所有情况下几乎相同的运行时间。所以我也研究了一下。
Here's the code I used (test_pow.cpp):
这是我使用的代码(test_pow.cpp):
#include <iostream>
#include <cmath>
#include <chrono>
class Timer {
public:
explicit Timer () : from (std::chrono::high_resolution_clock::now()) { }
void start () {
from = std::chrono::high_resolution_clock::now();
}
double elapsed() const {
return std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - from).count() * 1.0e-6;
}
private:
std::chrono::high_resolution_clock::time_point from;
};
int main (int argc, char* argv[])
{
double total;
Timer timer;
total = 0.0;
timer.start();
for (double i = 0.0; i < 1.0; i += 1e-8)
total += std::pow (i,2);
std::cout << "std::pow(i,2): " << timer.elapsed() << "s (result = " << total << ")\n";
total = 0.0;
timer.start();
for (double i = 0.0; i < 1.0; i += 1e-8)
total += i*i;
std::cout << "i*i: " << timer.elapsed() << "s (result = " << total << ")\n";
std::cout << "\n";
total = 0.0;
timer.start();
for (double i = 0.0; i < 1.0; i += 1e-8)
total += std::pow (i,3);
std::cout << "std::pow(i,3): " << timer.elapsed() << "s (result = " << total << ")\n";
total = 0.0;
timer.start();
for (double i = 0.0; i < 1.0; i += 1e-8)
total += i*i*i;
std::cout << "i*i*i: " << timer.elapsed() << "s (result = " << total << ")\n";
return 0;
}
This was compiled using:
这是使用以下编译的:
g++ -std=c++11 [-O2] test_pow.cpp -o test_pow
Basically, the difference is the argument to std::pow() is the loop counter. As I feared, the difference in performance is pronounced. Without the -O2 flag, the results on my system (Arch Linux 64-bit, g++ 4.9.1, Intel i7-4930) were:
基本上,区别在于 std::pow() 的参数是循环计数器。正如我所担心的那样,性能差异很明显。如果没有 -O2 标志,我的系统(Arch Linux 64 位,g++ 4.9.1,Intel i7-4930)上的结果是:
std::pow(i,2): 0.001105s (result = 3.33333e+07)
i*i: 0.000352s (result = 3.33333e+07)
std::pow(i,3): 0.006034s (result = 2.5e+07)
i*i*i: 0.000328s (result = 2.5e+07)
With optimisation, the results were equally striking:
通过优化,结果同样惊人:
std::pow(i,2): 0.000155s (result = 3.33333e+07)
i*i: 0.000106s (result = 3.33333e+07)
std::pow(i,3): 0.006066s (result = 2.5e+07)
i*i*i: 9.7e-05s (result = 2.5e+07)
So it looks like the compiler does at least try to optimise the std::pow(x,2) case, but not the std::pow(x,3) case (it takes ~40 times longer than the std::pow(x,2) case). In all cases, manual expansion performed better - but particularly for the power 3 case (60 times quicker). This is definitely worth bearing in mind if running std::pow() with integer powers greater than 2 in a tight loop...
所以看起来编译器至少会尝试优化 std::pow(x,2) 的情况,而不是 std::pow(x,3) 的情况(它比 std::pow 花费的时间长约 40 倍(x,2) 情况)。在所有情况下,手动扩展都表现得更好 - 但特别是对于 power 3 情况(快 60 倍)。如果在紧密循环中以大于 2 的整数幂运行 std::pow() ,这绝对值得牢记......
回答by mhaghighat
The most efficient way is to consider the exponential growth of the multiplications. Check this code for p^q:
最有效的方法是考虑乘法的指数增长。检查 p^q 的代码:
template <typename T>
T expt(T p, unsigned q){
T r =1;
while (q != 0) {
if (q % 2 == 1) { // if q is odd
r *= p;
q--;
}
p *= p;
q /= 2;
}
return r;
}
回答by mhaghighat
If the exponent is constant and small, expand it out, minimizing the number of multiplications. (For example, x^4
is not optimally x*x*x*x
, but y*y
where y=x*x
. And x^5
is y*y*x
where y=x*x
. And so on.) For constant integer exponents, just write out the optimized form already; with small exponents, this is a standard optimization that should be performed whether the code has been profiled or not. The optimized form will be quicker in so large a percentage of cases that it's basically always worth doing.
如果指数是常数且很小,则将其展开,尽量减少乘法次数。(例如,x^4
不是最优的x*x*x*x
,而是y*y
哪里y=x*x
。而且x^5
是y*y*x
哪里y=x*x
。等等。)对于常数整数指数,只需写出优化形式;对于小指数,这是一个标准的优化,无论代码是否已被分析,都应该执行。在如此大的百分比的情况下,优化的形式会更快,基本上总是值得做的。
(If you use Visual C++, std::pow(float,int)
performs the optimization I allude to, whereby the sequence of operations is related to the bit pattern of the exponent. I make no guarantee that the compiler will unroll the loop for you, though, so it's still worth doing it by hand.)
(如果您使用 Visual C++,请std::pow(float,int)
执行我提到的优化,即操作序列与指数的位模式相关。不过,我不保证编译器会为您展开循环,因此仍然值得做亲手制作。)
[edit] BTW pow
has a (un)surprising tendency to crop up on the profiler results. If you don't absolutely need it (i.e., the exponent is large or not a constant), and you're at all concerned about performance, then best to write out the optimal code and wait for the profiler to tell you it's (surprisingly) wasting time before thinking further. (The alternative is to call pow
and have the profiler tell you it's (unsurprisingly) wasting time -- you're cutting out this step by doing it intelligently.)
[编辑] 顺便说一句pow
,在分析器结果中突然出现(不)令人惊讶的趋势。如果您不是绝对需要它(即指数很大或不是常数),并且您完全关心性能,那么最好写出最佳代码并等待分析器告诉您它(令人惊讶的是) 在进一步思考之前浪费时间。(另一种方法是打电话pow
并让分析器告诉您这(不出所料)是在浪费时间——您正在通过明智地执行此步骤来减少这一步。)
回答by Camion
I have been busy with a similar problem, and I'm quite puzzled by the results. I was calculating x?3/2 for Newtonian gravitation in an n-bodies situation (acceleration undergone from another body of mass M situated at a distance vector d) : a = M G d*(d2)?3/2
(where d2 is the dot (scalar) product of d by itself) , and I thought calculating M*G*pow(d2, -1.5)
would be simpler than M*G/d2/sqrt(d2)
我一直在忙于解决类似的问题,对结果感到非常困惑。我正在计算 n 体情况下牛顿引力的 x?3/2(来自位于距离向量 d 处的另一个质量为 M 的物体的加速度):(a = M G d*(d2)?3/2
其中 d2 是 d 本身的点(标量)乘积),我认为计算M*G*pow(d2, -1.5)
会比M*G/d2/sqrt(d2)
The trick is that it is true for small systems, but as systems grow in size, M*G/d2/sqrt(d2)
becomes more efficient and I don't understand why the size of the system impacts this result, because repeating the operation on different data does not. It is as if there were possible optimizations as the system grow, but which are not possible with pow
诀窍是对于小型系统来说确实如此,但是随着系统规模的增长,M*G/d2/sqrt(d2)
效率会变得更高,我不明白为什么系统的规模会影响这个结果,因为对不同的数据重复操作不会。就好像随着系统的增长有可能的优化,但这是不可能的pow