在什么情况下我应该使用 memcpy 而不是 C++ 中的标准运算符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4544804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In what cases should I use memcpy over standard operators in C++?
提问by Patryk Czachurski
When can I get better performance using memcpy
or how do I benefit from using it?
For example:
我什么时候可以memcpy
使用它获得更好的性能,或者我如何从使用中受益?例如:
float a[3]; float b[3];
is code:
是代码:
memcpy(a, b, 3*sizeof(float));
fasterthan this one?
比这个还快?
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
回答by Martin York
Efficiency should not be your concern.
Write clean maintainable code.
效率不应该是你关心的问题。
编写干净的可维护代码。
It bothers me that so many answers indicate that the memcpy() is inefficient. It is designed to be the most efficient way of copy blocks of memory (for C programs).
令我困扰的是,这么多答案表明 memcpy() 效率低下。它被设计为复制内存块的最有效方式(对于 C 程序)。
So I wrote the following as a test:
所以我写了以下作为测试:
#include <algorithm>
extern float a[3];
extern float b[3];
extern void base();
int main()
{
base();
#if defined(M1)
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
#elif defined(M2)
memcpy(a, b, 3*sizeof(float));
#elif defined(M3)
std::copy(&a[0], &a[3], &b[0]);
#endif
base();
}
Then to compare the code produces:
然后比较代码产生:
g++ -O3 -S xr.cpp -o s0.s
g++ -O3 -S xr.cpp -o s1.s -DM1
g++ -O3 -S xr.cpp -o s2.s -DM2
g++ -O3 -S xr.cpp -o s3.s -DM3
echo "=======" > D
diff s0.s s1.s >> D
echo "=======" >> D
diff s0.s s2.s >> D
echo "=======" >> D
diff s0.s s3.s >> D
This resulted in: (comments added by hand)
这导致:(手动添加评论)
======= // Copy by hand
10a11,18
> movq _a@GOTPCREL(%rip), %rcx
> movq _b@GOTPCREL(%rip), %rdx
> movl (%rdx), %eax
> movl %eax, (%rcx)
> movl 4(%rdx), %eax
> movl %eax, 4(%rcx)
> movl 8(%rdx), %eax
> movl %eax, 8(%rcx)
======= // memcpy()
10a11,16
> movq _a@GOTPCREL(%rip), %rcx
> movq _b@GOTPCREL(%rip), %rdx
> movq (%rdx), %rax
> movq %rax, (%rcx)
> movl 8(%rdx), %eax
> movl %eax, 8(%rcx)
======= // std::copy()
10a11,14
> movq _a@GOTPCREL(%rip), %rsi
> movl , %edx
> movq _b@GOTPCREL(%rip), %rdi
> call _memmove
Added Timing results for running the above inside a loop of 1000000000
.
添加了在1000000000
.
g++ -c -O3 -DM1 X.cpp
g++ -O3 X.o base.o -o m1
g++ -c -O3 -DM2 X.cpp
g++ -O3 X.o base.o -o m2
g++ -c -O3 -DM3 X.cpp
g++ -O3 X.o base.o -o m3
time ./m1
real 0m2.486s
user 0m2.478s
sys 0m0.005s
time ./m2
real 0m1.859s
user 0m1.853s
sys 0m0.004s
time ./m3
real 0m1.858s
user 0m1.851s
sys 0m0.006s
回答by crazylammer
You can use memcpy
only if the objects you're copying have no explicit constructors, so as their members (so-called POD, "Plain Old Data"). So it is OK to call memcpy
for float
, but it is wrong for, e.g., std::string
.
memcpy
仅当您复制的对象没有显式构造函数时才能使用,因此它们的成员(所谓的 POD,“Plain Old Data”)。所以调用 是可以memcpy
的float
,但是调用是错误的,例如,std::string
。
But part of the work has already been done for you: std::copy
from <algorithm>
is specialized for built-in types (and possibly for every other POD-type - depends on STL implementation). So writing std::copy(a, a + 3, b)
is as fast (after compiler optimization) as memcpy
, but is less error-prone.
但是已经为您完成了部分工作:std::copy
from<algorithm>
专门用于内置类型(并且可能适用于所有其他 POD 类型 - 取决于 STL 实现)。因此,编写std::copy(a, a + 3, b)
与 一样快(在编译器优化之后)memcpy
,但不易出错。
回答by ismail
Compilers specifically optimize memcpy
calls, at least clang & gcc does. So you should prefer it wherever you can.
编译器专门优化memcpy
调用,至少 clang 和 gcc 是这样。所以你应该尽可能喜欢它。
回答by Paul R
Don't go for premature micro-optimisations such as using memcpy like this. Using assignment is clearer and less error-prone and any decent compiler will generate suitably efficient code. If, and only if, you have profiled the code and found the assignments to be a significant bottleneck then you can consider some kind of micro-optimisation, but in general you should always write clear, robust code in the first instance.
不要过早地进行微优化,例如像这样使用 memcpy。使用赋值更清晰且不易出错,任何体面的编译器都会生成适当高效的代码。如果且仅当您分析了代码并发现分配是一个重要的瓶颈,那么您可以考虑某种微优化,但一般来说,您应该始终首先编写清晰、健壮的代码。
回答by Thanatos
Use std::copy()
. As the header file for g++
notes:
使用std::copy()
. 作为g++
笔记的头文件:
This inline function will boil down to a call to @c memmove whenever possible.
只要有可能,这个内联函数将归结为对@c memmove 的调用。
Probably, Visual Studio's is not much different. Go with the normal way, and optimize once you're aware of a bottle neck. In the case of a simple copy, the compiler is probably already optimizing for you.
可能,Visual Studio 的差别不大。按照常规方式进行,并在意识到瓶颈后进行优化。在简单副本的情况下,编译器可能已经在为您优化了。
回答by Jamie
The benefits of memcpy? Probably readability. Otherwise, you would have to either do a number of assignments or have a for loop for copying, neither of which are as simple and clear as just doing memcpy (of course, as long as your types are simple and don't require construction/destruction).
memcpy 的好处?大概是可读性。否则,您将不得不进行一些分配或使用 for 循环进行复制,这两者都不像仅执行 memcpy 那样简单明了(当然,只要您的类型简单且不需要构造/破坏)。
Also, memcpy is generally relatively optimized for specific platforms, to the point that it won't be all that much slower than simple assignment, and may even be faster.
此外,memcpy 通常针对特定平台进行了相对优化,以至于它不会比简单赋值慢多少,甚至可能更快。
回答by Simone
Supposedly, as Nawaz said, the assignment version shouldbe faster on most platform. That's because memcpy()
will copy byte by byte while the second version could copy 4 bytes at a time.
据说,正如 Nawaz 所说,分配版本在大多数平台上应该更快。那是因为memcpy()
将逐字节复制,而第二个版本一次可以复制 4 个字节。
As it's always the case, you should always profile applications to be sure that what you expect to be the bottleneck matches the reality.
与往常一样,您应该始终对应用程序进行概要分析,以确保您期望的瓶颈与现实相匹配。
Edit
Same applies to dynamic array. Since you mention C++ you should use std::copy()
algorithm in that case.
编辑
同样适用于动态数组。既然你提到了 C++,你应该std::copy()
在这种情况下使用算法。
Edit
This is code output for Windows XP with GCC 4.5.0, compiled with -O3 flag:
编辑
这是带有 GCC 4.5.0 的 Windows XP 的代码输出,使用 -O3 标志编译:
extern "C" void cpy(float* d, float* s, size_t n)
{
memcpy(d, s, sizeof(float)*n);
}
I have done this function because OP specified dynamic arrays too.
我已经完成了这个功能,因为 OP 也指定了动态数组。
Output assembly is the following:
输出汇编如下:
_cpy:
LFB393:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
pushl %edi
LCFI2:
pushl %esi
LCFI3:
movl 8(%ebp), %eax
movl 12(%ebp), %esi
movl 16(%ebp), %ecx
sall , %ecx
movl %eax, %edi
rep movsb
popl %esi
LCFI4:
popl %edi
LCFI5:
leave
LCFI6:
ret
of course, I assume all of the experts here knows what rep movsb
means.
当然,我假设这里的所有专家都知道是什么rep movsb
意思。
This is the assignment version:
这是作业版本:
extern "C" void cpy2(float* d, float* s, size_t n)
{
while (n > 0) {
d[n] = s[n];
n--;
}
}
which yields the following code:
这产生以下代码:
_cpy2:
LFB394:
pushl %ebp
LCFI7:
movl %esp, %ebp
LCFI8:
pushl %ebx
LCFI9:
movl 8(%ebp), %ebx
movl 12(%ebp), %ecx
movl 16(%ebp), %eax
testl %eax, %eax
je L2
.p2align 2,,3
L5:
movl (%ecx,%eax,4), %edx
movl %edx, (%ebx,%eax,4)
decl %eax
jne L5
L2:
popl %ebx
LCFI10:
leave
LCFI11:
ret
Which moves 4 bytes at a time.
一次移动 4 个字节。