C语言 memcpy 的内部实现是如何工作的?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17498743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does the internal implementation of memcpy work?
提问by hddh
How does the standard C function 'memcpy' work? It has to copy a (large) chunk of RAM to another area in the RAM. Since I know you cannot move straight from RAM to RAM in assembly (with the mov instruction) so I am guessing it uses a CPU register as the intermediate memory when copying?
标准 C 函数“memcpy”如何工作?它必须将一个(大)RAM 块复制到 RAM 中的另一个区域。因为我知道你不能在汇编中直接从 RAM 移动到 RAM(使用 mov 指令)所以我猜它在复制时使用 CPU 寄存器作为中间内存?
But how does it copy? By blocks (how would it copy by blocks?), by individual bytes (char) or the largest data type they have (copy in long long double's - which is 12 bytes on my system).
但它是如何复制的?按块(它如何按块复制?),按单个字节(char)或它们拥有的最大数据类型(以 long long double 复制 - 在我的系统上为 12 个字节)。
EDIT: Ok apparently you can move data from RAM to RAM directly, I am not an assembly expert and all I have learnt about assembly is from this document (X86 assembly guide) which mentions in the section about the mov instruction that you cannot move from RAM to RAM. Apparently this isn't true.
编辑:好的,显然您可以直接将数据从 RAM 移动到 RAM,我不是汇编专家,我所了解的关于汇编的所有知识都来自此文档(X86 汇编指南),该文档在有关无法从中移动的 mov 指令的部分中提到内存到内存。显然这不是真的。
采纳答案by Gian
Depends. In general, you couldn't physically copy anything larger than the largest usable register in a single cycle, but that's not really how machines work these days. In practice, you really care less about what the CPU is doing and more about the characteristics of DRAM. The memory hierarchy of the machine is going to play a crucial determining role in performing this copy in the fastest possible manner (e.g., are you loading whole cache-lines? What's the size of a DRAM row with respect to the copy operation?). An implementation might instead choose to use some kind of vector instructions to implement memcpy. Without reference to a specific implementation, it's effectively a byte-for-byte copy with a one-place buffer.
要看。通常,您无法在单个周期内物理复制大于最大可用寄存器的任何内容,但这并不是当今机器的真正工作方式。在实践中,你真正关心的不是 CPU 在做什么,而是更多地关心 DRAM 的特性。机器的内存层次将在以尽可能快的方式执行此复制方面发挥关键的决定作用(例如,您是否加载整个缓存行?与复制操作相关的 DRAM 行的大小是多少?)。实现可能会选择使用某种向量指令来实现memcpy。没有参考特定的实现,它实际上是一个带有单位缓冲区的逐字节复制。
Here's a fun articlethat describes one person's adventure into optimizing memcpy. The main take-home point is that it is always going to be targeted to a specific architecture and environment based on the instructions you can execute inexpensively.
这是一篇有趣的文章,描述了一个人在优化memcpy. 主要的要点是,它总是会根据您可以廉价执行的指令针对特定的架构和环境。
回答by dasblinkenlight
The implementation of memcpyis highly specific to the system in which it is implemented. Implementations are often hardware-assisted.
的实现与memcpy实现它的系统高度相关。实现通常是硬件辅助的。
Memory-to-memory mov instructions are not that uncommon - they have been around since at least PDP-11times, when you could write something like this:
内存到内存的 mov 指令并不少见——它们至少从那时起就已经存在了PDP-11,那时你可以写这样的东西:
MOV FROM, R2
MOV TO, R3
MOV R2, R4
ADD LEN, R4
CP: MOV (R2+), (R3+) ; "(Rx+)" means "*Rx++" in C
CMP R2, R4
BNE CP
The commented line is roughly equivalent to C's
注释行大致相当于 C 的
*to++ = *from++;
Contemporary CPUs have instructions that implement memcpydirectly: you load special registers with the source and destination addresses, invoke a memory copy command, and let CPU do the rest.
当代 CPU 具有memcpy直接实现的指令:您将源地址和目标地址加载到特殊寄存器中,调用内存复制命令,然后让 CPU 完成剩下的工作。
回答by ouah
A trivial implementation of memcpyis:
一个简单的实现memcpy是:
while (n--) *s2++ = *s1++;
But glibcusually uses some clever implementations in assembly code. memcpycalls are usually inlined.
但glibc通常在汇编代码中使用一些巧妙的实现。memcpy调用通常是内联的。
On x86, the code checks if the size parameter is a literal multiple of 2or a multiple of 4(using gccbuiltins functions) and uses a loop with movlinstruction (copy 4bytes) otherwise it calls the general case.
在 x86 上,代码检查 size 参数是字面量的倍数 2还是倍数4(使用gcc内置函数),并使用带有movl指令的循环(复制4字节),否则调用一般情况。
The general case uses the fast block copy assembly using repand movslinstructions.
一般情况下使用快速块复制程序集使用rep和movsl指令。

