C语言 memset() 比 C 中的 for 循环更有效吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7367677/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 09:36:41  来源:igfitidea点击:

is memset() more efficient than for loop in C?

cperformancememset

提问by David

is memset more efficient than for loop. so if i have

memset 比 for 循环更有效。所以如果我有

char x[500];
memset(x,0,sizeof(x));

or

或者

char x[500];
for(int i = 0 ; i < 500 ; i ++) x[i] = 0;

which one is more efficient and why? is there any special instruction in hardware to do block level initialization.

哪个更有效,为什么?硬件中是否有任何特殊指令来进行块级初始化。

采纳答案by Diego Sevilla

Most certainly, memsetwill be much faster than that loop. Note how you treat one characterat a time, but those functions are so optimized that set several bytes at a time, even using, when available, MMX and SSE instructions.

最肯定的是,memset会比那个循环快得多。请注意一次处理一个字符的方式,但这些函数经过优化,可以一次设置多个字节,甚至在可用时使用 MMX 和 SSE 指令。

I think the paradigmatic example of these optimizations, that go unnoticed usually, is the GNU C library strlenfunction. One would think that it has at least O(n) performance, but it actually has O(n/4) or O(n/8) depending on the architecture (yes, I know, in big O() will be the same, but you actually get an eighthof the time). How? Tricky, but nicely: strlen.

我认为这些优化的典型例子,通常不被注意,是 GNU C 库strlen函数。有人会认为它至少具有 O(n) 性能,但它实际上具有 O(n/4) 或 O(n/8) 取决于架构(是的,我知道,在大 O() 中将是相同的,但实际上你得到了八分之一的时间)。如何?棘手,但很好:strlen

回答by Ed S.

Well, why don't we take a look at the generated assembly code, full optimization under VS 2010.

好吧,不如来看看生成的汇编代码,VS 2010下的全优化。

char x[500];
char y[500];
int i;      

memset(x, 0, sizeof(x) );   
  003A1014  push        1F4h  
  003A1019  lea         eax,[ebp-1F8h]  
  003A101F  push        0  
  003A1021  push        eax  
  003A1022  call        memset (3A1844h)  

And your loop...

还有你的循环...

char x[500];
char y[500];
int i;    

for( i = 0; i < 500; ++i )
{
    x[i] = 0;

      00E81014  push        1F4h  
      00E81019  lea         eax,[ebp-1F8h]  
      00E8101F  push        0  
      00E81021  push        eax  
      00E81022  call        memset (0E81844h)  

      /* note that this is *replacing* the loop, 
         not being called once for each iteration. */
}

So, under this compiler, the generated code is exactly the same. memsetis fast, and the compiler is smart enough to know that you are doing the same thing as calling memsetonce anyway, so it does it for you.

所以,在这个编译器下,生成的代码是完全一样的。 memset速度很快,而且编译器足够聪明,知道你在做与调用memset一次相同的事情,所以它会为你做。

If the compiler actually left the loop as-is then it would likely be slower as you can set more than one byte size block at a time (i.e., you could unroll your loop a bit at a minimum. You can assume that memsetwill be at leastas fast as a naive implementation such as the loop. Try it under a debug build and you will notice that the loop is not replaced.

如果编译器实际上让循环保持原样,那么它可能会更慢,因为您一次可以设置多个字节大小的块(即,您可以至少展开循环一点。您可以假设memset在至少像循环这样的幼稚实现一样快。在调试版本下尝试它,您会注意到循环没有被替换。

That said, it depends on what the compiler does for you. Looking at the disassembly is always a good way to know exactly what is going on.

也就是说,这取决于编译器为您做什么。查看反汇编始终是准确了解正在发生的事情的好方法。

回答by Michael

It really depends on the compiler and library. For older compilers or simple compilers, memset may be implemented in a library and would not perform better than a custom loop.

这真的取决于编译器和库。对于较旧的编译器或简单的编译器,memset 可能在库中实现,并且不会比自定义循环执行得更好。

For nearly all compilers that are worth using, memset is an intrinsic function and the compiler will generate optimized, inline code for it.

对于几乎所有值得使用的编译器,memset 是一个内在函数,编译器将为它生成优化的内联代码。

Others have suggested profiling and comparing, but I wouldn't bother. Just use memset. Code is simple and easy to understand. Don't worry about it until your benchmarks tell you this part of code is a performance hotspot.

其他人建议进行分析和比较,但我不会打扰。只需使用memset。代码简单易懂。不要担心,直到您的基准测试告诉您这部分代码是性能热点。

回答by Bobby Powers

The answer is 'it depends'. memsetMAY be more efficient, or it may internally use a for loop. I can't think of a case where memsetwill be less efficient. In this case, it may turn into a more efficient for loop: your loop iterates 500 times setting a bytes worth of the array to 0 every time. On a 64 bit machine, you could loop through, setting 8 bytes (a long long) at a time, which would be almost 8 times quicker, and just dealing with the remaining 4 bytes (500%8) at the end.

答案是“视情况而定”。 memset可能更有效,或者它可以在内部使用 for 循环。我想不出memset会降低效率的情况。在这种情况下,它可能会变成一个更有效的 for 循环:您的循环迭代 500 次,每次将数组的一个字节设置为 0。在 64 位机器上,您可以循环遍历,一次设置 8 个字节(一个 long long),这几乎快 8 倍,最后只处理剩余的 4 个字节(500%8)。

EDIT:

编辑:

in fact, this is what memsetdoes in glibc:

事实上,这就是memsetglibc 的作用:

http://repo.or.cz/w/glibc.git/blob/HEAD:/string/memset.c

http://repo.or.cz/w/glibc.git/blob/HEAD:/string/memset.c

As Michael pointed out, in certain cases (where the array length is known at compile time), the C compiler can inline memset, getting rid of the overhead of the function call. Glibc also has assembly optimized versions of memsetfor most major platforms, like amd64:

正如迈克尔指出的那样,在某些情况下(在编译时已知数组长度),C 编译器可以内联memset,从而消除函数调用的开销。Glibc 还memset为大多数主要平台提供了汇编优化版本,例如 amd64:

http://repo.or.cz/w/glibc.git/blob/HEAD:/sysdeps/x86_64/memset.S

http://repo.or.cz/w/glibc.git/blob/HEAD:/sysdeps/x86_64/memset.S

回答by Stephen Canon

Good compilers will recognize the for loop and replace it with either an optimal inline sequence or a call to memset. They will also replace memset with an optimal inline sequence when the buffer size is small.

好的编译器会识别 for 循环并将其替换为最佳内联序列或对 memset 的调用。当缓冲区大小较小时,它们还将用最佳内联序列替换 memset。

In practice, with an optimizing compiler the generated code (and therefore performance) will be identical.

实际上,使用优化编译器生成的代码(以及因此的性能)将是相同的。

回答by beetree

Agree with above. It depends. But, for sure memset is faster or equal to the for-loop. If you are uncertain of your environment or too lazy to test, take the safe route and go with memset.

同意楼上的。这取决于。但是,肯定 memset 更快或等于 for 循环。如果您不确定您的环境或懒得测试,请走安全路线并使用 memset。

回答by puchu

void fill_array(void* array, size_t size_of_item, size_t length, void* value) {
  uint8_t* bytes      = value;
  uint8_t  first_byte = bytes[0];

  if (size_of_item == 1) {
    memset(array, first_byte, length);
    return;
  }

  // size_of_item > 1 here.
  bool all_bytes_are_identical = true;

  for (size_t byte_index = 1; byte_index < size_of_item; byte_index++) {
    if (bytes[byte_index] != first_byte) {
      all_bytes_are_identical = false;
      break;
    }
  }

  if (all_bytes_are_identical) {
    memset(array, first_byte, size_of_item * length);
    return;
  }

  for (size_t index = 0; index < length; index++) {
    memcpy((uint8_t*)array + size_of_item * index, value, size_of_item);
  }
}

memsetis more efficient, it shouldn't care about non symmetric values (where all_bytes_are_identicalis false). So you will search how to wrap it.

memset效率更高,它不应该关心非对称值(其中all_bytes_are_identicalfalse)。因此,您将搜索如何包装它。

This is my variant. It is working for both little and big endian systems.

这是我的变种。它适用于小端和大端系统。