将 C int 数组重置为零:最快的方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9146395/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 12:22:57  来源:igfitidea点击:

Reset C int array to zero : the fastest way?

c++carraysmemset

提问by Vincent

Assuming that we have a T myarray[100]with T = int, unsigned int, long long int or unsigned long long int, what is the fastest way to reset all its content to zero (not only for initialization but to reset the content several times in my program)? Maybe with memset?

假设我们有一个T myarray[100]with T = int, unsigned int, long long int 或 unsigned long long int ,将其所有内容重置为零的最快方法是什么(不仅用于初始化,而且在我的程序中多次重置内容) ? 也许用memset?

Same question for a dynamic array like T *myarray = new T[100].

对于像T *myarray = new T[100].

回答by Matteo Italia

memset(from <string.h>) is probably the fastest standard way, since it's usually a routine written directly in assembly and optimized by hand.

memset(from <string.h>) 可能是最快的标准方式,因为它通常是直接用汇编编写并手动优化的例程。

memset(myarray, 0, sizeof(myarray)); // for automatically-allocated arrays
memset(myarray, 0, N*sizeof(*myarray)); // for heap-allocated arrays, where N is the number of elements


By the way, in C++ the idiomatic way would be to use std::fill(from <algorithm>):

顺便说一句,在 C++ 中,惯用的方法是使用std::fill(from <algorithm>):

std::fill(myarray, myarray+N, 0);

which maybe optimized automatically into a memset; I'm quite sure that it will work as fast as memsetfor ints, while it may perform slightly worse for smaller types if the optimizer isn't smart enough. Still, when in doubt, profile.

可以被自动优化成memset; 我敢肯定,它会以最快的速度为工作memsetintS,虽然它可能对较小的类型,如果优化器是不够聪明执行略差。尽管如此,当有疑问时,配置文件。

回答by Benjamin

This question, although rather old, needs some benchmarks, as it asks for not the most idiomatic way, or the way that can be written in the fewest number of lines, but the fastestway. And it is silly to answer that question without some actual testing. So I compared four solutions, memset vs. std::fill vs. ZERO of AnT's answer vs a solution I made using AVX intrinsics.

这个问题虽然很老,但需要一些基准测试,因为它要求的不是最惯用的方式,或者可以用最少的行数编写的方式,而是最快的方式。没有一些实际测试就回答这个问题是愚蠢的。所以我比较了四种解决方案,memset 与 std::fill 与 AnT 答案的零与我使用 AVX 内在函数制作的解决方案。

Note that this solution is not generic, it only works on data of 32 or 64 bits. Please comment if this code is doing something incorrect.

请注意,此解决方案不是通用的,它仅适用于 32 位或 64 位数据。如果此代码做错了什么,请发表评论。

#include<immintrin.h>
#define intrin_ZERO(a,n){\
size_t x = 0;\
const size_t inc = 32 / sizeof(*(a));/*size of 256 bit register over size of variable*/\
for (;x < n-inc;x+=inc)\
    _mm256_storeu_ps((float *)((a)+x),_mm256_setzero_ps());\
if(4 == sizeof(*(a))){\
    switch(n-x){\
    case 3:\
        (a)[x] = 0;x++;\
    case 2:\
        _mm_storeu_ps((float *)((a)+x),_mm_setzero_ps());break;\
    case 1:\
        (a)[x] = 0;\
        break;\
    case 0:\
        break;\
    };\
}\
else if(8 == sizeof(*(a))){\
switch(n-x){\
    case 7:\
        (a)[x] = 0;x++;\
    case 6:\
        (a)[x] = 0;x++;\
    case 5:\
        (a)[x] = 0;x++;\
    case 4:\
        _mm_storeu_ps((float *)((a)+x),_mm_setzero_ps());break;\
    case 3:\
        (a)[x] = 0;x++;\
    case 2:\
        ((long long *)(a))[x] = 0;break;\
    case 1:\
        (a)[x] = 0;\
        break;\
    case 0:\
        break;\
};\
}\
}

I will not claim that this is the fastest method, since I am not a low level optimization expert. Rather it is an example of a correct architecture dependent implementation that is faster than memset.

我不会声称这是最快的方法,因为我不是低级优化专家。相反,它是比 memset 更快的正确架构相关实现的示例。

Now, onto the results. I calculated performance for size 100 int and long long arrays, both statically and dynamically allocated, but with the exception of msvc, which did a dead code elimination on static arrays, the results were extremely comparable, so I will show only dynamic array performance. Time markings are ms for 1 million iterations, using time.h's low precision clock function.

现在,进入结果。我计算了大小为 100 的 int 和 long long 数组的性能,静态和动态分配,但 msvc 除外,它对静态数组进行了死代码消除,结果非常具有可比性,因此我将仅展示动态数组性能。时间标记为 100 万次迭代的毫秒,使用 time.h 的低精度时钟函数。

clang 3.8 (Using the clang-cl frontend, optimization flags= /OX /arch:AVX /Oi /Ot)

clang 3.8(使用 clang-cl 前端,优化标志= /OX /arch:AVX /Oi /Ot)

int:
memset:      99
fill:        97
ZERO:        98
intrin_ZERO: 90

long long:
memset:      285
fill:        286
ZERO:        285
intrin_ZERO: 188

gcc 5.1.0 (optimization flags: -O3 -march=native -mtune=native -mavx):

gcc 5.1.0(优化标志:-O3 -march=native -mtune=native -mavx):

int:
memset:      268
fill:        268
ZERO:        268
intrin_ZERO: 91
long long:
memset:      402
fill:        399
ZERO:        400
intrin_ZERO: 185

msvc 2015 (optimization flags: /OX /arch:AVX /Oi /Ot):

msvc 2015(优化标志:/OX /arch:AVX /Oi /Ot):

int
memset:      196
fill:        613
ZERO:        221
intrin_ZERO: 95
long long:
memset:      273
fill:        559
ZERO:        376
intrin_ZERO: 188

There is a lot interesting going on here: llvm killing gcc, MSVC's typical spotty optimizations (it does an impressive dead code elimination on static arrays and then has awful performance for fill). Although my implementation is significantly faster, this may only be because it recognizes that bit clearing has much less overhead than any other setting operation.

这里有很多有趣的事情:llvm 杀死 gcc,MSVC 典型的参差不齐的优化(它对静态数组进行了令人印象深刻的死代码消除,然后在填充方面具有糟糕的性能)。虽然我的实现速度明显更快,但这可能只是因为它认识到位清除的开销比任何其他设置操作都要少得多。

Clang's implementation merits more looking at, as it is significantly faster. Some additional testing shows that its memset is in fact specialized for zero--non zero memsets for 400 byte array are much slower (~220ms) and are comparable to gcc's. However, the nonzero memsetting with an 800 byte array makes no speed difference, which is probably why in that case, their memset has worse performance than my implementation--the specialization is only for small arrays, and the cuttoff is right around 800 bytes. Also note that gcc 'fill' and 'ZERO' are not optimizing to memset (looking at generated code), gcc is simply generating code with identical performance characteristics.

Clang 的实现值得更多关注,因为它明显更快。一些额外的测试表明,它的 memset 实际上专门用于零——400 字节数组的非零 memset 慢得多(~220ms)并且与 gcc 相当。但是,具有 800 字节数组的非零 memset 不会产生速度差异,这可能就是为什么在这种情况下,它们的 memset 性能比我的实现更差的原因——专门化仅适用于小数组,而截止值正好在 800 字节左右。另请注意,gcc 'fill' 和 'ZERO' 并未针对 memset 进行优化(查看生成的代码),gcc 只是生成具有相同性能特征的代码。

Conclusion: memset is not really optimized for this task as well as people would pretend it is (otherwise gcc and msvc and llvm's memset would have the same performance). If performance matters then memset should not be a final solution, especially for these awkward medium sized arrays, because it is not specialized for bit clearing, and it is not hand optimized any better than the compiler can do on its own.

结论:memset 并没有真正针对这项任务进行优化,正如人们假装的那样(否则 gcc 和 msvc 以及 llvm 的 memset 将具有相同的性能)。如果性能很重要,那么 memset 不应该是最终的解决方案,尤其是对于这些笨拙的中等大小的数组,因为它不是专门用于位清除的,并且它没有比编译器自己做的更好的手动优化。

回答by Alex Reynolds

From memset():

来自memset()

memset(myarray, 0, sizeof(myarray));

You can use sizeof(myarray)if the size of myarrayis known at compile-time. Otherwise, if you are using a dynamically-sized array, such as obtained via mallocor new, you will need to keep track of the length.

sizeof(myarray)如果myarray在编译时知道的大小,则可以使用。否则,如果您使用的是动态大小的数组,例如通过malloc或获得new,您将需要跟踪长度。

回答by AnT

You can use memset, but only because our selection of types is restricted to integral types.

您可以使用memset,但这仅仅是因为我们选择的类型仅限于整数类型。

In general case in C it makes sense to implement a macro

在一般情况下,在 C 中实现宏是有意义的

#define ZERO_ANY(T, a, n) do{\
   T *a_ = (a);\
   size_t n_ = (n);\
   for (; n_ > 0; --n_, ++a_)\
     *a_ = (T) { 0 };\
} while (0)

This will give you C++-like functionality that will let you to "reset to zeros" an array of objects of any type without having to resort to hacks like memset. Basically, this is a C analog of C++ function template, except that you have to specify the type argument explicitly.

这将为您提供类似 C++ 的功能,让您可以将任何类型的对象数组“重置为零”,而无需求助于memset. 基本上,这是 C++ 函数模板的 C 模拟,除了您必须显式指定类型参数。

On top of that you can build a "template" for non-decayed arrays

最重要的是,您可以为非衰减数组构建一个“模板”

#define ARRAY_SIZE(a) (sizeof (a) / sizeof *(a))
#define ZERO_ANY_A(T, a) ZERO_ANY(T, (a), ARRAY_SIZE(a))

In your example it would be applied as

在您的示例中,它将被应用为

int a[100];

ZERO_ANY(int, a, 100);
// or
ZERO_ANY_A(int, a);

It is also worth noting that specifically for objects of scalar types one can implement a type-independent macro

还值得注意的是,专门针对标量类型的对象可以实现一个类型无关的宏

#define ZERO(a, n) do{\
   size_t i_ = 0, n_ = (n);\
   for (; i_ < n_; ++i_)\
     (a)[i_] = 0;\
} while (0)

and

#define ZERO_A(a) ZERO((a), ARRAY_SIZE(a))

turning the above example into

把上面的例子变成

 int a[100];

 ZERO(a, 100);
 // or
 ZERO_A(a);

回答by Bruno Soares

For static declaration I think you could use:

对于静态声明,我认为您可以使用:

T myarray[100] = {0};

For dynamic declaration I suggest the same way: memset

对于动态声明,我建议采用相同的方式: memset

回答by Navin

zero(myarray);is all you need in C++.

zero(myarray);就是你在 C++ 中所需要的。

Just add this to a header:

只需将其添加到标题中:

template<typename T, size_t SIZE> inline void zero(T(&arr)[SIZE]){
    memset(arr, 0, SIZE*sizeof(T));
}

回答by Shital Shah

Here's the function I use:

这是我使用的功能:

template<typename T>
static void setValue(T arr[], size_t length, const T& val)
{
    std::fill(arr, arr + length, val);
}

template<typename T, size_t N>
static void setValue(T (&arr)[N], const T& val)
{
    std::fill(arr, arr + N, val);
}

You can call it like this:

你可以这样称呼它:

//fixed arrays
int a[10];
setValue(a, 0);

//dynamic arrays
int *d = new int[length];
setValue(d, length, 0);

Above is more C++11 way than using memset. Also you get compile time error if you use dynamic array with specifying the size.

以上是比使用 memset 更 C++11 的方式。如果使用指定大小的动态数组,也会出现编译时错误。