C++ 哪个更快/首选:memset 或 for 循环将双精度数组归零?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1373369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 19:45:03  来源:igfitidea点击:

Which is faster/preferred: memset or for loop to zero out an array of doubles?

c++cperformanceoptimization

提问by vehomzzz

double d[10];
int length = 10;

memset(d, length * sizeof(double), 0);

//or

for (int i = length; i--;)
  d[i] = 0.0;

回答by sharptooth

If you really care you should try and measure. However the most portable way is using std::fill():

如果你真的在乎,你应该尝试衡量。然而,最便携的方法是使用 std::fill():

std::fill( array, array + numberOfElements, 0.0 );

回答by codymanix

Note that for memset you have to pass the number of bytes, not the number of elements because this is an old C function:

请注意,对于 memset,您必须传递字节数,而不是元素数,因为这是一个旧的 C 函数:

memset(d, 0, sizeof(double)*length);

memset canbe faster since it is written in assembler, whereas std::fillis a template function which simply does a loop internally.

memset可以更快,因为它是用汇编程序编写的,而它std::fill是一个模板函数,它只是在内部执行循环。

But for type safety and more readable code I would recommendstd::fill()- it is the c++ way of doing things, and consider memsetif a performance optimization is needed at this place in the code.

但是为了类型安全和更易读的代码,我会推荐std::fill()- 它是 C++ 的做事方式,并考虑memset在代码中的这个地方是否需要性能优化。

回答by fortran

Try this, if only to be cool xD

试试这个,如果只是为了酷 xD

{
    double *to = d;
    int n=(length+7)/8;
    switch(length%8){
        case 0: do{ *to++ = 0.0;
        case 7:     *to++ = 0.0;
        case 6:     *to++ = 0.0;
        case 5:     *to++ = 0.0;
        case 4:     *to++ = 0.0;
        case 3:     *to++ = 0.0;
        case 2:     *to++ = 0.0;
        case 1:     *to++ = 0.0;
        }while(--n>0);
    }
}

回答by fortran

In addition to the several bugs and omissions in your code, using memset is not portable. You can't assume that a double with all zero bits is equal to 0.0. First make your code correct, then worry about optimizing.

除了代码中的几个错误和遗漏之外,使用 memset 是不可移植的。您不能假设全零位的 double 等于 0.0。首先让你的代码正确,然后再考虑优化。

回答by MSalters

Assuming the loop length is an integral constant expression, the most probable outcome it that a good optimizer will recognize both the for-loop and the memset(0). The result would be that the assembly generated is essentially equal. Perhaps the choice of registers could differ, or the setup. But the marginal costs per double should really be the same.

假设循环长度是一个整数常量表达式,一个好的优化器最有可能的结果是同时识别 for 循环和 memset(0)。结果将是生成的程序集基本相等。也许寄存器的选择或设置可能不同。但是每双的边际成本应该是一样的。

回答by Michael Krelin - hacker

memset(d,0,10*sizeof(*d));

is likely to be faster. Like they say you can also

可能会更快。就像他们说的,你也可以

std::fill_n(d,10,0.);

but it is most likely a prettier way to do the loop.

但这很可能是一种更漂亮的循环方式。

回答by user57368

calloc(length, sizeof(double))

According to IEEE-754, the bit representation of a positive zero is all zero bits, and there's nothing wrong with requiring IEEE-754 compliance. (If you need to zero out the array to reuse it, then pick one of the above solutions).

根据 IEEE-754,正零的位表示是全零位,并且要求符合 IEEE-754 没有任何问题。(如果您需要将数组清零以重新使用它,请选择上述解决方案之一)。

回答by Omnifarious

According to this Wikipedia article on IEEE 754-1975 64-bit floating pointa bit pattern of all 0s will indeed properly initialize a double to 0.0. Unfortunately your memset code doesn't do that.

根据这篇关于IEEE 754-1975 64 位浮点数的维基百科文章,全 0 的位模式确实可以正确地将双精度初始化为 0.0。不幸的是,您的 memset 代码并没有这样做。

Here is the code you ought to be using:

这是您应该使用的代码:

memset(d, 0, length * sizeof(double));

As part of a more complete package...

作为更完整包的一部分...

{
    double *d;
    int length = 10;
    d = malloc(sizeof(d[0]) * length);
    memset(d, 0, length * sizeof(d[0]));
}

Of course, that's dropping the error checking you should be doing on the return value of malloc. sizeof(d[0])is slightly better than sizeof(double)because it's robust against changes in the type of d.

当然,这放弃了您应该对 malloc 的返回值进行的错误检查。sizeof(d[0])略好于sizeof(double)因为它对 d 类型的变化具有鲁棒性。

Also, if you use calloc(length, sizeof(d[0]))it will clear the memory for you and the subsequent memset will no longer be necessary. I didn't use it in the example because then it seems like your question wouldn't be answered.

此外,如果您使用calloc(length, sizeof(d[0]))它,它将为您清除内存,并且不再需要后续的 memset。我没有在示例中使用它,因为这样看来您的问题不会得到回答。

回答by frast

The example will not work because you have to allocate memory for your array. You can do this on the stack or on the heap.

该示例将不起作用,因为您必须为数组分配内存。您可以在堆栈或堆上执行此操作。

This is an example to do it on the stack:

这是在堆栈上执行此操作的示例:

double d[50] = {0.0};

No memset is needed after that.

之后就不需要 memset 了。

回答by metamorphosis

Memset will always be faster, if debug mode or a low level of optimization is used. At higher levels of optimization, it will still be equivalent to std::fill or std::fill_n. For example, for the following code under Google Benchmark: (Test setup: xubuntu 18, GCC 7.3, Clang 6.0)

如果使用调试模式或低级别优化,Memset 将始终更快。在更高级别的优化中,它仍然等同于 std::fill 或 std::fill_n。例如,对于以下 Google Benchmark 下的代码:(测试设置:xubuntu 18、GCC 7.3、Clang 6.0)

#include <cstring>
#include <algorithm>
#include <benchmark/benchmark.h>

double total = 0;


static void memory_memset(benchmark::State& state)
{
    int ints[50000];

    for (auto _ : state)
    {
        std::memset(ints, 0, sizeof(int) * 50000);
    }

    for (int counter = 0; counter != 50000; ++counter)
    {
        total += ints[counter];
    }
}


static void memory_filln(benchmark::State& state)
{
    int ints[50000];

    for (auto _ : state)
    {
        std::fill_n(ints, 50000, 0);
    }

    for (int counter = 0; counter != 50000; ++counter)
    {
        total += ints[counter];
    }
}


static void memory_fill(benchmark::State& state)
{
    int ints[50000];

    for (auto _ : state)
    {
        std::fill(std::begin(ints), std::end(ints), 0);
    }

    for (int counter = 0; counter != 50000; ++counter)
    {
        total += ints[counter];
    }
}


// Register the function as a benchmark
BENCHMARK(memory_filln);
BENCHMARK(memory_fill);
BENCHMARK(memory_memset);



int main (int argc, char ** argv)
{
    benchmark::Initialize (&argc, argv);
    benchmark::RunSpecifiedBenchmarks ();
    printf("Total = %f\n", total);
    getchar();
    return 0;
}

Gives the following results in release mode for GCC (-O2;-march=native):

在 GCC (-O2;-march=native) 的发布模式下给出以下结果:

-----------------------------------------------------
Benchmark              Time           CPU Iterations
-----------------------------------------------------
memory_filln       16488 ns      16477 ns      42460
memory_fill        16493 ns      16493 ns      42440
memory_memset       8414 ns       8408 ns      83022

And the following results in debug mode (-O0):

调试模式 (-O0) 下的结果如下:

-----------------------------------------------------
Benchmark              Time           CPU Iterations
-----------------------------------------------------
memory_filln       87209 ns      87139 ns       8029
memory_fill        94593 ns      94533 ns       7411
memory_memset       8441 ns       8434 ns      82833

While at -O3 or with clang at -O2, the following is obtained:

在 -O3 处或在 -O2 处使用 clang 时,将获得以下结果:

-----------------------------------------------------
Benchmark              Time           CPU Iterations
-----------------------------------------------------
memory_filln        8437 ns       8437 ns      82799
memory_fill         8437 ns       8437 ns      82756
memory_memset       8436 ns       8436 ns      82754

TLDR: use memset unless told you absolutely have to use std::fill or a for-loop, at least for POD types which are not non-IEEE-754 floating-points. There are no strong reasons not to.

TLDR:使用 memset 除非告诉您绝对必须使用 std::fill 或 for 循环,至少对于不是非 IEEE-754 浮点数的 POD 类型。没有充分的理由不这样做。

(note: the for loops counting the array contents are necessary for clang not to optimize away the google benchmark loops entirely (it will detect they're not used otherwise))

(注意:计数数组内容的 for 循环对于 clang 不完全优化谷歌基准循环是必要的(它会检测到它们没有被使用))