Linux 为什么 malloc 在 gcc 中将值初始化为 0？

Question

提问by SHH

Maybe it is different from platform to platform, but

可能平台不同，但是

when I compile using gcc and run the code below, I get 0 every time in my ubuntu 11.10.

当我使用 gcc 编译并运行下面的代码时，我每次在 ubuntu 11.10 中都得到 0。

#include <stdio.h>
#include <stdlib.h>

int main()
{
    double *a = (double*) malloc(sizeof(double)*100)
    printf("%f", *a);
}

Why do malloc behave like this even though there is calloc?

即使有 calloc，为什么 malloc 的行为也会如此？

Doesn't it mean that there is an unwanted performance overhead just to initialize the values to 0 even if you don't want it to be sometimes?

这是否意味着即使有时不希望将值初始化为 0 也会产生不必要的性能开销？

EDIT: Oh, my previous example was not initiazling, but happened to use "fresh" block.

编辑：哦，我之前的例子不是初始化，而是碰巧使用了“新鲜”块。

What I precisely was looking for was why it initializes it when it allocates a large block:

我正是在寻找的是为什么它在分配一个大块时初始化它：

int main()
{
    int *a = (int*) malloc(sizeof(int)*200000);
    a[10] = 3;
    printf("%d", *(a+10));

    free(a);

    a = (double*) malloc(sizeof(double)*200000);
    printf("%d", *(a+10));
}

OUTPUT: 3
        0 (initialized)

But thanks for pointing out that there is a SECURITY reason when mallocing! (Never thought about it). Sure it has to initialize to zero when allocating fresh block, or the large block.

但是感谢您指出mallocing时有一个安全原因！（从来没想过）。当然在分配新块或大块时它必须初始化为零。

Answer 1

采纳答案by Mysticial

Short Answer:

简答：

It doesn't, it just happens to be zero in your case.
(Also your test case doesn't show that the data is zero. It only shows if one element is zero.)

它没有，在你的情况下它恰好为零。
（此外，您的测试用例并未显示数据为零。它仅显示一个元素是否为零。）

Long Answer:

长答案：

When you call malloc(), one of two things will happen:

当您调用时malloc()，会发生以下两种情况之一：

It recycles memory that was previous allocated and freed from the same process.
It requests new page(s) from the operating system.

它回收先前从同一进程分配和释放的内存。
它从操作系统请求新页面。

In the first case, the memory will contain data leftover from previous allocations. So it won't be zero. This is the usual case when performing small allocations.

在第一种情况下，内存将包含以前分配的剩余数据。所以不会为零。这是执行小分配时的常见情况。

In the second case, the memory will be from the OS. This happens when the program runs out of memory - or when you are requesting a very large allocation. (as is the case in your example)

在第二种情况下，内存将来自操作系统。当程序耗尽内存时会发生这种情况 - 或者当您请求非常大的分配时。（就像你的例子一样）

Here's the catch: Memory coming from the OS will be zeroed for securityreasons.*

这里有一个问题：出于安全原因，来自操作系统的内存将被清零。 *

When the OS gives you memory, it could have been freed from a different process. So that memory could contain sensitive information such as a password. So to prevent you reading such data, the OS will zero it before it gives it to you.

当操作系统为您提供内存时，它可能已从不同的进程中释放出来。因此该内存可能包含敏感信息，例如密码。因此，为了防止您读取此类数据，操作系统会将其归零，然后再将其提供给您。

_{*I note that the C standard says nothing about this. This is strictly an OS behavior. So this zeroing may or may not be present on systems where security is not a concern.}

_{*我注意到 C 标准没有说明这一点。这是严格的操作系统行为。因此，在不考虑安全性的系统上可能会或可能不会出现这种归零。}

To give more of a performance background to this:

要为此提供更多性能背景：

As @R. mentions in the comments, this zeroing is why you should always use calloc()instead of malloc()+ memset(). calloc()can take advantage of this fact to avoid a separate memset().

作为@R。在评论中提到，这个归零就是为什么你应该总是使用calloc()而不是malloc()+memset()。calloc()可以利用这个事实来避免单独的memset().

On the other hand, this zeroing is sometimes a performance bottleneck. In some numerical applications (such as the out-of-place FFT), you need to allocate a huge chunk of scratch memory. Use it to perform whatever algorithm, then free it.

另一方面，这种归零有时是性能瓶颈。在某些数值应用程序中（例如out-of-place FFT），您需要分配大量的临时内存。使用它来执行任何算法，然后释放它。

In these cases, the zeroing is unnecessary and amounts to pure overhead.

在这些情况下，归零是不必要的，并且相当于纯粹的开销。

The most extreme example I've seen is a 20-second zeroing overhead for a 70-second operation with a 48 GB scratch buffer. (Roughly 30% overhead.) _{(Granted: the machine did have a lack of memory bandwidth.)}

我见过的最极端的例子是 20 秒清零开销，用于 48 GB 暂存缓冲区的 70 秒操作。（大约 30% 的开销。） _{（当然：机器确实缺乏内存带宽。）}

The obvious solution is to simply reuse the memory manually. But that often requires breaking through established interfaces. (especially if it's part of a library routine)

显而易见的解决方案是简单地手动重用内存。但这通常需要突破既定的接口。（特别是如果它是图书馆例程的一部分）

Answer 2

回答by SHH

Do you know that it is definitely being initialised? Is it possible that the area returned by malloc() just frequently has 0 at the beginning?

你知道它肯定正在被初始化吗？malloc() 返回的区域是否有可能在开头经常有 0？

Answer 3

回答by SHH

The standard does not dictate that malloc()should initialize the values to zero. It just happens at your platform that it might be set to zero, or it might have been zero at the specific moment you read that value.

该标准并未规定malloc()应将值初始化为零。它只是在您的平台上发生，它可能被设置为零，或者在您读取该值的特定时刻它可能已经为零。

Answer 4

回答by TonyK

Your code doesn't demonstrate that mallocinitialises its memory to 0. That could be done by the operating system, before the program starts. To see shich is the case, write a different value to the memory, free it, and call malloc again. You will probably get the same address, but you will have to check this. If so, you can look to see what it contains. Let us know!

您的代码没有证明malloc将其内存初始化为 0。这可以由操作系统在程序启动之前完成。要查看情况，请向内存写入不同的值，释放它，然后再次调用 malloc。您可能会得到相同的地址，但您必须检查一下。如果是这样，您可以查看它包含的内容。让我们知道！

Answer 5

回答by hugomg

The OS will usually clear fresh memory pages it sends to your process so it can't look at an older process' data. This means that the first time you initialize a variable (or malloc something) it will often be zero but if you ever reuse that memory (by freeing it and malloc-ing again, for instance) then all bets are off.

操作系统通常会清除它发送给您的进程的新内存页面，因此它无法查看旧进程的数据。这意味着第一次初始化一个变量（或 malloc 某些东西）时，它通常为零，但如果你曾经重用该内存（例如，通过释放它并再次进行 malloc-ing），那么所有的赌注都会被取消。

This inconsistence is precisely why uninitialized variables are such a hard to find bug.

这种不一致正是未初始化变量如此难以发现的错误的原因。

As for the unwanted performance overheads, avoiding unspecified behaviour is probably more important. Whatever small performance boost you could gain in this case won't compensate the hard to find bugs you will have to deal with if someone slightly modifies the codes (breaking previous assumptions) or ports it to another system (where the assumptions might have been invalid in the first place).

至于不必要的性能开销，避免未指定的行为可能更重要。如果有人稍微修改代码（打破先前的假设）或将其移植到另一个系统（假设可能无效），在这种情况下您可以获得的任何小的性能提升都不会补偿您将不得不处理的难以发现的错误首先）。

Answer 6

回答by Dan Aloni

Why do you assume that malloc()initializes to zero? It just so happens to be that the first call to malloc()results in a call to sbrkor mmapsystem calls, which allocate a page of memory from the OS. The OS is obliged to provide zero-initialized memory for security reasons (otherwise, data from other processes gets visible!). So you might think there - the OS wastes time zeroing the page. But no! In Linux, there is a special system-wide singleton page called the 'zero page' and that page will get mapped as Copy-On-Write, which means that only when you actually write on that page, the OS will allocate another page and initialize it. So I hope this answers your question regarding performance. The memory paging model allows usage of memory to be sort-of lazy by supporting the capability of multiple mapping of the same page plus the ability to handle the case when the first write occurs.

为什么你假设malloc()初始化为零？碰巧的是，第一次调用malloc()导致调用sbrk或mmap系统调用，从操作系统分配内存页。出于安全原因，操作系统有义务提供零初始化内存（否则，来自其他进程的数据将变得可见！）。所以你可能会认为 - 操作系统浪费时间将页面归零。但不是！在 Linux 中，有一个特殊的系统范围的单例页面，称为“零页面”，该页面将被映射为写时复制，这意味着只有当您实际在该页面上写入时，操作系统才会分配另一个页面并初始化它。所以我希望这能回答你关于性能的问题。内存分页模型通过支持同一页面的多重映射能力以及处理第一次写入发生的情况的能力，允许内存的使用在某种程度上是惰性的。

If you call free(), the glibcallocator will return the region to its free lists, and when malloc()is called again, you might get that same region, but dirty with the previous data. Eventually, free()might return the memory to the OS by calling system calls again.

如果您调用free()，glibc分配器会将区域返回到它的空闲列表，当malloc()再次调用时，您可能会得到相同的区域，但之前的数据很脏。最终，free()可能会通过再次调用系统调用将内存返回给操作系统。

Notice that the glibcman pageon malloc()strictly says that the memory is not cleared, so by the "contract" on the API, you cannot assume that it does get cleared. Here's the original excerpt:

请注意，glibc手册页上的malloc()严格说明内存未清除，因此根据 API 上的“合同”，您不能假设它确实被清除了。以下是原文摘录：

malloc() allocates size bytes and returns a pointer to the allocated memory.
The memory is not cleared. If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().

malloc() 分配 size 字节并返回指向已分配内存的指针。
记忆没有被清除。如果 size 为 0，则 malloc() 返回 NULL 或稍后可以成功传递给 free() 的唯一指针值。

If you would like, you can read more about of that documentation if you are worried about performance or other side-effects.

如果您愿意，如果您担心性能或其他副作用，可以阅读有关该文档的更多信息。

Answer 7

回答by FlyingGuy

Never evercount on anycompiler to generate code that will initialize memory to anything. malloc simply returns a pointer to n bytes of memory someplacehell it might even be in swap.

从来没有过指望任何编译器生成的代码，将初始化的内存来什么。为n个字节内存的malloc简单地返回一个指向某个地方地狱它甚至可能成为交换。

If the contents of the memory is critical initialize it yourself.

如果内存的内容很重要，请自行初始化。

Answer 8

回答by TomaszK

From gnu.org:

从gnu.org：

Very large blocks (much larger than a page) are allocated with mmap (anonymous or via /dev/zero) by this implementation.

非常大的块（比页面大得多）通过此实现分配了 mmap（匿名或通过 /dev/zero）。

Answer 9

回答by Praetorian

I modified your example to contain 2 identical allocations. Now it is easy to see mallocdoesn't zero initialize memory.

我修改了您的示例以包含 2 个相同的分配。现在很容易看到malloc不零初始化内存。

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    {
      double *a = malloc(sizeof(double)*100);
      *a = 100;
      printf("%f\n", *a);
      free(a);
    }
    {
      double *a = malloc(sizeof(double)*100);
      printf("%f\n", *a);
      free(a);
    }

    return 0;
}

Output with gcc 4.3.4

使用 gcc 4.3.4 输出

100.000000
100.000000

Linux 为什么 malloc 在 gcc 中将值初始化为 0？

提问by SHH

采纳答案by Mysticial

回答by SHH

回答by SHH

回答by TonyK

回答by hugomg

回答by Dan Aloni

回答by FlyingGuy

回答by TomaszK

回答by Praetorian

相关推荐

最近更新

标签

Linux 为什么 malloc 在 gcc 中将值初始化为 0？

提问by SHH

采纳答案by Mysticial

回答by SHH

回答by SHH

回答by TonyK

回答by hugomg

回答by Dan Aloni

回答by FlyingGuy

回答by TomaszK

回答by Praetorian

相关推荐

PYTHONPATH 不适用于 GNU/Linux 上的 sudo（适用于 root）

Linux 增加 FD_SETSIZE 的限制并选择

我有几个 EMF 文件。如何在 Linux 上将它们转换为 ps/pdf/tiff？

Linux 从 C 代码获取当前使用的文件描述符的计数

相关推荐

最近更新

标签