C语言 为什么我会遇到 C malloc 断言失败?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2987207/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 05:36:45  来源:igfitidea点击:

Why do I get a C malloc assertion failure?

cgccmallocassertion

提问by Chris

I am implementing a divide and conquer polynomial algorithm so I can benchmark it against an OpenCL implementation, but I can't get mallocto work. When I run the program, it allocates a bunch of stuff, checks some things, then sends the size/2to the algorithm. Then when I hit the mallocline again it spits out this:

我正在实现一个分而治之的多项式算法,因此我可以针对 OpenCL 实现对其进行基准测试,但我无法开始malloc工作。当我运行程序时,它会分配一堆东西,检查一些东西,然后将它们发送size/2给算法。然后当我malloc再次打线时,它会吐出这个:

malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed. Aborted

malloc.c:3096: sSYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)(((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof) (size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' 失败。中止

The line in question is:

有问题的行是:

int *mult(int size, int *a, int *b) {
    int *out,i, j, *tmp1, *tmp2, *tmp3, *tmpa1, *tmpa2, *tmpb1, *tmpb2,d, *res1, *res2;
    fprintf(stdout, "size: %d\n", size);

    out = (int *)malloc(sizeof(int) * size * 2);
}

I checked size with a fprintf, and it is a positive integer (usually 50 at that point). I tried calling mallocwith a plain number as well and I still get the error. I'm just stumped at what's going on, and nothing from Google I have found so far is helpful.

我用 a 检查了大小fprintf,它是一个正整数(通常是 50)。我也尝试malloc使用普通号码拨打电话,但仍然收到错误消息。我只是被正在发生的事情难住了,到目前为止我从谷歌发现的任何东西都没有帮助。

Any ideas what's going on? I'm trying to figure out how to compile a newer GCC in case it's a compiler error, but I really doubt it.

任何想法发生了什么?我试图弄清楚如何编译更新的 GCC,以防它是编译器错误,但我真的很怀疑。

回答by R Samuel Klatchko

99.9% likely that you have corrupted memory (over- or under-flowed a buffer, wrote to a pointer after it was freed, called free twice on the same pointer, etc.)

99.9% 的可能性是您的内存已损坏(缓冲区溢出或下溢、在释放指针后写入指针、在同一指针上调用两次 free 等)

Run your code under Valgrindto see where your program did something incorrect.

Valgrind下运行你的代码,看看你的程序哪里做错了。

回答by Jon Gjengset

To give you a better understanding of whythis happens, I'd like to expand upon @r-samuel-klatchko's answer a bit.

为了让您更好地理解为什么会发生这种情况,我想稍微扩展一下@r-samuel-klatchko 的回答。

When you call malloc, what is really happening is a bit more complicated than just giving you a chunk of memory to play with. Under the hood, mallocalso keeps some housekeeping information about the memory it has given you (most importantly, its size), so that when you call free, it knows things like how much memory to free. This information is commonly kept right before the memory location returned to you by malloc. More exhaustive information can be found on the internet?, but the (very) basic idea is something like this:

当你调用 时malloc,真正发生的事情比仅仅给你一大块内存来玩要复杂一些。在幕后,malloc还保留了一些有关它给您的内存的内务管理信息(最重要的是,它的大小),以便当您调用 时free,它知道诸如要释放多少内存之类的事情。此信息通常保存在内存位置由 返回给您之前malloc。可以在互联网上找到更详尽的信息吗?,但(非常)基本的想法是这样的:

+------+-------------------------------------------------+
+ size |                  malloc'd memory                +
+------+-------------------------------------------------+
       ^-- location in pointer returned by malloc

Building on this (and simplifying things greatly), when you call malloc, it needs to get a pointer to the next part of memory that is available. One very simple way of doing this is to look at the previous bit of memory it gave away, and move sizebytes further down (or up) in memory. With this implementation, you end up with your memory looking something like this after allocating p1, p2and p3:

以此为基础(并大大简化事情),当您调用 时malloc,它需要获取指向可用内存的下一部分的指针。一种非常简单的方法是查看它放弃的前一位内存,并size在内存中进一步向下(或向上)移动字节。通过这个实现,你的内存在分配后最终看起来像这样p1p2并且p3

+------+----------------+------+--------------------+------+----------+
+ size |                | size |                    | size |          +
+------+----------------+------+--------------------+------+----------+
       ^- p1                   ^- p2                       ^- p3

So, what is causing your error?

那么,是什么导致了您的错误?

Well, imagine that your code erroneously writes past the amount of memory you've allocated (either because you allocated less than you needed as was your problem or because you're using the wrong boundary conditions somewhere in your code). Say your code writes so much data to p2that it starts overwriting what is in p3's sizefield. When you now next call malloc, it will look at the last memory location it returned, look at its size field, move to p3 + sizeand then start allocating memory from there. Since your code has overwritten size, however, this memory location is no longer after the previously allocated memory.

好吧,想象一下您的代码错误地写入超过了您分配的内存量(因为您分配的内存少于您的问题所需的数量,或者因为您在代码中的某处使用了错误的边界条件)。假设您的代码写入了如此多的数据,p2以至于它开始覆盖p3'ssize字段中的内容。当你下次调用时malloc,它会查看它返回的最后一个内存位置,查看它的大小字段,移动到p3 + size然后从那里开始分配内存。size但是,由于您的代码已覆盖,因此此内存位置不再位于先前分配的内存之后。

Needless to say, this can wreck havoc! The implementors of mallochave therefore put in a number of "assertions", or checks, that try to do a bunch of sanity checking to catch this (and other issues) if they are about to happen. In your particular case, these assertions are violated, and thus mallocaborts, telling you that your code was about to do something it really shouldn't be doing.

不用说,这可能会造成严重破坏!因此,实现者malloc已经进行了许多“断言”或检查,如果它们即将发生,它们会尝试进行一系列健全性检查以捕获此(和其他问题)。在您的特定情况下,这些断言被违反,因此malloc中止,告诉您您的代码即将做一些它不应该做的事情。

As previously stated, this is a gross oversimplification, but it is sufficient to illustrate the point. The glibc implementation of mallocis more than 5k lines, and there have been substantial amounts of research into how to build good dynamic memory allocation mechanisms, so covering it all in a SO answer is not possible. Hopefully this has given you a bit of a view of what is really causing the problem though!

如前所述,这是一种粗略的过度简化,但足以说明这一点。的 glibc 实现malloc超过 5k 行,并且已经对如何构建良好的动态内存分配机制进行了大量研究,因此不可能在 SO 答案中涵盖所有内容。希望这能让您对真正导致问题的原因有所了解!

回答by iBug

My alternative solution to using Valgrind:

我使用 Valgrind 的替代解决方案:

I'm very happy because I just helped my friend debug a program. His program had this exact problem (malloc()causing abort), with the same error message from GDB.

我很高兴,因为我刚刚帮助我的朋友调试了一个程序。他的程序有这个确切的问题(malloc()导致中止),与来自 GDB 的错误消息相同。

I compiled his program using Address Sanitizerwith

我编译使用他的程序地址消毒剂

gcc -Wall -g3 -fsanitize=address -o new new.c
              ^^^^^^^^^^^^^^^^^^

And then ran gdb new. When the program gets terminated by SIGABRTcaused in a subsequent malloc(), a whole lot of useful information is printed:

然后跑了gdb new。当程序因SIGABRT随后的引起而终止时,malloc()会打印出大量有用的信息:

=================================================================
==407==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060000000b4 at pc 0x7ffffe49ed1a bp 0x7ffffffedc20 sp 0x7ffffffed3c8
WRITE of size 104 at 0x6060000000b4 thread T0
    #0 0x7ffffe49ed19  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5ed19)
    #1 0x8001dab in CreatHT2 /home/wsl/Desktop/hash/new.c:59
    #2 0x80031cf in main /home/wsl/Desktop/hash/new.c:209
    #3 0x7ffffe061b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #4 0x8001679 in _start (/mnt/d/Desktop/hash/new+0x1679)

0x6060000000b4 is located 0 bytes to the right of 52-byte region [0x606000000080,0x6060000000b4)
allocated by thread T0 here:
    #0 0x7ffffe51eb50 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
    #1 0x8001d56 in CreatHT2 /home/wsl/Desktop/hash/new.c:55
    #2 0x80031cf in main /home/wsl/Desktop/hash/new.c:209
    #3 0x7ffffe061b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

Let's take a look at the output, especially the stack trace:

让我们看一下输出,尤其是堆栈跟踪:

The first part says there's a invalid write operation at new.c:59. That line reads

第一部分说在 处有一个无效的写操作new.c:59。那行写着

memset(len,0,sizeof(int*)*p);
             ^^^^^^^^^^^^

The second part says the memory that the bad write happened on is created at new.c:55. That line reads

第二部分说发生错误写入的内存是在new.c:55. 那行写着

if(!(len=(int*)malloc(sizeof(int)*p))){
                      ^^^^^^^^^^^

That's it. It only took me less than half a minute to locate the bug that confused my friend for a few hours. He managed to locate the failure, but it's a subsequent malloc()call that failed, without being able to spot this error in previous code.

就是这样。我只花了不到半分钟的时间就找到了让我朋友困惑了几个小时的错误。他设法找到了故障,但失败的是后续malloc()调用,而无法在之前的代码中发现此错误。

Sum up: Try the -fsanitize=addressof GCC or Clang. It can be very helpful when debugging memory issues.

总结:试试-fsanitize=addressGCC 或 Clang 的。它在调试内存问题时非常有用。

回答by Michael Grieswald

I got the following message, similar to your one:

我收到以下消息,类似于您的消息:

    program: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

Made a mistake some method call before, when using malloc. Erroneously overwrote the multiplication sign '*' with a '+', when updating the factor after sizeof()-operator on adding a field to unsigned char array.

之前在使用 malloc 时调用了某个方法出错了。在将字段添加到 unsigned char 数组时,在 sizeof()-operator 之后更新因子时,错误地用“+”覆盖了乘号“*”。

Here is the code responsible for the error in my case:

这是在我的情况下导致错误的代码:

    UCHAR* b=(UCHAR*)malloc(sizeof(UCHAR)+5);
    b[INTBITS]=(some calculation);
    b[BUFSPC]=(some calculation);
    b[BUFOVR]=(some calculation);
    b[BUFMEM]=(some calculation);
    b[MATCHBITS]=(some calculation);

In another method later, I used malloc again and it produced the error message shown above. The call was (simple enough):

在后来的另一种方法中,我再次使用了 malloc,它产生了如上所示的错误消息。电话是(足够简单):

    UCHAR* b=(UCHAR*)malloc(sizeof(UCHAR)*50);

Think using the '+'-sign on the 1st call, which lead to mis-calculus in combination with immediate initialization of the array after (overwriting memory that was not allocated to the array), brought some confusion to malloc's memory map.Therefore the 2nd call went wrong.

考虑在第一次调用时使用“+”符号,这会导致误算,再加上之后立即初始化数组(覆盖未分配给数组的内存),给 malloc 的内存映射带来了一些混乱。因此第二次调用出错了。

回答by pbernatchez

You are probably overrunning beyond the allocated mem somewhere. then the underlying sw doesn't pick up on it until you call malloc

您可能在某个地方超出了分配的内存。那么在你调用 malloc 之前底层 sw 不会接受它

There may be a guard value clobbered that is being caught by malloc.

可能有一个保护值被 malloc 捕获。

edit...added this for bounds checking help

编辑...添加此边界检查帮助

http://www.lrde.epita.fr/~akim/ccmp/doc/bounds-checking.html

http://www.lrde.epita.fr/~akim/ccmp/doc/bounds-checking.html

回答by Phob

We got this error because we forgot to multiply by sizeof(int). Note the argument to malloc(..) is a number of bytes, not number of machine words or whatever.

我们得到这个错误是因为我们忘记乘以 sizeof(int)。请注意 malloc(..) 的参数是字节数,而不是机器字数或其他任何内容。

回答by namila007

i got the same problem, i used malloc over n over again in a loop for adding new char *string data. i faced the same problem, but after releasing the allocated memory void free()problem were sorted

我遇到了同样的问题,我在循环中再次使用 malloc over n 来添加新的 char *string 数据。我遇到了同样的问题,但释放分配的内存void free()问题后排序

回答by JMH

I was porting one application from Visual C to gcc over Linux and I had the same problem with

我正在将一个应用程序从 Visual C 移植到 Linux 上的 gcc,我遇到了同样的问题

malloc.c:3096: sYSMALLOc: Assertion using gcc on UBUNTU 11.

malloc.c:3096: sSYSMALLOc: 在 UBUNTU 11 上使用 gcc 进行断言。

I moved the same code to a Suse distribution (on other computer ) and I don't have any problem.

我将相同的代码移到了 Suse 发行版(在其他计算机上),我没有任何问题。

I suspect that the problems are not in our programs but in the own libc.

我怀疑问题不在我们的程序中,而在我们自己的 libc 中。