C++ 理解“损坏的大小与 prev_size”glibc 错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49628615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 15:44:27  来源:igfitidea点击:

Understanding "corrupted size vs. prev_size" glibc error

c++mallocfreejnaglibc

提问by Sheinbergon

I have implemented a JNA bridge to FDK-AAC. Source code can be found in here

我已经实现了到 FDK-AAC 的 JNA 桥接。源代码可以在这里找到

When bench-marking my code, I can get hundreds of successful runs on the same input, and then occasionally a C-level crash that'll kill the entire process, causing a core-dump to be generated:

在对我的代码进行基准测试时,我可以在同一个输入上成功运行数百次,然后偶尔会发生 C 级崩溃,这会杀死整个进程,导致生成核心转储:

Looking at the core dump, it looks like this:

查看核心转储,它看起来像这样:

#1  0x00007f3e92e00f5d in __GI_abort () at abort.c:90
#2  0x00007f3e92e4928d in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f3e92f70528 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007f3e92e5064a in malloc_printerr (action=<optimized out>, str=0x7f3e92f6cdee "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5426
#4  0x00007f3e92e5304a in _int_free (av=0x7f3de0000020, p=<optimized out>, have_lock=0) at malloc.c:4337
#5  0x00007f3e92e5744e in __GI___libc_free (mem=<optimized out>) at malloc.c:3145
#6  0x00007f3e113921e9 in FDKfree (ptr=0x7f3de009df60) at libSYS/src/genericStds.cpp:233
#7  0x00007f3e1130d7d3 in Free_AacEncoder (p=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:407
#8  0x00007f3e1130fbb3 in aacEncClose (phAacEncoder=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:1395

This back/stack trace error is reproducible if I run repeat benchmark enough times , though I'm having a hard time understanding what might be the cause for such error? Memory allocated to pointer 0x7f3de009df60is allocated inside the CPP/C code as well and I can guarantee the same instance that's allocated is being freed. The benchmark is, of course - single-threaded.

如果我运行重复基准测试次数足够多,则此后退/堆栈跟踪错误是可重现的,尽管我很难理解导致此类错误的原因是什么?分配给指针的内存0x7f3de009df60也在 CPP/C 代码中分配,我可以保证分配的同一个实例正在被释放。基准当然是单线程的。

After reading these:

阅读这些后:

security checks&& internal functions

安全检查&& 内部功能

I'm still having a hard time understanding - what might be a real (non-exploitation, but rather error)) scenario that causes me to get the above error? and why does it happen very scarcely?

我仍然很难理解 - 什么可能是真正的(非利用,而是错误))导致我得到上述错误的场景?为什么它很少发生?

Current suspicion:

目前怀疑

Running a detailed backtrace, I get this input:

运行详细的回溯,我得到以下输入:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {4, 6378670679680, 645636045657660056, 90523359816, 139904561311072, 292199584, 139903730612120, 139903730611784, 139904561311088, 1460617926600, 47573685816, 4119199860131166208, 
            139904593745464, 139904553224483, 139904561311136, 288245657}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007f3e92e00f5d in __GI_abort () at abort.c:90
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f3de026db10, sa_sigaction = 0x7f3de026db10}, sa_mask = {__val = {139903730540556, 19, 30064771092, 812522497172832284, 139903728706672, 1887866374039011357, 
              139900298780168, 3775732748407067896, 763430436865, 35180077121538, 4119199860131166208, 139904561311552, 139904553065676, 1, 139904561311584, 139904561312192}}, sa_flags = 4096, 
          sa_restorer = 0x14}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f3e92e4928d in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f3e92f70528 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:181
        ap = {{gp_offset = 40, fp_offset = 32574, overflow_arg_area = 0x7f3e11adf1d0, reg_save_area = 0x7f3e11adf160}}
        fd = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#3  0x00007f3e92e5064a in malloc_printerr (action=<optimized out>, str=0x7f3e92f6cdee "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5426
        buf = "00007f3de009e9f0"
        cp = <optimized out>
        ar_ptr = <optimized out>
        ptr = <optimized out>
        str = 0x7f3e92f6cdee "corrupted size vs. prev_size"
        action = <optimized out>
#4  0x00007f3e92e5304a in _int_free (av=0x7f3de0000020, p=<optimized out>, have_lock=0) at malloc.c:4337
        size = 2720
        fb = <optimized out>
        nextchunk = 0x7f3de009e9f0
        nextsize = 736
        nextinuse = <optimized out>
        prevsize = <optimized out>
        bck = <optimized out>
        fwd = <optimized out>
        errstr = 0x0
        locked = <optimized out>
#5  0x00007f3e92e5744e in __GI___libc_free (mem=<optimized out>) at malloc.c:3145
        ar_ptr = <optimized out>
        p = <optimized out>
        hook = <optimized out>
#6  0x00007f3e113921e9 in FDKfree (ptr=0x7f3de009df60) at libSYS/src/genericStds.cpp:233
No locals.
#7  0x00007f3e1130d7d3 in Free_AacEncoder (p=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:407
No locals.
#8  0x00007f3e1130fbb3 in aacEncClose (phAacEncoder=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:1395
        hAacEncoder = 0x7f3de009df60
        err = AACENC_OK
  • In frame #6, you can see the pointer in questions is 0x7f3de009df60.
  • In frame #4, you can see that the size is 2720, which is indeed the expected size of the structure being released.
  • However the address of nextchunkis 0x7f3de009e9f0, which is only 2704 bytes after the current pointer which is being released.
  • I can confirm this is always the case when the error reproduces.
  • Could this be a strong indication of the error I'm facing ??
  • 第 6帧中,您可以看到问题中的指针是 0x7f3de009df60
  • 第 4帧中,您可以看到大小为 2720,这确实是正在释放的结构的预期大小。
  • 但是nextchunkis的地址,0x7f3de009e9f0在当前指针被释放后仅 2704 字节。
  • 我可以确认当错误重现时总是如此。
  • 这可能是我面临的错误的强烈迹象吗??

回答by Sheinbergon

OK, so I've managed to overcome this issue.

好的,所以我已经设法克服了这个问题。

First of all - A practical cause to "corrupted size vs. prev_size" is quite simple - memory chunk control structure fields in the adjacent following chunk are being overwritten due to out-of-bounds access by the code. if you allocate xbytes for pointer pbut wind up writing beyond xin regards to the same pointer, you might get this error, indicating the current memory allocation (chunk) size is not the same as what's found in the next chunk control structure (due to it being overwritten).

首先 - “大小与 prev_size 损坏”的实际原因非常简单 - 由于代码的越界访问,相邻的后续块中的内存块控制结构字段被覆盖。如果您x为指针分配字节,p但最终写入超出x相同指针的内容,您可能会收到此错误,表明当前内存分配(块)大小与在下一个块控制结构中找到的大小不同(由于它被覆盖)。

As for the cause for this memory leak - structure mapping done in the Java/JNA layer implied different #pragmarelated padding/alignment from what dll/so was compiled with. This in turn, caused data to be written beyond the allocated structure boundary. Disabling that alignment made the issues go away. (Thousands of executions without a single crash!).

至于这种内存泄漏的原因 - 在 Java/JNA 层中完成的结构映射暗示了#pragma与 dll/so 编译时使用的不同的相关填充/对齐。这反过来又导致数据写入超出分配的结构边界。禁用该对齐使问题消失。(数以千计的执行没有一次崩溃!)。