Linux 来自 libstdc++.so.6 的 std::string::assign() 方法中奇怪的 SIGSEGV 分段错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7038124/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Weird SIGSEGV segmentation fault in std::string::assign() method from libstdc++.so.6
提问by yaobin
My program recently encountered a weird segfault when running. I want to know if somebody had met this error before and how it could be fixed.Here is more info:
我的程序最近在运行时遇到了一个奇怪的段错误。 我想知道之前是否有人遇到过这个错误以及如何修复它。这是更多信息:
Basic info:
基础信息:
- CentOS 5.2, kernal version is 2.6.18
- g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)
- CPU: Intel x86 family
- libstdc++.so.6.0.8
- My program will start multiple threads to process data. The segfault occurred in one of the threads.
- Though it's a multi-thread program, the segfault seemed to occur on a local std::string object. I'll show this in the code snippet later.
- The program is compiled with -g, -Wall and -fPIC, and without -O2 or other optimization options.
- CentOS 5.2,内核版本为 2.6.18
- g++ (GCC) 4.1.2 20080704(红帽 4.1.2-50)
- CPU:Intel x86 家族
- libstdc++.so.6.0.8
- 我的程序将启动多个线程来处理数据。段错误发生在其中一个线程中。
- 虽然它是一个多线程程序,但段错误似乎发生在本地 std::string 对象上。稍后我将在代码片段中展示这一点。
- 该程序使用 -g、-Wall 和 -fPIC 编译,没有 -O2 或其他优化选项。
The core dump info:
核心转储信息:
Core was generated by `./myprog'.
Program terminated with signal 11, Segmentation fault.
#0 0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6
(gdb) bt
#0 0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6
#1 0x06f507c3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6
#2 0x06f50834 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6
#3 0x081402fc in Q_gdw::ProcessData (this=0xb2f79f60) at ../../../myprog/src/Q_gdw/Q_gdw.cpp:798
#4 0x08117d3a in DataParser::Parse (this=0x8222720) at ../../../myprog/src/DataParser.cpp:367
#5 0x08119160 in DataParser::run (this=0x8222720) at ../../../myprog/src/DataParser.cpp:338
#6 0x080852ed in Utility::__dispatch (arg=0x8222720) at ../../../common/thread/Thread.cpp:603
#7 0x0052c832 in start_thread () from /lib/libpthread.so.0
#8 0x00ca845e in clone () from /lib/libc.so.6
Please note that the segfault begins within the basic_string::operator=().
请注意,段错误开始于basic_string::operator=()。
The related code:(I've shown more code than that might be needed, and please ignore the coding style things for now.)
相关代码:(我已经展示了比可能需要的更多的代码,现在请忽略编码风格的东西。)
int Q_gdw::ProcessData()
{
char tmpTime[10+1] = {0};
char A01Time[12+1] = {0};
std::string tmpTimeStamp;
// Get the timestamp from TP
if((m_BackFrameBuff[11] & 0x80) >> 7)
{
for (i = 0; i < 12; i++)
{
A01Time[i] = (char)A15Result[i];
}
tmpTimeStamp = FormatTimeStamp(A01Time, 12); // Segfault occurs on this line
And here is the prototype of this FormatTimeStamp method:
这是这个 FormatTimeStamp 方法的原型:
std::string FormatTimeStamp(const char *time, int len)
I think such string assignment operations should be a kind of commonly used one, but I just don't understand why a segfault could occurr here.
我认为这样的字符串赋值操作应该是一种常用的操作,但我只是不明白为什么这里会发生段错误。
What I have investigated:
我调查的内容:
I've searched on the web for answers. I looked at here. The reply says try to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING macro defined. I tried but the crash still happens.
我在网上搜索过答案。我看着这里。回复说尝试使用定义的 _GLIBCXX_FULLY_DYNAMIC_STRING 宏重新编译程序。我试过了,但崩溃仍然发生。
I also looked at here. It also says to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING, but the author seems to be dealing with a different problem with mine, thus I don't think his solution works for me.
我也看了这里。它还说要使用 _GLIBCXX_FULLY_DYNAMIC_STRING 重新编译程序,但作者似乎正在处理我的不同问题,因此我认为他的解决方案不适合我。
Updated on 08/15/2011
2011 年 8 月 15 日更新
Hi guys, here is the original code of this FormatTimeStamp. I understand the coding doesn't look very nice(too many magic numbers, for instance..), but let's focus on the crash issue first.
大家好,这是这个 FormatTimeStamp 的原始代码。我知道编码看起来不太好(例如,太多的幻数..),但让我们首先关注崩溃问题。
string Q_gdw::FormatTimeStamp(const char *time, int len)
{
string timeStamp;
string tmpstring;
if (time) // It is guaranteed that "time" is correctly zero-terminated, so don't worry about any overflow here.
tmpstring = time;
// Get the current time point.
int year, month, day, hour, minute, second;
#ifndef _WIN32
struct timeval timeVal;
struct tm *p;
gettimeofday(&timeVal, NULL);
p = localtime(&(timeVal.tv_sec));
year = p->tm_year + 1900;
month = p->tm_mon + 1;
day = p->tm_mday;
hour = p->tm_hour;
minute = p->tm_min;
second = p->tm_sec;
#else
SYSTEMTIME sys;
GetLocalTime(&sys);
year = sys.wYear;
month = sys.wMonth;
day = sys.wDay;
hour = sys.wHour;
minute = sys.wMinute;
second = sys.wSecond;
#endif
if (0 == len)
{
// The "time" doesn't specify any time so we just use the current time
char tmpTime[30];
memset(tmpTime, 0, 30);
sprintf(tmpTime, "%d-%d-%d %d:%d:%d.000", year, month, day, hour, minute, second);
timeStamp = tmpTime;
}
else if (6 == len)
{
// The "time" specifies "day-month-year" with each being 2-digit.
// For example: "150811" means "August 15th, 2011".
timeStamp = "20";
timeStamp = timeStamp + tmpstring.substr(4, 2) + "-" + tmpstring.substr(2, 2) + "-" +
tmpstring.substr(0, 2);
}
else if (8 == len)
{
// The "time" specifies "minute-hour-day-month" with each being 2-digit.
// For example: "51151508" means "August 15th, 15:51".
// As the year is not specified, the current year will be used.
string strYear;
stringstream sstream;
sstream << year;
sstream >> strYear;
sstream.clear();
timeStamp = strYear + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " +
tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000";
}
else if (10 == len)
{
// The "time" specifies "minute-hour-day-month-year" with each being 2-digit.
// For example: "5115150811" means "August 15th, 2011, 15:51".
timeStamp = "20";
timeStamp = timeStamp + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " +
tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000";
}
else if (12 == len)
{
// The "time" specifies "second-minute-hour-day-month-year" with each being 2-digit.
// For example: "305115150811" means "August 15th, 2011, 15:51:30".
timeStamp = "20";
timeStamp = timeStamp + tmpstring.substr(10, 2) + "-" + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + " " +
tmpstring.substr(4, 2) + ":" + tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ".000";
}
return timeStamp;
}
Updated on 08/19/2011
更新于 08/19/2011
This problem has finally been addressed and fixed. The FormatTimeStamp() function has nothing to do with the root cause, in fact. The segfault is caused by a writing overflow of a local char buffer.
这个问题终于得到解决和修复。事实上,FormatTimeStamp() 函数与根本原因无关。段错误是由本地字符缓冲区的写入溢出引起的。
This problem can be reproduced with the following simpler program(please ignore the bad namings of some variables for now):
这个问题可以用以下更简单的程序重现(请暂时忽略某些变量的错误命名):
(Compiled with "g++ -Wall -g main.cpp")
(用“g++ -Wall -g main.cpp”编译)
#include <string>
#include <iostream>
void overflow_it(char * A15, char * A15Result)
{
int m;
int t = 0,i = 0;
char temp[3];
for (m = 0; m < 6; m++)
{
t = ((*A15 & 0xf0) >> 4) *10 ;
t += *A15 & 0x0f;
A15 ++;
std::cout << "m = " << m << "; t = " << t << "; i = " << i << std::endl;
memset(temp, 0, sizeof(temp));
sprintf((char *)temp, "%02d", t); // The buggy code: temp is not big enough when t is a 3-digit integer.
A15Result[i++] = temp[0];
A15Result[i++] = temp[1];
}
}
int main(int argc, char * argv[])
{
std::string str;
{
char tpTime[6] = {0};
char A15Result[12] = {0};
// Initialize tpTime
for(int i = 0; i < 6; i++)
tpTime[i] = char(154); // 154 would result in a 3-digit t in overflow_it().
overflow_it(tpTime, A15Result);
str.assign(A15Result);
}
std::cout << "str says: " << str << std::endl;
return 0;
}
Here are two facts we should remember before going on: 1). My machine is an Intel x86 machine so it's using the Little Endian rule. Therefore for a variable "m" of int type, whose value is, say, 10, it's memory layout might be like this:
在继续之前,我们应该记住以下两个事实:1)。我的机器是 Intel x86 机器,所以它使用 Little Endian 规则。因此,对于 int 类型的变量“m”,例如其值为 10,其内存布局可能如下所示:
Starting addr:0xbf89bebc: m(byte#1): 10
0xbf89bebd: m(byte#2): 0
0xbf89bebe: m(byte#3): 0
0xbf89bebf: m(byte#4): 0
2). The program above runs within the main thread. When it comes to the overflow_it() function, the variables layout in the thread stack looks like this(which only shows the important variables):
2)。上面的程序在主线程中运行。说到overflow_it()函数,线程栈中的变量布局是这样的(只显示了重要的变量):
0xbfc609e9 : temp[0]
0xbfc609ea : temp[1]
0xbfc609eb : temp[2]
0xbfc609ec : m(byte#1) <-- Note that m follows temp immediately. m(byte#1) happens to be the byte temp[3].
0xbfc609ed : m(byte#2)
0xbfc609ee : m(byte#3)
0xbfc609ef : m(byte#4)
0xbfc609f0 : t
...(3 bytes)
0xbfc609f4 : i
...(3 bytes)
...(etc. etc. etc...)
0xbfc60a26 : A15Result <-- Data would be written to this buffer in overflow_it()
...(11 bytes)
0xbfc60a32 : tpTime
...(5 bytes)
0xbfc60a38 : str <-- Note the str takes up 4 bytes. Its starting address is **16 bytes** behind A15Result.
My analysis:
我的分析:
1). m is a counter in overflow_it() whose value is incremented by 1 at each for loop and whose max value is supposed not greater than 6. Thus it's value could be stored completely in m(byte#1)(remember it's Little Endian) which happens to be temp3.
1)。m 是 overflow_it() 中的一个计数器,其值在每个 for 循环中递增 1,其最大值假设不大于 6。因此它的值可以完全存储在 m(byte#1)(记住它是 Little Endian)中恰好是 temp 3。
2). In the buggy line: When t is a 3-digit integer, such as 109, then the sprintf() call would result in a buffer overflow, because serializing the number 109 to the string "109" actually requires 4 bytes: '1', '0', '9' and a terminating '\0'. Because temp[] is allocated with 3 bytes only, the final '\0' would definitely be written to temp3, which is just the m(byte#1), which unfortunately stores m's value. As a result, m's value is reset to 0 every time.
2)。在错误行中:当 t 是 3 位整数时,例如 109,那么 sprintf() 调用将导致缓冲区溢出,因为将数字 109 序列化为字符串“109”实际上需要 4 个字节:'1' , '0', '9' 和终止的 '\0'。因为 temp[] 只分配了 3 个字节,最后的 '\0' 肯定会被写入 temp 3,它只是 m(byte#1),不幸的是它存储了 m 的值。结果,m 的值每次都重置为 0。
3). The programmer's expectation, however, is that the for loop in the overflow_it() would execute 6 times only, with each time m being incremented by 1. Because m is always reset to 0, the actual loop time is far more than 6 times.
3)。然而,程序员的期望是overflow_it() 中的for 循环只会执行6 次,每次m 增加1。因为m 总是被重置为0,所以实际循环时间远远超过6 次。
4). Let's look at the variable i in overflow_it(): Every time the for loop is executed, i's value is incremented by 2, and A15Result[i] will be accessed. However, if you compile and run this program, you'll see the i value finally adds up to 24, which means the overflow_it() writes data to the bytes ranging from A15Result[0] to A15Result[23]. Note that the object str is only 16 bytes behind A15Result[0], thus the overflow_it() has "sweeped through" str and destroy it's correct memory layout.
4)。我们看overflow_it()中的变量i:每次for循环执行时,i的值加2,会访问A15Result[i]。但是,如果您编译并运行此程序,您将看到 i 值最终加起来为 24,这意味着 overflow_it() 将数据写入从 A15Result[0] 到 A15Result[23] 的字节。请注意,对象 str 仅在 A15Result[0] 后面 16 个字节,因此 overflow_it() 已“扫过” str 并破坏了它的正确内存布局。
5). I think the correct use of std::string, as it is a non-POD data structure, depends on that that instantiated std::string object must have a correct internal state. But in this program, str's internal layout has been changed by force externally. This should be why the assign() method call would finally cause a segfault.
5)。我认为 std::string 的正确使用,因为它是一个非 POD 数据结构,取决于实例化的 std::string 对象必须具有正确的内部状态。但是在这个程序中,str的内部布局已经被外部强行改变了。这应该就是assign() 方法调用最终会导致段错误的原因。
Update on 08/26/2011
2011 年 8 月 26 日更新
In my previous update on 08/19/2011, I said that the segfault was caused by a method call on a local std::string object whose memory layout had been broken and thus became a "destroyed" object. This is not an "always" true story. Consider the C++ program below:
在我 2011 年 8 月 19 日的上一次更新中,我说段错误是由本地 std::string 对象的方法调用引起的,该对象的内存布局已被破坏,因此成为“销毁”对象。这不是一个“永远”真实的故事。考虑下面的 C++ 程序:
//C++
class A {
public:
void Hello(const std::string& name) {
std::cout << "hello " << name;
}
};
int main(int argc, char** argv)
{
A* pa = NULL; //!!
pa->Hello("world");
return 0;
}
The Hello() call would succeed. It would succeed even if you assign an obviously bad pointer to pa. The reason is: the non-virtual methods of a class don't reside within the memory layout of the object, according to the C++ object model. The C++ compiler turns the A::Hello() method to something like, say, A_Hello_xxx(A * const this, ...) which could be a global function. Thus, as long as you don't operate on the "this" pointer, things could go pretty well.
Hello() 调用会成功。即使您为 pa 分配了一个明显错误的指针,它也会成功。原因是:根据 C++ 对象模型,类的非虚拟方法不驻留在对象的内存布局中。C++ 编译器将 A::Hello() 方法转换为类似 A_Hello_xxx(A * const this, ...) 的方法,它可能是一个全局函数。因此,只要您不对“this”指针进行操作,事情就会顺利进行。
This fact shows that a "bad" object is NOTthe root cause that results in the SIGSEGV segfault. The assign() method is not virtual in std::string, thus the "bad" std::string object wouldn't cause the segfault. There must be some other reason that finally caused the segfault.
这一事实表明,“坏”对象不是导致 SIGSEGV 段错误的根本原因。assign() 方法在 std::string 中不是虚拟的,因此“坏”的 std::string 对象不会导致段错误。一定有其他原因最终导致了段错误。
I noticed that the segfault comes from the __gnu_cxx::__exchange_and_add() function, so I then looked into its source code in this web page:
我注意到段错误来自 __gnu_cxx::__exchange_and_add() 函数,所以我在这个网页中查看了它的源代码:
00046 static inline _Atomic_word
00047 __exchange_and_add(volatile _Atomic_word* __mem, int __val)
00048 { return __sync_fetch_and_add(__mem, __val); }
The __exchange_and_add() finally calls the __sync_fetch_and_add(). According to this web page, the __sync_fetch_and_add() is a GCC builtin function whose behavior is like this:
__exchange_and_add() 最后调用 __sync_fetch_and_add()。根据此网页, __sync_fetch_and_add() 是 GCC 内置函数,其行为如下所示:
type __sync_fetch_and_add (type *ptr, type value, ...)
{
tmp = *ptr;
*ptr op= value; // Here the "op=" means "+=" as this function is "_and_add".
return tmp;
}
There it is! The passed-in ptr pointer is dereferenced here. In the 08/19/2011 program, the ptr is actually the "this" pointer of the "bad" std::string object within the assign() method. It is the derefenence at this point that actually caused the SIGSEGV segmentation fault.
就在那里!传入的 ptr 指针在这里被取消引用。在 08/19/2011 程序中,ptr 实际上是assign() 方法中“坏”std::string 对象的“this”指针。正是此时的 derefenence 实际上导致了 SIGSEGV 分段错误。
We could test this with the following program:
我们可以使用以下程序进行测试:
#include <bits/atomicity.h>
int main(int argc, char * argv[])
{
__sync_fetch_and_add((_Atomic_word *)0, 10); // Would result in a segfault.
return 0;
}
采纳答案by Employed Russian
There are two likely possibilities:
有两种可能的可能性:
- some code before line 798 has corrupted the local
tmpTimeStamp
object - the return value from
FormatTimeStamp()
was somehow bad.
- 第 798 行之前的一些代码损坏了本地
tmpTimeStamp
对象 - 来自的返回值
FormatTimeStamp()
在某种程度上很糟糕。
The _GLIBCXX_FULLY_DYNAMIC_STRING
is most likely a red herring and has nothing to do with the problem.
这_GLIBCXX_FULLY_DYNAMIC_STRING
很可能是一个红鲱鱼,与问题无关。
If you install debuginfo
package for libstdc++
(I don't know what it's called on CentOS), you'll be able to "see into" that code, and might be able to tell whether the left-hand-side (LHS) or the RHS of the assignment operator caused the problem.
如果您debuginfo
为libstdc++
(我不知道它在 CentOS 上叫什么)安装软件包,您将能够“查看”该代码,并且可能能够分辨出左侧 (LHS) 还是 RHS赋值运算符导致了这个问题。
If that's not possible, you'll have to debug this at the assembly level. Going into frame #2
and doing x/4x $ebp
should give you previous ebp
, caller address (0x081402fc
), LHS (should match &tmpTimeStamp
in frame #3
), and RHS. Go from there, and good luck!
如果这是不可能的,则必须在程序集级别对此进行调试。进入框架#2
和做x/4x $ebp
应该给你上一个ebp
,调用者地址(0x081402fc
),LHS(应该&tmpTimeStamp
在框架中匹配#3
)和RHS。从那里出发,祝你好运!
回答by ks1322
I guess there could be some problem inside FormatTimeStamp
function, but without source code it's hard to say anything. Try to check your program under Valgrind. Usually this helps to fix such sort of bugs.
我想FormatTimeStamp
函数内部可能存在一些问题,但是没有源代码就很难说什么。尝试在 Valgrind 下检查您的程序。通常这有助于修复此类错误。