C++ 程序员应该知道哪些常见的未定义行为?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/367633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What are all the common undefined behaviours that a C++ programmer should know about?
提问by yesraaj
What are all the common undefined behaviours that a C++ programmer should know about?
C++ 程序员应该知道哪些常见的未定义行为?
Say, like:
说,比如:
a[i] = i++;
回答by Diomidis Spinellis
Pointer
指针
- Dereferencing a
NULL
pointer - Dereferencing a pointer returned by a "new" allocation of size zero
- Using pointers to objects whose lifetime has ended (for instance, stack allocated objects or deleted objects)
- Dereferencing a pointer that has not yet been definitely initialized
- Performing pointer arithmetic that yields a result outside the boundaries (either above or below) of an array.
- Dereferencing the pointer at a location beyond the end of an array.
- Converting pointers to objects of incompatible types
- Using
memcpy
to copy overlapping buffers.
- 取消引用
NULL
指针 - 取消引用大小为零的“新”分配返回的指针
- 使用指向生命周期已结束的对象的指针(例如,堆栈分配的对象或删除的对象)
- 取消引用尚未明确初始化的指针
- 执行指针算术,产生超出数组边界(上方或下方)的结果。
- 在超出数组末尾的位置取消引用指针。
- 将指针转换为不兼容类型的对象
- 使用
memcpy
复制重叠的缓冲区。
Buffer overflows
缓冲区溢出
- Reading or writing to an object or array at an offset that is negative, or beyond the size of that object (stack/heap overflow)
- 以负偏移量或超出该对象的大小(堆栈/堆溢出)读取或写入对象或数组
Integer Overflows
整数溢出
- Signed integer overflow
- Evaluating an expression that is not mathematically defined
- Left-shifting values by a negative amount (right shifts by negative amounts are implementation defined)
- Shifting values by an amount greater than or equal to the number of bits in the number (e.g.
int64_t i = 1; i <<= 72
is undefined)
- 有符号整数溢出
- 计算未在数学上定义的表达式
- 将值左移负值(负值右移是实现定义的)
- 将值移位大于或等于数字中的位数(例如
int64_t i = 1; i <<= 72
未定义)
Types, Cast and Const
类型、强制转换和常量
- Casting a numeric value into a value that can't be represented by the target type (either directly or via static_cast)
- Using an automatic variable before it has been definitely assigned (e.g.,
int i; i++; cout << i;
) - Using the value of any object of type other than
volatile
orsig_atomic_t
at the receipt of a signal - Attempting to modify a string literal or any other const object during its lifetime
- Concatenating a narrow with a wide string literal during preprocessing
- 将数值转换为目标类型无法表示的值(直接或通过 static_cast)
- 在明确分配之前使用自动变量(例如,
int i; i++; cout << i;
) - 使用任何类型对象的值而不是
volatile
或sig_atomic_t
在接收信号时 - 尝试在其生命周期内修改字符串文字或任何其他 const 对象
- 在预处理期间将窄字符串与宽字符串文字连接起来
Function and Template
功能和模板
- Not returning a value from a value-returning function (directly or by flowing off from a try-block)
- Multiple different definitions for the same entity (class, template, enumeration, inline function, static member function, etc.)
- Infinite recursion in the instantiation of templates
- Calling a function using different parameters or linkage to the parameters and linkage that the function is defined as using.
- 不从返回值的函数返回值(直接或通过从 try 块流出)
- 同一实体的多个不同定义(类、模板、枚举、内联函数、静态成员函数等)
- 模板实例化中的无限递归
- 使用不同的参数调用函数或链接到函数定义为使用的参数和链接。
OOP
面向对象编程
- Cascading destructions of objects with static storage duration
- The result of assigning to partially overlapping objects
- Recursively re-entering a function during the initialization of its static objects
- Making virtual function calls to pure virtual functions of an object from its constructor or destructor
- Referring to nonstatic members of objects that have not been constructed or have already been destructed
- 具有静态存储持续时间的对象的级联销毁
- 分配给部分重叠对象的结果
- 在其静态对象的初始化期间递归地重新进入一个函数
- 从对象的构造函数或析构函数对对象的纯虚函数进行虚函数调用
- 引用尚未构造或已被破坏的对象的非静态成员
Source file and Preprocessing
源文件和预处理
- A non-empty source file that doesn't end with a newline, or ends with a backslash (prior to C++11)
- A backslash followed by a character that is not part of the specified escape codes in a character or string constant (this is implementation-defined in C++11).
- Exceeding implementation limits (number of nested blocks, number of functions in a program, available stack space ...)
- Preprocessor numeric values that can't be represented by a
long int
- Preprocessing directive on the left side of a function-like macro definition
- Dynamically generating the defined token in a
#if
expression
- 不以换行符结尾或以反斜杠结尾的非空源文件(C++11 之前)
- 反斜杠后跟一个不属于字符或字符串常量中指定转义码的字符(这是在 C++11 中实现定义的)。
- 超出实现限制(嵌套块的数量、程序中的函数数量、可用的堆栈空间......)
- 不能用 a 表示的预处理器数值
long int
- 类函数宏定义左侧的预处理指令
- 在
#if
表达式中动态生成定义的标记
To be classified
待分类
- Calling exit during the destruction of a program with static storage duration
- 在具有静态存储持续时间的程序销毁期间调用 exit
回答by Martin York
The order that function parameters are evaluated is unspecifiedbehavior. (This won't make your program crash, explode, or order pizza... unlike undefinedbehavior.)
评估函数参数的顺序是未指定的行为。(这不会使您的程序崩溃、爆炸或订购披萨……与未定义行为不同。)
The only requirement is that all parameters must be fully evaluated before the function is called.
唯一的要求是在调用函数之前必须完全评估所有参数。
This:
这个:
// The simple obvious one.
callFunc(getA(),getB());
Can be equivalent to this:
可以等价于:
int a = getA();
int b = getB();
callFunc(a,b);
Or this:
或这个:
int b = getB();
int a = getA();
callFunc(a,b);
It can be either; it's up to the compiler. The result can matter, depending on the side effects.
它可以是;这取决于编译器。结果可能很重要,具体取决于副作用。
回答by Martin York
The compiler is free to re-order the evaluation parts of an expression (assuming the meaning is unchanged).
编译器可以自由地对表达式的求值部分重新排序(假设含义不变)。
From the original question:
从原来的问题:
a[i] = i++;
// This expression has three parts:
(a) a[i]
(b) i++
(c) Assign (b) to (a)
// (c) is guaranteed to happen after (a) and (b)
// But (a) and (b) can be done in either order.
// See n2521 Section 5.17
// (b) increments i but returns the original value.
// See n2521 Section 5.2.6
// Thus this expression can be written as:
int rhs = i++;
int lhs& = a[i];
lhs = rhs;
// or
int lhs& = a[i];
int rhs = i++;
lhs = rhs;
Double Checked locking. And one easy mistake to make.
双重检查锁定。还有一个容易犯的错误。
A* a = new A("plop");
// Looks simple enough.
// But this can be split into three parts.
(a) allocate Memory
(b) Call constructor
(c) Assign value to 'a'
// No problem here:
// The compiler is allowed to do this:
(a) allocate Memory
(c) Assign value to 'a'
(b) Call constructor.
// This is because the whole thing is between two sequence points.
// So what is the big deal.
// Simple Double checked lock. (I know there are many other problems with this).
if (a == null) // (Point B)
{
Lock lock(mutex);
if (a == null)
{
a = new A("Plop"); // (Point A).
}
}
a->doStuff();
// Think of this situation.
// Thread 1: Reaches point A. Executes (a)(c)
// Thread 1: Is about to do (b) and gets unscheduled.
// Thread 2: Reaches point B. It can now skip the if block
// Remember (c) has been done thus 'a' is not NULL.
// But the memory has not been initialized.
// Thread 2 now executes doStuff() on an uninitialized variable.
// The solution to this problem is to move the assignment of 'a'
// To the other side of the sequence point.
if (a == null) // (Point B)
{
Lock lock(mutex);
if (a == null)
{
A* tmp = new A("Plop"); // (Point A).
a = tmp;
}
}
a->doStuff();
// Of course there are still other problems because of C++ support for
// threads. But hopefully these are addresses in the next standard.
回答by yesraaj
Assigning to a constant after stripping const
ness using const_cast<>
:
const
使用const_cast<>
以下方法在剥离后分配给常量:
const int i = 10;
int *p = const_cast<int*>( &i );
*p = 1234; //Undefined
回答by Daniel Earwicker
My favourite is "Infinite recursion in the instantiation of templates" because I believe it's the only one where the undefined behaviour occurs at compile time.
我最喜欢的是“模板实例化中的无限递归”,因为我相信它是唯一一个在编译时发生未定义行为的地方。
回答by Constantin
Besides undefined behaviour, there is also the equally nasty implementation-defined behaviour.
除了未定义的行为,还有同样令人讨厌的实现定义的行为。
Undefined behaviour occurs when a program does something the result of which is not specified by the standard.
当程序执行标准未指定的结果时,会发生未定义的行为。
Implementation-defined behaviour is an action by a program the result of which is not defined by the standard, but which the implementation is required to document. An example is "Multibyte character literals", from Stack Overflow question Is there a C compiler that fails to compile this?.
实现定义的行为是程序的行为,其结果未由标准定义,但实现需要记录。一个例子是“多字节字符文字”,来自堆栈溢出问题是否有无法编译这个的 C 编译器?.
Implementation-defined behaviour only bites you when you start porting (but upgrading to new version of compiler is also porting!)
实现定义的行为只会在您开始移植时咬你(但升级到新版本的编译器也是移植!)
回答by Martin York
Variables may only be updated once in an expression (technically once between sequence points).
变量只能在表达式中更新一次(技术上在序列点之间更新一次)。
int i =1;
i = ++i;
// Undefined. Assignment to 'i' twice in the same expression.
回答by RandomNickName42
A basic understanding of the various environmental limits. The full list is in section 5.2.4.1 of the C specification. Here are a few;
对各种环境限制的基本了解。完整列表在 C 规范的第 5.2.4.1 节中。这里有一些;
- 127 parameters in one function de?nition
- 127 arguments in one function call
- 127 parameters in one macro de?nition
- 127 arguments in one macro invocation
- 4095 characters in a logical source line
- 4095 characters in a character string literal or wide string literal (after concatenation)
- 65535 bytes in an object (in a hosted environment only)
- 15nesting levels for #included?les
- 1023 case labels for a switch statement (excluding those for anynested switch statements)
- 127 个参数在一个函数定义中
- 一个函数调用中有 127 个参数
- 一个宏定义中包含 127 个参数
- 一次宏调用中有 127 个参数
- 逻辑源代码行中的 4095 个字符
- 字符串文字或宽字符串文字中的 4095 个字符(连接后)
- 对象中的 65535 字节(仅在托管环境中)
- #included?les 的 15 个嵌套级别
- switch 语句的 1023 个 case 标签(不包括任何嵌套 switch 语句的标签)
I was actually a bit surprised at the limit of 1023 case labels for a switch statement, I can forsee that being exceeded for generated code/lex/parsers fairly easially.
实际上,我对 switch 语句的 1023 个 case 标签的限制感到有些惊讶,我可以很容易地预见到,生成的代码/lex/解析器超过了这个限制。
If these limits are exceeded, you have undefined behavior (crashes, security flaws, etc...).
如果超过了这些限制,您就会有未定义的行为(崩溃、安全漏洞等)。
Right, I know this is from the C specification, but C++ shares these basic supports.
是的,我知道这是来自 C 规范,但 C++ 共享这些基本支持。
回答by John Dibling
Using memcpy
to copy between overlapping memory regions. For example:
使用memcpy
重叠的内存区域之间进行复制。例如:
char a[256] = {};
memcpy(a, a, sizeof(a));
The behavior is undefined according to the C Standard, which is subsumed by the C++03 Standard.
根据 C 标准,该行为未定义,该标准包含在 C++03 标准中。
7.21.2.1 The memcpy function
7.21.2.1 memcpy 函数
Synopsis
1/ #include void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
Description
2/ The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. Returns 3 The memcpy function returns the value of s1.
概要
1/ #include void *memcpy(void *restrict s1,const void *restrict s2, size_t n);
描述
2/ memcpy函数将s2指向的对象中的n个字符复制到s1指向的对象中。如果复制发生在重叠的对象之间,则行为未定义。返回 3 memcpy 函数返回 s1 的值。
7.21.2.2 The memmove function
7.21.2.2 memmove 函数
Synopsis
1 #include void *memmove(void *s1, const void *s2, size_t n);
Description
2 The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1. Returns
3 The memmove function returns the value of s1.
概要
1 #include void *memmove(void *s1, const void *s2, size_t n);
描述
2 memmove 函数将 n 个字符从 s2 指向的对象复制到 s1 指向的对象中。复制的发生就好像 s2 指向的对象中的 n 个字符首先被复制到一个与 s1 和 s2 指向的对象不重叠的 n 个字符的临时数组中,然后将临时数组中的 n 个字符复制到s1 指向的对象。退货
3 memmove 函数返回 s1 的值。
回答by JaredPar
The only type for which C++ guarantees a size is char
. And the size is 1. The size of all other types is platform dependent.
C++ 保证大小的唯一类型是char
. 大小为 1。所有其他类型的大小取决于平台。