C 和 C++ 中联合的目的

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2310483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 22:58:46  来源:igfitidea点击:

Purpose of Unions in C and C++

c++cunionstype-punning

提问by legends2k

I have used unions earlier comfortably; today I was alarmed when I read this postand came to know that this code

早些时候我很舒服地使用了工会;今天,当我读到这篇文章时我很震惊,并知道这段代码

union ARGB
{
    uint32_t colour;

    struct componentsTag
    {
        uint8_t b;
        uint8_t g;
        uint8_t r;
        uint8_t a;
    } components;

} pixel;

pixel.colour = 0xff040201;  // ARGB::colour is the active member from now on

// somewhere down the line, without any edit to pixel

if(pixel.components.a)      // accessing the non-active member ARGB::components

is actually undefined behaviour I.e. reading from a member of the union other than the one recently written to leads to undefined behaviour. If this isn't the intended usage of unions, what is? Can some one please explain it elaborately?

实际上是未定义的行为,即从联合成员读取而不是最近写入的成员导致未定义的行为。如果这不是联合的预期用途,那是什么?有人能详细解释一下吗?

Update:

更新:

I wanted to clarify a few things in hindsight.

我想在事后澄清一些事情。

  • The answer to the question isn't the same for C and C++; my ignorant younger self tagged it as both C and C++.
  • After scouring through C++11's standard I couldn't conclusively say that it calls out accessing/inspecting a non-active union member is undefined/unspecified/implementation-defined. All I could find was §9.5/1:

    If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members. §9.2/19: Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

  • While in C, (C99 TC3 - DR 283onwards) it's legal to do so (thanks to Pascal Cuoqfor bringing this up). However, attempting to do it can still lead to undefined behavior, if the value read happens to be invalid (so called "trap representation") for the type it is read through. Otherwise, the value read is implementation defined.
  • C89/90 called this out under unspecified behavior (Annex J) and K&R's book says it's implementation defined. Quote from K&R:

    This is the purpose of a union - a single variable that can legitimately hold any of one of several types. [...] so long as the usage is consistent: the type retrieved must be the type most recently stored. It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another.

  • Extract from Stroustrup's TC++PL (emphasis mine)

    Use of unions can be essential for compatness of data [...] sometimes misused for "type conversion".

  • 对于 C 和 C++,这个问题的答案是不一样的。我无知的年轻自我将其标记为 C 和 C++。
  • 在浏览了 C++11 的标准之后,我不能得出结论说它调用访问/检查非活动联合成员是未定义/未指定/实现定义的。我能找到的只是§9.5/1:

    如果一个标准布局联合包含多个共享一个公共初始序列的标准布局结构,并且如果这个标准布局联合类型的对象包含标准布局结构之一,则允许检查任何标准布局结构的公共初始序列。标准布局结构成员。§9.2/19:如果相应的成员具有布局兼容的类型,并且两个成员都不是位域,或者两个成员都是一个或多个初始序列的宽度相同的位域,则两个标准布局结构共享一个共同的初始序列成员。

  • 在 C 中,(从C99 TC3 - DR 283开始)这样做是合法的(感谢 Pascal Cuoq提出这一点)。但是,如果读取的值碰巧对于读取的类型无效(所谓的“陷阱表示”),尝试这样做仍然会导致未定义的行为。否则,读取的值是实现定义的。
  • C89/90 在未指定的行为(附件 J)下指出了这一点,K&R 的书说它是实现定义的。来自 K&R 的报价:

    这就是联合的​​目的——一个单一的变量,可以合法地持有几种类型中的任何一种。[...] 只要用法一致:检索到的类型必须是最近存储的类型。跟踪当前存储在联合中的类型是程序员的责任;如果某些内容作为一种类型存储并作为另一种类型提取,则结果取决于实现。

  • 摘自 Stroustrup 的 TC++PL(强调我的)

    联合的使用对于数据的兼容性至关重要 [...]有时被误用于“类型转换”。

Above all, this question (whose title remains unchanged since my ask) was posed with an intention of understanding the purpose of unions AND not on what the standard allowsE.g. Using inheritance for code reuse is, of course, allowed by the C++ standard, but it wasn't the purpose or the original intention of introducing inheritance as a C++ language feature. This is the reason Andrey's answer continues to remain as the accepted one.

最重要的是,提出这个问题(自我提出以来,其标题保持不变)的目的是理解联合的目的,而不是标准允许的内容,例如使用继承进行代码重用当然是 C++ 标准所允许的,但是引入继承作为 C++ 语言特性并不是目的或初衷。这就是安德烈的答案继续被接受的原因。

回答by AnT

The purpose of unions is rather obvious, but for some reason people miss it quite often.

工会的目的相当明显,但由于某些原因,人们经常错过它。

The purpose of union is to save memoryby using the same memory region for storing different objects at different times.That's it.

union 的目的是通过在不同时间使用相同的内存区域来存储不同的对象来节省内存就是这样。

It is like a room in a hotel. Different people live in it for non-overlapping periods of time. These people never meet, and generally don't know anything about each other. By properly managing the time-sharing of the rooms (i.e. by making sure different people don't get assigned to one room at the same time), a relatively small hotel can provide accommodations to a relatively large number of people, which is what hotels are for.

这就像酒店的房间。不同的人在不重叠的时间内生活在其中。这些人从来没有见过面,而且通常对彼此一无所知。通过合理地管理房间的分时(即确保不同的人不会同时被分配到一个房间),一个相对较小的酒店可以为相对较多的人提供住宿,这就是酒店是给。

That's exactly what union does. If you know that several objects in your program hold values with non-overlapping value-lifetimes, then you can "merge" these objects into a union and thus save memory. Just like a hotel room has at most one "active" tenant at each moment of time, a union has at most one "active" member at each moment of program time. Only the "active" member can be read. By writing into other member you switch the "active" status to that other member.

这正是工会所做的。如果您知道程序中的多个对象保存的值具有不重叠的值生命周期,那么您可以将这些对象“合并”为一个联合,从而节省内存。就像酒店房间在每个时刻最多有一个“活跃”租户一样,工会在计划时间的每个时刻最多有一个“活跃”成员。只能读取“活动”成员。通过写入其他成员,您将“活动”状态切换到该其他成员。

For some reason, this original purpose of the union got "overridden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation (aka "type punning") is not a valid use of unions. It generally leads to undefined behavioris described as producing implementation-defined behavior in C89/90.

出于某种原因,工会的这个最初目的被完全不同的东西“覆盖”了:写工会的一个成员,然后通过另一个成员检查它。这种内存重新解释(又名“类型双关语”)不是联合的有效使用。它通常导致未定义的行为在 C89/90 中被描述为产生实现定义的行为。

EDIT:Using unions for the purposes of type punning (i.e. writing one member and then reading another) was given a more detailed definition in one of the Technical Corrigenda to the C99 standard (see DR#257and DR#283). However, keep in mind that formally this does not protect you from running into undefined behavior by attempting to read a trap representation.

编辑:在 C99 标准的技术勘误之一(参见DR#257DR#283)中,为类型双关语(即写入一个成员然后读取另一个)的目的使用联合给出了更详细的定义。但是,请记住,从形式上讲,这并不能防止您通过尝试读取陷阱表示而遇到未定义的行为。

回答by Erich Kitzmueller

You could use unions to create structs like the following, which contains a field that tells us which component of the union is actually used:

您可以使用联合来创建如下所示的结构,其中包含一个字段,告诉我们实际使用联合的哪个组件:

struct VAROBJECT
{
    enum o_t { Int, Double, String } objectType;

    union
    {
        int intValue;
        double dblValue;
        char *strValue;
    } value;
} object;

回答by David Rodríguez - dribeas

The behavior is undefined from the language point of view. Consider that different platforms can have different constraints in memory alignment and endianness. The code in a big endian versus a little endian machine will update the values in the struct differently. Fixing the behavior in the language would require all implementations to use the same endianness (and memory alignment constraints...) limiting use.

从语言的角度来看,行为是未定义的。考虑到不同的平台在内存对齐和字节序方面可能有不同的限制。大端与小端机器中的代码将以不同的方式更新结构中的值。修复语言中的行为将要求所有实现使用相同的字节序(和内存对齐约束...)限制使用。

If you are using C++ (you are using two tags) and you really care about portability, then you can just use the struct and provide a setter that takes the uint32_tand sets the fields appropriately through bitmask operations. The same can be done in C with a function.

如果您正在使用 C++(您正在使用两个标签)并且您真的关心可移植性,那么您可以只使用结构并提供一个设置器,该uint32_t设置器通过位掩码操作适当地设置和设置字段。在 C 中使用函数也可以做到这一点。

Edit: I was expecting AProgrammer to write down an answer to vote and close this one. As some comments have pointed out, endianness is dealt in other parts of the standard by letting each implementation decide what to do, and alignment and padding can also be handled differently. Now, the strict aliasing rules that AProgrammer implicitly refers to are a important point here. The compiler is allowed to make assumptions on the modification (or lack of modification) of variables. In the case of the union, the compiler could reorder instructions and move the read of each color component over the write to the colour variable.

编辑:我期待 AProgrammer 写下投票的答案并关闭这个答案。正如一些评论指出的那样,标准的其他部分通过让每个实现决定做什么来处理字节序,并且对齐和填充也可以以不同的方式处理。现在,AProgrammer 隐式引用的严格别名规则是这里的重点。允许编译器对变量的修改(或未修改)做出假设。在联合的情况下,编译器可以重新排序指令并将每个颜色分量的读取移动到写入颜色变量。

回答by bobobobo

The most commonuse of unionI regularly come across is aliasing.

我经常遇到的最常见的用法unionaliasing

Consider the following:

考虑以下:

union Vector3f
{
  struct{ float x,y,z ; } ;
  float elts[3];
}

What does this do? It allows clean, neat access of a Vector3f vec;'s members by eithername:

这有什么作用?它允许Vector3f vec;任一名称干净、整洁地访问 a的成员:

vec.x=vec.y=vec.z=1.f ;

or by integer access into the array

或通过整数访问数组

for( int i = 0 ; i < 3 ; i++ )
  vec.elts[i]=1.f;

In some cases, accessing by name is the clearest thing you can do. In other cases, especially when the axis is chosen programmatically, the easier thing to do is to access the axis by numerical index - 0 for x, 1 for y, and 2 for z.

在某些情况下,按名称访问是您能做的最清楚的事情。在其他情况下,尤其是在以编程方式选择轴时,更简单的做法是通过数字索引访问轴 - x 为 0,y 为 1,z 为 2。

回答by bobobobo

As you say, this is strictly undefined behaviour, though it will "work" on many platforms. The real reason for using unions is to create variant records.

正如您所说,这是严格未定义的行为,尽管它可以在许多平台上“工作”。使用联合的真正原因是创建变体记录。

union A {
   int i;
   double d;
};

A a[10];    // records in "a" can be either ints or doubles 
a[0].i = 42;
a[1].d = 1.23;

Of course, you also need some sort of discriminator to say what the variant actually contains. And note that in C++ unions are not much use because they can only contain POD types - effectively those without constructors and destructors.

当然,您还需要某种鉴别器来说明变体实际包含的内容。请注意,在 C++ 中联合并没有多大用处,因为它们只能包含 POD 类型——实际上是那些没有构造函数和析构函数的类型。

回答by Totonga

In C it was a nice way to implement something like an variant.

在 C 中,这是实现诸如变体之类的东西的好方法。

enum possibleTypes{
  eInt,
  eDouble,
  eChar
}


struct Value{

    union Value {
      int iVal_;
      double dval;
      char cVal;
    } value_;
    possibleTypes discriminator_;
} 

switch(val.discriminator_)
{
  case eInt: val.value_.iVal_; break;

In times of litlle memory this structure is using less memory than a struct that has all the member.

在内存不足的情况下,此结构使用的内存少于具有所有成员的结构。

By the way C provides

顺便说一下C提供

    typedef struct {
      unsigned int mantissa_low:32;      //mantissa
      unsigned int mantissa_high:20;
      unsigned int exponent:11;         //exponent
      unsigned int sign:1;
    } realVal;

to access bit values.

访问位值。

回答by Matthieu M.

In C++, Boost Variantimplement a safe version of the union, designed to prevent undefined behavior as much as possible.

在 C++ 中,Boost Variant实现了一个安全版本的联合,旨在尽可能地防止未定义的行为。

Its performances are identical to the enum + unionconstruct (stack allocated too etc) but it uses a template list of types instead of the enum:)

它的性能与enum + union构造相同(堆栈分配等),但它使用类型的模板列表而不是enum:)

回答by Paul R

Although this is strictly undefined behaviour, in practice it will work with pretty much any compiler. It is such a widely used paradigm that any self-respecting compiler will need to do "the right thing" in cases such as this. It's certainly to be preferred over type-punning, which may well generate broken code with some compilers.

尽管这是严格未定义的行为,但实际上它几乎适用于任何编译器。它是如此广泛使用的范例,以至于在这种情况下,任何有自尊的编译器都需要做“正确的事情”。它肯定比类型双关更受欢迎,因为类型双关很可能会在某些编译器中生成损坏的代码。

回答by Nick

The behaviour may be undefined, but that just means there isn't a "standard". All decent compilers offer #pragmasto control packing and alignment, but may have different defaults. The defaults will also change depending on the optimisation settings used.

行为可能未定义,但这仅意味着没有“标准”。所有体面的编译器都提供#pragmas来控制打包和对齐,但可能有不同的默认值。默认值也将根据使用的优化设置而改变。

Also, unions are not justfor saving space. They can help modern compilers with type punning. If you reinterpret_cast<>everything the compiler can't make assumptions about what you are doing. It may have to throw away what it knows about your type and start again (forcing a write back to memory, which is very inefficient these days compared to CPU clock speed).

此外,工会不仅仅是为了节省空间。它们可以帮助现代编译器进行类型双关。如果你的reinterpret_cast<>一切编译器不能对你在做什么做出假设。它可能不得不放弃它对您的类型的了解并重新开始(强制写回内存,与 CPU 时钟速度相比,现在效率非常低)。

回答by supercat

In the C language as it was documented in 1974, all structure members shared a common namespace, and the meaning of "ptr->member" was definedas adding the member's displacement to "ptr" and accessing the resulting address using the member's type. This design made it possible to use the same ptr with member names taken from different structure definitions but with the same offset; programmers used that ability for a variety of purposes.

在 1974 年记录的 C 语言中,所有结构成员共享一个公共命名空间,“ptr->member”的含义被定义为将成员的位移添加到“ptr”并使用成员的类型访问结果地址。这种设计使得可以使用相同的 ptr 和来自不同结构定义但具有相同偏移量的成员名称;程序员将这种能力用于各种目的。

When structure members were assigned their own namespaces, it became impossible to declare two structure members with the same displacement. Adding unions to the language made it possible to achieve the same semantics that had been available in earlier versions of the language (though the inability to have names exported to an enclosing context may have still necessitated using a find/replace to replace foo->member into foo->type1.member). What was important was not so much that the people who added unions have any particular target usage in mind, but rather that they provide a means by which programmers who had relied upon the earlier semantics, for whatever purpose, should still be able to achieve the same semantics even if they had to use a different syntax to do it.

当结构成员被分配了自己的命名空间时,就不可能声明两个具有相同位移的结构成员。向语言中添加联合可以实现与该语言早期版本中可用的相同语义(尽管无法将名称导出到封闭上下文可能仍然需要使用查找/替换来替换 foo->member进入 foo->type1.member)。重要的不是添加联合的人有任何特定的目标用法,而是它们提供了一种方法,使依赖早期语义的程序员无论出于何种目的仍然能够实现相同的语义,即使他们必须使用不同的语法来做到这一点。