C语言 如何将混合数据类型(int、float、char 等)存储在数组中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18577404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 07:20:36  来源:igfitidea点击:

How can mixed data types (int, float, char, etc) be stored in an array?

carraysvariantmixed-type

提问by chanzerre

I want to store mixed data types in an array. How could one do that?

我想在数组中存储混合数据类型。怎么能这样呢?

回答by Barmar

You can make the array elements a discriminated union, aka tagged union.

您可以使数组元素成为可区分的联合,又名标记联合

struct {
    enum { is_int, is_float, is_char } type;
    union {
        int ival;
        float fval;
        char cval;
    } val;
} my_array[10];

The typemember is used to hold the choice of which member of the unionis should be used for each array element. So if you want to store an intin the first element, you would do:

type成员用于保存union对每个数组元素应使用哪个成员的选择。因此,如果您想将 an 存储int在第一个元素中,您可以这样做:

my_array[0].type = is_int;
my_array[0].val.ival = 3;

When you want to access an element of the array, you must first check the type, then use the corresponding member of the union. A switchstatement is useful:

当要访问数组的某个元素时,必须先检查类型,然后使用联合的相应成员。一个switch说法是有用的:

switch (my_array[n].type) {
case is_int:
    // Do stuff for integer, using my_array[n].ival
    break;
case is_float:
    // Do stuff for float, using my_array[n].fval
    break;
case is_char:
    // Do stuff for char, using my_array[n].cvar
    break;
default:
    // Report an error, this shouldn't happen
}

It's left up to the programmer to ensure that the typemember always corresponds to the last value stored in the union.

由程序员负责确保该type成员始终对应于存储在union.

回答by chanzerre

Use a union:

使用联合:

union {
    int ival;
    float fval;
    void *pval;
} array[10];

You will have to keep track of the type of each element, though.

但是,您必须跟踪每个元素的类型。

回答by chanzerre

Array elements need to have the same size, that is why it's not possible. You could work around it by creating a variant type:

数组元素需要具有相同的大小,这就是不可能的原因。您可以通过创建变体类型来解决它:

#include <stdio.h>
#define SIZE 3

typedef enum __VarType {
  V_INT,
  V_CHAR,
  V_FLOAT,
} VarType;

typedef struct __Var {
  VarType type;
  union {
    int i;
    char c;
    float f;
  };
} Var;

void var_init_int(Var *v, int i) {
  v->type = V_INT;
  v->i = i;
}

void var_init_char(Var *v, char c) {
  v->type = V_CHAR;
  v->c = c;
}

void var_init_float(Var *v, float f) {
  v->type = V_FLOAT;
  v->f = f;
}

int main(int argc, char **argv) {

  Var v[SIZE];
  int i;

  var_init_int(&v[0], 10);
  var_init_char(&v[1], 'C');
  var_init_float(&v[2], 3.14);

  for( i = 0 ; i < SIZE ; i++ ) {
    switch( v[i].type ) {
      case V_INT  : printf("INT   %d\n", v[i].i); break;
      case V_CHAR : printf("CHAR  %c\n", v[i].c); break;
      case V_FLOAT: printf("FLOAT %f\n", v[i].f); break;
    }
  }

  return 0;
}

The size of the element of the union is the size of the largest element, 4.

联合元素的大小是最大元素的大小,4。

回答by luser droog

There's a different style of defining the tag-union (by whatever name) that IMO make it much nicer to use, by removing the internal union. This is the style used in the X Window System for things like Events.

有一种不同的定义标签联合(无论名称)的风格,IMO通过删除内部联合使其更好。这是在 X Window 系统中用于事件之类的样式。

The example in Barmar's answer gives the name valto the internal union. The example in Sp.'s answer uses an anonymous union to avoid having to specify the .val.every time you access the variant record. Unfortunately "anonymous" internal structs and unions is not available in C89 or C99. It's a compiler extension, and therefore inherently non-portable.

Barmar 的回答中的示例val为内部联合命名。Sp. 的答案中的示例使用匿名联合以避免.val.每次访问变体记录时都必须指定。不幸的是,“匿名”内部结构和联合在 C89 或 C99 中不可用。它是一个编译器扩展,因此本质上是不可移植的。

A better way IMO is to invert the whole definition. Make each data type its own struct, and put the tag (type specifier) into each struct.

IMO 更好的方法是反转整个定义。使每个数据类型成为自己的结构,并将标记(类型说明符)放入每个结构中。

typedef struct {
    int tag;
    int val;
} integer;

typedef struct {
    int tag;
    float val;
} real;

Then you wrap these in a top-level union.

然后将它们包装在顶级联合中。

typedef union {
    int tag;
    integer int_;
    real real_;
} record;

enum types { INVALID, INT, REAL };

Now it may appear that we're repeating ourselves, and we are. But consider that this definition is likely to be isolated to a single file. But we've eliminated the noise of specifiying the intermediate .val.before you get to the data.

现在看来我们是在重复自己,而且我们。但是考虑到这个定义很可能被隔离到单个文件中。但是我们已经消除了.val.在您获取数据之前指定中间体的噪音。

record i;
i.tag = INT;
i.int_.val = 12;

record r;
r.tag = REAL;
r.real_.val = 57.0;

Instead, it goes at the end, where it's less obnoxious. :D

相反,它放在最后,不那么令人讨厌。:D

Another thing this allows is a form of inheritance. Edit: this part is not standard C, but uses a GNU extension.

这允许的另一件事是一种继承形式。编辑:这部分不是标准的 C,而是使用 GNU 扩展。

if (r.tag == INT) {
    integer x = r;
    x.val = 36;
} else if (r.tag == REAL) {
    real x = r;
    x.val = 25.0;
}

integer g = { INT, 100 };
record rg = g;

Up-casting and down-casting.

向上铸造和向下铸造。



Edit:One gotcha to be aware of is if you're constructing one of these with C99 designated initializers. All member initializers should be through the same union member.

编辑:要注意的一个问题是,如果您使用 C99 指定的初始值设定项构建其中之一。所有成员初始值设定项都应该通过同一个联合成员。

record problem = { .tag = INT, .int_.val = 3 };

problem.tag; // may not be initialized

The .taginitializer can be ignored by an optimizing compiler, because the .int_initializer that follows aliasesthe same data area. Even though weknow the layout (!), and it shouldbe ok. No, it ain't. Use the "internal" tag instead (it overlays the outer tag, just like we want, but doesn't confuse the compiler).

.tag初始化可以通过优化编译器被忽略,因为.int_后面初始化别名相同的数据区域。即使我们知道布局 (!),也应该没问题。不,不是。改用“内部”标签(它覆盖外部标签,就像我们想要的那样,但不会混淆编译器)。

record not_a_problem = { .int_.tag = INT, .int_.val = 3 };

not_a_problem.tag; // == INT

回答by dzada

You can do a void *array, with a separated array of size_t.But you lose the information type.
If you need to keep information type in some way keep a third array of int (where the int is an enumerated value) Then code the function that casts depending on the enumvalue.

你可以做一个void *数组,用一个分隔的数组size_t.但是你丢失了信息类型。
如果您需要以某种方式保留信息类型,请保留第三个 int 数组(其中 int 是枚举值)然后编写根据enum值进行转换的函数。

回答by phuclv

Union is the standard way to go. But you have other solutions as well. One of those is tagged pointer, which involves storing more information in the "free"bits of a pointer.

联合是标准的方式。但是您也有其他解决方案。其中之一是标记指针,它涉及在指针的“空闲”位中存储更多信息。

Depending on architectures you can use the low or high bits, but the safest and most portable way is using the unused low bitsby taking the advantage of aligned memory. For example in 32-bit and 64-bit systems, pointers to intmust be multiples of 4(assuming intis a 32-bit type) and the 2 least significant bits must be 0, hence you can use them to store the type of your values. Of course you need to clear the tag bits before dereferencing the pointer. For example if your data type is limited to 4 different types then you can use it like below

根据架构,您可以使用低位或高位,但最安全和最便携的方法是利用对齐内存的优势使用未使用的低位。例如在 32 位和 64 位系统中,指向的指针int必须是 4 的倍数(假设int是 32 位类型)并且 2 个最低有效位必须是 0,因此您可以使用它们来存储值的类型. 当然,您需要在取消引用指针之前清除标记位。例如,如果您的数据类型仅限于 4 种不同类型,那么您可以像下面这样使用它

void* tp; // tagged pointer
enum { is_int, is_double, is_char_p, is_char } type;
// ...
uintptr_t addr = (uintptr_t)tp & ~0x03; // clear the 2 low bits in the pointer
switch ((uintptr_t)tp & 0x03)           // check the tag (2 low bits) for the type
{
case is_int:    // data is int
    printf("%d\n", *((int*)addr));
    break;
case is_double: // data is double
    printf("%f\n", *((double*)addr));
    break;
case is_char_p: // data is char*
    printf("%s\n", (char*)addr);
    break;
case is_char:   // data is char
    printf("%c\n", *((char*)addr));
    break;
}

If you can make sure that the data is 8-byte aligned (like for pointers in 64-bit systems, or long longand uint64_t...), you'll have one more bit for the tag.

如果您可以确保数据是 8 字节对齐的(例如对于 64 位系统中的指针,或者long longuint64_t...),那么您将多一位用于标记。

This has one disadvantage that you'll need more memory if the data have not been stored in a variable elsewhere. Therefore in case the type and range of your data is limited, you can store the values directly in the pointer. This technique has been used in the 32-bit version of Chrome's V8 engine, where it checks the least significant bit of the address to see if that's a pointer to another object(like double, big integers, string or some object) or a 31-bit signed value(called smi- small integer). If it's an int, Chrome simply does an arithmetic right shift 1 bit to get the value, otherwise the pointer is dereferenced.

这有一个缺点,如果数据没有存储在其他地方的变量中,您将需要更多内存。因此,如果您的数据类型和范围有限,您可以将值直接存储在指针中。这种技术已在 32 位版本的Chrome 的 V8 引擎中使用,它检查地址的最低有效位,看它是指向另一个对象指针(如双精度、大整数、字符串或某个对象)还是一个31 -bit 有符号值(称为smi- 小整数)。如果是int,Chrome 只需进行算术右移 1 位来获取值,否则指针将被取消引用。



On most current 64-bit systems the virtual address space is still much narrower than 64 bits, hence the high most significant bits can also be used as tags. Depending on the architecture you have different ways to use those as tags. ARM, 68kand many others can be configured to ignore the top bits, allowing you to use them freely without worrying about segfault or anything. From the linked Wikipedia article above:

在大多数当前的 64 位系统上,虚拟地址空间仍然比 64 位窄得多,因此最高有效位也可以用作标记。根据架构,您有不同的方式将它们用作标签。ARM68k和许多其他的可以配置为忽略最高位,让您可以自由地使用它们而不必担心段错误或任何其他问题。从上面链接的维基百科文章:

A significant example of the use of tagged pointers is the Objective-C runtime on iOS 7 on ARM64, notably used on the iPhone 5S. In iOS 7, virtual addresses are 33 bits (byte-aligned), so word-aligned addresses only use 30 bits (3 least significant bits are 0), leaving 34 bits for tags. Objective-C class pointers are word-aligned, and the tag fields are used for many purposes, such as storing a reference count and whether the object has a destructor.

Early versions of MacOS used tagged addresses called Handles to store references to data objects. The high bits of the address indicated whether the data object was locked, purgeable, and/or originated from a resource file, respectively. This caused compatibility problems when MacOS addressing advanced from 24 bits to 32 bits in System 7.

https://en.wikipedia.org/wiki/Tagged_pointer#Examples

使用标记指针的一个重要例子是 ARM64 上 iOS 7 上的 Objective-C 运行时,特别是在 iPhone 5S 上使用。在 iOS 7 中,虚拟地址是 33 位(字节对齐),因此字对齐地址仅使用 30 位(3 个最低有效位为 0),剩下 34 位用于标记。Objective-C 类指针是字对齐的,标签字段有多种用途,例如存储引用计数以及对象是否具有析构函数。

早期版本的 MacOS 使用称为句柄的标记地址来存储对数据对象的引用。地址的高位分别指示数据对象是否被锁定、可清除和/或源自资源文件。当 MacOS 寻址在 System 7 中从 24 位升级到 32 位时,这会导致兼容性问题。

https://en.wikipedia.org/wiki/Tagged_pointer#Examples

On x86_64 you can still use the high bits as tags with care. Of course you don't need to use all those 16 bits and can leave out some bits for future proof

在 x86_64 上,您仍然可以小心地将高位用作标记。当然,您不需要使用所有这些 16 位,并且可以省略一些位以供将来证明

In prior versions of Mozilla Firefox they also use small integer optimizationslike V8, with the 3 low bits used to store the type(int, string, object... etc.). But since J?gerMonkey they took another path (Mozilla's New JavaScript Value Representation, backup link). The value is now always stored in a 64-bit double precision variable. When the doubleis a normalizedone, it can be used directly in calculations. However if the high 16 bits of it are all 1s, which denote an NaN, the low 32-bits will store the address (in a 32-bit computer) to the value or the value directly, the remaining 16-bits will be used to store the type. This technique is called NaN-boxingor nun-boxing. It's also used in 64-bit WebKit's JavaScriptCore and Mozilla's SpiderMonkey with the pointer being stored in the low 48 bits. If your main data type is floating-point, this is the best solution and delivers very good performance.

在之前版本的 Mozilla Firefox 中,它们也使用像 V8 一样的小整数优化低 3 位用于存储类型(int、string、object...等)。但是自从 J?gerMonkey 他们走了另一条路(Mozilla 的新 JavaScript 值表示备份链接)。该值现在始终存储在 64 位双精度变量中。当double化的时,它可以直接用于计算。但是,如果它的高 16 位都是 1,表示NaN,则低 32 位将地址(在 32 位计算机中)直接存储到值或值,其余 16 位将被使用来存储类型。这种技术称为NaN-boxing或修女拳击。它还用于 64 位 WebKit 的 JavaScriptCore 和 Mozilla 的 SpiderMonkey,指针存储在低 48 位。如果您的主要数据类型是浮点数,这是最好的解决方案,并提供非常好的性能。

Read more about the above techniques: https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations

阅读有关上述技术的更多信息:https: //wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations