C++ 沿 4 字节边界对齐

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1237963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 19:16:39  来源:igfitidea点击:

Alignment along 4-byte boundaries

c++cpualignmentinternals

提问by Tony the Pony

I recently got thinking about alignment... It's something that we don't ordinarily have to consider, but I've realized that some processors require objects to be aligned along 4-byte boundaries. What exactly does this mean, and which specific systems have alignment requirements?

我最近开始考虑对齐……这是我们通常不必考虑的事情,但我意识到有些处理器要求对象沿 4 字节边界对齐。这究竟是什么意思,哪些特定系统有对齐要求?

Suppose I have an arbitrary pointer:

假设我有一个任意指针:

unsigned char* ptr

unsigned char* ptr

Now, I'm trying to retrieve a double value from a memory location:

现在,我正在尝试从内存位置检索双精度值:

double d = **((double*)ptr);

double d = **((double*)ptr);

Is this going to cause problems?

这会导致问题吗?

回答by laalto

It can definitely cause problems on some systems.

它肯定会导致某些系统出现问题。

For example, on ARM-based systems you cannot address a 32-bit word that is not aligned to a 4-byte boundary. Doing so will result in an access violation exception. On x86 you can access such non-aligned data, though the performance suffers a little since two words have to fetched from memory instead of just one.

例如,在基于 ARM 的系统上,您无法寻址未与 4 字节边界对齐的 32 位字。这样做会导致访问冲突异常。在 x86 上,您可以访问此类未对齐的数据,但性能会受到一点影响,因为必须从内存中获取两个字而不是一个。

回答by Tamas Czinege

Here's what the Intel x86/x64 Reference Manualsays about alignments:

以下是Intel x86/x64 参考手册关于对齐的说明:

4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords

Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. The natural boundaries for words, double words, and quadwords are even-numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively. However, to improve the performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. The reason for this is that the processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.

Some instructions that operate on double quadwords require memory operands to be aligned on a natural boundary. These instructions generate a general-protection exception (#GP) if an unaligned operand is specified. A natural boundary for a double quadword is any address evenly divisible by 16. Other instructions that operate on double quadwords permit unaligned access (without generating a general-protection exception). However, additional memory bus cycles are required to access unaligned data from memory.

4.1.1 字、双字、四字、双四字的对齐

字、双字和四字不需要在内存中自然边界对齐。字、双字和四字的自然边界分别是偶数地址、可被四整除的地址和可被八整除的地址。然而,为了提高程序的性能,数据结构(尤其是堆栈)应该尽可能在自然边界上对齐。这样做的原因是处理器需要两次内存访问才能进行未对齐的内存访问;对齐访问只需要一次内存访问。跨越 4 字节边界的字或双字操作数或跨越 8 字节边界的四字操作数被认为是未对齐的,需要两个单独的内存总线周期进行访问。

一些对双四字进行操作的指令要求内存操作数在自然边界上对齐。如果指定了未对齐的操作数,这些指令会生成通用保护异常 (#GP)。双四字的自然边界是任何可被 16 整除的地址。对双四字进行操作的其他指令允许未对齐访问(不会产生一般保护异常)。但是,需要额外的内存总线周期来访问内存中未对齐的数据。

Don't forget, reference manuals are the ultimate source of information of the responsible developer and engineer, so if you're dealing with something well documented such as Intel CPUs, just look up what the reference manual says about the issue.

不要忘记,参考手册是负责的开发人员和工程师的最终信息来源,因此如果您正在处理诸如 Intel CPU 之类的文档齐全的内容,只需查看参考手册中有关该问题的内容。

回答by jalf

Yes, that can cause a number of problems. The C++ standard doesn't actually guarantee that it'll work. You can't just arbitrarily cast between pointer types.

是的,这会导致许多问题。C++ 标准实际上并不保证它会工作。您不能随意在指针类型之间进行转换。

When you cast a char pointer to a double pointer, it uses a reinterpret_cast, which applies an implementation-definedmapping. You're not guaranteed that the resulting pointer will contain the same bit pattern, or that it will point to the same address or, well, anything else. In more practical terms, you're also not guaranteed that the value you're reading is aligned properly. If the data was written as a series of chars, then they will use char's alignment requirements.

当您将 char 指针转换为双指针时,它使用 a reinterpret_cast,它应用实现定义的映射。你不能保证结果指针将包含相同的位模式,或者它会指向相同的地址,或者其他任何东西。在更实际的情况下,您也不能保证您正在阅读的值正确对齐。如果数据被写成一系列字符,那么它们将使用字符的对齐要求。

As for what alignment means, essentially just that the starting address of the value should be divisible by the alignment size. Address 16 is aligned on 1, 2, 4, 8 and 16-byte boundaries, for example, so on typical CPU's, values of these sizes can be stored there.

至于对齐是什么意思,本质上只是值的起始地址应该可以被对齐大小整除。例如,地址 16 在 1、2、4、8 和 16 字节边界上对齐,因此在典型的 CPU 上,这些大小的值可以存储在那里。

Address 6 isn't aligned on a 4-byte boundary, so we should not store 4-byte values there.

地址 6 未在 4 字节边界上对齐,因此我们不应在那里存储 4 字节值。

It's worth noting that even on CPU's that don't enforce or require alignment, you typically still get a significant slowdown from accessing unaligned values.

值得注意的是,即使在不强制或不需要对齐的 CPU 上,访问未对齐的值通常仍然会显着减慢。

回答by Martin Liversage

Alignment affects the layout of structs. Consider this struct:

对齐会影响结构的布局。考虑这个结构:

struct S {
  char a;
  long b;
};

On a 32-bit CPU the layout of this struct will often be:

在 32 位 CPU 上,此结构的布局通常是:

a _ _ _ b b b b

The requirement is that a 32-bit value has to be aligned on a 32-bit boundary. If the struct is changed like this:

要求是 32 位值必须在 32 位边界上对齐。如果结构像这样改变:

struct S {
  char a;
  short b;
  long c;
};

the layout will be this:

布局将是这样的:

a _ b b c c c c

The 16-bit value is aligned on a 16-bit boundary.

16 位值在 16 位边界上对齐。

Sometimes you want to packthe structs perhaps if you want to match the struct with a data format. By using a compiler option or perhaps a #pragmayou are able to remove the excess space:

有时,如果您想将结构与数据格式相匹配,您可能想要打包结构。通过使用编译器选项或 a#pragma您可以删除多余的空间:

a b b b b
a b b c c c c

However, accessing an unaligned member of a packed struct will often be much slower on modern CPU's, or may even result in an exception.

然而,在现代 CPU 上访问一个打包结构的未对齐成员通常会慢得多,甚至可能导致异常。

回答by Steve Jessop

Yes, that could cause problems.

是的,这可能会导致问题。

4-alignment simply means that the pointer, when considered as a numeric address, is a multiple of 4. If the pointer is not a multiple of the required alignment, then it is unaligned. There are two reasons why compilers place alignment restrictions on certain types:

4-alignment 只是意味着当指针被视为数字地址时,它是 4 的倍数。如果指针不是所需对齐的倍数,则它是未对齐的。编译器对某些类型设置对齐限制的原因有两个:

  1. Because the hardware cannot load that datatype from an unaligned pointer (at least, not using the instructions which the compiler wants to emit for loads and stores).
  2. Because the hardware loads that datatype more quickly from aligned pointers.
  1. 因为硬件无法从未对齐的指针加载该数据类型(至少,不使用编译器想要为加载和存储发出的指令)。
  2. 因为硬件可以更快地从对齐的指针加载该数据类型。

If you're in case (1), and double is 4-aligned, and you try your code with a char *pointer which is not 4-aligned, then you'll most likely get a hardware trap. Some hardware does not trap. It just loads a nonsense value and continues. However, the C++ standard doesn't define what can happen (undefined behavior), so this code could set your computer on fire.

如果您在情况 (1) 中,并且 double 是 4 对齐的,并且您尝试使用char *不是 4 对齐的指针的代码,那么您很可能会遇到硬件陷阱。一些硬件不陷阱。它只是加载一个无意义的值并继续。但是,C++ 标准没有定义会发生什么(未定义的行为),因此此代码可能会让您的计算机着火。

On x86, you're never in case (1), because the standard load instructions can handle unaligned pointers. On ARM, there are no unaligned loads, and if you attempt one then your program crashes (if you're lucky. Some ARMs silently fail).

在 x86 上,您永远不会遇到情况 (1),因为标准加载指令可以处理未对齐的指针。在 ARM 上,没有未对齐的加载,如果您尝试加载,那么您的程序会崩溃(如果幸运的话。有些 ARM 会默默地失败)。

Coming back to your example, the question is why you're trying this with a char *that isn't 4-aligned. If you successfully wrote a double there via a double *, then you'll be able to read it back. So if you originally had a "proper" pointer to double, which you cast to char *and you're now casting back, you don't have to worry about alignment.

回到你的例子,问题是你为什么用一个char *不是 4-aligned来尝试这个。如果您通过 a 成功地在那里写了一个 double double *,那么您将能够读回它。因此,如果您最初有一个指向 double 的“正确”指针,您将其char *强制转换为该指针,而现在又要返回,则不必担心对齐问题。

But you said arbitrary char *, so I guess that's not what you have. If you read a chunk of data out of a file, which contains a serialized double, then you mustensure that that the alignment requirements for your platform are met in order to do this cast. If you have 8 bytes representing a double in some file format, then you cannot just read it willy-nilly into a char* buffer at any offset and then cast to double *.

但是你说的是任意的char *,所以我想那不是你所拥有的。如果从包含序列化双精度值的文件中读取大量数据,则必须确保满足平台的对齐要求才能执行此转换。如果您有 8 个字节代表某种文件格式的双精度值,那么您不能随意地将它读入任何偏移量的 char* 缓冲区,然后转换为double *.

The easiest way to do this is to make sure that you read the file data into a suitable struct. You're also helped by the fact that memory allocations are always aligned to the maximum alignment requirement of any type they're big enough to contain. So if you allocate a buffer big enough to contain a double, then the start of that buffer has whatever alignment is required by double. So then you can read the 8 bytes representing the double into the start of the buffer, cast (or use a union) and read the double out.

最简单的方法是确保将文件数据读入合适的结构中。内存分配始终与它们大到足以包含的任何类型的最大对齐要求对齐这一事实也对您有所帮助。因此,如果您分配的缓冲区足够大以包含双精度数,则该缓冲区的开头具有双精度数所需的任何对齐方式。因此,您可以将表示双精度的 8 个字节读入缓冲区的开头,转换(或使用联合)并读出双精度。

Alternatively, you could do something like this:

或者,您可以执行以下操作:

double readUnalignedDouble(char *un_ptr) {
    double d;
    // either of these
    std::memcpy(&d, un_ptr, sizeof(d));
    std::copy(un_ptr, un_ptr + sizeof(d), reinterpret_cast<char *>(&d));
    return d;
}

This is guaranteed to be valid (assuming un_ptr really points to the bytes of a valid double representation for your platform), because double is POD and hence can be copied byte-by-byte. It may not be the fastest solution, if you have a lot of doubles to load.

这保证是有效的(假设 un_ptr 确实指向您平台的有效 double 表示的字节),因为 double 是 POD,因此可以逐字节复制。如果您要加载很多双打,它可能不是最快的解决方案。

If you are reading from a file, there's actually a bit more to it than that if you're worried about platforms with non-IEEE double representations, or with 9 bit bytes, or some other unusual properties, where there might be non-value bits in the stored representation of a double. But you didn't actually ask about files, I just made it up as an example, and in any case those platforms are much rarer than the issue you're asking about, which is for double to have an alignment requirement.

如果您正在从文件中读取数据,那么如果您担心具有非 IEEE 双重表示或 9 位字节或其他一些不寻常属性的平台,其中可能会有非值double 的存储表示中的位。但是您实际上并没有询问文件,我只是将其作为一个例子,无论如何,这些平台比您询问的问题要少得多,后者是 double 有对齐要求。

Finally, nothing at all to do with alignment, you also have strict aliasing to worry about if you got that char *via a cast from a pointer which is not alias-compatible with double *. Aliasing is valid between char *itself and anything else, though.

最后,与对齐完全无关,您还需要担心是否char *通过从与double *. 但是,别名在char *它自己和其他任何东西之间是有效的。

回答by pngaz

On the x86 it's always going to run, of course more efficiently when aligned.

But if you're MULTITHREADING then watch for read-write-tearing. With a 64-bit value you need an x64 machine to give you atomic read-and-write between threads.
If say you read the value from another thread when it's say incrementing between 0x00000000.FFFFFFFF and 0x00000001.00000000, then another thread might in theory read say either 0 or 1FFFFFFFF, especially IF SAY the value STRADDLED A CACHE-LINE boundary.
I recommend Duffy's "Concurrent Programming on Windows" for its nice discussion of memory models, even mentioning alignment gotchas on multiprocessors when dot-net does a GC. You want to stay away from the Itanium !

在 x86 上它总是会运行,当然在对齐时会更有效。

但是,如果您是多线程,那么请注意读写撕裂。对于 64 位值,您需要一台 x64 机器来为您提供线程之间的原子读写。
如果说你从另一个线程读取值,当它说在 0x00000000.FFFFFFFF 和 0x00000001.00000000 之间递增时,那么理论上另一个线程可能会读到 0 或 1FFFFFFFF,尤其是如果说值 STRADDLED A CACHE-LINE 边界。
我推荐 Duffy 的“Windows 上的并发编程”,因为它对内存模型进行了很好的讨论,甚至提到了当 dot-net 执行 GC 时多处理器上的对齐问题。你想远离安腾!

回答by JDonner

SPARC (Solaris machines) is another architecture (at least some in times past) that will choke (give a SIGBUS error) if you try to use an unaligned value.

SPARC(Solaris 机器)是另一种架构(至少在过去的某些时候),如果您尝试使用未对齐的值,它会阻塞(给出 SIGBUS 错误)。

An addendum to Martin York, malloc also is aligned to the largest possible type, ie it's safe for everything, like 'new'. In fact, frequently 'new' just uses malloc.

作为 Martin York 的补充,malloc 也与尽可能大的类型保持一致,即它对所有东西都是安全的,比如“new”。事实上,'new' 经常只使用 malloc。

回答by zebrabox

Enforced memory alignment is much more common in RISCbased architectures such as MIPS.
The main thinking for these types of processors, AFAIK, is really a speed issue.
RISC methodology was all about having a set of simple and fast instructions ( usually one memory cycle per instruction ). This does not mean necessarily that it has less instructions than a CISC processor, more that it has simpler, faster instructions.
Many MIPS processors, although 8 byte addressable would be word aligned ( 32-bits typically but not always) then mask off the appropriate bits.
The idea being that this is faster to do an aligned load + bit mask than than trying to do an unaligned load. Typically ( and of course this really depends on chipset ), doing an un-aligned load would generate a bus error so RISC processors would offer an 'unaligned load/store' instruction but this would often be much slower than the corresponding aligned load/store.

强制内存对齐在基于RISC的体系结构(如 MIPS)中更为常见。
这些类型的处理器 AFAIK 的主要思想实际上是一个速度问题。
RISC 方法就是拥有一组简单而快速的指令(通常每条指令一个内存周期)。这并不一定意味着它的指令比 CISC 处理器少,更多的是它有更简单、更快的指令。
许多 MIPS 处理器,尽管 8 字节可寻址将是字对齐的(通常为 32 位,但并非总是如此)然后屏蔽适当的位。
这个想法是,与尝试执行未对齐的加载相比,执行对齐加载 + 位掩码更快。通常(当然这真的取决于芯片组),执行未对齐的加载会产生总线错误,因此 RISC 处理器会提供“未对齐的加载/存储”指令,但这通常比相应的对齐加载/存储慢得多.

Of course this still doesn't answer the question as to why they do this i.e what advantage does having memory word aligned give you? I'm no hardware expert and I'm sure someone on here can give a better answer but my two best guesses are:
1. It can be much faster to fetch from the cache when word aligned because many caches are organised into cache-lines ( anything from 8 to 512 bytes ) and as cache memory is typically much more expensive than RAM, you want to make the most of it.
2. It may be much faster to access each memory address as it allows you to read through 'Burst Mode' ( i.e fetching the next sequential address before it's needed )

当然,这仍然没有回答他们为什么这样做的问题,即记忆字对齐给你带来什么好处?我不是硬件专家,我相信这里有人可以给出更好的答案,但我的两个最佳猜测是:
1. 字对齐时从缓存中获取会快得多,因为许多缓存被组织成缓存行(从 8 到 512 字节的任何内容)并且由于高速缓存通常比 RAM 贵得多,您希望充分利用它。
2.访问每个内存地址可能要快得多,因为它允许您通过“突发模式”读取(即在需要之前获取下一个连续地址)

Note none of the above is strictly impossible with non-aligned stores, I'm guessing ( though I don't know ) that a lot of it comes down to hardware design choices and cost

请注意,对于非对齐的商店,以上所有内容都不是绝对不可能的,我猜(虽然我不知道)很多都归结为硬件设计选择和成本

回答by Artur Soler

An example of aligment requirement is when using vectorization (SIMD) instructions. (It can be used without aligment but is much faster if you use a kind of instruction which requires alignment).

对齐要求的一个示例是使用矢量化 (SIMD) 指令时。(它可以在没有对齐的情况下使用,但如果您使用一种需要对齐的指令,速度会快得多)。