C++ 向量的数据如何对齐?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8456236/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How is a vector's data aligned?
提问by fredoverflow
If I want to process data in a std::vector
with SSE, I need 16 byte alignment. How can I achieve that? Do I need to write my own allocator? Or does the default allocator already align to 16 byte boundaries?
如果我想std::vector
用 SSE处理数据,我需要 16 字节对齐。我怎样才能做到这一点?我需要编写自己的分配器吗?或者默认分配器是否已经与 16 字节边界对齐?
采纳答案by Maxim Egorushkin
C++ standard requires allocation functions (malloc()
and operator new()
) to allocate memory suitably aligned for any standardtype. As these functions don't receive the alignment requirement as an argument, on practice it means that the alignment for all allocations is the same and is the alignment of a standard type with the largest alignment requirement, which often is long double
and/or long long
(see boost max_align union).
C++ 标准需要分配函数(malloc()
和operator new()
)来为任何标准类型分配适当对齐的内存。由于这些函数不接受对齐要求作为参数,实际上这意味着所有分配的对齐是相同的,并且是具有最大对齐要求的标准类型的对齐,这通常是long double
和/或long long
(参见boost max_align 联合)。
Vector instructions, such as SSE and AVX, have stronger alignment requirements (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access) than that provided by the standard C++ allocation functions. posix_memalign()
or memalign()
can be used to satisfy such allocations with stronger alignment requirements.
向量指令,例如 SSE 和 AVX,比标准 C++ 分配函数提供的对齐要求更强(16 字节对齐用于 128 位访问,32 字节对齐用于 256 位访问)。posix_memalign()
或memalign()
可用于满足具有更强对齐要求的此类分配。
回答by user1071136
You should use a custom allocator with std::
containers, such as vector
. Can't remember who wrote the following one, but I used it for some time and it seems to work (you might have to change _aligned_malloc
to _mm_malloc
, depending on compiler/platform):
您应该对std::
容器使用自定义分配器,例如vector
. 不记得谁写了以下内容,但我使用了一段时间,它似乎有效(您可能必须更改_aligned_malloc
为_mm_malloc
,具体取决于编译器/平台):
#ifndef ALIGNMENT_ALLOCATOR_H
#define ALIGNMENT_ALLOCATOR_H
#include <stdlib.h>
#include <malloc.h>
template <typename T, std::size_t N = 16>
class AlignmentAllocator {
public:
typedef T value_type;
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
typedef T * pointer;
typedef const T * const_pointer;
typedef T & reference;
typedef const T & const_reference;
public:
inline AlignmentAllocator () throw () { }
template <typename T2>
inline AlignmentAllocator (const AlignmentAllocator<T2, N> &) throw () { }
inline ~AlignmentAllocator () throw () { }
inline pointer adress (reference r) {
return &r;
}
inline const_pointer adress (const_reference r) const {
return &r;
}
inline pointer allocate (size_type n) {
return (pointer)_aligned_malloc(n*sizeof(value_type), N);
}
inline void deallocate (pointer p, size_type) {
_aligned_free (p);
}
inline void construct (pointer p, const value_type & wert) {
new (p) value_type (wert);
}
inline void destroy (pointer p) {
p->~value_type ();
}
inline size_type max_size () const throw () {
return size_type (-1) / sizeof (value_type);
}
template <typename T2>
struct rebind {
typedef AlignmentAllocator<T2, N> other;
};
bool operator!=(const AlignmentAllocator<T,N>& other) const {
return !(*this == other);
}
// Returns true if and only if storage allocated from *this
// can be deallocated from other, and vice versa.
// Always returns true for stateless allocators.
bool operator==(const AlignmentAllocator<T,N>& other) const {
return true;
}
};
#endif
Use it like this (change the 16 to another alignment, if needed):
像这样使用它(如果需要,将 16 更改为另一个对齐方式):
std::vector<T, AlignmentAllocator<T, 16> > bla;
This, however, only makes sure the memory block std::vector
uses is 16-bytes aligned. If sizeof(T)
is not a multiple of 16, some of your elements will not be aligned. Depending on your data-type, this might be a non-issue. If T
is int
(4 bytes), only load elements whose index is a multiple of 4. If it's double
(8 bytes), only multiples of 2, etc.
但是,这只能确保使用的内存块std::vector
是 16 字节对齐的。如果sizeof(T)
不是 16 的倍数,则您的某些元素将不会对齐。根据您的数据类型,这可能不是问题。如果T
是int
(4个字节),只加载索引是4的倍数的元素。如果是(8个字节),只加载double
2的倍数,等等。
The real issue is if you use classes as T
, in which case you will have to specify your alignment requirements in the class itself (again, depending on compiler, this might be different; the example is for GCC):
真正的问题是,如果您使用类 as T
,在这种情况下,您必须在类本身中指定对齐要求(同样,根据编译器,这可能会有所不同;示例适用于 GCC):
class __attribute__ ((aligned (16))) Foo {
__attribute__ ((aligned (16))) double u[2];
};
We're almost done! If you use Visual C++(at least, version 2010), you won't be able to use an std::vector
with classes whose alignment you specified, because of std::vector::resize
.
我们快完成了!如果您使用Visual C++(至少是 2010 版),您将无法使用std::vector
您指定对齐方式的with 类,因为std::vector::resize
.
When compiling, if you get the following error:
编译时,如果出现以下错误:
C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector(870):
error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned
You will have to hack your stl::vector header
file:
你将不得不破解你的stl::vector header
文件:
- Locate the
vector
header file [C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector] - Locate the
void resize( _Ty _Val )
method [line 870 on VC2010] - Change it to
void resize( const _Ty& _Val )
.
- 找到
vector
头文件[C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector] - 找到
void resize( _Ty _Val )
方法[VC2010上的第870行] - 将其更改为
void resize( const _Ty& _Val )
.
回答by Dev Null
Instead of writing your own allocator, as suggested before, you can use boost::alignment::aligned_allocator
for std::vector
like this:
您可以像这样使用for ,而不是像之前建议的那样编写自己的分配器:boost::alignment::aligned_allocator
std::vector
#include <vector>
#include <boost/align/aligned_allocator.hpp>
template <typename T>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, 16>>;
回答by moose
Write your own allocator. allocate
and deallocate
are the important ones. Here is one example:
编写自己的分配器。allocate
并且deallocate
是重要的。这是一个例子:
pointer allocate( size_type size, const void * pBuff = 0 )
{
char * p;
int difference;
if( size > ( INT_MAX - 16 ) )
return NULL;
p = (char*)malloc( size + 16 );
if( !p )
return NULL;
difference = ( (-(int)p - 1 ) & 15 ) + 1;
p += difference;
p[ -1 ] = (char)difference;
return (T*)p;
}
void deallocate( pointer p, size_type num )
{
char * pBuffer = (char*)p;
free( (void*)(((char*)p) - pBuffer[ -1 ] ) );
}
回答by Martin York
Short Answer:
简答:
If sizeof(T)*vector.size() > 16
then Yes.
Assuming you vector uses normal allocators
如果sizeof(T)*vector.size() > 16
然后是。
假设您向量使用普通分配器
Caveat: As long as alignof(std::max_align_t) >= 16
as this is the max alignment.
警告:只要alignof(std::max_align_t) >= 16
这是最大对齐。
Long Answer:
长答案:
Updated 25/Aug/2017 new standard n4659
2017 年 8 月 25 日更新新标准n4659
If it is aligned for anything that is greater than 16 it is also aligned correctly for 16.
如果它针对大于 16 的任何内容对齐,则它也针对 16 正确对齐。
6.11 Alignment (Paragraph 4/5)
6.11 对齐(第 4/5 段)
Alignments are represented as values of the type std::size_t. Valid alignments include only those values returned by an alignof expression for the fundamental types plus an additional implementation-defined set of values, which may be empty. Every alignment value shall be a non-negative integral power of two.
Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.
对齐表示为 std::size_t 类型的值。有效的对齐仅包括由基本类型的 alignof 表达式返回的那些值加上一组附加的实现定义的值,这些值可能为空。每个对齐值应为 2 的非负整数幂。
对齐具有从弱到强或更严格对齐的顺序。更严格的对齐具有更大的对齐值。满足对齐要求的地址也满足任何较弱的有效对齐要求。
new and new[] return values that are aligned so that objects are correctly aligned for their size:
new 和 new[] 返回对齐的值,以便对象与其大小正确对齐:
8.3.4 New (paragraph 17)
8.3.4 新的(第 17 段)
[ Note: when the allocation function returns a value other than null, it must be a pointer to a block of storage in which space for the object has been reserved. The block of storage is assumed to be appropriately aligned and of the requested size. The address of the created object will not necessarily be the same as that of the block if the object is an array. — end note ]
[注意:当分配函数返回非空值时,它必须是一个指向已为对象保留空间的存储块的指针。假定存储块已适当对齐并具有请求的大小。如果对象是数组,则创建的对象的地址不一定与块的地址相同。— 尾注 ]
Note most systems have a maximum alignment. Dynamically allocated memory does not need to be aligned to a value greater than this.
请注意,大多数系统都有最大对齐。动态分配的内存不需要与大于此值的值对齐。
6.11 Alignment (paragraph 2)
6.11 对齐(第 2 段)
A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (21.2). The alignment required for a type might be different when it is used as the type of a complete object and when it is used as the type of a subobject.
基本对齐由小于或等于所有上下文中实现支持的最大对齐的对齐表示,它等于 alignof(std::max_align_t) (21.2)。当一个类型用作完整对象的类型和用作子对象的类型时,它所需的对齐方式可能不同。
Thus as long as your vector memory allocated is greater than 16 bytes it will be correctly aligned on 16 byte boundaries.
因此,只要您分配的向量内存大于 16 字节,它就会在 16 字节边界上正确对齐。
回答by kiriloff
Use declspec(align(x,y))
as explained in vectorization tutorial for Intel, http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.pdf
declspec(align(x,y))
按照英特尔矢量化教程中的说明使用,http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.pdf
回答by Mario
Don't assume anything about STL containers. Their interface/behaviour is defined, but not what's behind them. If you need raw access, you'll have to write your own implementation that follows the rules you'd like to have.
不要对 STL 容器做任何假设。它们的界面/行为是定义的,但不是它们背后的东西。如果您需要原始访问权限,则必须编写自己的实现,该实现遵循您希望拥有的规则。
回答by Puppy
The Standard mandates that new
and new[]
return data aligned for anydata type, which should include SSE. Whether or not MSVC actually follows that rule is another question.
该标准要求new
并new[]
返回与任何数据类型对齐的数据,其中应包括 SSE。MSVC 是否真正遵循该规则是另一个问题。