在 C++11 中对齐内存的推荐方法是什么

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20791428/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 23:19:01  来源:igfitidea点击:

What is the recommended way to align memory in C++11

c++c++11memory-alignment

提问by Rajiv

I am working on a single producer single consumer ring buffer implementation.I have two requirements:

我正在研究单生产者单消费者环形缓冲区实现。我有两个要求:

1) Align a single heap allocated instance of a ring buffer to a cache line.

1) 将环形缓冲区的单个堆分配实例与缓存线对齐。

2) Align a field within a ring buffer to a cache line (to prevent false sharing).

2) 将环形缓冲区内的字段与缓存行对齐(以防止错误共享)。

My class looks something like:

我的课程看起来像:

#define CACHE_LINE_SIZE 64  // To be used later.

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
public:
  ....

private:
  std::atomic<int64_t> publisher_sequence_ ;
  int64_t cached_consumer_sequence_;
  T* events_;
  std::atomic<int64_t> consumer_sequence_;  // This needs to be aligned to a cache line.

};

Let me first tackle point 1 i.e. aligning a single heap allocated instanceof the class. There are a few ways:

让我首先解决第 1 点,即对齐类的单个堆分配实例。有几种方法:

1) Use the c++ 11 alignas(..)specifier:

1) 使用 c++ 11alignas(..)说明符:

template<typename T, uint64_t num_events>
class alignas(CACHE_LINE_SIZE) RingBuffer {
public:
  ....

private:
  // All the private fields.

};

2) Use posix_memalign(..)+ placement new(..)without altering the class definition. This suffers from not being platform independent:

2) 在不改变类定义的情况下使用posix_memalign(..)+ 放置new(..)。这受到不独立于平台的影响:

 void* buffer;
 if (posix_memalign(&buffer, 64, sizeof(processor::RingBuffer<int, kRingBufferSize>)) != 0) {
   perror("posix_memalign did not work!");
   abort();
 }
 // Use placement new on a cache aligned buffer.
 auto ring_buffer = new(buffer) processor::RingBuffer<int, kRingBufferSize>();

3) Use the GCC/Clang extension __attribute__ ((aligned(#)))

3) 使用 GCC/Clang 扩展 __attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {
public:
  ....

private:
  // All the private fields.

} __attribute__ ((aligned(CACHE_LINE_SIZE)));

4) I tried to use the C++ 11 standardized aligned_alloc(..)function instead of posix_memalign(..)but GCC 4.8.1 on Ubuntu 12.04 could not find the definition in stdlib.h

4) 我尝试使用 C++ 11 标准化aligned_alloc(..)函数而不是posix_memalign(..)GCC 4.8.1 在 Ubuntu 12.04 中找不到定义stdlib.h

Are all of these guaranteed to do the same thing? My goal is cache-line alignment so any method that has some limits on alignment (say double word) will not do. Platform independence which would point to using the standardized alignas(..)is a secondary goal.

所有这些都保证做同样的事情吗?我的目标是缓存行对齐,因此任何对对齐有一些限制(比如双字)的方法都行不通。指向使用标准化的平台独立性alignas(..)是次要目标。

I am not clear on whether alignas(..)and __attribute__((aligned(#)))have some limit which could be below the cache line on the machine. I can't reproduce this any more but while printing addresses I think I did not always get 64 byte aligned addresses with alignas(..). On the contrary posix_memalign(..)seemed to always work. Again I cannot reproduce this any more so maybe I was making a mistake.

我不清楚是否alignas(..)__attribute__((aligned(#)))有一定的限额,这可能是本机上的高速缓存线以下。我不能再复制这个了,但是在打印地址时,我想我并不总是得到 64 字节对齐的地址alignas(..)。相反posix_memalign(..)似乎总是有效。再次,我不能再重现这个,所以也许我犯了一个错误。

The second aim is to align a field within a class/structto a cache line. I am doing this to prevent false sharing. I have tried the following ways:

第二个目标是将类/结构中的字段与缓存行对齐。我这样做是为了防止错误共享。我尝试了以下方法:

1) Use the C++ 11 alignas(..)specifier:

1) 使用 C++ 11alignas(..)说明符:

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ alignas(CACHE_LINE_SIZE);
};

2) Use the GCC/Clang extension __attribute__ ((aligned(#)))

2) 使用 GCC/Clang 扩展 __attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ __attribute__ ((aligned (CACHE_LINE_SIZE)));
};

Both these methods seem to align consumer_sequenceto an address 64 bytes after the beginning of the object so whether consumer_sequenceis cache aligned depends on whether the object itself is cache aligned. Here my question is - are there any better ways to do the same?

这两种方法似乎都consumer_sequence与对象开始后 64 字节的地址对齐,因此是否consumer_sequence缓存对齐取决于对象本身是否缓存对齐。我的问题是 - 有没有更好的方法来做同样的事情?

EDIT:The reason aligned_alloc did not work on my machine was that I was on eglibc 2.15 (Ubuntu 12.04). It worked on a later version of eglibc.

编辑:aligned_alloc 在我的机器上不起作用的原因是我在 eglibc 2.15 (Ubuntu 12.04) 上。它适用于更高版本的 eglibc。

From the man page: The function aligned_alloc() was added to glibc in version 2.16.

手册页The function aligned_alloc() was added to glibc in version 2.16

This makes it pretty useless for me since I cannot require such a recent version of eglibc/glibc.

这对我来说毫无用处,因为我不需要这样一个最新版本的 eglibc/glibc。

采纳答案by Glenn Teitelbaum

Unfortunately the best I have found is allocating extra space and then using the "aligned" part. So the RingBuffer newcan request an extra 64 bytes and then return the first 64 byte aligned part of that. It wastes space but will give the alignment you need. You will likely need to set the memory before what is returned to the actual alloc address to unallocate it.

不幸的是,我发现的最好的方法是分配额外的空间,然后使用“对齐”部分。因此 RingBuffernew可以请求额外的 64 字节,然后返回其中第一个 64 字节对齐的部分。它浪费空间,但会提供您需要的对齐方式。您可能需要在返回到实际分配地址以取消分配之前设置内存。

[Memory returned][ptr to start of memory][aligned memory][extra memory]

(assuming no inheritence from RingBuffer)something like:

(假设没有从 RingBuffer 继承)类似于:

void * RingBuffer::operator new(size_t request)
{
     static const size_t ptr_alloc = sizeof(void *);
     static const size_t align_size = 64;
     static const size_t request_size = sizeof(RingBuffer)+align_size;
     static const size_t needed = ptr_alloc+request_size;

     void * alloc = ::operator new(needed);
     void *ptr = std::align(align_size, sizeof(RingBuffer),
                          alloc+ptr_alloc, request_size);

     ((void **)ptr)[-1] = alloc; // save for delete calls to use
     return ptr;  
}

void RingBuffer::operator delete(void * ptr)
{
    if (ptr) // 0 is valid, but a noop, so prevent passing negative memory
    {
           void * alloc = ((void **)ptr)[-1];
           ::operator delete (alloc);
    }
}

For the second requirement of having a data member of RingBufferalso 64 byte aligned, for that if you know that the start of thisis aligned, you can pad to force the alignment for data members.

对于RingBuffer同样 64 字节对齐的数据成员的第二个要求,如果您知道开头this是对齐的,则可以填充以强制数据成员对齐。

回答by rubenvb

The answer to your problem is std::aligned_storage. It can be used top level and for individual members of a class.

您的问题的答案是std::aligned_storage。它可以用于顶层和类的单个成员。

回答by Rajiv

After some more research my thoughts are:

经过更多的研究,我的想法是:

1) Like @TemplateRex pointed out there does not seem to be a standard way to align to more than 16 bytes. So even if we use the standardized alignas(..)there is no guarantee unless the alignment boundary is less than or equal to 16 bytes. I'll have to verify that it works as expected on a target platform.

1) 就像@TemplateRex 指出的那样,似乎没有对齐超过 16 个字节的标准方法。因此,即使我们使用标准化alignas(..)也不能保证,除非对齐边界小于或等于 16 个字节。我必须验证它在目标平台上是否按预期工作。

2) __attribute ((aligned(#)))or alignas(..)cannot be used to align a heap allocated object as I suspected i.e. new()doesn't do anything with these annotations. They seem to work for static objects or stack allocations with the caveats from (1).

2)__attribute ((aligned(#)))或者alignas(..)不能用于对齐堆分配的对象,因为我怀疑 ienew()不会对这些注释做任何事情。它们似乎适用于静态对象或堆栈分配,但有 (1) 中的警告。

Either posix_memalign(..)(non standard) or aligned_alloc(..)(standardized but couldn't get it to work on GCC 4.8.1) + placement new(..)seems to be the solution. My solution for when I need platform independent code is compiler specific macros :)

无论是posix_memalign(..)(非标)或aligned_alloc(..)(标准化,但实在提不起工作的GCC 4.8.1)+位置new(..)似乎是解决方案。当我需要独立于平台的代码时,我的解决方案是编译器特定的宏:)

3) Alignment for struct/class fields seems to work with both __attribute ((aligned(#)))and alignas()as noted in the answer. Again I think the caveats from (1) about guarantees on alignment stand.

3)结构/类字段的对齐似乎适用于两者,__attribute ((aligned(#)))并且alignas()如答案中所述。我再次认为(1)中关于对齐支架保证的警告。

So my current solution is to use posix_memalign(..)+ placement new(..)for aligning a heap allocated instance of my class since my target platform right now is Linux only. I am also using alignas(..)for aligning fields since it's standardized and at least works on Clang and GCC. I'll be happy to change it if a better answer comes along.

所以我目前的解决方案是使用 posix_memalign(..)+ 放置new(..)来对齐我的类的堆分配实例,因为我现在的目标平台只是 Linux。我也alignas(..)用于对齐字段,因为它是标准化的并且至少适用于 Clang 和 GCC。如果出现更好的答案,我会很乐意更改它。

回答by Hugo

I don't know if it is the best way to align memory allocated with a new operator, but it is certainly very simple !

我不知道这是否是使用 new 运算符对齐分配的内存的最佳方法,但它确实非常简单!

This is the way it is done in thread sanitizer pass in GCC 6.1.0

这是在 GCC 6.1.0 中的线程清理程序中完成的方式

#define ALIGNED(x) __attribute__((aligned(x)))

static char myarray[sizeof(myClass)] ALIGNED(64) ;
var = new(myarray) myClass;

Well, in sanitizer_common/sanitizer_internal_defs.h, it is also written

嗯,在sanitizer_common/sanitizer_internal_defs.h中,也是这样写的

// Please only use the ALIGNED macro before the type.
// Using ALIGNED after the variable declaration is not portable!        

So I do not know why the ALIGNED here is used after the variable declaration. But it is an other story.

所以不知道为什么这里的ALIGNED用在变量声明之后。但这是另一个故事。