C语言 为什么要使用_mm_malloc?(与 _aligned_malloc、alligned_alloc 或 posix_memalign 相对)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32612881/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why use _mm_malloc? (as opposed to _aligned_malloc, alligned_alloc, or posix_memalign)
提问by Praxeolitic
There are a few options for acquiring an aligned block of memory but they're very similar and the issue mostly boils down to what language standard and platforms you're targeting.
获取对齐的内存块有几种选择,但它们非常相似,问题主要归结为您所针对的语言标准和平台。
C11
C11
void * aligned_alloc (size_t alignment, size_t size)
POSIX
POSIX
int posix_memalign (void **memptr, size_t alignment, size_t size)
Windows
视窗
void * _aligned_malloc(size_t size, size_t alignment);
And of course it's also always an option to align by hand.
当然,也始终可以选择手动对齐。
Intel offers another option.
英特尔提供了另一种选择。
Intel
英特尔
void* _mm_malloc (int size, int align)
void _mm_free (void *p)
Based on source code released by Intel, this seems to be the method of allocating aligned memory their engineers prefer but I can't find any documentation comparing it to other methods. The closest I found simply acknowledges that other aligned memory allocation routines exist.
根据英特尔发布的源代码,这似乎是他们的工程师喜欢分配对齐内存的方法,但我找不到任何将其与其他方法进行比较的文档。我发现的最接近的只是承认存在其他对齐的内存分配例程。
To dynamically allocate a piece of aligned memory, use posix_memalign, which is supported by GCC as well as the Intel Compiler. The benefit of using it is that you don't have to change the memory disposal API. You can use free() as you always do. But pay attention to the parameter profile:
??int posix_memalign (void **memptr, size_t align, size_t size);
The Intel Compiler also provides another set of memory allocation APIs. C/C++ programmers can use _mm_malloc and _mm_free to allocate and free aligned blocks of memory. For example, the following statement requests a 64-byte aligned memory block for 8 floating point elements.
??farray = (float *)__mm_malloc(8*sizeof(float), 64);
Memory that is allocated using _mm_malloc must be freed using _mm_free. Calling free on memory allocated with _mm_malloc or calling _mm_free on memory allocated with malloc will result in unpredictable behavior.
要动态分配一块对齐的内存,请使用 GCC 和英特尔编译器支持的 posix_memalign。使用它的好处是您不必更改内存处理 API。您可以像往常一样使用 free()。但是要注意参数profile:
??int posix_memalign (void **memptr, size_t align, size_t size);
英特尔编译器还提供了另一组内存分配 API。C/C++ 程序员可以使用 _mm_malloc 和 _mm_free 来分配和释放对齐的内存块。例如,以下语句为 8 个浮点元素请求一个 64 字节对齐的内存块。
??farray = (float *)__mm_malloc(8*sizeof(float), 64);
必须使用 _mm_free 释放使用 _mm_malloc 分配的内存。在用 _mm_malloc 分配的内存上调用 free 或在用 malloc 分配的内存上调用 _mm_free 将导致不可预测的行为。
The clear differences from a user perspective is that _mm_mallocrequires direct CPU and compiler support and memory allocated with _mm_mallocmust be freed with _mm_free. Given these drawbacks, what is the reason for ever using _mm_malloc?Can it have a slight performance advantage? Historical accident?
从用户角度来看,明显的区别在于_mm_malloc需要直接的 CPU 和编译器支持,并且分配的内存_mm_malloc必须使用_mm_free. 鉴于这些缺点,是什么原因一直使用_mm_malloc?它是否具有轻微的性能优势?历史事故?
采纳答案by Jeff
Intel compilers support POSIX (Linux) and non-POSIX (Windows) operating systems, hence cannot rely upon either the POSIX or the Windows function. Thus, a compiler-specific but OS-agnostic solution was chosen.
英特尔编译器支持 POSIX (Linux) 和非 POSIX (Windows) 操作系统,因此不能依赖 POSIX 或 Windows 函数。因此,选择了特定于编译器但与操作系统无关的解决方案。
C11 is a great solution but Microsoft doesn't even support C99 yet, so who knows if they will ever support C11.
C11 是一个很好的解决方案,但微软甚至还不支持 C99,所以谁知道他们是否会支持 C11。
Update:Unlike the C11/POSIX/Windows allocation functions, the ICC intrinsics include a deallocation function. This allows this API to use a separate heap manager from the default one. I don't know if/when it actually does that, but it can be useful to support this model.
更新:与 C11/POSIX/Windows 分配函数不同,ICC 内在函数包括释放函数。这允许此 API 使用不同于默认的堆管理器。我不知道它是否/何时真的这样做了,但是支持这个模型很有用。
Disclaimer: I work for Intel but have no special knowledge of these decisions, which happened long before I joined the company.
免责声明:我为英特尔工作,但对这些决定并不特别了解,这些决定发生在我加入公司之前很久。
回答by supercat
It's possible to take an existing C compiler which does not presently happen to use the identifiers _mm_allocand _mm_freeand define functions with those names which will behave as required. This could be done either by having _mm_allocfunction as a wrapper on malloc()which asks for a slightly-oversized allocation and constructs a pointer to the first suitably-aligned address within it that's at least one byte from the beginning, and storing the number of bytes skipped immediately before that address, or by having _mm_mallocrequest large chunks of memory from malloc()and then dispense them piecemeal. In any case, the pointers returned by _mm_malloc()would not be pointers that free()would generally know how to do anything with; calling _mm_freewould use the byte immediately preceding the allocation as an aid to finding the real start of the allocation received from malloc, and then pass that do free.
这有可能采用现有的C编译器不发生目前使用的标识符_mm_alloc,并_mm_free与那些要求将行为名称来定义的功能。这可以通过将_mm_alloc函数作为包装器来完成,该包装器malloc()要求稍微超大的分配并构造一个指向其中至少一个字节的第一个适当对齐的地址的指针,并存储立即跳过的字节数在该地址之前,或者通过_mm_malloc从中请求大块内存malloc()然后零碎地分配它们。在任何情况下,由 返回的指针_mm_malloc()都不会是free()通常知道如何做任何事情的指针;打电话_mm_free将使用紧接在分配之前的字节作为帮助找到从 接收到的分配的真正开始malloc,然后通过 do free。
If an aligned-allocate function is allowed to use the internals of the mallocand freefunctions, however, that may eliminate the need for the extra layer of wrapping. It's possible to write _mm_alloc()/_mm_free()functions which wraps malloc/freewithout knowing anything about their internals, but it requires that _mm_alloc()keep book-keeping information which is separate from that used by malloc/free.
但是,如果允许对齐分配函数使用malloc和free函数的内部结构,则可能不需要额外的包装层。可以编写包装/ 的_mm_alloc()/_mm_free()函数,而无需了解其内部结构,但它需要将簿记信息与/使用的信息分开。mallocfree_mm_alloc()mallocfree
If the author of an aligned-allocate function knows how mallocand freeare implemented, it will often be possible to coordinate the design of all the allocation/free functions so that freecan distinguish all kinds of allocations and handle them appropriately. No single aligned-allocate implementation would be usable on all malloc/freeimplementations, however.
如果对齐分配函数的作者知道如何实现malloc并free实现,则通常可以协调所有分配/释放函数的设计,以便free区分各种分配并适当处理它们。然而,没有一个单一的对齐分配实现可用于所有malloc/free实现。
I would suggest that the most portable way to write code would probably be to select a couple of symbols that are not used anywhere else for your own allocate and free functions, so that you could then say, e.g.
我建议编写代码的最可移植方式可能是选择几个不在其他任何地方用于您自己的分配和释放函数的符号,这样您就可以说,例如
#define a_alloc(align,sz) _mm_alloc((align),(sz))
#define a_free(ptr) _mm_free((ptr))
on compilers that support that, or
在支持的编译器上,或
static inline void *aa_alloc(int align, int size)
{
void *ret=0;
posix_memalign(&ret, align, size); // Guessing here
return ret;
}
#define a_alloc(align,sz) aa_alloc((align),(sz))
#define a_free(ptr) free((ptr))
on Posix systems, etc. For every system it should be possible to define macros or functions that will yield the necessary behavior [I think it's probably better to use macros consistently than to sometimes use macros and sometimes functions, so as to allow #if defined macronameto test whether things are defined yet].
在 Posix 系统等上。对于每个系统,应该可以定义产生必要行为的宏或函数[我认为一致地使用宏可能比有时使用宏和有时使用函数更好,以便#if defined macroname测试是否事情还没有定义]。
回答by Thief
_mm_malloc seems to have been created before there was a standard aligned_alloc function, and the need to use _mm_free is a quirk of the implementation.
_mm_malloc 似乎是在有标准的aligned_alloc 函数之前创建的,需要使用_mm_free 是实现的一个怪癖。
My guess is that unlike when using posix_memalign, it doesn't need to over-allocate in order to guarantee alignment, instead it uses a separate alignment-aware allocator. This will save memory when allocating types with alignment different to the default alignment (typically 8 or 16 bytes).
我的猜测是,与使用 posix_memalign 时不同,它不需要过度分配以保证对齐,而是使用单独的对齐感知分配器。这将在分配对齐方式与默认对齐方式(通常为 8 或 16 字节)不同的类型时节省内存。

