C语言 打印一个 __m128i 变量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13257166/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 04:19:55  来源:igfitidea点击:

print a __m128i variable

cassemblyssesimdintrinsics

提问by arunmoezhi

I'm trying to learn to code using intrinsics and below is a code which does addition

我正在尝试学习使用内在函数进行编码,下面是一个进行加法的代码

compiler used: icc

compiler used: icc

#include<stdio.h>
#include<emmintrin.h>
int main()
{
        __m128i a = _mm_set_epi32(1,2,3,4);
        __m128i b = _mm_set_epi32(1,2,3,4);
        __m128i c;
        c = _mm_add_epi32(a,b);
        printf("%d\n",c[2]);
        return 0;
}

I get the below error:

我收到以下错误:

test.c(9): error: expression must have pointer-to-object type
        printf("%d\n",c[2]);

How do I print the values in the variable cwhich is of type __m128i

如何打印c类型变量中的值__m128i

采纳答案by askmish

Use this function to print them:

使用此函数打印它们:

#include <stdint.h>
#include <string.h>

void print128_num(__m128i var)
{
    uint16_t val[8];
    memcpy(val, &var, sizeof(val));
    printf("Numerical: %i %i %i %i %i %i %i %i \n", 
           val[0], val[1], val[2], val[3], val[4], val[5], 
           val[6], val[7]);
}

You split 128bits into 16-bits(or 32-bits) before printing them.

在打印之前将 128 位拆分为 16 位(或 32 位)。

This is a way of 64-bit splitting and printing if you have 64-bit support available:

如果您有 64 位支持,这是 64 位拆分和打印的一种方式:

#include <inttypes.h>

void print128_num(__m128i var) 
{
    int64_t v64val[2];
    memcpy(v64val, &var, sizeof(v64val));
    printf("%.16llx %.16llx\n", v64val[1], v64val[0]);
}

Note:casting the &vardirectly to an int*or uint16_t*would also work MSVC, but this violates strict aliasing and is undefined behaviour. Using memcpyis the standard compliant way to do the same and with minimal optimization the compiler will generate the exact same binary code.

注意:&var直接转换为int*oruint16_t*也可以使用 MSVC,但这违反了严格的别名并且是未定义的行为。使用memcpy是执行相同操作的标准兼容方式,并且通过最少的优化,编译器将生成完全相同的二进制代码。

回答by Peter Cordes

  • Portable across gcc/clang/ICC/MSVC, C and C++.
  • fully safe with all optimization levels: no pointer aliasing(unlike most of the other answers)
  • print in hex as u8, u16, u32, or u64 elements (based on @AG1's answer)
  • Prints in memory order (least-significant element first, like _mm_setr_epiX). Reverse the array indices if you prefer printing in the same order Intel's manuals use, where the most significant element is on the left (like _mm_set_epiX). Related: Convention for displaying vector registers
  • 可移植到 gcc/clang/ICC/MSVC、C 和 C++。
  • 所有优化级别都完全安全:没有指针别名(与大多数其他答案不同)
  • 以十六进制打印为 u8、u16、u32 或 u64 元素(基于@AG1 的回答
  • 按内存顺序打印(最不重要的元素在前,如_mm_setr_epiX)。如果您更喜欢以英特尔手册使用的相同顺序打印,则反转数组索引,其中最重要的元素在左侧(如_mm_set_epiX)。相关:显示矢量寄存器的约定

Using a __m128i*to load from an array of intis safe because the __m128types are defined to allow aliasing. (e.g. in gcc's headers, the definition includes __attribute__((may_alias)).)

使用 a__m128i*从数组加载int是安全的,因为__m128类型被定义为允许别名。(例如,在 gcc 的头文件中,定义包括__attribute__((may_alias)).)

The reverse isn'tsafe (a __m128iobject and an intpointer). It might happen to work in most cases, but why risk it?

反过来不安全的(一个__m128i对象和一个int指针)。在大多数情况下它可能会起作用,但为什么要冒险呢?

(uint32_t*) &my_vectorviolates the C and C++ aliasing rules, and is not guaranteed to work the way you'd expect. Storing to a local array and then accessing it is guaranteed to be safe. It even optimizes away with most compilers, so you get movq/ pextrqdirectly from xmm to integer registers instead of an actualstore/reload, for example.

(uint32_t*) &my_vector违反了 C 和 C++ 别名规则,并且不能保证按您期望的方式工作。存储到本地数组然后访问它是安全的。它甚至可以使用大多数编译器进行优化,例如,您可以直接从 xmm获取movq/pextrq到整数寄存器,而不是实际的存储/重新加载。

Source + asm output on the Godbolt compiler explorer: proof it compiles with MSVC and so on.

Godbolt 编译器资源管理器上的 Source + asm 输出:证明它使用 MSVC 编译等等。

#include <immintrin.h>
#include <stdint.h>
#include <stdio.h>

#ifndef __cplusplus
#include <stdalign.h>   // C11 defines _Alignas().  This header defines alignas()
#endif

void p128_hex_u8(__m128i in) {
    alignas(16) uint8_t v[16];
    _mm_store_si128((__m128i*)v, in);
    printf("v16_u8: %x %x %x %x | %x %x %x %x | %x %x %x %x | %x %x %x %x\n",
           v[0], v[1],  v[2],  v[3],  v[4],  v[5],  v[6],  v[7],
           v[8], v[9], v[10], v[11], v[12], v[13], v[14], v[15]);
}

void p128_hex_u16(__m128i in) {
    alignas(16) uint16_t v[8];
    _mm_store_si128((__m128i*)v, in);
    printf("v8_u16: %x %x %x %x,  %x %x %x %x\n", v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7]);
}

void p128_hex_u32(__m128i in) {
    alignas(16) uint32_t v[4];
    _mm_store_si128((__m128i*)v, in);
    printf("v4_u32: %x %x %x %x\n", v[0], v[1], v[2], v[3]);
}

void p128_hex_u64(__m128i in) {
    alignas(16) unsigned long long v[2];  // uint64_t might give format-string warnings with %llx; it's just long in some ABIs
    _mm_store_si128((__m128i*)v, in);
    printf("v2_u64: %llx %llx\n", v[0], v[1]);
}

If you need portability to C99 or C++03 or earlier (i.e. without C11 / C++11), remove the alignas()and use storeuinstead of store. Or use __attribute__((aligned(16)))or __declspec( align(16) )instead.

如果您需要可移植到 C99 或 C++03 或更早版本(即没有 C11/C++11),请删除alignas()并使用storeu代替store。或者使用__attribute__((aligned(16)))or__declspec( align(16) )代替。

(If you're writing code with intrinsics, you should be using a recent compiler version. Newer compilers usually make better asm than older compilers, including for SSE/AVX intrinsics. But maybe you want to use gcc-6.3 with -std=gnu++03C++03 mode for a codebase that isn't ready for C++11 or something.)

(如果你用内在函数编写代码,你应该使用最新的编译器版本。新的编译器通常比旧的编译器有更好的 asm,包括 SSE/AVX 内在函数。但也许你想在-std=gnu++03C++03 中使用 gcc-6.3未准备好用于 C++11 或其他东西的代码库的模式。)



Sample output from calling all 4 functions on

调用所有 4 个函数的示例输出

// source used:
__m128i vec = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7,
                            8, 9, 10, 11, 12, 13, 14, 15, 16);

// output:

v2_u64: 0x807060504030201 0x100f0e0d0c0b0a09
v4_u32: 0x4030201 0x8070605 0xc0b0a09 0x100f0e0d
v8_u16: 0x201 0x403 0x605 0x807  | 0xa09 0xc0b 0xe0d 0x100f
v16_u8: 0x1 0x2 0x3 0x4 | 0x5 0x6 0x7 0x8 | 0x9 0xa 0xb 0xc | 0xd 0xe 0xf 0x10

Adjust the format strings if you want to pad with leading zeros for consistent output width. See printf(3).

如果要填充前导零以获得一致的输出宽度,请调整格式字符串。见printf(3)

回答by Antonio

I know this question is tagged C, but it was the best search result also when looking for a C++ solution to the same problem.

我知道这个问题被标记为 C,但在寻找相同问题的 C++ 解决方案时,它也是最好的搜索结果。

So, this could be a C++ implementation:

所以,这可能是一个 C++ 实现:

#include <string>
#include <cstring>
#include <sstream>

#if defined(__SSE2__)
template <typename T>
std::string __m128i_toString(const __m128i var) {
    std::stringstream sstr;
    T values[16/sizeof(T)];
    std::memcpy(values,&var,sizeof(values)); //See discussion below
    if (sizeof(T) == 1) {
        for (unsigned int i = 0; i < sizeof(__m128i); i++) { //C++11: Range for also possible
            sstr << (int) values[i] << " ";
        }
    } else {
        for (unsigned int i = 0; i < sizeof(__m128i) / sizeof(T); i++) { //C++11: Range for also possible
            sstr << values[i] << " ";
        }
    }
    return sstr.str();
}
#endif

Usage:

用法:

#include <iostream>
[..]
__m128i x
[..]
std::cout << __m128i_toString<uint8_t>(x) << std::endl;
std::cout << __m128i_toString<uint16_t>(x) << std::endl;
std::cout << __m128i_toString<uint32_t>(x) << std::endl;
std::cout << __m128i_toString<uint64_t>(x) << std::endl;

Result:

结果:

141 114 0 0 0 0 0 0 151 104 0 0 0 0 0 0
29325 0 0 0 26775 0 0 0
29325 0 26775 0
29325 26775

Note: there exists a simple way to avoid the if (size(T)==1), see https://stackoverflow.com/a/28414758/2436175

注意:有一种简单的方法可以避免if (size(T)==1),请参阅https://stackoverflow.com/a/28414758/2436175

回答by Lucien

#include<stdio.h>
#include<emmintrin.h>
int main()
{
    __m128i a = _mm_set_epi32(1,2,3,4);
    __m128i b = _mm_set_epi32(1,2,3,4);
    __m128i c;

    const int32_t* q; 
    //add a pointer 
    c = _mm_add_epi32(a,b);

    q = (const int32_t*) &c;
    printf("%d\n",q[2]);
    //printf("%d\n",c[2]);
    return 0;
}

Try this code.

试试这个代码。