xcode 在每个功能/每个代码块的基础上启用 SSE4 的正确方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24101875/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 05:02:00  来源:igfitidea点击:

Proper way to enable SSE4 on a per-function / per-block of code basis?

xcodeclangllvmsse

提问by iccir

For one of my OS X programs, I have a few optimized cases which use SSE4.1 instructions. On SSE3-only machines, the non-optimized branch is ran:

对于我的 OS X 程序之一,我有一些使用 SSE4.1 指令的优化案例。在仅 SSE3 的机器上,运行非优化分支:

// SupportsSSE4_1 returns true on CPUs that support SSE4.1, false otherwise
if (SupportsSSE4_1()) {

    // Code that uses _mm_dp_ps, an SSE4 instruction

    ...

    __m128 hDelta   = _mm_sub_ps(here128, right128);
    __m128 vDelta   = _mm_sub_ps(here128, down128);

    hDelta = _mm_sqrt_ss(_mm_dp_ps(hDelta, hDelta, 0x71));
    vDelta = _mm_sqrt_ss(_mm_dp_ps(vDelta, vDelta, 0x71));

    ...

} else {
    // Equivalent code that uses SSE3 instructions
    ...
}

In order to get the above to compile, I had to set CLANG_X86_VECTOR_INSTRUCTIONSto sse4.1.

为了编译上述内容,我必须设置CLANG_X86_VECTOR_INSTRUCTIONSsse4.1.

However, this seems to instruct clang that it's ok to use the ROUNDSDinstruction anywhere in my program. Hence, the program is crashing on SSE3-only machines with SIGILL: ILL_ILLOPC.

但是,这似乎指示 clangROUNDSD可以在我的程序中的任何地方使用该指令。因此,该程序在带有SIGILL: ILL_ILLOPC.

What's the best practice for enabling SSE4.1 for just the lines the code inside of true branch of the SupportsSSE4_1()if block?

仅对SupportsSSE4_1()if 块的 true 分支内的代码启用 SSE4.1 的最佳做法是什么?

回答by Stephen Canon

There is currently no way to target different ISA extensions at block / function granularity in clang. You can only do it at filegranularity (put your SSE4.1 code into a separate file and specify that file to use -msse4.1). If this is an important feature for you, please file a bug report to request it!

目前没有办法在 clang 中以块 / 函数粒度针对不同的 ISA 扩展。您只能在文件粒度上执行此操作(将 SSE4.1 代码放入单独的文件并指定要使用的文件-msse4.1)。如果这对您来说是一个重要功能,请提交错误报告以请求它!

However, I should note that the actually benefit of DPPSis pretty small in most real scenarios (and using DPPSeven slows down some code sequences!). Unless this particular code sequence is critical, and you have carefully measured the effect of using DPPS, it may not be worth the hassle to special case for SSE4.1 even if that compiler feature is available.

但是,我应该注意到,DPPS在大多数实际场景中, 的实际好处非常小(并且使用DPPS甚至会减慢某些代码序列的速度!)。除非这个特定的代码序列很关键,并且您已经仔细测量了使用 DPPS 的效果,否则即使该编译器功能可用,也可能不值得为 SSE4.1 的特殊情况而烦恼。

回答by Z boson

You can make a CPU dispatcher. You can do this in one file but you have to compile twice. First with SSE4.1 and then without and then link in the object file for SSE4.1. The first time you call your fucntion myfuncit calls the function myfunc_dispatchwhich determines the instruction set and sets the pointer to either myfunc_SSE41or myfunc_SSE3. The next time you call your func myfuncit jumps right to the function for your instruction set.

您可以制作 CPU 调度程序。您可以在一个文件中执行此操作,但必须编译两次。首先使用 SSE4.1,然后不使用,然后链接到 SSE4.1 的目标文件中。第一次调用 fucntion 时,myfunc它会调用myfunc_dispatch确定指令集并将指针设置为myfunc_SSE41或的函数myfunc_SSE3。下次调用 func 时,myfunc它会直接跳转到指令集的函数。

//clang -c -O3 -msse4.1 foo.cpp -o foo_sse41.o
//clang -O3 -msse3 foo.cpp foo_sse41.o   

typedef float MyFuncType(float*);

MyFuncType myfunc, myfunc_SSE41, myfunc_SSE3, myfunc_dispatch;
MyFuncType * myfunc_pointer = &myfunc_dispatch;

#ifdef __SSE4_1__
float myfunc_SSE41(float* a) {
    //SSE41 code
}
#else
float  myfunc_SSE3(float *a) {
    //SSE3 code
}

float myfunc_dispatch(float *a) {
    if(SupportsSSE4_1()) {
        myfunc_pointer = myfunc_SSE41;
    }
    else {
        myfunc_pointer = myfunc_SSE3;
    }
    myfunc_pointer(a);
}

float myfunc(float *a) {
    (*myfunc_pointer)(a);
}
int main() {
    //myfunc(a);
}
#endif

回答by echristo

Depending on the OS you might be able to use something like Function Multiversioning in the future. I'm working on the feature right now, but it'll be a while before it's ready for use in a production compiler.

根据操作系统的不同,您将来可能会使用诸如 Function Multiversioning 之类的东西。我现在正在研究该功能,但还需要一段时间才能在生产编译器中使用。

See http://gcc.gnu.org/wiki/FunctionMultiVersioningfor more details.

有关更多详细信息,请参阅http://gcc.gnu.org/wiki/FunctionMultiVersioning