C++三角函数的快速实现

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5777110/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 18:51:58  来源:igfitidea点击:

Fast implementation of trigonometric functions for c++

c++mathoptimization

提问by janitor048

Short version: I'd like to know whether there are implementations of the standard trigonometric functions that are faster than the ones included in math.h.

简短版本:我想知道是否有标准三角函数的实现比math.h.

Long version: I got a program that's quite heavy on numerics (it's a physics simulation) and that needs to call trigonometric functions, mostly sinand cos, a lot. Currently I'm simply using the implementations included in math.h. Profiling shows that the calls to these functions cost more than I was expecting (hoping).

长版:我得到了一个非常注重数字的程序(它是一个物理模拟)并且需要调用三角函数,主要是sinand cos,很多。目前我只是使用包含在math.h. 分析显示对这些函数的调用成本高于我的预期(希望)。

While there is most certainly plenty of room for optimization in other parts of the code, having faster sinand cosmight give me some additional percent.. So, do you guys have any suggestions?
In another postthe usage of self-made lookup tables is suggested. But maybe there are alternatives? Or ready-made and well tested lookup solutions in some libraries?

虽然代码的其他部分肯定有很大的优化空间,但速度更快sincos可能会给我一些额外的百分比..那么,你们有什么建议吗?
在另一篇文章中,建议使用自制的查找表。但也许还有其他选择?或者在某些库中现成且经过良好测试的查找解决方案?

采纳答案by celion

Here are some good slides on how to do power series approximations (NOT Taylor series though) of trig functions: Faster Math Functions.

这里有一些关于如何进行三角函数的幂级数近似(虽然不是泰勒级数)的很好的幻灯片:更快的数学函数

It's geared towards game programmers, which means accuracy gets sacrificed for performance, but you should be able to add another term or two to the approximations to get some of the accuracy back.

它面向游戏程序员,这意味着为了性能而牺牲了准确性,但您应该能够在近似值中添加一两个术语以恢复一些准确性。

The nice thing about this is that you should also be able to extend it to SIMD easily, so that you could compute the sin or cos of 4 values at one (2 if you're using double precision).

这样做的好处是您还应该能够轻松地将其扩展到 SIMD,以便您可以同时计算 4 个值的正弦或余弦(如果您使用的是双精度,则为 2)。

Hope that helps...

希望有帮助...

回答by Jeremy Trifilo

This should be pretty damn fast if you can optimize it further please do and post the code on like pastie.org or something.

如果您可以进一步优化它,这应该非常快,请在pastie.org 之类的网站上发布并发布代码。

Computer Specifications -> 512MB Ram , Visual Studio 2010 , Windows XP Professional SP3 Version 2002 , Intel (R) Pentium (R) 4 CPU 2.8GHZ.

计算机规格 -> 512MB 内存、Visual Studio 2010、Windows XP Professional SP3 版本 2002、Intel (R) Pentium (R) 4 CPU 2.8GHZ。

This is insanely accurate and will actually provide slightly better results in some situations. E.g. 90, 180, 270 degrees in C++ returns a non 0 decimal.

这是非常准确的,并且在某些情况下实际上会提供更好的结果。例如,C++ 中的 90、180、270 度返回非 0 十进制数。

FULL TABLE OF 0 through 359 Degrees: https://pastee.org/dhwbj

0 到 359 度的完整表格:https: //pastee.org/dhwbj

FORMAT -> DEGREE # -> MINE_X(#) , CosX(#) , MINE_Z(#) , SinZ(#).

FORMAT -> DEGREE # -> MINE_X(#) , CosX(#) , MINE_Z(#) , SinZ(#)。

Below is the code used to construct the above shown table. You can probably make it even more accurate if you use a larger data type. I utilized an unsigned short and did N/64000. So What ever the cos(##) and sin(##) where closest to I rounded to that index. I also tried to use as little extra data as possible so this wouldn't be some cluttered table with 720 float values for cos and sin. Which would probably give better results, but be a complete waste of memory. The table below is as small as I could make it. I'd like to see if it's possible to make an equation that could round to all these short values and use that instead. I'm not sure if it would be any faster, but it would eliminate the table completely and probably not reduce speed by anything or much.

下面是用于构建上述表格的代码。如果您使用更大的数据类型,您可能会使其更加准确。我使用了一个无符号的 short 并做了 N/64000。因此,最接近我的 cos(##) 和 sin(##) 舍入到该索引。我还尝试使用尽可能少的额外数据,这样就不会是一些带有 720 个 cos 和 sin 浮点值的杂乱表格。这可能会产生更好的结果,但完全浪费内存。下表尽可能小。我想看看是否有可能制作一个可以四舍五入到所有这些短值的等式并使用它。我不确定它是否会更快,但它会完全消除桌子,并且可能不会降低任何或太多的速度。

So the accuracy in comparison to the C++ cos/sin operations is 99.99998% through 100%.

因此,与 C++ cos/sin 运算相比,准确度为 99.99998% 到 100%。

Below is the table used to calculate the cos/sin values.

下表是用于计算 cos/sin 值的表格。

static const unsigned __int16 DEGREE_LOOKUP_TABLE[91] =
{
    64000, 63990, 63961, 63912, 63844, 63756,
    63649, 63523, 63377, 63212, 63028, 62824,
    62601, 62360, 62099, 61819, 61521, 61204,
    60868, 60513, 60140, 59749, 59340, 58912,
    58467, 58004, 57523, 57024, 56509, 55976,
    55426, 54859, 54275, 53675, 53058, 52426,
    51777, 51113, 50433, 49737, 49027, 48301,
    47561, 46807, 46038, 45255, 44458, 43648,
    42824, 41988, 41138, 40277, 39402, 38516,
    37618, 36709, 35788, 34857, 33915, 32962,
    32000, 31028, 30046, 29055, 28056, 27048,
    26031, 25007, 23975, 22936, 21889, 20836,
    19777, 18712, 17641, 16564, 15483, 14397,
    13306, 12212, 11113, 10012,  8907,  7800,
     6690,  5578,  4464,  3350,  2234,  1117,
        0,
};

Below is the actual code that does the cos/sin calculations.

下面是进行 cos/sin 计算的实际代码。

    int deg1 = (int)degrees;
    int deg2 = 90 - deg1;
    float module = degrees - deg1;
    double vX = DEGREE_LOOKUP_TABLE[deg1] * 0.000015625;
    double vZ = DEGREE_LOOKUP_TABLE[deg2] * 0.000015625;
    double mX = DEGREE_LOOKUP_TABLE[deg1 + 1] * 0.000015625;
    double mZ = DEGREE_LOOKUP_TABLE[deg2 - 1] * 0.000015625;
    float vectorX = vX + (mX - vX) * module;
    float vectorZ = vZ + (mZ - vZ) * module;
    if (quadrant & 1)
    {
        float tmp = vectorX;
        if (quadrant == 1)
        {
            vectorX = -vectorZ;
            vectorZ = tmp;
        } else {
            vectorX = vectorZ;
            vectorZ = -tmp;
        }
    } else if (quadrant == 2) {
        vectorX = -vectorX;
        vectorZ = -vectorZ;
    }

SPEEDS BELOW using the originally mention computer specifications. I was running it in debug mode before this is debug mode, but is ran through the executable which I believe is debug without debugging.

速度低于使用最初提到的计算机规格。在这是调试模式之前,我在调试模式下运行它,但是通过我认为是调试而不调试的可执行文件运行。

MY METHOD

我的方法

1,000 Iterations -> 0.004641 MS or 4641 NanoSeconds.
100,000 Iterations -> 4.4328 MS.
100,000,000 Iterations -> 454.079 MS.
1,000,000,000 Iterations -> 4065.19 MS.

COS/SIN METHOD

COS/SIN 方法

1,000 Iterations -> 0.581016 MS or 581016 NanoSeconds.
100,000 Iterations -> 25.0049 MS.
100,000,000 Iterations -> 24,731.6 MS.
1,000,000,000 Iterations -> 246,096 MS.

So to summarize the above performing both cos(###) and sin(###) with my strategy allows roughly 220,000,000 executions per second. Utilizing the computer specifications shown originally. This is fairly quick and utilizes very little memory so it's a great substitute to math cos/sin functions normally found in C++. If you'd like to see the accuracy open the link shown above and there is a print out of degrees 0 trough 359. Also this supports 0 through 89 and quadrants 0 through 3. So you'd need to either use that or perform (DEGREES % 90).

所以总结一下上面用我的策略执行 cos(###) 和 sin(###) 允许大约每秒 220,000,000 次执行。使用最初显示的计算机规格。这相当快,而且占用的内存很少,因此它是 C++ 中常见的数学 cos/sin 函数的一个很好的替代品。如果您想查看精度,请打开上面显示的链接,并打印出 0 度波谷 359。此外,这支持 0 到 89 和象限 0 到 3。因此您需要使用它或执行 (度 % 90)。

回答by Lior Kogan

If you want to use a custom implementation, look here, hereand here

如果您想使用自定义实现,请查看此处此处此处

Also here(scroll to Universal SIMD-Mathlibrary) if you need to calculate sin/cos for large arrays

如果您需要计算大型数组的正弦/余弦,也在这里(滚动到通用 SIMD 数学库)

You can also try to use the C++ SSE intrinsics. Look here

您还可以尝试使用 C++ SSE 内在函数。看这里

Note that most modern compilers support SSE and SSE2 optimizations. For Visual Studio 2010, for example, you'll need to manually enable it. Once you do this, a different implementation will be used for most standard math functions.

请注意,大多数现代编译器都支持 SSE 和 SSE2 优化。例如,对于 Visual Studio 2010,您需要手动启用它。完成此操作后,大多数标准数学函数将使用不同的实现。

One more option is to use DirectX HLSL. Look here. Note that there is a nice sincosfunctions which return both sin and cos.

另一种选择是使用 DirectX HLSL。看这里。请注意,有一个很好的sincos函数,它返回 sin 和 cos。

Usually, I use IPP (which is not free). For details, look here

通常,我使用 IPP(不是免费的)。详情请看这里

回答by Necrolis

Quake 3's source has some code for precomputed sine/cos aimed at speed over precision, its not sse based that thus quite portable(both on architecture and intrinsic api). You might also find this summary of sse and sse2 based functions very interesting: http://gruntthepeon.free.fr/ssemath/

Quake 3 的源代码有一些用于预计算正弦/余弦的代码,旨在速度超过精度,它不是基于 sse 的,因此非常便携(在体系结构和内在 api 上)。您可能还会发现这个基于 sse 和 sse2 的函数的摘要非常有趣:http://gruntthepeon.free.fr/ssemath/

回答by Mike Dunlavey

A) Trying to save small percents will not be very satisfying. Finishing in 97 instead of 100 hours is still a long time.

A) 试图节省一小部分不会很令人满意。在 97 小时而不是 100 小时内完成仍然是一个很长的时间。

B) You say you profiled, and that the trig functions take more time than you would like. How much? and what about all the remaining time? It's quite possible you have bigger fish to fry. Most profilers based on the gprof conceptsdo not tell you about mid-stack calls that you could focus on to save larger amounts of time. Here's an example.

B)您说您进行了分析,并且三角函数花费的时间比您想要的要多。多少?剩下的时间呢?你很可能有更大的鱼要煎。大多数基于 gprof 概念的分析器不会告诉您有关中间堆栈调用的信息,您可以专注于节省大量时间。这是一个例子。

回答by hevi

I've implemented a fast sine function on cpu side which is at least two times faster than math.h ' s sine function however I used a very small lookup table(20 floats). it's accuracy is also not bad at all; average relative error rate is 0.095%. you can check it out from http://www.hevi.info/tag/fast-sine-function/

我在 cpu 端实现了一个快速正弦函数,它至少比 math.h 的正弦函数快两倍,但是我使用了一个非常小的查找表(20 个浮点数)。它的准确性也不差;平均相对错误率为 0.095%。你可以从http://www.hevi.info/tag/fast-sine-function/ 查看

Explanation of the method is quite simple and relies on the fact that for small a's sin(a) = a * pi / 180 (see the link above for the proof)

该方法的解释非常简单,并且依赖于一个事实,即对于小 a 的 sin(a) = a * pi / 180(有关证明,请参见上面的链接)

enter image description here

在此处输入图片说明

Some Trigonometry

一些三角学

Although it is possible to achieve relatively accurate results with the formula shown above for angles between 0 and 10, as the angle gets wider as it loses accuricy. Therefore we should use the formula for angles less than 10 but how?!

尽管使用上面显示的公式可以针对 0 到 10 之间的角度获得相对准确的结果,但随着精度的降低,角度会变宽。因此我们应该对小于 10 的角度使用公式,但是如何?!

The answer comes from the trigonometric sine addition formula;

答案来自三角正弦加法公式;

sin(a+b) = sin(a) cos(b) + sin(b) cos(a)

sin(a+b) = sin(a) cos(b) + sin(b) cos(a)

If we can keep the ‘b' less than 10 then we will be able to use our formula in order to find the sine with a couple of aritchmetic operations.

如果我们可以保持 'b' 小于 10,那么我们将能够使用我们的公式通过一些算术运算来找到正弦值。

Let's say we are asked the sine value for 71.654, then;

假设我们被问到 71.654 的正弦值,然后;

a = 70

一 = 70

b = 1.654

b = 1.654

and,

和,

sin(71.654) = sin(70 + 1.654) = sin(70) cos(1.654) + sin(1.654) cos (70)

sin(71.654) = sin(70 + 1.654) = sin(70) cos(1.654) + sin(1.654) cos (70)

In this formula we are able to use the fast calculation for the sin(1.654) part and for the rest unfortunately we need to have sine and cosine tables. The good thing is we only need the multiply of tens for sine and natural number angles between 0 and 10 for cosine.

在这个公式中,我们能够对 sin(1.654) 部分使用快速计算,不幸的是,对于其余部分,我们需要有正弦和余弦表。好消息是我们只需要乘以 10 的正弦和 0 到 10 之间的自然数角的余弦。

回答by mAc

You can look at this. It talks about optimizing sin, cos.

你可以看看这个。它谈到了优化 sin、cos。

回答by Yuriy Vikulov

Long time ago on slow machines people used an arrays with precomputed values. another option to calculate with your own precision like this: (look for "Series definitions")

很久以前,在慢速机器上,人们使用具有预先计算值的数组。另一种选择计算与自己的精度像这样:(查找“系列的定义”)

回答by Rex Kerr

For 2-3% gain, this is almost certainly not worth the risk of inaccuracy, error, assumptions no longer being true (e.g. never falling outside of [-1,-1]), etc., unless you are planning on running this on a huge number of machines (where 2-3% represents thousands or millions of dollars in electricity and amortized cost of the machine).

对于 2-3% 的收益,这几乎肯定不值得冒不准确、错误、假设不再成立(例如永远不会超出[-1,-1])等风险,除非您计划在大量机器上运行它(其中 2-3% 代表数千或数百万美元的电力和机器的摊销成本)。

That said, if you have domain-specific knowledge about what you are trying to accomplish, you may be able to speed up your computations by a factor of two or more. For example, if you always need sinand cosof the same value, calculate them close to each other in the code and make sure that your compiler translates them into a FSINCOS assembly instruction (see this question). If you need only a small portion of the full range of the function, you can potentially use a set of low-order polynomials followed by an iteration of Newton's method to get full machine precision (or as much as you need). Again, this is much more powerful if you know that you only need some values--e.g. if you can use that sin(x) is close to x near zero, and you will only be needing values near zero, then you can dramatically decrease the number of terms you need.

也就是说,如果您对要完成的工作具有特定领域的知识,则可以将计算速度提高两倍或更多。例如,如果您总是需要sincos具有相同的值,请在代码中计算它们彼此接近,并确保您的编译器将它们转换为 FSINCOS 汇编指令(请参阅此问题)。如果您只需要整个函数范围的一小部分,您可以使用一组低阶多项式,然后是牛顿方法的迭代,以获得完整的机器精度(或尽可能多的精度)。同样,如果你知道你只需要一些值,这会更强大——例如,如果你可以使用 sin(x) 接近于零的 x,并且你只需要接近零的值,那么你可以显着减少您需要的术语数。

But, again, my primary advice is: 2-3% is not worth it. Think harder about the algorithms used and other potential bottlenecks (e.g. is malloc eating too much time?) before you optimize this.

但是,我的主要建议是:2-3% 不值得。在优化之前更仔细地考虑所使用的算法和其他潜在的瓶颈(例如 malloc 是否占用了太多时间?)。