C++ 快速弧余弦算法？

Question

提问by jmasterx

I have my own, very fast cos function:

我有自己的，非常快的 cos 函数：

float sine(float x)
{
    const float B = 4/pi;
    const float C = -4/(pi*pi);

    float y = B * x + C * x * abs(x);

    //  const float Q = 0.775;
    const float P = 0.225;

    y = P * (y * abs(y) - y) + y;   // Q * y + P * y * abs(y)


    return y;
}

float cosine(float x)
{
    return sine(x + (pi / 2));
}

But now when I profile, I see that acos() is killing the processor. I don't need intense precision. What is a fast way to calculate acos(x) Thanks.

但是现在当我分析时，我看到 acos() 正在杀死处理器。我不需要很高的精度。什么是计算 acos(x) 的快速方法谢谢。

Answer 1

回答by dan04

A simple cubic approximation, the Lagrange polynomial for x ∈ {-1, -?, 0, ?, 1}, is:

一个简单的三次近似，x ∈ {-1, -?, 0, ?, 1} 的拉格朗日多项式是：

double acos(x) {
   return (-0.69813170079773212 * x * x - 0.87266462599716477) * x + 1.5707963267948966;
}

It has a maximum error of about 0.18 rad.

它的最大误差约为 0.18 rad。

Answer 2

回答by spender

Got spare memory? A lookup table (with interpolation, if required) is gonna be fastest.

有备用内存吗？查找表（如果需要，带有插值）将是最快的。

Answer 3

回答by Fnord

nVidia has some great resourcesthat show how to approximate otherwise very expensive math functions, such as: acos asin atan2etc etc...

nVidia 有一些很好的资源，展示了如何近似否则非常昂贵的数学函数，例如：acos asin atan2等...

These algorithms produce good results when speed of execution is more important (within reason) than precision. Here's their acos function:

当执行速度（在合理范围内）比精度更重要时，这些算法会产生良好的结果。这是他们的 acos 函数：

// Absolute error <= 6.7e-5
float acos(float x) {
  float negate = float(x < 0);
  x = abs(x);
  float ret = -0.0187293;
  ret = ret * x;
  ret = ret + 0.0742610;
  ret = ret * x;
  ret = ret - 0.2121144;
  ret = ret * x;
  ret = ret + 1.5707288;
  ret = ret * sqrt(1.0-x);
  ret = ret - 2 * negate * ret;
  return negate * 3.14159265358979 + ret;
}

And here are the results for when calculating acos(0.5):

以下是计算 acos(0.5) 时的结果：

nVidia:   result: 1.0471513828611643
math.h:   result: 1.0471975511965976

That's pretty close! Depending on your required degree of precision, this might be a good option for you.

那非常接近！根据您所需的精度，这对您来说可能是一个不错的选择。

Answer 4

回答by Trey Reynolds

I have my own. It's pretty accurate and sort of fast. It works off of a theorem I built around quartic convergence. It's really interesting, and you can see the equation and how fast it can make my natural log approximation converge here: https://www.desmos.com/calculator/yb04qt8jx4

我有我自己的。它非常准确，而且有点快。它基于我围绕四次收敛建立的定理。这真的很有趣，你可以在这里看到方程以及它可以使我的自然对数近似收敛的速度：https: //www.desmos.com/calculator/yb04qt8jx4

Here's my arccos code:

这是我的 arccos 代码：

function acos(x)
    local a=1.43+0.59*x a=(a+(2+2*x)/a)/2
    local b=1.65-1.41*x b=(b+(2-2*x)/b)/2
    local c=0.88-0.77*x c=(c+(2-a)/c)/2
    return (8*(c+(2-a)/c)-(b+(2-2*x)/b))/6
end

A lot of that is just square root approximation. It works really well, too, unless you get too close to taking a square root of 0. It has an average error (excluding x=0.99 to 1) of 0.0003. The problem, though, is that at 0.99 it starts going to shit, and at x=1, the difference in accuracy becomes 0.05. Of course, this could be solved by doing more iterations on the square roots (lol nope) or, just a little thing like, if x>0.99 then use a different set of square root linearizations, but that makes the code all long and ugly.

其中很多只是平方根近似。它也非常有效，除非您太接近取 0 的平方根。它的平均误差（不包括 x=0.99 到 1）为 0.0003。但问题是，在 0.99 时它开始变得糟糕，而在 x=1 时，准确度差异变为 0.05。当然，这可以通过对平方根进行更多迭代来解决（lol nope），或者，如果 x>0.99 则使用一组不同的平方根线性化，但这会使代码变得又长又丑.

If you don't care about accuracy so much, you could just do one iteration per square root, which should still keep you somewhere in the range of 0.0162 or something as far as accuracy goes:

如果您不太关心准确度，您可以只对每个平方根进行一次迭代，这仍然应该使您保持在 0.0162 或就准确度而言的范围内的某处：

function acos(x)
    local a=1.43+0.59*x a=(a+(2+2*x)/a)/2
    local b=1.65-1.41*x b=(b+(2-2*x)/b)/2
    local c=0.88-0.77*x c=(c+(2-a)/c)/2
    return 8/3*c-b/3
end

If you're okay with it, you can use pre-existing square root code. It will get rid of the the equation going a bit crazy at x=1:

如果您可以接受，则可以使用预先存在的平方根代码。它将摆脱在 x=1 时有点疯狂的等式：

function acos(x)
    local a = math.sqrt(2+2*x)
    local b = math.sqrt(2-2*x)
    local c = math.sqrt(2-a)
    return 8/3*d-b/3
end

Frankly, though, if you're really pressed for time, remember that you could linearize arccos into 3.14159-1.57079x and just do:

不过，坦率地说，如果您真的时间紧迫，请记住，您可以将 arccos 线性化为 3.14159-1.57079x，然后执行以下操作：

function acos(x)
    return 1.57079-1.57079*x
end

Anyway, if you want to see a list of my arccos approximation equations, you can go to https://www.desmos.com/calculator/tcaty2sv8lI know that my approximations aren't the best for certain things, but if you're doing something where my approximations would be useful, please use them, but try to give me credit.

无论如何，如果你想查看我的 arccos 近似方程列表，你可以去https://www.desmos.com/calculator/tcaty2sv8l我知道我的近似值对于某些事情来说不是最好的，但是如果你正在做一些我的近似值有用的事情，请使用它们，但请尝试给我信任。

Answer 5

回答by Ruud

You can approximate the inverse cosine with a polynomial as suggested by dan04, but a polynomial is a pretty bad approximation near -1 and 1 where the derivative of the inverse cosine goes to infinity. When you increase the degree of the polynomial you hit diminishing returns quickly, and it is still hard to get a good approximation around the endpoints. A rational function(the quotient of two polynomials) can give a much better approximation in this case.

您可以按照 dan04 的建议使用多项式近似反余弦，但多项式在 -1 和 1 附近是一个非常糟糕的近似值，其中反余弦的导数趋于无穷大。当您增加多项式的次数时，您很快就会遇到收益递减，并且仍然很难在端点周围获得良好的近似值。一个有理函数（两个多项式的商）可以在这个情况下，一个更好的近似。

acos(x) ≈ π/2 + (ax + bx3) / (1 + cx2 + dx?)

where

在哪里

a = -0.939115566365855
b =  0.9217841528914573
c = -1.2845906244690837
d =  0.295624144969963174

has a maximum absolute error of 0.017 radians (0.96 degrees) on the interval (-1, 1). Here is a plot(the inverse cosine in black, cubic polynomial approximation in red, the above function in blue) for comparison:

在区间 (-1, 1) 上的最大绝对误差为 0.017 弧度（0.96 度）。这是一个比较图（黑色为反余弦，红色为三次多项式近似，蓝色为上述函数）：

The coefficients above have been chosen to minimise the maximum absolute error over the entire domain. If you are willing to allow a larger error at the endpoints, the error on the interval (-0.98, 0.98) can be made much smaller. A numerator of degree 5 and a denominator of degree 2 is about as fast as the above function, but slightly less accurate. At the expense of performance you can increase accuracy by using higher degree polynomials.

已选择上述系数以最小化整个域的最大绝对误差。如果您愿意在端点处允许更大的误差，则可以将区间 (-0.98, 0.98) 上的误差设置得更小。5 阶分子和 2 阶分母与上述函数的速度差不多，但精度稍差。以牺牲性能为代价，您可以通过使用更高次的多项式来提高准确性。

A note about performance: computing the two polynomials is still very cheap, and you can use fused multiply-add instructions. The division is not so bad, because you can use the hardware reciprocal approximation and a multiply. The error in the reciprocal approximation is negligible in comparison with the error in the acos approximation. On a 2.6 GHz Skylake i7, this approximation can do about 8 inverse cosines every 6 cycles using AVX. (That is throughput, the latency is longer than 6 cycles.)

关于性能的注意事项：计算两个多项式仍然非常便宜，您可以使用融合乘加指令。除法还不错，因为您可以使用硬件倒数近似和乘法。与 acos 近似中的误差相比，倒数近似中的误差可以忽略不计。在 2.6 GHz Skylake i7 上，这种近似可以使用 AVX 每 6 个周期执行大约 8 个反余弦。（即吞吐量，延迟超过 6 个周期。）

Answer 6

回答by dan04

Another approach you could take is to use complex numbers. From de Moivre's formula,

您可以采用的另一种方法是使用复数。根据de Moivre 公式，

?^x= cos(π/2*x) + ?*sin(π/2*x)

? ^x= cos(π/2*x) + ?*sin(π/2*x)

Let θ = π/2*x. Then x = 2θ/π, so

设 θ = π/2*x。那么 x = 2θ/π，所以

sin(θ) = ?(?^^2θ/π)
cos(θ) = ?(?^^2θ/π)

sin(θ) = ?(?^ ^2θ/π)
cos(θ) = ?(?^ ^2θ/π)

How can you calculate powers of ? without sin and cos? Start with a precomputed table for powers of 2:

你如何计算的幂？没有罪和罪？从预先计算的 2 次幂表开始：

?⁴= 1
?²= -1
?¹= ?
?^1/2= 0.7071067811865476 + 0.7071067811865475*?
?^1/4= 0.9238795325112867 + 0.3826834323650898*?
?^1/8= 0.9807852804032304 + 0.19509032201612825*?
?^1/16= 0.9951847266721969 + 0.0980171403295606*?
?^1/32= 0.9987954562051724 + 0.049067674327418015*?
?^1/64= 0.9996988186962042 + 0.024541228522912288*?
?^1/128= 0.9999247018391445 + 0.012271538285719925*?
?^1/256= 0.9999811752826011 + 0.006135884649154475*?

? ⁴= 1
? ²= -1
? ¹= ?
? ^1/2= 0.7071067811865476 + 0.7071067811865475*？
? ^1/4= 0.9238795325112867 + 0.3826834323650898*？
? ^1/8= 0.9807852804032304 + 0.19509032201612825*？
? ^1/16= 0.9951847266721969 + 0.0980171403295606*？
? ^1/32= 0.9987954562051724 + 0.049067674327418015*？
? ^1/64= 0.9996988186962042 + 0.024541228522912288*？
? ^1/128= 0.9999247018391445 + 0.012271538285719925*？
? ^1/256= 0.9999811752826011 + 0.006135884649154475*？

To calculate arbitrary values of ?^x, approximate the exponent as a binary fraction, and then multiply together the corresponding values from the table.

计算任意值 ? ^x，将指数近似为二进制分数，然后将表中的相应值相乘。

For example, to find sin and cos of 72° = 0.8π/2:

例如，要找到 72° = 0.8π/2 的 sin 和 cos：

?^0.8≈ ?^205/256= ?^0b11001101= ?^1/2* ?^1/4* ?^1/32* ?^1/64* ?^1/256
= 0.3078496400415349 + 0.9514350209690084*?

? ^0.8≈ ? ^205/256= ? ^0b11001101= ? ^1/2* ? ^1/4* ? ^1/32* ? ^1/64* ? ^1/256
= 0.3078496400415349 + 0.9514350209690084*？

sin(72°) ≈ 0.9514350209690084 ("exact" value is 0.9510565162951535)
cos(72°) ≈ 0.3078496400415349 ("exact" value is 0.30901699437494745).

sin(72°) ≈ 0.9514350209690084（“精确”值为 0.9510565162951535）
cos(72°) ≈ 0.3078496400415349（“精确”值为 0.30901699437494745）。

To find asin and acos, you can use this table with the Bisection Method:

要查找 asin 和 acos，您可以将此表与二分法一起使用：

For example, to find asin(0.6) (the smallest angle in a 3-4-5 triangle):

例如，要找到 asin(0.6)（3-4-5 三角形中的最小角）：

?⁰= 1 + 0*?. The sin is too small, so increase x by 1/2.
?^1/2= 0.7071067811865476 + 0.7071067811865475*? . The sin is too big, so decrease x by 1/4.
?^1/4= 0.9238795325112867 + 0.3826834323650898*?. The sin is too small, so increase x by 1/8.
?^3/8= 0.8314696123025452 + 0.5555702330196022*?. The sin is still too small, so increase x by 1/16.
?^7/16= 0.773010453362737 + 0.6343932841636455*?. The sin is too big, so decrease x by 1/32.
?^13/32= 0.8032075314806449 + 0.5956993044924334*?.

? ⁰= 1 + 0*?。sin 太小，所以将 x 增加 1/2。
? ^1/2= 0.7071067811865476 + 0.7071067811865475*? . 罪过太大，所以将 x 减少 1/4。
? ^1/4= 0.9238795325112867 + 0.3826834323650898*?。sin 太小，所以将 x 增加 1/8。
? ^3/8= 0.8314696123025452 + 0.5555702330196022*?。sin 仍然太小，所以将 x 增加 1/16。
? ^7/16= 0.773010453362737 + 0.6343932841636455*?。罪过太大，所以将 x 减少 1/32。
? ^13/32= 0.8032075314806449 + 0.5956993044924334* ?.

Each time you increase x, multiply by the corresponding power of ?. Each time you decrease x, divide by the corresponding power of ?.

每次增加 x，乘以 ? 的相应幂。每次减少 x，除以 ? 的相应幂。

If we stop here, we obtain acos(0.6) ≈ 13/32*π/2 = 0.6381360077604268 (The "exact" value is 0.6435011087932844.)

如果我们到此为止，我们会得到 acos(0.6) ≈ 13/32*π/2 = 0.6381360077604268（“精确”值为 0.6435011087932844。）

The accuracy, of course, depends on the number of iterations. For a quick-and-dirty approximation, use 10 iterations. For "intense precision", use 50-60 iterations.

当然，准确性取决于迭代次数。对于快速而肮脏的近似值，请使用 10 次迭代。对于“高精确度”，使用 50-60 次迭代。

Answer 7

回答by julian kizanis

Here is a great website with many options: https://www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/arcsin/onlyelem.html

这是一个很棒的网站，有很多选择：https: //www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/arcsin/onlyelem.html

Personally I went the Chebyshev-Pade quotient approximation with with the following code:

我个人使用以下代码进行 Chebyshev-Pade 商近似：

double arccos(double x) {
const double pi = 3.141592653;
    return pi / 2 - (.5689111419 - .2644381021*x - .4212611542*(2*x - 1)*(2*x - 1)
         + .1475622352*(2*x - 1)*(2*x - 1)*(2*x - 1))
         / (2.006022274 - 2.343685222*x + .3316406750*(2*x - 1)*(2*x - 1) +
             .02607135626*(2*x - 1)*(2*x - 1)*(2*x - 1));
}

Answer 8

回答by njuffa

A fast arccosine implementation, accurate to about 0.5 degrees, can be based on the observationthat for x in [0,1], acos(x) ≈ √(2*(1-x)). An additional scale factor improves accuracy near zero. The optimal factor can be found by a simple binary search. Negative arguments are handled according to acos (-x) = π - acos (x).

一个快速反余弦实现，精确到大约 0.5 度，可以基于以下观察：对于 [0,1] 中的 x，acos(x) ≈ √(2*(1-x))。额外的比例因子提高了接近零的精度。可以通过简单的二分搜索找到最佳因子。根据 acos (-x) = π - acos (x) 处理负参数。

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>

// Approximate acos(a) with relative error < 5.15e-3
// This uses an idea from Robert Harley's posting in comp.arch.arithmetic on 1996/07/12
// https://groups.google.com/forum/#!original/comp.arch.arithmetic/wqCPkCCXqWs/T9qCkHtGE2YJ
float fast_acos (float a)
{
    const float PI = 3.14159265f;
    const float C  = 0.10501094f;
    float r, s, t, u;
    t = (a < 0) ? (-a) : a;  // handle negative arguments
    u = 1.0f - t;
    s = sqrtf (u + u);
    r = C * u * s + s;  // or fmaf (C * u, s, s) if FMA support in hardware
    if (a < 0) r = PI - r;  // handle negative arguments
    return r;
}

float uint_as_float (uint32_t a)
{
    float r;
    memcpy (&r, &a, sizeof(r));
    return r;
}

int main (void)
{
    double maxrelerr = 0.0;
    uint32_t a = 0;
    do {
        float x = uint_as_float (a);
        float r = fast_acos (x);
        double xx = (double)x;
        double res = (double)r;
        double ref = acos (xx);
        double relerr = (res - ref) / ref;
        if (fabs (relerr) > maxrelerr) {
            maxrelerr = fabs (relerr);
            printf ("xx=% 15.8e  res=% 15.8e  ref=% 15.8e  rel.err=% 15.8e\n",
                    xx, res, ref, relerr);
        }
        a++;
    } while (a);
    printf ("maximum relative error = %15.8e\n", maxrelerr);
    return EXIT_SUCCESS;
}

The output of the above test scaffold should look similar to this:

上述测试脚手架的输出应该类似于：

xx= 0.00000000e+000  res= 1.56272149e+000  ref= 1.57079633e+000  rel.err=-5.14060021e-003
xx= 2.98023259e-008  res= 1.56272137e+000  ref= 1.57079630e+000  rel.err=-5.14065723e-003
xx= 8.94069672e-008  res= 1.56272125e+000  ref= 1.57079624e+000  rel.err=-5.14069537e-003
xx=-2.98023259e-008  res= 1.57887137e+000  ref= 1.57079636e+000  rel.err= 5.14071269e-003
xx=-8.94069672e-008  res= 1.57887149e+000  ref= 1.57079642e+000  rel.err= 5.14075044e-003
maximum relative error = 5.14075044e-003

Answer 9

回答by John Doe

If you're using Microsoft VC++, here's an inline __asm x87 FPU code version without all the CRT filler, error checks, etc. and unlike the earliest classic ASM code you can find, it uses a FMUL instead of the slower FDIV. It compiles/works with Microsoft VC++ 2005 Express/Pro what I always stick with for various reasons.

如果您使用 Microsoft VC++，这里有一个内联 __asm x87 FPU 代码版本，没有所有 CRT 填充、错误检查等。与您能找到的最早的经典 ASM 代码不同，它使用 FMUL 而不是较慢的 FDIV。它与 Microsoft VC++ 2005 Express/Pro 一起编译/工作，我出于各种原因一直坚持使用。

It's a little tricky to setup a function with "__declspec(naked)/__fastcall", pull parameters correctly, handle stack, so not for the faint of heart. If it fails to compile with errors on your version, don't bother unless you're experienced. Or ask me, I can rewrite it in a slightly friendlier __asm{} block. I would manually inline this if it's a critical part of a function in a loop for further performance gains if need be.

使用“__declspec(naked)/__fastcall”设置函数、正确拉参数、处理堆栈有点棘手，所以不适合胆小的人。如果它无法编译并在您的版本上出现错误，除非您有经验，否则不要打扰。或者问我，我可以用稍微友好的 __asm{} 块重写它。如果它是循环中函数的关键部分，我会手动内联它，以便在需要时进一步提高性能。

extern float __fastcall fs_acos(float x);
extern double __fastcall fs_Acos(double x);

// ACOS(x)- Computes the arccosine of ST(0)
// Allowable range: -1<=x<=+1
// Derivative Formulas: acos(x) = atan(sqrt((1 - x * x)/(x * x))) OR
// acos(x) = atan2(sqrt(1 - x * x), x)
// e.g. acos(-1.0) = 3.1415927

__declspec(naked) float __fastcall fs_acos(float x) { __asm {
    FLD   DWORD PTR [ESP+4] ;// Load/Push parameter 'x' to FPU stack
    FLD1            ;// Load 1.0
    FADD  ST, ST(1) ;// Compute 1.0 + 'x'
    FLD1            ;// Load 1.0
    FSUB  ST, ST(2) ;// Compute 1.0 - 'x'
    FMULP ST(1), ST ;// Compute (1-x) * (1+x)
    FSQRT           ;// Compute sqrt(result)
    FXCH  ST(1)
    FPATAN          ;// Compute arctangent of result / 'x' (ST1/ST0)
    RET 4
}}

__declspec(naked) double __fastcall fs_Acos(double x) { __asm { //
    FLD   QWORD PTR [ESP+4] ;// Load/Push parameter 'x' to FPU stack
    FLD1            ;// Load 1.0
    FADD  ST, ST(1) ;// Compute (1.0 + 'x')
    FLD1            ;// Load 1.0
    FSUB  ST, ST(2) ;// Compute (1.0 - 'x')
    FMULP ST(1), ST ;// Compute (1-x) * (1+x)
    FSQRT           ;// Compute sqrt((1-x) * (1+x))
    FXCH  ST(1) 
    FPATAN          ;// Compute arctangent of result / 'x' (ST1/ST0)
    RET 8
}}

Answer 10

回答by SergeD

Unfortunately I do not have enough reputation to comment. Here is a small modification of Nvidia's function, that deals with the fact that numbers that should be <= 1 while preserving performance as much as possible.

不幸的是，我没有足够的声誉来发表评论。这是对 Nvidia 函数的一个小修改，它处理的事实是数字应该 <= 1，同时尽可能地保持性能。

It may be important since rounding errors can lead number that should be 1.0 to be (oh so slightly) larger than 1.0.

这可能很重要，因为四舍五入错误可能导致应该是 1.0 的数字（哦，稍微）大于 1.0。


double safer_acos(double x) {
  double negate = double(x < 0);
  x = abs(x);
  x -= double(x>1.0)*(x-1.0); // <- equivalent to min(1.0,x), but faster
  double ret = -0.0187293;
  ret = ret * x;
  ret = ret + 0.0742610;
  ret = ret * x;
  ret = ret - 0.2121144;
  ret = ret * x;
  ret = ret + 1.5707288;
  ret = ret * sqrt(1.0-x);
  ret = ret - 2 * negate * ret;
  return negate * 3.14159265358979 + ret;

  // In a single line (no gain using gcc)
  //return negate * 3.14159265358979 + (((((-0.0187293*x)+ 0.0742610)*x - 0.2121144)*x + 1.5707288)* sqrt(1.0-x))*(1.0-2.0*negate);

}

C++ 快速弧余弦算法？

提问by jmasterx

回答by dan04

回答by spender

回答by Fnord

回答by Trey Reynolds

回答by Ruud

回答by dan04

回答by julian kizanis

回答by njuffa

回答by John Doe

回答by SergeD

相关推荐

最近更新

标签

C++ 快速弧余弦算法？

提问by jmasterx

回答by dan04

回答by spender

回答by Fnord

回答by Trey Reynolds

回答by Ruud

回答by dan04

回答by julian kizanis

回答by njuffa

回答by John Doe

回答by SergeD

相关推荐

C++ 我们应该通过引用还是通过值传递 shared_ptr ？

C++ 具有依赖范围的嵌套模板

c ++朋友函数-运算符重载istream >>

C++中的静态抽象方法

相关推荐

最近更新

标签