C++ 快速定点 pow、log、exp 和 sqrt
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4657468/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fast fixed point pow, log, exp and sqrt
提问by Goz
I've got a fixed point class (10.22) and I have a need of a pow, a sqrt, an exp and a log function.
我有一个定点类 (10.22),我需要一个 pow、一个 sqrt、一个 exp 和一个日志函数。
Alas I have no idea where to even start on this. Can anyone provide me with some links to useful articles or, better yet, provide me with some code?
唉,我什至不知道从哪里开始。谁能为我提供一些有用文章的链接,或者更好的是,为我提供一些代码?
I'm assuming that once I have an exp function then it becomes relatively easy to implement pow and sqrt as they just become.
我假设一旦我有了 exp 函数,那么实现 pow 和 sqrt 就变得相对容易了。
pow( x, y ) => exp( y * log( x ) )
sqrt( x ) => pow( x, 0.5 )
Its just those exp and log functions that I'm finding difficult (as though I remember a few of my log rules, I can't remember much else about them).
它只是那些我发现困难的 exp 和 log 函数(好像我记得我的一些日志规则,但我不记得关于它们的更多内容)。
Presumably, there would also be a faster method for sqrt and pow so any pointers on that front would be appreciated even if its just to say use the methods i outline above.
据推测,sqrt 和 pow 也会有一种更快的方法,因此即使只是说使用我上面概述的方法,也将不胜感激任何在这方面的指针。
Please note: This HAS to be cross platform and in pure C/C++ code so I cannot use any assembler optimisations.
请注意:这必须是跨平台和纯 C/C++ 代码,所以我不能使用任何汇编器优化。
回答by MSalters
A very simple solution is to use a decent table-driven approximation. You don't actually need a lot of data if you reduce your inputs correctly. exp(a)==exp(a/2)*exp(a/2)
, which means you really only need to calculate exp(x)
for 1 < x < 2
. Over that range, a runga-kutta approximation would give reasonable results with ~16 entries IIRC.
一个非常简单的解决方案是使用一个不错的表驱动近似。如果您正确减少输入,您实际上并不需要大量数据。exp(a)==exp(a/2)*exp(a/2)
,这意味着你真的只需要计算exp(x)
的1 < x < 2
。在这个范围内,runga-kutta 近似将给出大约 16 个条目 IIRC 的合理结果。
Similarly, sqrt(a) == 2 * sqrt(a/4) == sqrt(4*a) / 2
which means you need only table entries for 1 < a < 4
. Log(a) is a bit harder: log(a) == 1 + log(a/e)
. This is a rather slow iteration, but log(1024) is only 6.9 so you won't have many iterations.
同样,sqrt(a) == 2 * sqrt(a/4) == sqrt(4*a) / 2
这意味着您只需要1 < a < 4
. 日志(a)是有点困难:log(a) == 1 + log(a/e)
。这是一个相当慢的迭代,但 log(1024) 只有 6.9,所以你不会有很多迭代。
You'd use a similar "integer-first" algorithm for pow: pow(x,y)==pow(x, floor(y)) * pow(x, frac(y))
. This works because pow(double, int)
is trivial (divide and conquer).
您将对 pow: 使用类似的“整数优先”算法pow(x,y)==pow(x, floor(y)) * pow(x, frac(y))
。这pow(double, int)
是可行的,因为它是微不足道的(分而治之)。
[edit] For the integral component of log(a)
, it may be useful to store a table 1, e, e^2, e^3, e^4, e^5, e^6, e^7
so you can reduce log(a) == n + log(a/e^n)
by a simple hardcoded binary search of a in that table. The improvement from 7 to 3 steps isn't so big, but it means you only have to divide once by e^n
instead of n
times by e
.
[编辑] 对于 的整数部分log(a)
,存储表可能很有用,1, e, e^2, e^3, e^4, e^5, e^6, e^7
因此您可以log(a) == n + log(a/e^n)
通过对该表中的 a 进行简单的硬编码二进制搜索来减少。从 7 步到 3 步的改进并不是很大,但这意味着您只需除以一次e^n
而不是除以n
次e
。
[edit 2]
And for that last log(a/e^n)
term, you can use log(a/e^n) = log((a/e^n)^8)/8
- each iteration produces 3 more bits by table lookup. That keeps your code and table size small. This is typically code for embedded systems, and they don't have large caches.
[编辑 2] 对于最后一个log(a/e^n)
术语,您可以使用log(a/e^n) = log((a/e^n)^8)/8
- 每次迭代通过 table lookup产生 3 个多位。这使您的代码和表的大小保持较小。这通常是嵌入式系统的代码,它们没有大的缓存。
[edit 3]
That's stil not to smart on my side. log(a) = log(2) + log(a/2)
. You can just store the fixed-point value log2=0.30102999566
, count the number of leading zeroes, shift a
into the range used for your lookup table, and multiply that shift (integer) by the fixed-point constant log2
. Can be as low as 3 instructions.
[编辑 3] 这对我来说仍然不聪明。log(a) = log(2) + log(a/2)
. 您可以只存储定点值log2=0.30102999566
,计算前导零的数量,移入a
用于查找表的范围,然后将该移位(整数)乘以定点常量log2
。可以低至 3 个指令。
Using e
for the reduction step just gives you a "nice" log(e)=1.0
constant but that's false optimization. 0.30102999566 is just as good a constant as 1.0; both are 32 bits constants in 10.22 fixed point. Using 2 as the constant for range reduction allows you to use a bit shift for a division.
使用e
还原步骤只是给你一个“好”log(e)=1.0
不变,但是这是错误的优化。0.30102999566 和 1.0 一样好;两者都是 10.22 定点中的 32 位常量。使用 2 作为范围缩减的常数允许您使用位移位进行除法。
You still get the trick from edit 2, log(a/2^n) = log((a/2^n)^8)/8
. Basically, this gets you a result (a + b/8 + c/64 + d/512) * 0.30102999566
- with b,c,d in the range [0,7]. a.bcd
really is an octal number. Not a surprise since we used 8 as the power. (The trick works equally well with power 2, 4 or 16.)
您仍然可以从编辑 2, 中获得诀窍log(a/2^n) = log((a/2^n)^8)/8
。基本上,这会给你一个结果(a + b/8 + c/64 + d/512) * 0.30102999566
- b,c,d 在 [0,7] 范围内。a.bcd
真的是一个八进制数。不足为奇,因为我们使用 8 作为幂。(这个技巧同样适用于 2、4 或 16 次方。)
[edit 4]
Still had an open end. pow(x, frac(y)
is just pow(sqrt(x), 2 * frac(y))
and we have a decent 1/sqrt(x)
. That gives us the far more efficient approach. Say frac(y)=0.101
binary, i.e. 1/2 plus 1/8. Then that means x^0.101
is (x^1/2 * x^1/8)
. But x^1/2
is just sqrt(x)
and x^1/8
is (sqrt(sqrt(sqrt(x)))
. Saving one more operation, Newton-Raphson NR(x)
gives us 1/sqrt(x)
so we calculate 1.0/(NR(x)*NR((NR(NR(x)))
. We only invert the end result, don't use the sqrt function directly.
[编辑 4] 仍然有一个开放的结局。pow(x, frac(y)
只是pow(sqrt(x), 2 * frac(y))
和我们有一个像样的1/sqrt(x)
。这为我们提供了更有效的方法。说frac(y)=0.101
二进制,即 1/2 加 1/8。那么这意味着x^0.101
是(x^1/2 * x^1/8)
。但x^1/2
只是sqrt(x)
并且x^1/8
是(sqrt(sqrt(sqrt(x)))
。再保存一次操作,Newton-RaphsonNR(x)
给了我们,1/sqrt(x)
所以我们计算1.0/(NR(x)*NR((NR(NR(x)))
。我们只反转最终结果,不直接使用 sqrt 函数。
回答by Dan Moulding
Below is an example C implementation of Clay S. Turner's fixed-point log base 2 algorithm[1]. The algorithm doesn't require any kind of look-up table. This can be useful on systems where memory constraints are tight and the processor lacks an FPU, such as is the case with many microcontrollers. Log base eand log base 10 are then also supported by using the property of logarithms that, for any base n:
下面是 Clay S. Turner 的定点对数基 2 算法 [ 1]的示例 C 实现。该算法不需要任何类型的查找表。这对于内存限制严格且处理器缺少 FPU 的系统非常有用,例如许多微控制器的情况。数底Ë和日志基座10然后也通过使用对数的特性的支持,对于任何碱Ñ:
log?(x)
log?(x) = ───────
log?(n)
where, for this algorithm, mequals 2.
其中,对于该算法,m等于 2。
A nice feature of this implementation is that it supports variable precision: the precision can be determined at runtime, at the expense of range. The way I've implemented it, the processor (or compiler) must be capable of doing 64-bit math for holding some intermediate results. It can be easily adapted to not require 64-bit support, but the range will be reduced.
这个实现的一个很好的特性是它支持可变精度:精度可以在运行时确定,以范围为代价。按照我的实现方式,处理器(或编译器)必须能够进行 64 位数学运算以保存一些中间结果。它可以很容易地适应不需要 64 位支持,但范围会缩小。
When using these functions, x
is expected to be a fixed-point value scaled according to the
specified precision
. For instance, if precision
is 16, then x
should be scaled by 2^16 (65536). The result is a fixed-point value with the same scale factor as the input. A return value of INT32_MIN
represents negative infinity. A return value of INT32_MAX
indicates an error and errno
will be set to EINVAL
, indicating that the input precision was invalid.
使用这些函数时,x
预计是根据指定的定点值precision
。例如,如果precision
是 16,则x
应按 2^16 (65536) 进行缩放。结果是与输入具有相同比例因子的定点值。的返回值INT32_MIN
表示负无穷大。返回值INT32_MAX
表示错误,errno
将设置为EINVAL
,表示输入精度无效。
#include <errno.h>
#include <stddef.h>
#include "log2fix.h"
#define INV_LOG2_E_Q1DOT31 UINT64_C(0x58b90bfc) // Inverse log base 2 of e
#define INV_LOG2_10_Q1DOT31 UINT64_C(0x268826a1) // Inverse log base 2 of 10
int32_t log2fix (uint32_t x, size_t precision)
{
int32_t b = 1U << (precision - 1);
int32_t y = 0;
if (precision < 1 || precision > 31) {
errno = EINVAL;
return INT32_MAX; // indicates an error
}
if (x == 0) {
return INT32_MIN; // represents negative infinity
}
while (x < 1U << precision) {
x <<= 1;
y -= 1U << precision;
}
while (x >= 2U << precision) {
x >>= 1;
y += 1U << precision;
}
uint64_t z = x;
for (size_t i = 0; i < precision; i++) {
z = z * z >> precision;
if (z >= 2U << (uint64_t)precision) {
z >>= 1;
y += b;
}
b >>= 1;
}
return y;
}
int32_t logfix (uint32_t x, size_t precision)
{
uint64_t t;
t = log2fix(x, precision) * INV_LOG2_E_Q1DOT31;
return t >> 31;
}
int32_t log10fix (uint32_t x, size_t precision)
{
uint64_t t;
t = log2fix(x, precision) * INV_LOG2_10_Q1DOT31;
return t >> 31;
}
The code for this implementation also lives at Github, along with a sample/test program that illustrates how to use this function to compute and display logarithms from numbers read from standard input.
这个实现的代码也存在于Github 上,以及一个示例/测试程序,它说明了如何使用这个函数来计算和显示从标准输入读取的数字的对数。
[1] C. S. Turner, "A Fast Binary Logarithm Algorithm", IEEE Signal Processing Mag., pp. 124,140, Sep. 2010.
[ 1] CS Turner,“一种快速二进制对数算法”,IEEE 信号处理杂志。,第 124,140 页,2010 年 9 月。
回答by Paul R
A good starting point is Hyman Crenshaw's book, "Math Toolkit for Real-Time Programming". It has a good discussion of algorithms and implementations for various transcendental functions.
一个很好的起点是Hyman Crenshaw 的书“Math Toolkit for Real-Time Programming”。它对各种超越函数的算法和实现进行了很好的讨论。
回答by chmike
Check my fixed point sqrt implementation using only integer operations. It was fun to invent. Quite old now.
仅使用整数运算检查我的定点 sqrt 实现。发明起来很有趣。现在很老了。
Otherwise check the CORDICset of algorithms. That's the way to implement all the functions you listed and the trigonometric functions.
否则检查CORDIC算法集。这是实现您列出的所有函数和三角函数的方法。
EDIT :I published the reviewed source on GitHub here
编辑:我在这里在 GitHub 上发布了经过的源代码