C++ 实数 - 如何确定是否需要 float 或 double?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13620481/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Real numbers - how to determine whether float or double is required?
提问by Soham Chakraborty
Given a real value, can we check if a float
data type is enough to store the number, or a double
is required?
给定一个实数,我们可以检查float
数据类型是否足以存储数字,还是double
需要 a ?
I know precision varies from architecture to architecture. Is there any C/C++ function to determine the right data type?
我知道精度因架构而异。是否有任何 C/C++ 函数来确定正确的数据类型?
回答by Patricia Shanahan
For background, see What Every Computer Scientist Should Know About Floating-Point Arithmetic
有关背景信息,请参阅每个计算机科学家都应该了解的关于浮点运算的知识
Unfortunately, I don't think there is any way to automate the decision.
不幸的是,我认为没有任何方法可以使决策自动化。
Generally, when people represent numbers in floating point, rather than as strings, the intent is to do arithmetic using the numbers. Even if all the inputs fit in a given floating point type with acceptable precision, you still have to consider rounding error and intermediate results.
通常,当人们用浮点数而不是字符串表示数字时,其目的是使用数字进行算术运算。即使所有输入都以可接受的精度符合给定的浮点类型,您仍然必须考虑舍入误差和中间结果。
In practice, most calculations will work with enough precision for usable results, using a 64 bit type. Many calculations will not get usable results using only 32 bits.
在实践中,使用 64 位类型,大多数计算都可以以足够的精度获得可用结果。许多计算仅使用 32 位将无法获得可用结果。
In modern processors, buses and arithmetic units are wide enough to give 32 bit and 64 bit floating point similar performance. The main motivation for using 32 bit is to save space when storing a very large array.
在现代处理器中,总线和算术单元的宽度足以提供 32 位和 64 位浮点相似的性能。使用 32 位的主要动机是在存储非常大的数组时节省空间。
That leads to the following strategy:
这导致了以下策略:
If arrays are large enough to justify spending significant effort to halve their size, do analysis and experiments to decide whether a 32 bit type gives good enough results, and if so use it. Otherwise, use a 64 bit type.
如果数组大到足以证明花费大量精力将其大小减半是合理的,请进行分析和实验以确定 32 位类型是否提供足够好的结果,如果是,则使用它。否则,使用 64 位类型。
回答by Potatoswatter
Precision is not very platform-dependent. Although platforms are allowed to be different, float
is almost universally IEEE standard single precisionand double
is double precision.
精度不是很依赖平台。虽然平台允许不同,float
是几乎普遍IEEE标准单精度和double
是双精度。
Single precision assigns 23 bits of "mantissa," or binary digits after the radix point (decimal point). Since the bit before the dot is always one, this equates to a 24-bit fraction. Dividing by log2(10) = 3.3, a float gets you 7.2 decimal digitsof precision.
单精度在小数点(小数点)后分配 23 位“尾数”或二进制数字。由于点之前的位始终为 1,因此这相当于 24 位小数。除以 log2(10) = 3.3,浮点数得到7.2 位十进制数的精度。
Following the same process for double
yields 15.9 digits and long double
yields 19.2 (for systems using the Intel 80-bit format).
遵循相同的过程double
产生 15.9 位和long double
19.2 位(对于使用 Intel 80 位格式的系统)。
The bits besides the mantissa are used for exponent. The number of exponent bits determines the range of numbers allowed. Single goes to ~ 10±38, double goes to ~ 10±308.
除尾数外的位用于指数。指数位数决定了允许的数字范围。单去~10 ±38,双去~10 ±308。
As for whether you need 7, 16, or 19 digits or if limited-precision representation is appropriate at all, that's really outside the scope of the question. It depends on the algorithm and the application.
至于您是否需要 7、16 或 19 位数字,或者有限精度的表示是否完全合适,这确实超出了问题的范围。这取决于算法和应用程序。
回答by sampson-chen
I think your question presupposes a way to specify any "real number" to C / C++ (or any other program) without precision loss.
我认为您的问题预设了一种在不损失精度的情况下为 C/C++(或任何其他程序)指定任何“实数”的方法。
Suppose that you get this real number by specifying it in code or through user input; a way to check if a float or a double would be enough to store it without precision loss is to just count the number of significant bits and check that against the data range for float and double.
假设您通过在代码中指定或通过用户输入获得这个实数;检查浮点数或双精度数是否足以在不损失精度的情况下存储它的方法是仅计算有效位的数量,并根据浮点数和双精度数的数据范围进行检查。
If the number is given as an expression (i.e. 1/7
or sqrt(2)
), you will also want ways of detecting:
如果数字作为表达式(即1/7
或sqrt(2)
)给出,您还需要检测的方法:
- If the number is rational, whether it has repeating decimals, or cyclic decimals.
- Or, What happens when you have an irrational number?
- 如果这个数是有理数,它是否有重复小数,或循环小数。
- 或者,当你有一个无理数时会发生什么?
More over, there are numbers, such as 0.9
, that float / double cannot in theory represent "exactly" )at least not in our binary computation paradigm) - see Jon Skeet's excellent answer on this.
此外,还有一些数字,例如0.9
, float / double 理论上不能“完全”表示))至少在我们的二进制计算范式中不是) - 请参阅Jon Skeet 对此的出色回答。
Lastly, see additional discussion on float vs. double.
最后,请参阅有关 float 与 double 的其他讨论。
回答by jonathanasdf
回答by jonathanasdf
You cannot represent real number with float or double variables, but only a subset of rational numbers.
您不能用 float 或 double 变量表示实数,而只能表示有理数的子集。
When you do floating point computation, your CPU floating point unit will decide the best approximation for you.
当您进行浮点计算时,您的 CPU 浮点单元将决定最适合您的近似值。
I might be wrong but I thought that float (4 bytes) and double (8 bytes) floating point representation were actually specified independently of comp architectures.
我可能错了,但我认为 float(4 字节)和 double(8 字节)浮点表示实际上是独立于 comp 架构指定的。
回答by Jakob S.
Couldn't you simply store it to a float
and a double
variable and than compare these two? This should implicitely convert the float
back to a double - if there is no difference, the float
is sufficient?
你不能简单地将它存储到一个float
和一个double
变量中然后比较这两个吗?这应该隐式地将float
返回转换为双精度 - 如果没有区别,float
就足够了吗?
float f = value;
double d = value;
if ((double)f == d)
{
// float is sufficient
}