C++ 原始 float 和 double 支持多少个小数位?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28045787/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 20:54:17  来源:igfitidea点击:

How many decimal places does the primitive float and double support?

c++

提问by code511788465541441

I have read that double stores 15 digits and float stores 7 digits.

我读过 double 存储 15 位数字和 float 存储 7 位数字。

My question is, are these numbers the number of decimal places supported or total number of digits in a number?

我的问题是,这些数字是支持的小数位数还是数字中的总位数?

采纳答案by John Zwinck

Those are the total number of "significant figures" if you will, counting from left to right, regardless of where the decimal point is. Beyond those numbers of digits, accuracy is not preserved.

如果您愿意,这些是“有效数字”的总数,从左到右计算,无论小数点在哪里。超出这些位数,精度不会得到保留。

The counts you listed are for the base 10 representation.

您列出的计数是基于 10 表示的。

回答by Samuel Navarro Lou

If you are on an architecture using IEEE-754 floating point arithmetic (as in most architectures), then the type floatcorresponds to single precision, and the type doublecorresponds to double precision, as described in the standard.

如果您在使用 IEEE-754 浮点运算的架构上(如在大多数架构中),则类型float对应于单精度,而类型double对应于双精度,如标准中所述。

Let's make some numbers:

让我们做一些数字:

Single precision:

单精度:

32 bits to represent the number, out of which 24 bitsare for mantissa. This means that the least significant bit (LSB) has a relative value of 2^(-24)respect to the MSB, which is the "hidden 1", and it is not represented. Therefore, for a fixed exponent, the minimum representable value is 10^(-7.22)times the exponent. What this means is that for a representation in base exponent notation (3.141592653589 E25), only "7.22" decimal numbers are significant, which in practice means that at least 7 decimals will be always correct.

32 位表示数字,其中24 位表示尾数。这意味着最低有效位 (LSB)相对于 MSB具有2^(-24)的相对值,即“隐藏的 1”,并且没有表示出来。因此,对于固定指数,最小可表示值是指数的10^(-7.22)倍。这意味着对于基本指数表示法 (3.141592653589 E25) 的表示,只有“7.22”十进制数是有效的,这实际上意味着至少 7 位小数将始终正确。

Double precision:

双精度:

64 bits to represent the number, out of which 53 bitsare for mantissa. Following the same reasoning, expressing 2^(-53)as a power of 10 results in 10^(-15.95), which in term means that at least 15 decimals will be always correct.

64 位表示数字,其中53 位表示尾数。按照相同的推理,将2^(-53) 表示为 10 的幂会得到10^(-15.95),这意味着至少 15 位小数将始终正确。

回答by Barry

There are macros for the number of decimal places each type supports. The gcc docsexplain what they are and also what they mean:

每种类型支持的小数位数都有宏。在GCC文档解释它们是什么,也是它们的含义:

FLT_DIG

This is the number of decimal digits of precision for the float data type. Technically, if p and b are the precision and base (respectively) for the representation, then the decimal precision q is the maximum number of decimal digits such that any floating point number with q base 10 digits can be rounded to a floating point number with p base b digits and back again, without change to the q decimal digits.

The value of this macro is supposed to be at least 6, to satisfy ISO C.

DBL_DIG
LDBL_DIG

These are similar to FLT_DIG, but for the data types double and long double, respectively. The values of these macros are supposed to be at least 10.

FLT_DIG

这是浮点数据类型精度的十进制位数。从技术上讲,如果 p 和 b 是表示的精度和基数(分别),那么小数精度 q 是十进制数字的最大数量,这样任何以 q 为基数的 10 位浮点数都可以四舍五入为浮点数p 基数为 b 位并再次返回,不更改为 q 位十进制数。

这个宏的值应该至少是 6,以满足 ISO C。

DBL_DIG
LDBL_DIG

它们类似于 FLT_DIG,但分别针对数据类型 double 和 long double。这些宏的值应该至少为 10

On both gcc 4.9.2 and clang 3.5.0, these macros yield 6 and 15, respectively.

在 gcc 4.9.2 和 clang 3.5.0 上,这些宏分别产生 6 和 15。

回答by Damon

are these numbers the number of decimal places supported or total number of digits in a number?

这些数字是支持的小数位数还是数字中的总位数?

They are the significantdigits contained in every number (although you may not need all of them, but they're still there). The mantissa of the same type always contains the same number of bits, so every number consequentially contains the same number of valid "digits" if you think in terms of decimal digits. You cannot store more digits than will fit into the mantissa.

它们是包含在每个数字中的有效数字(尽管您可能不需要所有这些数字,但它们仍然存在)。相同类型的尾数总是包含相同数量的位,因此如果您从十进制数字的角度考虑,每个数字必然包含相同数量的有效“数字”。您不能存储多于适合尾数的数字。

The number of "supported" digits is, however, much larger, for example floatwill usually support up to 38 decimal digits and doublewill support up to 308 decimal digits, but most of these digits are not significant(that is, "unknown").

然而,“支持的”数字的数量要大得多,例如float通常最多支持 38 位double十进制数字,并且最多支持 308 位十进制数字,但这些数字中的大多数都不重要(即“未知”)。

Although technically, this is wrong, since floatand doubledo not have universally well-defined sizes like I presumed above (they're implementation-defined). Also, storage sizes are not necessarily the same as the sizes of intermediate results.

虽然技术上,这是错误的,因为floatdouble像我上面假设没有普遍良好定义的尺寸(它们是实现定义)。此外,存储大小不一定与中间结果的大小相同。

The C++ standard is very reluctant at precisely defining any fundamental type, leaving almost everything to the implementation. Floating point types are no exception:

C++ 标准非常不愿意精确定义任何基本类型,几乎将所有内容留给实现。浮点类型也不例外:

3.9.1 / 8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.

3.9.1 / 8
浮点数类型共有三种:float、double 和 long double。double 类型提供的精度至少与 float 一样,long double 类型提供的精度至少与 double 一样。float 类型的值集是 double 类型的值集的子集;double 类型的值集是 long double 类型的值集的子集。浮点类型的值表示是实现定义的。

Now of course all of this is not particularly helpful in practice.

当然,所有这些在实践中并不是特别有用。

In practice, floating point is (usually) IEEE 754 compliant, with floathaving a width of 32 bits and doublehaving a width of 64 bits (as stored in memory, registers have higher precision on some notable mainstream architectures).

在实践中,浮点(通常)符合 IEEE 754,float宽度为 32 位,double宽度为 64 位(如存储在内存中,寄存器在一些著名的主流架构上具有更高的精度)。

This is equivalent to 24 bits and 53 bits of matissa, respectively, or 7 and 15 full decimals.

这分别相当于 24 位和 53 位 matissa,或7 位和 15 位完整十进制数