C语言 “浮动”与“双精度”

Question

提问by foo

The code

编码

float x  = 3.141592653589793238;
double z = 3.141592653589793238;
printf("x=%f\n", x);
printf("z=%f\n", z);
printf("x=%20.18f\n", x);
printf("z=%20.18f\n", z);

will give you the output

会给你输出

x=3.141593
z=3.141593
x=3.141592741012573242
z=3.141592653589793116

where on the third line of output 741012573242is garbage and on the fourth line 116is garbage. Do doubles always have 16 significant figures while floats always have 7 significant figures? Why don't doubles have 14 significant figures?

输出的第三行741012573242是垃圾，第四行116是垃圾。双打总是有 16 位有效数字，而浮动总是有 7 位有效数字吗？为什么双打没有 14 位有效数字？

Answer 1

采纳答案by Alan Geleynse

Floating point numbers in C use IEEE 754encoding.

C 中的浮点数使用IEEE 754编码。

This type of encoding uses a sign, a significand, and an exponent.

这种类型的编码使用一个符号、一个有效数和一个指数。

Because of this encoding, many numbers will have small changes to allow them to be stored.

由于这种编码，许多数字将有小的变化以允许它们被存储。

Also, the number of significant digits can change slightly since it is a binary representation, not a decimal one.

此外，有效数字的数量可能会略有变化，因为它是二进制表示，而不是十进制表示。

Single precision (float) gives you 23 bits of significand, 8 bits of exponent, and 1 sign bit.

单精度 (float) 为您提供 23 位有效数、8 位指数和 1 个符号位。

Double precision (double) gives you 52 bits of significand, 11 bits of exponent, and 1 sign bit.

双精度 (double) 为您提供 52 位有效数、11 位指数和 1 个符号位。

Answer 2

回答by Stephen Canon

Do doubles always have 16 significant figures while floats always have 7 significant figures?

双打总是有 16 位有效数字，而浮动总是有 7 位有效数字吗？

No. Doubles always have 53 significant bitsand floats always have 24 significant bits(except for denormals, infinities, and NaN values, but those are subjects for a different question). These are binary formats, and you can only speak clearly about the precision of their representations in terms of binary digits (bits).

不是。双精度数总是有 53 个有效位，浮点数总是有 24 个有效位（非正规数、无穷大和 NaN 值除外，但这些是不同问题的主题）。这些是二进制格式，您只能清楚地说明它们以二进制数字（位）表示的精度。

This is analogous to the question of how many digits can be stored in a binary integer: an unsigned 32 bit integer can store integers with up to 32 bits, which doesn't precisely map to any number of decimal digits: all integers of up to 9 decimal digits can be stored, but a lot of 10-digit numbers can be stored as well.

这类似于一个二进制整数中可以存储多少位数字的问题：一个无符号的 32 位整数可以存储最多 32 位的整数，它不能精确地映射到任何数量的十进制数字：所有整数最多为可以存储 9 位十进制数字，但也可以存储很多 10 位数字。

Why don't doubles have 14 significant figures?

为什么双打没有 14 位有效数字？

The encoding of a double uses 64 bits (1 bit for the sign, 11 bits for the exponent, 52 explicit significant bits and one implicit bit), which is doublethe number of bits used to represent a float (32 bits).

double 的编码使用 64 位（符号 1 位，指数 11 位，52 个显式有效位和 1 个隐式位），这是用于表示浮点数（32 位）的两倍。

Answer 3

回答by abe312

float : 23 bits of significand, 8 bits of exponent, and 1 sign bit.

float : 23 位有效数，8 位指数和 1 个符号位。

double : 52 bits of significand, 11 bits of exponent, and 1 sign bit.

double : 52 位有效数，11 位指数和 1 个符号位。

Answer 4

回答by user470379

It's usually based on significant figures of both the exponent and significand in base 2, not base 10. From what I can tell in the C99 standard, however, there is no specified precision for floats and doubles (other than the fact that 1 and 1 + 1E-5/ 1 + 1E-7are distinguishable [floatand doublerepsectively]). However, the number of significant figures is left to the implementer (as well as which base they use internally, so in other words, an implementation could decide to make it based on 18 digits of precision in base 3). [1]

它通常基于以 2 为底的指数和有效数的有效数字，而不是以 10 为底。但是，根据我在 C99 标准中的说法，浮点数和双精度数没有指定的精度（除了 1 和1 + 1E-5/1 + 1E-7是可区分的[float和double分别]）。但是，有效数字的数量留给实现者（以及他们内部使用的基数，因此换句话说，实现可以决定基于基数 3 中的 18 位精度）。[1]

If you need to know these values, the constants FLT_RADIXand FLT_MANT_DIG(and DBL_MANT_DIG/ LDBL_MANT_DIG) are defined in float.h.

如果您需要知道这些值，常量FLT_RADIX和FLT_MANT_DIG（和DBL_MANT_DIG/ LDBL_MANT_DIG）在 float.h 中定义。

The reason it's called a doubleis because the number of bytes used to store it is double the number of a float (but this includes both the exponent and significand). The IEEE 754 standard (used by most compilers) allocate relatively more bits for the significand than the exponent (23 to 9 for floatvs. 52 to 12 for double), which is why the precision is more than doubled.

它被称为 a 的原因double是因为用于存储它的字节数是浮点数的两倍（但这包括指数和有效数）。IEEE 754 标准（大多数编译器使用）为有效数分配比指数更多的位（23 到 9float对 52 到 12 为double），这就是精度增加一倍以上的原因。

1: Section 5.2.4.2.2 ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)

1：第 5.2.4.2.2 节（http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf）

Answer 5

回答by Chris Nash

A float has 23 bits of precision, and a double has 52.

浮点数有 23 位精度，双精度数有 52 位。

Answer 6

回答by user541686

It's not exactly doubleprecision because of how IEEE 754works, and because binary doesn't really translate well to decimal. Take a look at the standard if you're interested.

由于IEEE 754 的工作方式，并且因为二进制不能很好地转换为十进制，所以它不完全是双精度。如果您有兴趣，请查看标准。

Answer 7

回答by Vineeth Krishna K

float stands for floating point number.In C, float data type is used in those cases in which the precision of total number of digits is 7.For eg:- the decimal no. 12.3546987 cannot be stored in float because it has a total of 9 digits.The output will be shown as 12.354699 i.e. the first 7 digits will be shown as entered in the input and the 8th digit will be rounded off.The float type can represent values ranging from approximately 1.5 x 10^(-45) to 3.4 x 10^(38).In terms of memory allocation, float is a single-precision, 32-bit floating point data type.

float代表浮点数。在C中，float数据类型用于总位数精度为7的情况。例如：-十进制数。12.3546987 不能存储在浮点数中，因为它总共有 9 位。输出将显示为 12.354699，即前 7 位将显示为输入中的输入，第 8 位将被四舍五入。浮点型可以表示值范围从大约 1.5 x 10^(-45) 到 3.4 x 10^(38)。在内存分配方面，float 是一种单精度、32 位浮点数据类型。

Unlike float, double has a precision of 15 to 16 digits.The range of double is 5.0 × 10^(?345) to 1.7 × 10^(308).In terms of byte allocation,double is a 64-bit floating point data type.

与 float 不同，double 的精度为 15 到 16 位。 double 的范围是 5.0 × 10^(?345) 到 1.7 × 10^(308)。在字节分配方面，double 是 64 位浮点数据类型。

The problem arises in its use.float or double does not affect printf but in case of scanf the appropriate data type is to used depending on the total no. of digits in floating no. that is to be read from the input.

问题出现在它的 use.float 或 double 不会影响 printf 但在 scanf 的情况下，将根据总数使用适当的数据类型。浮点数中的数字即从输入中读取。

Hence double is preferred over float for higher accuracy of data.

因此 double 优于 float 以获得更高的数据准确性。

Hope this helps.

希望这可以帮助。

C语言 “浮动”与“双精度”

提问by foo

采纳答案by Alan Geleynse

回答by Stephen Canon

回答by abe312

回答by user470379

回答by Chris Nash

回答by user541686

回答by Vineeth Krishna K

相关推荐

最近更新

标签

C语言 “浮动”与“双精度”

提问by foo

采纳答案by Alan Geleynse

回答by Stephen Canon

回答by abe312

回答by user470379

回答by Chris Nash

回答by user541686

回答by Vineeth Krishna K

相关推荐

C语言 使用 M_PI 和 C89 标准

C语言 cmake 命令行选项

C语言 C - 将 int 转换为 char 并将 char 附加到 char

C语言 如何在不四舍五入的情况下将浮点值打印到小数点后两位

相关推荐

最近更新

标签

C语言使用 M_PI 和 C89 标准

C语言如何在不四舍五入的情况下将浮点值打印到小数点后两位