C语言 如何获得浮点数的符号、尾数和指数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15685181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the sign, mantissa and exponent of a floating point number
提问by MetallicPriest
I have a program, which is running on two processors, one of which does not have floating point support. So, I need to perform floating point calculations using fixed point in that processor. For that purpose, I will be using a floating point emulation library.
我有一个程序,它在两个处理器上运行,其中一个没有浮点支持。因此,我需要在该处理器中使用定点执行浮点计算。为此,我将使用浮点仿真库。
I need to first extract the signs, mantissas and exponents of floating point numbers on the processor which do support floating point. So, my question is how can I get the sign, mantissa and exponent of a single precision floating point number.
我需要首先在支持浮点的处理器上提取浮点数的符号、尾数和指数。所以,我的问题是如何获得单精度浮点数的符号、尾数和指数。
Following the format from this figure,
按照这个图中的格式,
That is what I've done so far, but except sign, neither mantissa and exponent are correct. I think, I'm missing something.
这就是我到目前为止所做的,但除了符号,尾数和指数都不正确。我想,我错过了一些东西。
void getSME( int& s, int& m, int& e, float number )
{
unsigned int* ptr = (unsigned int*)&number;
s = *ptr >> 31;
e = *ptr & 0x7f800000;
e >>= 23;
m = *ptr & 0x007fffff;
}
采纳答案by eran
I think it is better to use unions to do the casts, it is clearer.
我认为最好使用工会来做演员表,这样更清楚。
#include <stdio.h>
typedef union {
float f;
struct {
unsigned int mantisa : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} parts;
} float_cast;
int main(void) {
float_cast d1 = { .f = 0.15625 };
printf("sign = %x\n", d1.parts.sign);
printf("exponent = %x\n", d1.parts.exponent);
printf("mantisa = %x\n", d1.parts.mantisa);
}
Example based on http://en.wikipedia.org/wiki/Single_precision
回答by Pietro Braione
My advice is to stick to rule 0 and not redo what standard libraries already do, if this is enough. Look at math.h (cmath in standard C++) and functions frexp, frexpf, frexpl, that break a floating point value (double, float, or long double) in its significand and exponent part. To extract the sign from the significand you can use signbit, also in math.h / cmath, or copysign (only C++11). Some alternatives, with slighter different semantics, are modf and ilogb/scalbn, available in C++11; http://en.cppreference.com/w/cpp/numeric/math/logbcompares them, but I didn't find in the documentation how all these functions behave with +/-inf and NaNs. Finally, if you really want to use bitmasks (e.g., you desperately need to know the exact bits, and your program may have different NaNs with different representations, and you don't trust the above functions), at least make everything platform-independent by using the macros in float.h/cfloat.
我的建议是坚持规则 0 而不是重做标准库已经做的事情,如果这足够了。查看 math.h(标准 C++ 中的 cmath)和函数 frexp、frexpf、frexpl,它们在有效数和指数部分中断浮点值(double、float 或 long double)。要从有效数中提取符号,您可以使用 signbit,也在 math.h/cmath 中,或 copysign(仅限 C++11)。一些语义稍有不同的替代方案是 modf 和 ilogb/scalbn,可在 C++11 中使用;http://en.cppreference.com/w/cpp/numeric/math/logb比较它们,但我没有在文档中找到所有这些函数与 +/-inf 和 NaN 的行为方式。最后,如果您真的想使用位掩码(例如,您迫切需要知道确切的位,并且您的程序可能具有不同表示的不同 NaN,并且您不信任上述函数),至少使所有内容与平台无关通过使用 float.h/cfloat 中的宏。
回答by Alexey Frunze
Find out the format of the floating point numbers used on the CPU that directly supports floating point and break it down into those parts. The most common format is IEEE-754.
找出直接支持浮点的 CPU 上使用的浮点数的格式,并将其分解为那些部分。最常见的格式是IEEE-754。
Alternatively, you could obtain those parts using a few special functions (double frexp(double value, int *exp);and double ldexp(double x, int exp);) as shown in this answer.
或者,您可以使用一些特殊函数 (double frexp(double value, int *exp);和double ldexp(double x, int exp);)获取这些部分,如本答案所示。
Another optionis to use %awith printf().
另一种选择是%a与printf().
回答by Xymostech
You're &ing the wrong bits. I think you want:
你&打错了位。我想你想要:
s = *ptr >> 31;
e = *ptr & 0x7f800000;
e >>= 23;
m = *ptr & 0x007fffff;
Remember, when you &, you are zeroing out bits that you don't set. So in this case, you want to zero out the sign bit when you get the exponent, and you want to zero out the sign bit and the exponent when you get the mantissa.
请记住,当您&将未设置的位清零时。因此,在这种情况下,您希望在获得指数时将符号位清零,并在获得尾数时将符号位和指数清零。
Note that the masks come directly from your picture. So, the exponent mask will look like:
请注意,面具直接来自您的图片。因此,指数掩码将如下所示:
0 11111111 00000000000000000000000
0 11111111 00000000000000000000000
and the mantissa mask will look like:
尾数掩码将如下所示:
0 00000000 11111111111111111111111
0 00000000 111111111111111111111111
回答by Maxim Egorushkin
On Linux package glibc-headers provides header #include <ieee754.h>with floating point types definitions, e.g.:
在 Linux 包 glibc-headers 提供#include <ieee754.h>带有浮点类型定义的头文件,例如:
union ieee754_double
{
double d;
/* This is the IEEE 754 double-precision format. */
struct
{
#if __BYTE_ORDER == __BIG_ENDIAN
unsigned int negative:1;
unsigned int exponent:11;
/* Together these comprise the mantissa. */
unsigned int mantissa0:20;
unsigned int mantissa1:32;
#endif /* Big endian. */
#if __BYTE_ORDER == __LITTLE_ENDIAN
# if __FLOAT_WORD_ORDER == __BIG_ENDIAN
unsigned int mantissa0:20;
unsigned int exponent:11;
unsigned int negative:1;
unsigned int mantissa1:32;
# else
/* Together these comprise the mantissa. */
unsigned int mantissa1:32;
unsigned int mantissa0:20;
unsigned int exponent:11;
unsigned int negative:1;
# endif
#endif /* Little endian. */
} ieee;
/* This format makes it easier to see if a NaN is a signalling NaN. */
struct
{
#if __BYTE_ORDER == __BIG_ENDIAN
unsigned int negative:1;
unsigned int exponent:11;
unsigned int quiet_nan:1;
/* Together these comprise the mantissa. */
unsigned int mantissa0:19;
unsigned int mantissa1:32;
#else
# if __FLOAT_WORD_ORDER == __BIG_ENDIAN
unsigned int mantissa0:19;
unsigned int quiet_nan:1;
unsigned int exponent:11;
unsigned int negative:1;
unsigned int mantissa1:32;
# else
/* Together these comprise the mantissa. */
unsigned int mantissa1:32;
unsigned int mantissa0:19;
unsigned int quiet_nan:1;
unsigned int exponent:11;
unsigned int negative:1;
# endif
#endif
} ieee_nan;
};
#define IEEE754_DOUBLE_BIAS 0x3ff /* Added to exponent. */
回答by Maxim Egorushkin
- Don't make functions that do multiple things.
- Don't mask then shift; shift then mask.
- Don't mutate values unnecessarily because it's slow, cache-destroying and error-prone.
- Don't use magic numbers.
- 不要创建做多件事的函数。
- 不要掩饰然后转移; 转移然后掩码。
- 不要不必要地改变值,因为它很慢,会破坏缓存并且容易出错。
- 不要使用幻数。
/* NaNs, infinities, denormals unhandled */
/* assumes sizeof(float) == 4 and uses ieee754 binary32 format */
/* assumes two's-complement machine */
/* C99 */
#include <stdint.h>
#define SIGN(f) (((f) <= -0.0) ? 1 : 0)
#define AS_U32(f) (*(const uint32_t*)&(f))
#define FLOAT_EXPONENT_WIDTH 8
#define FLOAT_MANTISSA_WIDTH 23
#define FLOAT_BIAS ((1<<(FLOAT_EXPONENT_WIDTH-1))-1) /* 2^(e-1)-1 */
#define MASK(width) ((1<<(width))-1) /* 2^w - 1 */
#define FLOAT_IMPLICIT_MANTISSA_BIT (1<<FLOAT_MANTISSA_WIDTH)
/* correct exponent with bias removed */
int float_exponent(float f) {
return (int)((AS_U32(f) >> FLOAT_MANTISSA_WIDTH) & MASK(FLOAT_EXPONENT_WIDTH)) - FLOAT_BIAS;
}
/* of non-zero, normal floats only */
int float_mantissa(float f) {
return (int)(AS_U32(f) & MASK(FLOAT_MANTISSA_BITS)) | FLOAT_IMPLICIT_MANTISSA_BIT;
}
/* Hacker's Delight book is your friend. */
回答by AymenTM
See this IEEE_754_types.hheader for the union types to extract: float, doubleand long double, (endianness handled). Here is an extract:
请参阅此IEEE_754_types.h标题以获取要提取的联合类型:float,double和long double, (处理的字节序)。这是一个摘录:
/*
** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
** Single Precision (float) -- Standard IEEE 754 Floating-point Specification
*/
# define IEEE_754_FLOAT_MANTISSA_BITS (23)
# define IEEE_754_FLOAT_EXPONENT_BITS (8)
# define IEEE_754_FLOAT_SIGN_BITS (1)
.
.
.
# if (IS_BIG_ENDIAN == 1)
typedef union {
float value;
struct {
__int8_t sign : IEEE_754_FLOAT_SIGN_BITS;
__int8_t exponent : IEEE_754_FLOAT_EXPONENT_BITS;
__uint32_t mantissa : IEEE_754_FLOAT_MANTISSA_BITS;
};
} IEEE_754_float;
# else
typedef union {
float value;
struct {
__uint32_t mantissa : IEEE_754_FLOAT_MANTISSA_BITS;
__int8_t exponent : IEEE_754_FLOAT_EXPONENT_BITS;
__int8_t sign : IEEE_754_FLOAT_SIGN_BITS;
};
} IEEE_754_float;
# endif
And see dtoa_base.cfor a demonstration of how to convert a doublevalue to string form.
并查看dtoa_base.c有关如何将double值转换为字符串形式的演示。
Furthermore, check out section 1.2.1.1.4.2 - Floating-Point Type Memory Layoutof the C/CPP Reference Book, it explains super well and in simple terms the memory representation/layout of all the floating-point types and how to decode them (w/ illustrations) following the actually IEEE 754 Floating-Point specification.
此外,请查看C/CPP 参考书的1.2.1.1.4.2 - Floating-Point Type Memory Layout部分,它非常好且简单地解释了所有浮点类型的内存表示/布局以及如何对其进行解码(带插图)遵循实际的 IEEE 754 浮点规范。
It also has links to really really good ressources that explain even deeper.
它还具有指向真正非常好的资源的链接,可以更深入地进行解释。
回答by Gavin H
Cast a pointer to the floating point variable as something like an unsigned int. Then you can shift and mask the bits to get each component.
将指向浮点变量的指针转换为类似于unsigned int. 然后您可以移动和屏蔽这些位以获取每个组件。
float foo;
unsigned int ival, mantissa, exponent, sign;
foo = -21.4f;
ival = *((unsigned int *)&foo);
mantissa = ( ival & 0x7FFFFF);
ival = ival >> 23;
exponent = ( ival & 0xFF );
ival = ival >> 8;
sign = ( ival & 0x01 );
Obviously you probably wouldn't use unsigned ints for the exponent and sign bits but this should at least give you the idea.
显然,您可能不会对指数和符号位使用无符号整数,但这至少应该给您这个想法。

