了解 Java 字节

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3845834/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 03:39:19  来源:igfitidea点击:

Understanding Java bytes

javabinarybyte

提问by Brian Warshaw

So at work yesterday, I had to write an application to count the pages in an AFP file. So I dusted off my MO:DCA spec PDF and found the structured field BPG (Begin Page)and its 3-byte identifier. The app needs to run on an AIX box, so I decided to write it in Java.

所以在昨天的工作中,我不得不编写一个应用程序来计算 AFP 文件中的页数。所以我清理了我的 MO:DCA 规范 PDF 并找到了结构化字段BPG (Begin Page)及其 3 字节标识符。该应用程序需要在 AIX 机器上运行,因此我决定用 Java 编写它。

For maximum efficiency, I decided that I would read the first 6 bytes of each structured field and then skip the remaining bytes in the field. This would get me:

为了获得最大效率,我决定读取每个结构化字段的前 6 个字节,然后跳过该字段中的剩余字节。这会让我:

0: Start of field byte
1-2: 2-byte length of field
3-5: 3-byte sequence identifying the type of field

So I check the field type and increment a page counter if it's BPG, and I don't if it's not. Then I skip the remaining bytes in the field rather than read through them. And here, in the skipping (and really in the field length) is where I discovered that Java uses signed bytes.

所以我检查字段类型,如果是BPG,则增加一个页面计数器,如果不是,我不会。然后我跳过字段中剩余的字节而不是通读它们。在这里,在跳过(实际上是在字段长度中)是我发现 Java 使用有符号字节的地方。

I did some googling and found quite a bit of useful information. Most useful, of course, was the instruction to do a bitwise &to 0xffto get the unsigned int value. This was necessary for me to get a length that could be used in the calculation for the number of bytes to skip.

我做了一些谷歌搜索,发现了很多有用的信息。最有用的,当然是做一个按位指令&0xff获得无符号整型值。这对我来说是必要的,以获得可用于计算要跳过的字节数的长度。

I now know that at 128, we start counting backwards from -128. What I want to know is how the bitwise operation works here--more specifically, how I arrive at the binary representation for a negative number.

我现在知道在 128 处,我们从 -128 开始倒数。我想知道的是位运算在这里是如何工作的——更具体地说,我是如何得出负数的二进制表示的。

If I understand the bitwise &properly, your result is equal to a number where only the common bits of your two numbers are set. So assuming byte b = -128, we would have:

如果我&正确理解按位,您的结果等于一个数字,其中仅设置了两个数字的公共位。所以假设byte b = -128,我们会有:

b & 0xff // 128

1000 0000-128
1111 1111 255
---------
1000 0000 128

So how would I arrive at 1000 0000 for -128? How would I get the binary representation of something less obvious like -72 or -64?

那么我如何以 -128 达到 1000 0000?我将如何获得不太明显的二进制表示,如 -72 或 -64?

回答by Grodriguez

In order to obtain the binary representation of a negative number you calculate two's complement:

为了获得负数的二进制表示,您需要计算二进制补码:

  • Get the binary representation of the positive number
  • Invert all the bits
  • Add one
  • 获取正数的二进制表示
  • 反转所有位
  • 添加一个

Let's do -72 as an example:

我们以 -72 为例:

0100 1000    72
1011 0111    All bits inverted
1011 1000    Add one

So the binary (8-bit) representation of -72 is 10111000.

所以 -72 的二进制(8 位)表示是10111000.

What is actually happening to you is the following: You file has a byte with value 10111000. When interpreted as an unsigned byte (which is probably what you want), this is 88.

实际发生在您身上的是以下内容:您的文件中有一个值为 value 的字节10111000。当解释为无符号字节(这可能是您想要的)时,这是 88。

In Java, when this byte is used as an int (for example because read()returns an int, or because of implicit promotion), it will be interpreted as a signed byte, and sign-extended to 11111111 11111111 11111111 10111000. This is an integer with value -72.

在 Java 中,当这个字节被用作 int 时(例如因为read()返回一个 int,或者因为隐式提升),它会被解释为一个有符号字节,并符号扩展为11111111 11111111 11111111 10111000. 这是一个值为 -72 的整数。

By ANDing with 0xffyou retain only the lowest 8 bits, so your integer is now 00000000 00000000 00000000 10111000, which is 88.

通过与0xff您进行AND运算,您只保留最低 8 位,因此您的整数现在是00000000 00000000 00000000 10111000,即 88。

回答by sepp2k

What I want to know is how the bitwise operation works here--more specifically, how I arrive at the binary representation for a negative number.

我想知道的是位运算在这里是如何工作的——更具体地说,我是如何得出负数的二进制表示的。

The binary representation of a negative number is that of the corresponding positive number bit-flipped with 1 added to it. This representation is called two's complement.

负数的二进制表示是相应的正数位翻转后加 1 的二进制表示。这种表示称为二进制补码

回答by DarkDust

I guess the magic here is that the byte is stored in a bigger container, likely a 32 bit int. And if the byte was interpreted as being a signed byte it gets expanded to represent the same number in the 32 bit int, that is if the most significant bit (the first one) of the byte is a 1 then in the 32 bit int all the bits left of that 1 are also turned to 1 (that's due to the way negative numbers are represented, two's complement).

我想这里的神奇之处在于字节存储在一个更大的容器中,可能是一个 32 位的 int。如果字节被解释为有符号字节,它会被扩展以表示 32 位 int 中的相同数字,也就是说,如果字节的最高有效位(第一个)是 1,则在 32 位 int 中全部为那个 1 剩下的位也变成 1(这是由于负数的表示方式,二进制补码)。

Now, if you & 0xFFthat int you cut off those 1's and end up with a "positive" int representing the byte value you've read.

现在,如果你使用& 0xFF那个 int,你会切断那些 1 并最终得到一个“正”int,代表你读过的字节值。

回答by Durandal

Not sure what you really want :) I assume you are asking how to extract a signed multi-byte value? First, look at what happens when you sign extend a single byte:

不确定您真正想要什么 :) 我假设您是在问如何提取有符号的多字节值?首先,看看当你签署扩展单个字节时会发生什么:

byte[] b = new byte[] { -128 };
int i = b[0];
System.out.println(i); // prints -128!

So, the sign is correctly extendet to 32 bits without doing anything special. The byte 1000 0000 extends correctly to 1111 1111 1111 1111 1111 1111 1000 0000. You already know how to suppress sign extension by AND'ing with 0xFF - for multi byte values, you want only the sign of the most significant byte to be extendet, and the less significant bytes you want to treat as unsigned (example assumes network byte order, 16-bit int value):

因此,该符号正确地扩展到 32 位,而无需做任何特殊处理。字节 1000 0000 正确扩展为 1111 1111 1111 1111 1111 1111 1000 0000。您已经知道如何通过与 0xFF 进行 AND'ing 来抑制符号扩展 - 对于多字节值,您只想扩展最重要字节的符号,以及要视为无符号的不太重要的字节(示例假定网络字节顺序,16 位 int 值):

byte[] b = new byte[] { -128, 1 }; // 0x80, 0x01
int i = (b[0] << 8) | (b[1] & 0xFF);
System.out.println(i); // prints -32767!
System.out.println(Integer.toHexString(i)); // prints ffff8001

You need to suppress the sign extension of every byte except the most significant one, so to extract a signed 32-bit int to a 64-bit long:

您需要抑制除最重要的字节之外的每个字节的符号扩展,以便将有符号的 32 位 int 提取为 64 位长:

byte[] b = new byte[] { -54, -2, -70, -66 }; // 0xca, 0xfe, 0xba, 0xbe
long l = ( b[0]         << 24) |
         ((b[1] & 0xFF) << 16) |
         ((b[2] & 0xFF) <<  8) |
         ((b[3] & 0xFF)      );
System.out.println(l); // prints -889275714
System.out.println(Long.toHexString(l)); // prints ffffffffcafebabe

Note: on intel based systems, bytes are often stored in reverse order (least significant byte first) because the x86 architecture stores larger entities in this order in memory. A lot of x86 originated software does use it in file formats, too.

注意:在基于 intel 的系统上,字节通常以相反的顺序存储(最低有效字节在前),因为 x86 架构在内存中以这种顺序存储较大的实体。许多源自 x86 的软件也确实以文件格式使用它。

回答by starblue

For bytes with bit 7 set:

对于设置了第 7 位的字节:

unsigned_value = signed_value + 256

Mathematically when you compute with bytes you compute modulo 256. The difference between signed and unsigned is that you choose different representatives for the equivalence classes, while the underlying representation as a bit pattern stays the same for each equivalence class. This also explains why addition, subtraction and multiplication have the same result as a bit pattern, regardless of whether you compute with signed or unsigned integers.

在数学上,当你用字节计算时,你计算模 256。有符号和无符号之间的区别在于你为等价类选择不同的代表,而作为位模式的底层表示对于每个等价类保持相同。这也解释了为什么加法、减法和乘法具有与位模式相同的结果,无论您使用有符号整数还是无符号整数进行计算。

回答by Peter Lawrey

To get the unsigned byte value you can either.

要获得无符号字节值,您也可以。

int u = b & 0xFF;

or

或者

int u = b < 0 ? b + 256 : b;