java 位如何存储在内存中?(以块为单位?可以将多个大小的位存储在一起吗?)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1546381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 17:02:07  来源:igfitidea点击:

How are bits stored in memory? (In chunks? Can there be bits of multiple sizes stored toghether?)

javamemorybinarybitcomputer-architecture

提问by Cobalt

I used to think that each memory location contains 8, 16, 32 or 64 bits. So 0101 would be stored in an 8 bit machine as 00000101 (sign extended if it was negative). This was all fine and dandy until I wrote a program in java out of curiosity to find out some more inner workings of this system.

我曾经认为每个内存位置包含 8、16、32 或 64 位。所以 0101 将作为 00000101 存储在 8 位机器中(如果它是负数,则扩展符号)。这一切都很好,直到我出于好奇用 Java 编写了一个程序以了解该系统的更多内部工作原理。

The method in question looks like this:

有问题的方法如下所示:

public void printBinaryRep(File f){
        try{
            FileInputStream inputStream = new FileInputStream(f);
            int next = 0;
            byte b = 0;
            while((next = inputStream.read()) != -1){
                b = (byte)next;
                System.out.println((char)next + " : "+Integer.toBinaryString(next));
            }
            inputStream.close();
        }
        catch(Exception e){System.out.println(e);}
 }

I got this output from a file that says Hello World

我从一个写着 Hello World 的文件中得到了这个输出

H : 1001000
e : 1100101
l : 1101100
l : 1101100
o : 1101111
  : 100000
W : 1010111
o : 1101111
r : 1110010
l : 1101100
d : 1100100

All of it looks fine except for the space. It has 6 bits instead of 8. I'm now wondering how all of that information is stored in memory. If all of it was stored in 8 bit chunks, like

除了空间之外,其他一切看起来都很好。它有 6 位而不是 8 位。我现在想知道所有这些信息是如何存储在内存中的。如果所有这些都存储在 8 位块中,例如

Hello: 10010001100101110110011011001101111

您好:10010001100101110110011011001101111

Then you can simply look at each 8 bit chunk and figure out what number it's representing (and then what ASCII code it's referring to). How does it work when a different sized character (like the 6 bit space and the 4 bit /n ) is stored along with them?? Then wouldn't storing a small number in a large bit space waste a lot of bits?

然后你可以简单地查看每个 8 位块并找出它代表的数字(然后它指的是什么 ASCII 代码)。当不同大小的字符(如 6 位空间和 4 位 /n )与它们一起存储时,它是如何工作的?那么在大的位空间中存储一个小数是不是会浪费很多位呢?

I think I have some of the fundamental understanding wrong (or maybe the program's wrong somewhere...). Sorry if the question sounds strange or too un-necessarily in-depth. I just want to know. I've done some googling, but it didn't come up with anything relevent. If you can let me know where I've gone wrong or point me in the right direction, I'd greatly appreciate it. Thanks!

我想我有一些基本的理解错误(或者程序的某个地方可能是错误的......)。对不起,如果问题听起来很奇怪或过于深入。我只是想知道。我已经做了一些谷歌搜索,但它没有想出任何相关的东西。如果您能让我知道我哪里出错了或指出我正确的方向,我将不胜感激。谢谢!

回答by John Millikin

You'll be better off experimenting in C and/or assembly, rather than Java. Those languages are lower-level and expose the address space directly.

最好在 C 和/或汇编中进行试验,而不是在 Java 中进行试验。这些语言是低级的,直接暴露地址空间。

I used to think that each memory location contains 8, 16, 32 or 64 bits. So 0101 would be stored in an 8 bit machine as 00000101 (sign extended if it was negative). This was all fine and dandy until I wrote a program in java out of curiosity to find out some more inner workings of this system.

我曾经认为每个内存位置包含 8、16、32 或 64 位。所以 0101 将作为 00000101 存储在 8 位机器中(如果它是负数,则扩展符号)。这一切都很好,直到我出于好奇用 Java 编写了一个程序以了解该系统的更多内部工作原理。

All memory locations in x86 systems contain 8 bits (1 byte). If a value contains more data than can fit into a single byte, it is stored using multiple bytes. For example, in C, the "float" type is stored using 4 bytes (32 bits).

x86 系统中的所有内存位置都包含 8 位(1 字节)。如果一个值包含的数据多于单个字节所能容纳的数据,则使用多个字节进行存储。例如,在 C 中,“float”类型使用 4 个字节(32 位)存储。

All of it looks fine except for the space. It has 6 bits instead of 8. I'm now wondering how all of that information is stored in memory. If all of it was stored in 8 bit chunks, like

除了空间之外,其他一切看起来都很好。它有 6 位而不是 8 位。我现在想知道所有这些信息是如何存储在内存中的。如果所有这些都存储在 8 位块中,例如

The space is also stored in a single byte. Your print code is forgetting to pad out to 8 spaces. 100000 == 00100000 == 0x20.

该空间也存储在单个字节中。您的打印代码忘记填充到 8 个空格。100000 == 00100000 == 0x20。

回答by Stephen Canon

The space has 8 bits too. It's just that Integer.toBinaryString doesn't print leading 0bits the way you used it.

该空间也有 8 位。只是 Integer.toBinaryString 不会0按照您使用它的方式打印前导位。

With all the leading 0bits, it actually looks like this in memory:

对于所有前导0位,它在内存中实际上是这样的:

H : 01001000
e : 01100101
l : 01101100
l : 01101100
o : 01101111
  : 00100000
W : 01010111
o : 01101111
r : 01110010
l : 01101100
d : 01100100

回答by JSB????

Your original intuition was (mostly) correct: all memory locations consist of the same number of bits. On all modern machines, there are eight bits in a "byte", where a byte is the smallest chunk of memory that the machine can access individually.

您最初的直觉(大部分)是正确的:所有内存位置都由相同数量的位组成。在所有现代机器上,一个“字节”中有八位,其中一个字节是机器可以单独访问的最小内存块。

Look closely at your output. You have sevendigits in all of them except the space. The space just happens to begin with two zeroes in its binary representation, while the other letters begin with one.

仔细查看您的输出。除了空格之外,所有数字都有七位数字。空格恰好在其二进制表示中以两个零开头,而其他字母则以 1 开头。

回答by JCasso

Actually your approach is wrong. Encoding is very important here.

其实你的做法是错误的。编码在这里非常重要。

If you use ASCII then you can easily say that each character is stored in a byte (eight bits) but when encoding changes you cannot say that.

如果您使用 ASCII,那么您可以轻松地说每个字符都存储在一个字节(八位)中,但是当编码更改时,您不能这么说。

Eg: UTF-8 uses one to three bytes (8 to 24 bits) for each character on a string. That is why you will see an overload in which you can specify the encoding on inputstream object.

例如:UTF-8 为字符串中的每个字符使用一到三个字节(8 到 24 位)。这就是为什么您会看到一个重载,您可以在其中指定输入流对象的编码。

Choosing wrong input stream will absolutely cause a wrong string output. Thus you have to know the encoding of the file to understand which bit means what. Actually fileinputstream does this for you.

选择错误的输入流绝对会导致错误的字符串输出。因此,您必须知道文件的编码才能了解哪个位的含义。实际上 fileinputstream 为你做这件事。

If you store a digit as string it will take a char length in hard drive. Just like another character.

如果您将数字存储为字符串,它将在硬盘驱动器中占用一个字符长度。就像另一个角色一样。

However if you store 123456789 as string with ASCII encoding it will take 9*8 bits = 72 bits.

但是,如果您将 123456789 存储为带有 ASCII 编码的字符串,它将需要 9*8 位 = 72 位。

If you store this as integer, (note that integer's data width differs in different environments) it will only take 16 bits.

如果将其存储为整数,(注意整数的数据宽度在不同环境中有所不同),它将只需要 16 位。

Also you cannot be sure that

你也不能确定

H : 01001000
e : 01100101
l : 01101100
l : 01101100
o : 01101111
  : 00100000
W : 01010111
o : 01101111
r : 01110010
l : 01101100
d : 01100100
\n: 00001010

is stored in hard drive as H : 01001000 e : 01100101 l : 01101100 l : 01101100 o : 01101111 : 00100000 W : 01010111 o : 01101111 r : 01110010 l : 01101100 d : 01100100 \n: 00001010

被存储在硬盘驱动器作为H:01001000 E:01100101升:01101100升:01101100○:01101111:00100000宽:01010111○:01101111 R:01110010升:01101100 d:01100100 \ N:00001010

You cannot be sure of that. File System is not that simple. Maybe Hello is successive but World string is at the end of drive. Thats why there is defrag command.

你不能确定这一点。文件系统没有那么简单。也许 Hello 是连续的,但 World 字符串在驱动器的末尾。这就是为什么有 defrag 命令。

But if we talk about main memory (RAM) when you define a string i expect bits to be successive. At least in C it is. You define a string like that.

但是,如果我们在定义字符串时谈论主内存 (RAM),我希望位是连续的。至少在 C 中是这样。你定义一个这样的字符串。

char[100] value; // c is a char array. (there is no string type in c)

here value[0] is the first character of our string. And value only addresses to the char arrays location in memory.

这里 value[0] 是我们字符串的第一个字符。并且值仅地址到内存中的字符数组位置。

if value[0]'s address is 10 then value[1]'s address is 10+8 = 18.

如果 value[0] 的地址为 10,则 value[1] 的地址为 10+8 = 18。

回答by Artelius

The way computers store numbers can be compared to an odometer in a car. If the odometer has 4 digits, it stores the number 33 as "0033".

计算机存储数字的方式可以与汽车中的里程表进行比较。如果里程表有 4 位数字,它会将数字 33 存储为“0033”。

If someone asksyou what your mileage is, you aren't going to say "zero thousand zero hundred and thirty three". By default, Java doesn't either. (Although you can tell it to.)

如果有人你你的里程数是多少,你不会说“000033”。默认情况下,Java 也没有。(虽然你可以告诉它。)

Then wouldn't storing a small number in a large bit space waste a lot of bits?

那么在大的位空间中存储一个小数是不是会浪费很多位呢?

Well, not really. Suppose you had 11000100 in memory somewhere. How is the computer supposed to know whether this means 11000100, or 11000 followed by 100, or 1 followed by 1000 followed by 100, and so on?

嗯,不是真的。假设您在某处的内存中有 11000100。计算机如何知道这是否意味着 11000100,或 11000 后接 100,或 1 后接 1000 后接 100,等等?

Well, actually the computer is just following the program it is given (remember that a Java program is created partly by you and partly by the people who design Java). If you can create a viable system for saving bits, you can make the computer do it.

嗯,实际上计算机只是按照它给出的程序运行(请记住,Java 程序部分是由您创建的,部分是由设计 Java 的人创建的)。如果您可以创建一个可行的系统来保存位,您就可以让计算机做到这一点。

However, keep in mind that there's a tradeoff in terms of processor usage and programming difficulty. Since a typical computer can work with bytes muchmore quickly than it can with say, 7-bit or variable-bit numbers, storing ASCII codes in bytes is a very common choice for storing text.

但是,请记住,在处理器使用和编程难度方面存在权衡。由于典型计算机处理字节速度比处理 7 位或可变位数字快得多,因此以字节为单位存储 ASCII 代码是存储文本的一种非常常见的选择。

But let me return to your question.

但让我回到你的问题。

Then wouldn't storing a small number in a large bit space waste a lot of bits?

那么在大的位空间中存储一个小数是不是会浪费很多位呢?

Mathematically speaking, no. A branch of mathematics called Information Theorytells us that the number of bits which are absolutely necessary depends on the possibilities you want to encode and how likely each of them is.

从数学上讲,没有。一个称为信息论的数学分支告诉我们,绝对必要的位数取决于您想要编码的可能性以及每个可能性的可能性。

Let's suppose you have only a four letter alphabet (A, B, C, D), and use two-bit numbers (00, 01, 10, 11 respectively) to represent it. If each of these letters is equally likely, then the minimum number of bits required per letter (on average) is 2. In other words, there are nowasted bits even though A is 00 and B is 01.

假设您只有一个四个字母的字母表(A、B、C、D),并使用两位数字(分别为 00、01、10、11)来表示它。如果这些字母中的每一个都有相同的可能性,那么每个字母所需的最小位数(平均)为 2。换句话说,即使 A 是 00,B 是 01 ,也没有浪费的位。

On the other hand, if you use ASCII and encode A, B, C, D as the following 7-bit numbers:

另一方面,如果您使用 ASCII 并将 A、B、C、D 编码为以下 7 位数字:

A: 1000001
B: 1000010
C: 1000011
D: 1000100

then you are "wasting" 5 bits per letter (even though you're not "storing small numbers in a large bit space").

那么你就“浪费”了每个字母 5 位(即使你没有“在大位空间中存储小数字”)。

These sorts of considerations are important when designing compression algorithms, and not so important for everday applications. It's certainly important to understand bits and bytes if you wish to learn C.

在设计压缩算法时,这些考虑因素很重要,而对于日常应用程序则不那么重要。如果你想学习 C,理解位和字节当然很重要。

回答by dcrosta

According to the Java 4 API,

根据Java 4 API

The unsigned integer value is the argument plus 232 if the argument is negative; otherwise it is equal to the argument. This value is converted to a string of ASCII digits in binary (base 2) with no extra leading 0s.

如果参数为负,则无符号整数值是参数加上 232;否则它等于参数。该值被转换为二进制(基数为 2)的 ASCII 数字字符串,没有额外的前导 0。

In reality, data storage is actually much more complicated. For efficiencies in processing, most data types are stored at word-boundaries, which means 4 bytes on 32-bit machines, or 8 bytes on 64-bit machines. Arrays may be packed more closely, so that char [4]may end up using the same amount of "actual space" as char.

实际上,数据存储实际上要复杂得多。为了提高处理效率,大多数数据类型都存储在字边界处,这意味着在 32 位机器上为 4 个字节,在 64 位机器上为 8 个字节。数组可能会更紧密地打包,因此char [4]最终可能会使用与char.

Java is a virtual machine, and I'm not certain what memory architecture, if any, it uses.

Java 是一个虚拟机,我不确定它使用什么内存架构,如果有的话。

回答by Cobalt

That clears it up. My main problem was that I was overlooking the zeroes in the beginning. I was experimenting with this as I was reading more about compression algorithms (namely, gzip) I was assuming ASCII for all of this. Seeing the representation was not the goal of the program, but the different number of bits per word threw me off from the original goal of implementing a basic, index based compression for a file type I'm working on. I'll try to rewrite it in C once I have a proof of concept in Java.

这样就搞定了。我的主要问题是一开始我忽略了零。我正在试验这个,因为我正在阅读更多关于压缩算法(即 gzip)的信息,我假设所有这些都是 ASCII。看到表示不是程序的目标,但每个字的不同位数使我偏离了为我正在处理的文件类型实现基于索引的基本压缩的最初目标。一旦我在 Java 中获得了概念证明,我将尝试用 C 重写它。

Thanks!

谢谢!

回答by EthOmus

http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Integer.html#toBinaryString%28int%29
the specification of Integer.ToBinarys reads:

http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Integer.html#toBinaryString%28int%29
Integer.ToBinarys 的规范如下:

"This value is converted to a string of ASCII digits in binary (base 2) with no extra leading 0s"

“该值被转换为二进制(基数为 2)的 ASCII 数字字符串,没有额外的前导 0”

That you overlooked this fact is what led to your confusion.

你忽视了这一事实,这导致了你的困惑。