Java 字符大小是 8 位还是 16 位?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24095187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 10:07:21  来源:igfitidea点击:

Char size 8 bit or 16 bit?

javacharbyte

提问by user3198603

http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html, char size is 16 bit i.e 2 byte. somehow i recalled its 8 bit i.e 1 byte. To clear my doubt, i created a text file with single character "a" and saved it. Then i inspected the size of file , its 1 byte i.e 8 bit. I am confused whats the size of character ? If its 2 byte , why file size is 1 byte and if it is 1 byte why link says 2 bytes?

http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html,字符大小为 16 位,即 2 字节。不知何故,我想起了它的 8 位,即 1 个字节。为了消除我的疑问,我创建了一个带有单个字符“a”的文本文件并保存了它。然后我检查了文件的大小,它是 1 个字节,即 8 位。我很困惑字符的大小是多少?如果是 2 字节,为什么文件大小是 1 字节,如果是 1 字节,为什么链接说是 2 字节?

采纳答案by Jon Skeet

A charin Java is a UTF-16code unit. It's not necessarily a complete Unicode character, but it's effectively an unsigned 16-bit integer.

charJava 中的A是一个UTF-16代码单元。它不一定是一个完整的 Unicode 字符,但它实际上是一个无符号的 16 位整数。

When you write text to a file (or in some other way convert it into a sequence of bytes), then the data will depend on which encodingyou use. For example, if you use ASCII or ISO-8859-1 then you're very limited as to which characters you can write, but each character will only be a byte. If you use UTF-16, then each Java charwill be converted into exactly two bytes - but some Unicode characters may take four bytes (those represented by two Java charvalues).

当您将文本写入文件(或以其他方式将其转换为字节序列)时,数据将取决于您使用的编码。例如,如果您使用 ASCII 或 ISO-8859-1,那么您可以写入的字符非常有限,但每个字符只能是一个字节。如果您使用 UTF-16,那么每个 Javachar将被转换为恰好两个字节 - 但某些 Unicode 字符可能需要四个字节(由两个 Javachar值表示的那些)。

If you use UTF-8, then the length of even a single Java charin the encoded form will depend on the value.

如果您使用UTF-8,那么即使是char编码形式的单个 Java 的长度也将取决于该值。

回答by vogomatix

Note that text files really have a format/ character set associated with them. Text files will normally be saved in UTF-8 format which is 8 bits per character unless the character is "special".

请注意,文本文件确实具有与之关联的格式/字符集。文本文件通常以 UTF-8 格式保存,即每个字符 8 位,除非字符是“特殊”字符。

回答by Ali Gajani

A char in Java is 2 bytes large (as the valid value range suggests). But it doesn't necessarily mean that every representation of a character is 2 bytes long. For instance, many encodings would only reserve 1 byte for every character (or use 1 byte for the most frequent characters).If the platform default encoding is a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8, it can easily convert that 1 byte to a single character.

Java 中的字符大小为 2 个字节(如有效值范围所示)。但这并不一定意味着字符的每个表示都是 2 个字节长。例如,许多编码只会为每个字符保留 1 个字节(或为最频繁的字符使用 1 个字节)。如果平台默认编码是 1 字节编码,例如 ISO-8859-1 或可变长度编码,例如作为 UTF-8,它可以轻松地将 1 个字节转换为单个字符。

回答by snr

There is a contemporary way to learn its size. Just print with BYTES.

有一种现代的方式来了解它的大小。只需打印BYTES.

System.out.println(Character.BYTES);

It results in 2

它导致 2