为什么 Java char 原语占用 2 个字节的内存?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3956734/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does the Java char primitive take up 2 bytes of memory?
提问by realnumber
Is there any reason why Java char primitive data type is 2 bytes unlike C which is 1 byte?
Java char 原始数据类型是 2 个字节,而 C 是 1 个字节,这有什么原因吗?
Thanks
谢谢
回答by Vijay Mathew
char
in Java is UTF-16 encoded, which requires a minimum of 16-bits of storage for each character.
char
Java 中的 UTF-16 编码,每个字符至少需要 16 位的存储空间。
回答by Matthew Flaschen
When Java was originally designed, it was anticipated that any Unicode character would fit in 2 bytes (16 bits), so char
and Character
were designed accordingly. In fact, a Unicode character can now require up to 4 bytes. Thus, UTF-16, the internal Java encoding, requires supplementary characters use 2 code units. Characters in the Basic Multilingual Plane (the most common ones) still use 1. A Java char
is used for each code unit. This Sun articleexplains it well.
最初设计 Java 时,预计任何 Unicode 字符都可以放入 2 个字节(16 位),因此char
并Character
进行了相应的设计。事实上,一个 Unicode 字符现在最多需要 4 个字节。因此,内部 Java 编码 UTF-16 要求补充字符使用 2 个代码单元。基本多语言平面中的字符(最常见的)仍然使用 1。char
每个代码单元使用一个 Java 。这孙文章解释得很好。
回答by DarkDust
In Java, a character is encoded in UTF-16which uses 2 bytes, while a normal C string is more or less just a bunch of bytes. When C was designed, using ASCII(which only covers the english language character set) was deemed sufficient, while the Java designers already accounted for internationalization. If you want to use Unicode with C strings, the UTF-8encoding is the preferred way as it has ASCII as a subset and does not use the 0 byte (unlike UTF-16), which is used as a end-of-string marker in C. Such an end-of-string marker is not necessary in Java as a string is a complex type here, with an explicit length.
在 Java 中,字符使用UTF-16编码,使用 2 个字节,而普通的 C 字符串或多或少只是一堆字节。在设计 C 时,认为使用ASCII(仅涵盖英文字符集)就足够了,而 Java 设计者已经考虑了国际化。如果要将 Unicode 与 C 字符串一起使用,UTF-8编码是首选方式,因为它以 ASCII 作为子集,并且不使用 0 字节(与 UTF-16 不同),后者用作字符串结尾C 中的标记。这样的字符串结束标记在 Java 中不是必需的,因为这里的字符串是复杂类型,具有显式长度。
回答by Master Amit
Java used as a internationalize so, its work in different languages and need to space more than one byte, that's why its take 2byte of space in char. for eg the chinese language can't hanfle one byte of char.
Java用作国际化,因此它在不同的语言中工作并且需要多于一个字节的空间,这就是为什么它在char中占用2个字节的空间。例如,中文无法处理一字节的字符。
回答by tilak
In previous languages like C ASCIInotations are used. And the range is 127 , for 127 unique symbolsand language characters.
在以前的语言中,像 C ASCII符号被使用。范围为 127,表示 127 个唯一符号和语言字符。
While JAVA comes with a feature called "INTERNATIONALIZATION", that is all the Human Readablecharacters(Including Regional symbols) are also added into it , and the range is also increased , so more the memory required , the system to unify all these symbols is "Standard Unicode System", and so that this Unificationrequires that additional byte in JAVA.
而JAVA自带了一个叫做“INTERNATIONALIZATION”的特性,就是把所有人类可读的字符(包括区域符号)都加进去了,范围也增加了,所以需要更多的内存,统一这些符号的系统是“标准 Unicode 系统”,因此这个统一需要 JAVA 中的额外字节。
The first byte remains as it is and ASCII characters are ranged to 127 as in C,C++ but unified characters are than appended to them.
第一个字节保持原样,ASCII 字符范围为 127,如在 C、C++ 中一样,但统一字符被附加到它们之后。
So 16-bits for char in JAVA and 8-bits for char in C.
所以 JAVA 中的 char 为 16 位,C 中的 char 为 8 位。
回答by Tikayat mohanta
As we know c suppors ASCII where as java supports Unicode which contains 3 things that is 1-ASCII 2-extended ASCII 3-local language character ASCII is a subset of unicode.ASCII supports only English language where as Unicode supports multinationals language.otherwise java character is encoded within UTF-16 which uses 2 byte.for all of the reason and as the Unicode is the extended version of ASCII ,so it uses 16 bit insted of 8 bit.
正如我们所知,c 支持 ASCII,因为 java 支持 Unicode,其中包含 3 个东西,即 1-ASCII 2-扩展 ASCII 3-本地语言字符 ASCII 是 unicode 的子集。ASCII 仅支持英语,而 Unicode 支持多国语言。否则 java字符在使用 2 字节的 UTF-16 中编码。出于所有原因,并且由于 Unicode 是 ASCII 的扩展版本,因此它使用 8 位的 16 位插入。
回答by Zeyu
The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
char 数据类型是单个 16 位 Unicode 字符。它的最小值为 '\u0000'(或 0),最大值为 '\uffff'(或 65,535)。