Java:字符与字符串字节大小。
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9825283/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java : Char vs String byte size .
提问by jayunit100
Hi guys : I was surprised to find that the following code
大家好:我很惊讶地发现以下代码
System.out.println("Character size:"+Character.SIZE/8);
System.out.println("String size:"+"a".getBytes().length);
Outputs this :
输出这个:
Character size:2
String size:1
字符大小:2
字符串大小:1
I would assume that a single character string should take up the same (or more ) bytes than a single char.
我会假设单个字符串应该占用与单个字符相同(或更多)的字节。
In particular im wondering ---
我特别想知道 ---
If I have a java bean with several fields in it, how its size will increase depending on the nature of the fields (Character, String, Boolean, Vector, etc...) I'm assuming that all java objects have some (probably minimal) footprint, and that one of the smallest of these footprints would be a single character. So.. To test that basic assumption I started with the above code - and the results of the print statements seem counterintuitive.
如果我有一个包含多个字段的 java bean,它的大小将如何增加,具体取决于字段的性质(字符、字符串、布尔值、向量等...)我假设所有 java 对象都有一些(可能最小)足迹,并且这些足迹中最小的一个将是单个字符。所以.. 为了测试这个基本假设,我从上面的代码开始——打印语句的结果似乎违反直觉。
Any insights into the way java stores/serializes characters vs strings by default would be very helpful... thanks.
任何有关默认情况下 java 存储/序列化字符与字符串的方式的见解都会非常有帮助......谢谢。
采纳答案by Thorsten S.
getBytes()
outputs the String
with the default encoding (most likely ISO-8859-1
) while the internal character char has always 2 bytes. Internally Java uses always char arrays with a 2 byte char, if you want to know more about encoding, read the link by Oded in the question comments.
getBytes()
String
使用默认编码(最有可能ISO-8859-1
)输出 ,而内部字符 char 始终为 2 个字节。Java 在内部始终使用带有 2 字节字符的 char 数组,如果您想了解有关编码的更多信息,请阅读问题评论中 Oded 的链接。
回答by scravy
The SIZE of a Character is the storage needed for a char, which is 16 bit. The length of a string (also the length of the underlying char-array or bytes-array) is the number of characters (or bytes), not a size in bit.
Character 的 SIZE 是 char 所需的存储空间,为 16 位。字符串的长度(也是底层 char-array 或 bytes-array 的长度)是字符(或字节)的数量,而不是以位为单位的大小。
That's why you had do to the division by 8 for the size, but not for the length. The length needs to be multiplied by two.
这就是为什么您必须将大小除以 8,而不是将长度除以 8。长度需要乘以2。
Also note that you will get other lengths for the byte-array if you specify a different encoding. In this case a transformation to a single- or varying-size encoding was performed when doing getBytes().
另请注意,如果您指定不同的编码,您将获得字节数组的其他长度。在这种情况下,在执行 getBytes() 时执行了到单一或可变大小编码的转换。
See: http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#getBytes(java.nio.charset.Charset)
请参阅:http: //docs.oracle.com/javase/6/docs/api/java/lang/String.html#getBytes(java.nio.charset.Charset)
回答by Nav
I would like to say what i think,correct me if i am wrong but you are finding the length of the string which is correctly it is showing as 1 as you have only 1 character in the string. length shows the length not the size . length and size are two different things.
我想说一下我的想法,如果我错了,请纠正我,但您正在找到正确的字符串长度,它显示为 1,因为字符串中只有 1 个字符。length 显示长度而不是大小。长度和大小是两个不同的东西。
check this Link.. you are finding the number of bytes occupied in the wrong way
检查此链接.. 您发现以错误方式占用的字节数
回答by Alex Stybaev
well, you have that 1 char in char array has the size of 2 bytes and that your String contains is 1 character long, not that it has 1 byte size.
好吧,您在 char 数组中的 1 个字符的大小为 2 个字节,而您的 String 包含的长度为 1 个字符,而不是它的大小为 1 个字节。
The String
object in Java consists of:
String
Java 中的对象包括:
private final char value[];
private final int offset;
private final int count;
private int hash;
only this should assure you that anyway the String
object is bigger then char
array.
If you want to learn more about how object's size you can also read about the object headers and multiplicity factor for char arrays. For example hereor here.
只有这样才能向您保证无论如何String
对象都比char
数组大。如果您想了解更多关于对象大小的信息,您还可以阅读有关对象标题和字符数组的多重因子的信息。例如这里或这里。
回答by Koray Tugay
I want to add some code first and then a bit of explanation:
我想先添加一些代码,然后再做一些解释:
import java.nio.charset.Charset;
public class Main {
public static void main(String[] args) {
System.out.println("Character size: " + Character.SIZE / 8);
final byte[] bytes = "a".getBytes(Charset.forName("UTF-16"));
System.out.println("String size: " + bytes.length);
sprintByteAsHex(bytes[0]);
sprintByteAsHex(bytes[1]);
sprintByteAsHex(bytes[2]);
sprintByteAsHex(bytes[3]);
}
static void sprintByteAsHex(byte b) {
System.out.print((Integer.toHexString((b & 0xFF))));
}
}
And the output will be:
输出将是:
Character size: 2
String size: 4
feff061
So what you are actually missing is, you are not providing any parameter to the getBytesmethod. Probably, you are getting the bytes for UTF-8 representation of the character 'a'.
因此,您实际上缺少的是,您没有向getBytes方法提供任何参数。可能,您正在获取字符 'a' 的 UTF-8 表示的字节。
Well, but why did we get 4 bytes, when we asked for UTF-16? Ok, Java uses UTF-16 internally, then we should have gotten 2 bytes right?
好吧,但是当我们要求 UTF-16 时,为什么我们得到了 4 个字节?好吧,Java 内部使用 UTF-16,那么我们应该得到 2 个字节吧?
If you examine the output:
如果您检查输出:
feff061
Java actually returned us a BOM: https://en.wikipedia.org/wiki/Byte_order_mark.
Java 实际上返回了一个 BOM:https: //en.wikipedia.org/wiki/Byte_order_mark。
So the first 2 bytes: feff is required for signalling that following bytes will be UTF-16 Big Endian. Please see the Wikipedia page for further information.
因此,前 2 个字节: feff 需要用于表示随后的字节将是 UTF-16 Big Endian。请参阅维基百科页面以获取更多信息。
The remaining 2 bytes: 0061 is the 2 byte representation of the character "a" you have. Can be verified from: http://www.fileformat.info/info/unicode/char/0061/index.htm
剩下的 2 个字节:0061 是您拥有的字符“a”的 2 个字节表示。可以从:http: //www.fileformat.info/info/unicode/char/0061/index.htm验证
So yes, a character in Java is 2 bytes, but when you ask for bytes without a specific encoding, you may not always get 2 bytes since different encodings will require different amount of bytes for various characters.
所以是的,Java 中的一个字符是 2 个字节,但是当您要求没有特定编码的字节时,您可能并不总是得到 2 个字节,因为不同的编码将需要不同数量的字节用于各种字符。