Java中字符串的字节数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4385623/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bytes of a string in Java
提问by Green
In Java, if I have a String x
, how can I calculate the number of bytes in that string?
在 Java 中,如果我有一个 String x
,我如何计算该字符串中的字节数?
回答by Boris Pavlovi?
According to How to convert Strings to and from UTF8 byte arrays in Java:
根据如何在 Java 中将字符串与 UTF8 字节数组相互转换:
String s = "some text here";
byte[] b = s.getBytes("UTF-8");
System.out.println(b.length);
回答by Andrei Ciobanu
There's a method called getBytes(). Use it wisely .
有一个方法叫做getBytes()。明智地使用它。
回答by Andrzej Doyle
A string is a list of characters(i.e. code points). The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes.
字符串是字符列表(即代码点)。用于表示字符串的字节数完全取决于您使用哪种编码将其转换为字节。
That said, you can turn the string into a byte array and then look at its size as follows:
也就是说,您可以将字符串转换为字节数组,然后按如下方式查看其大小:
// The input string for this test
final String string = "Hello World";
// Check length, in characters
System.out.println(string.length()); // prints "11"
// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"
final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"
final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"
final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"
final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"
So you see, even a simple "ASCII" string can have different number of bytes in its representation, depending which encoding is used. Use whichever character set you're interested in for your case, as the argument to getBytes()
. And don't fall into the trap of assuming that UTF-8 represents everycharacter as a single byte, as that's not true either:
所以你看,即使是一个简单的“ASCII”字符串在其表示中也可以有不同数量的字节,这取决于使用的编码。使用您对案例感兴趣的任何字符集作为getBytes()
. 并且不要陷入假设 UTF-8 将每个字符表示为单个字节的陷阱,因为这也不是真的:
final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms
// Check length, in characters
System.out.println(interesting.length()); // prints "4"
// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"
final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"
final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"
final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")
final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")
(Note that if you don't provide a character set argument, the platform's default character setis used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)
(请注意,如果您不提供字符集参数,则使用平台的默认字符集。这在某些上下文中可能很有用,但通常您应该避免依赖默认值,并在编码时始终使用显式字符集/需要解码。)
回答by ant
Try this :
尝试这个 :
Bytes.toBytes(x).length
Assuming you declared and initialized x before
假设您之前声明并初始化了 x
回答by Andreas Dolk
A String
instance allocates a certain amount of bytes in memory. Maybe you're looking at something like sizeof("Hello World")
which would return the number of bytes allocated by the datastructure itself?
一个String
实例在内存中分配一定数量的字节。也许您正在寻找类似的东西sizeof("Hello World")
,它会返回数据结构本身分配的字节数?
In Java, there's usually no need for a sizeof
function, because we never allocate memory to store a data structure. We can have a look at the String.java
file for a rough estimation, and we see some 'int', some references and a char[]
. The Java language specificationdefines, that a char
ranges from 0 to 65535, so two bytes are sufficient to keep a single char in memory. But a JVM does not have to store one char in 2 bytes, it only has to guarantee, that the implementation of char
can hold values of the defines range.
在 Java 中,通常不需要sizeof
函数,因为我们从不分配内存来存储数据结构。我们可以查看String.java
文件进行粗略估计,我们会看到一些“int”、一些引用和一个char[]
. 在Java语言规范定义,一个char
范围为0〜65535,所以两个字节是足以保持一个单个字符在存储器中。但是 JVM 不必在 2 个字节中存储一个字符,它只需要保证 的实现char
可以保存定义范围的值。
So sizeof
really does not make any sense in Java. But, assuming that we have a large String and one char
allocates two bytes, then the memory footprint of a String
object is at least 2 * str.length()
in bytes.
所以sizeof
在Java中真的没有任何意义。但是,假设我们有一个很大的 String 并且一个char
分配了两个字节,那么String
对象的内存占用至少2 * str.length()
以字节为单位。
回答by finnw
The pedantic answer (though not necessarily the most useful one, depending on what you want to do with the result) is:
迂腐的答案(虽然不一定是最有用的,取决于你想对结果做什么)是:
string.length() * 2
Java strings are physically stored in UTF-16BE
encoding, which uses 2 bytes per code unit, and String.length()
measures the length in UTF-16 code units, so this is equivalent to:
Java 字符串以UTF-16BE
编码方式物理存储,每个代码单元使用 2 个字节,并String.length()
以 UTF-16 代码单元测量长度,因此这等效于:
final byte[] utf16Bytes= string.getBytes("UTF-16BE");
System.out.println(utf16Bytes.length);
And this will tell you the size of the internal char
array, in bytes.
这将告诉您内部char
数组的大小,以字节为单位。
Note: "UTF-16"
will give a different result from "UTF-16BE"
as the former encoding will insert a BOM, adding 2 bytes to the length of the array.
注意:"UTF-16"
将给出"UTF-16BE"
与前一种编码不同的结果,因为前一种编码将插入一个BOM,将 2 个字节添加到数组的长度。
回答by roozbeh
If you're running with 64-bit references:
如果您使用 64 位引用运行:
sizeof(string) =
8 + // object header used by the VM
8 + // 64-bit reference to char array (value)
8 + string.length() * 2 + // character array itself (object header + 16-bit chars)
4 + // offset integer
4 + // count integer
4 + // cached hash code
In other words:
换句话说:
sizeof(string) = 36 + string.length() * 2
On a 32-bit VM or a 64-bit VM with compressed OOPs (-XX:+UseCompressedOops), the references are 4 bytes. So the total would be:
在具有压缩 OOP (-XX:+UseCompressedOops) 的 32 位 VM 或 64 位 VM 上,引用为 4 个字节。所以总数将是:
sizeof(string) = 32 + string.length() * 2
This does not take into account the references to the string object.
这不考虑对字符串对象的引用。
回答by radu_paun
To avoid try catch, use:
为避免尝试捕获,请使用:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
System.out.println(b.length);