Java中字符串的字节数

Question

提问by Green

In Java, if I have a String x, how can I calculate the number of bytes in that string?

在 Java 中，如果我有一个 String x，我如何计算该字符串中的字节数？

Answer 1

回答by Boris Pavlovi?

According to How to convert Strings to and from UTF8 byte arrays in Java:

String s = "some text here";
byte[] b = s.getBytes("UTF-8");
System.out.println(b.length);

Answer 2

回答by Andrei Ciobanu

There's a method called getBytes(). Use it wisely .

有一个方法叫做getBytes()。明智地使用它。

Answer 3

回答by Andrzej Doyle

A string is a list of characters(i.e. code points). The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes.

字符串是字符列表（即代码点）。用于表示字符串的字节数完全取决于您使用哪种编码将其转换为字节。

That said, you can turn the string into a byte array and then look at its size as follows:

也就是说，您可以将字符串转换为字节数组，然后按如下方式查看其大小：

// The input string for this test
final String string = "Hello World";

// Check length, in characters
System.out.println(string.length()); // prints "11"

// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"

final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"

final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"

final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"

final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"

So you see, even a simple "ASCII" string can have different number of bytes in its representation, depending which encoding is used. Use whichever character set you're interested in for your case, as the argument to getBytes(). And don't fall into the trap of assuming that UTF-8 represents everycharacter as a single byte, as that's not true either:

所以你看，即使是一个简单的“ASCII”字符串在其表示中也可以有不同数量的字节，这取决于使用的编码。使用您对案例感兴趣的任何字符集作为getBytes(). 并且不要陷入假设 UTF-8 将每个字符表示为单个字节的陷阱，因为这也不是真的：

final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms

// Check length, in characters
System.out.println(interesting.length()); // prints "4"

// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"

final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"

final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"

final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")

final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")

(Note that if you don't provide a character set argument, the platform's default character setis used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)

（请注意，如果您不提供字符集参数，则使用平台的默认字符集。这在某些上下文中可能很有用，但通常您应该避免依赖默认值，并在编码时始终使用显式字符集/需要解码。）

Answer 4

回答by ant

Try this :

尝试这个：

Bytes.toBytes(x).length

Assuming you declared and initialized x before

假设您之前声明并初始化了 x

Answer 5

回答by Andreas Dolk

A Stringinstance allocates a certain amount of bytes in memory. Maybe you're looking at something like sizeof("Hello World")which would return the number of bytes allocated by the datastructure itself?

一个String实例在内存中分配一定数量的字节。也许您正在寻找类似的东西sizeof("Hello World")，它会返回数据结构本身分配的字节数？

In Java, there's usually no need for a sizeoffunction, because we never allocate memory to store a data structure. We can have a look at the String.javafile for a rough estimation, and we see some 'int', some references and a char[]. The Java language specificationdefines, that a charranges from 0 to 65535, so two bytes are sufficient to keep a single char in memory. But a JVM does not have to store one char in 2 bytes, it only has to guarantee, that the implementation of charcan hold values of the defines range.

在 Java 中，通常不需要sizeof函数，因为我们从不分配内存来存储数据结构。我们可以查看String.java文件进行粗略估计，我们会看到一些“int”、一些引用和一个char[]. 在Java语言规范定义，一个char范围为0〜65535，所以两个字节是足以保持一个单个字符在存储器中。但是 JVM 不必在 2 个字节中存储一个字符，它只需要保证的实现char可以保存定义范围的值。

So sizeofreally does not make any sense in Java. But, assuming that we have a large String and one charallocates two bytes, then the memory footprint of a Stringobject is at least 2 * str.length()in bytes.

所以sizeof在Java中真的没有任何意义。但是，假设我们有一个很大的 String 并且一个char分配了两个字节，那么String对象的内存占用至少2 * str.length()以字节为单位。

Answer 6

回答by finnw

The pedantic answer (though not necessarily the most useful one, depending on what you want to do with the result) is:

迂腐的答案（虽然不一定是最有用的，取决于你想对结果做什么）是：

string.length() * 2

Java strings are physically stored in UTF-16BEencoding, which uses 2 bytes per code unit, and String.length()measures the length in UTF-16 code units, so this is equivalent to:

Java 字符串以UTF-16BE编码方式物理存储，每个代码单元使用 2 个字节，并String.length()以 UTF-16 代码单元测量长度，因此这等效于：

final byte[] utf16Bytes= string.getBytes("UTF-16BE");
System.out.println(utf16Bytes.length);

And this will tell you the size of the internal chararray, in bytes.

这将告诉您内部char数组的大小，以字节为单位。

Note: "UTF-16"will give a different result from "UTF-16BE"as the former encoding will insert a BOM, adding 2 bytes to the length of the array.

注意："UTF-16"将给出"UTF-16BE"与前一种编码不同的结果，因为前一种编码将插入一个BOM，将 2 个字节添加到数组的长度。

Answer 7

回答by roozbeh

If you're running with 64-bit references:

如果您使用 64 位引用运行：

sizeof(string) = 
8 + // object header used by the VM
8 + // 64-bit reference to char array (value)
8 + string.length() * 2 + // character array itself (object header + 16-bit chars)
4 + // offset integer
4 + // count integer
4 + // cached hash code

In other words:

换句话说：

sizeof(string) = 36 + string.length() * 2

On a 32-bit VM or a 64-bit VM with compressed OOPs (-XX:+UseCompressedOops), the references are 4 bytes. So the total would be:

在具有压缩 OOP (-XX:+UseCompressedOops) 的 32 位 VM 或 64 位 VM 上，引用为 4 个字节。所以总数将是：

sizeof(string) = 32 + string.length() * 2

This does not take into account the references to the string object.

这不考虑对字符串对象的引用。

Answer 8

回答by radu_paun

To avoid try catch, use:

为避免尝试捕获，请使用：

String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
System.out.println(b.length);

Java中字符串的字节数

提问by Green

回答by Boris Pavlovi?

回答by Andrei Ciobanu

回答by Andrzej Doyle

回答by ant

回答by Andreas Dolk

回答by finnw

回答by roozbeh

回答by radu_paun

相关推荐

最近更新

标签

Java中字符串的字节数

提问by Green

回答by Boris Pavlovi?

回答by Andrei Ciobanu

回答by Andrzej Doyle

回答by ant

回答by Andreas Dolk

回答by finnw

回答by roozbeh

回答by radu_paun

相关推荐

java maven 重建依赖

适用于 Android 的 Java REST 客户端 API

Java 从另一个线程更新 SWT 对象

Java 静态数组

相关推荐

最近更新

标签