Java String.length() 和 String.getBytes().length 的区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16270994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-16 06:06:42  来源:igfitidea点击:

Difference between String.length() and String.getBytes().length

javastring

提问by Key

I am beginner and self-learning in Java programming. So, I want to know about difference between String.length()and String.getBytes().lengthin Java.

我是 Java 编程的初学者和自学者。所以,我想知道JavaString.length()String.getBytes().lengthJava之间的区别。

What is more suitable to check the length of the string?

什么更适合检查字符串的长度?

采纳答案by BeeOnRope

String.length()

字符串.length()

String.length()is the number of 16-bit UTF-16 code unitsneeded to represent the string. That is, it is the number of charvalues that are used to represent the string and thus also equal to toCharArray().length. For most characters used in western languages this is typically the same as the number of unicode characters (code points) in the string, but the number of code point will be less than the number of code units if any UTF-16 surrogate pairsare used. Such pairs are needed only to encode characters outside the BMPand are rarely usedin most writing (emoji are a common exception).

String.length()是表示字符串所需的 16 位UTF-16 代码单元的数量。也就是说,它是char用于表示字符串的值的数量,因此也等于toCharArray().length。在西方语言中使用的大多数字符这通常是相同的字符串中的Unicode字符(码点)的数目,但代码点的数量会比的代码单元的数量更少,如果任何UTF-16代理对被用于. 此类对仅用于对BMP之外的字符进行编码,并且在大多数写作中很少使用(表情符号是一个常见的例外)。

String.getBytes().length

String.getBytes().length

String.getBytes().lengthon the other hand is the number of bytes needed to represent your string in the platform's default encoding. For example, if the default encoding was UTF-16 (rare), it would be exactly 2x the value returned by String.length()(since each 16-bit code unit takes 2 bytes to represent). More commonly, your platform encoding will be a multi-byte encoding like UTF-8.

String.getBytes().length另一方面是在平台的默认编码中表示字符串所需的字节数。例如,如果默认编码是 UTF-16(很少见),则它正好是返回值的 2String.length()倍(因为每个 16 位代码单元需要 2 个字节来表示)。更常见的是,您的平台编码将是多字节编码,如 UTF-8。

This means the relationship between those two lengths are more complex. For ASCII strings, the two calls will almost always produce the same result (outside of unusual default encodings that don't encode the ASCII subset in 1 byte). Outside of ASCII strings, String.getBytes().lengthis likely to be longer, as it counts bytes needed to represent the string, while length()counts 2-byte code units.

这意味着这两个长度之间的关系更加复杂。对于 ASCII 字符串,这两个调用几乎总是会产生相同的结果(除了不将 ASCII 子集编码为 1 个字节的不寻常的默认编码之外)。在 ASCII 字符串之外,String.getBytes().length可能会更长,因为它计算表示字符串所需length()的字节数,同时计算 2 字节的代码单元。

Which is more suitable?

哪个更合适?

Usually you'll use String.length()in concert with other string methods that take offsets into the string. E.g., to get the last character, you'd use str.charAt(str.length()-1). You'd only use the getBytes().lengthif for some reason you were dealing with the array-of-bytes encoding returned by getBytes.

通常,您将String.length()与其他将偏移量带入字符串的字符串方法一起使用。例如,要获取最后一个字符,您可以使用str.charAt(str.length()-1). getBytes().length由于某种原因,您只能使用if 处理由getBytes.

回答by Andy Thomas

The length()method returns the length of the string in characters.

length()方法以字符为单位返回字符串的长度。

Characters may take more than a single byte. The expression String.getBytes().getLength()returns the length of the string in bytes, using the platform's default character set.

字符可能占用多个字节。该表达式String.getBytes().getLength()使用平台的默认字符集以字节为单位返回字符串的长度。

回答by FreeNickname

The string.length() method returns the quantity of symbols in string. While getBytes().length() returns number of bytes used to store those symbols. Usually chars are stored in UTF-16 encoding. So it takes 2 bytes to store one char. Check this SO answerout.

string.length() 方法返回字符串中的符号数量。getBytes().length() 返回用于存储这些符号的字节数。通常字符以 UTF-16 编码存储。所以存储一个字符需要 2 个字节。检查这个SO 答案

I hope that it will help :)

我希望它会有所帮助:)

回答by PixelsTech

In short, String.length() returns the number of characters in the string while String.getBytes().length returns the number of bytes to represent the characters in the string with specified encoding.

简而言之,String.length() 返回字符串中的字符数,而 String.getBytes().length 返回字节数,以表示具有指定编码的字符串中的字符。

In many cases, String.length() will have the same value as String.getBytes().length. But in cases like encoding UTF-8 and the character has value over 127, String.length() will not be the same as String.getBytes().length. Here is an examplewhich explains how characters in string is converted to bytes when calling String.getBytes(). This should give you a sense of the difference between String.length() and String.getBytes().length.

在许多情况下,String.length() 将具有与 String.getBytes().length 相同的值。但是在编码 UTF-8 并且字符值超过 127 的情况下,String.length() 将与 String.getBytes().length 不同。这是一个示例,它解释了在调用 String.getBytes() 时如何将字符串中的字符转换为字节。这应该让您了解 String.length() 和 String.getBytes().length 之间的区别。