Java 中字符串的最大长度 - 调用 length() 方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/816142/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 19:48:04  来源:igfitidea点击:

String's Maximum length in Java - calling length() method

javastring

提问by taichi

In Java, what is the maximum size a Stringobject may have, referring to the length()method call?

Java 中String指的是length()方法调用,对象可能具有的最大大小是多少?

I know that length()return the size of a Stringas a char [];

我知道将length()a 的大小返回String为 a char []

采纳答案by coobird

Considering the Stringclass' lengthmethod returns an int, the maximum length that would be returned by the method would be Integer.MAX_VALUE, which is 2^31 - 1(or approximately 2 billion.)

考虑到String类的length方法返回int,该方法返回的最大长度为Integer.MAX_VALUE,即2^31 - 1(或大约 20 亿)。

In terms of lengths and indexing of arrays, (such as char[], which is probably the way the internal data representation is implemented for Strings), Chapter 10: Arraysof The Java Language Specification, Java SE 7 Editionsays the following:

在长度和数组的索引,(如而言char[],这可能是内部数据表示为实现方式StringS),第10章:数组Java语言规范,Java SE 7中版说以下内容:

The variables contained in an array have no names; instead they are referenced by array access expressions that use nonnegative integer index values. These variables are called the componentsof the array. If an array has ncomponents, we say nis the lengthof the array; the components of the array are referenced using integer indices from 0to n - 1, inclusive.

数组中包含的变量没有名称;相反,它们由使用非负整数索引值的数组访问表达式引用。这些变量称为数组的 组件。如果数组有n分量,我们说n是数组的 长度;数组的组成部分使用从0到 的整数索引引用n - 1,包括。

Furthermore, the indexing must be by intvalues, as mentioned in Section 10.4:

此外,索引必须按int值进行,如第 10.4 节所述

Arrays must be indexed by intvalues;

数组必须按int值索引;

Therefore, it appears that the limit is indeed 2^31 - 1, as that is the maximum value for a nonnegative intvalue.

因此,似乎极限确实是2^31 - 1,因为这是非负值的int最大值。

However, there probably are going to be other limitations, such as the maximum allocatable size for an array.

但是,可能会有其他限制,例如数组的最大可分配大小。

回答by Michael Myers

Since arrays must be indexed with integers, the maximum length of an array is Integer.MAX_INT(231-1, or 2 147 483 647). This is assuming you have enough memory to hold an array of that size, of course.

由于数组必须以整数作为索引,因此数组的最大长度为Integer.MAX_INT(2 31-1 或 2 147 483 647)。当然,这是假设您有足够的内存来保存该大小的数组。

回答by Francis

apparently it's bound to an int, which is 0x7FFFFFFF (2147483647).

显然它绑定到一个整数,即 0x7FFFFFFF (2147483647)。

回答by Takahiko Kawasaki

java.io.DataInput.readUTF()and java.io.DataOutput.writeUTF(String)say that a Stringobject is represented by two bytesof length information and the modified UTF-8representation of every character in the string. This concludes that the length of String is limited by the number of bytes of the modified UTF-8 representation of the string when used with DataInputand DataOutput.

java.io.DataInput.readUTF()java.io.DataOutput.writeUTF(String)说一个String对象被表示为两个字节的长度信息和UTF-8修改字符串中的每个字符的表示。这得出结论,当与DataInput和 一起使用时,字符串的长度受字符串的修改后的 UTF-8 表示的字节数限制DataOutput

In addition, The specification of CONSTANT_Utf8_infofound in the Java virtual machine specification defines the structure as follows.

此外,所述的说明书中CONSTANT_Utf8_info发现,在Java虚拟机规范定义如下的结构。

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

You can find that the size of 'length' is two bytes.

你可以发现'length'的大小是两个字节

That the return type of a certain method (e.g. String.length()) is intdoes not always mean that its allowed maximum value is Integer.MAX_VALUE. Instead, in most cases, intis chosen just for performance reasons. The Java language specification says that integers whose size is smaller than that of intare converted to intbefore calculation (if my memory serves me correctly) and it is one reason to choose intwhen there is no special reason.

某个方法的返回类型(例如String.length()int并不总是意味着其允许的最大值是Integer.MAX_VALUE。相反,在大多数情况下,int选择只是出于性能原因。Java 语言规范说,大小小于 的整数在计算之前int转换为int(如果我没记错的话),并且int在没有特殊原因的情况下选择它的原因之一。

The maximum length at compilation time is at most 65536. Note again that the length is the number of bytes of the modified UTF-8representation, not the number of characters in a Stringobject.

编译时的最大长度最多为 65536。再次注意,长度是修改后的 UTF-8表示的字节数,而不是String对象中的字符数。

Stringobjects may be able to have much more characters at runtime. However, if you want to use Stringobjects with DataInputand DataOutputinterfaces, it is better to avoid using too long Stringobjects. I found this limitation when I implemented Objective-C equivalents of DataInput.readUTF()and DataOutput.writeUTF(String).

String对象在运行时可能有更多的字符。但是,如果要使用String带有DataInputDataOutput接口的对象,最好避免使用太长的String对象。我发现这个限制时我实现的目标C当量DataInput.readUTF()DataOutput.writeUTF(String)

回答by Shanmugavel

The Return type of the length() method of the String class is int.

String 类的 length() 方法的返回类型是int

public int length()

公共整数长度()

Refer http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length()

参考http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length()

So the maximum value of int is 2147483647.

所以 int 的最大值是2147483647

String is considered as char array internally,So indexing is done within the maximum range. This means we cannot index the 2147483648th member.So the maximum length of String in java is 2147483647.

String在内部被认为是char数组,所以索引是在最大范围内完成的。这意味着我们无法索引第 2147483648 个成员。因此 java 中 String 的最大长度为 2147483647。

Primitive data type int is 4 bytes(32 bits) in java.As 1 bit (MSB) is used as a sign bit,The range is constrained within -2^31 to 2^31-1(-2147483648 to 2147483647). We cannot use negative values for indexing.So obviously the range we can use is from 0 to 2147483647.

java中的原始数据类型int是4个字节(32位)。由于1位(MSB)用作符号位,范围被限制在-2^31到2^31-1(-2147483648到2147483647)内。我们不能为索引使用负值。所以显然我们可以使用的范围是从 0 到 2147483647。

回答by dantiston

I have a 2010 iMac with 8GB of RAM, running Eclipse Neon.2 Release (4.6.2) with Java 1.8.0_25. With the VM argument -Xmx6g, I ran the following code:

我有一台带有 8GB RAM 的 2010 iMac,运行 Eclipse Neon.2 Release (4.6.2) 和 Java 1.8.0_25。使用 VM 参数 -Xmx6g,我运行了以下代码:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
    try {
        sb.append('a');
    } catch (Throwable e) {
        System.out.println(i);
        break;
    }
}
System.out.println(sb.toString().length());

This prints:

这打印:

Requested array size exceeds VM limit
1207959550

So, it seems that the max array size is ~1,207,959,549. Then I realized that we don't actually care if Java runs out of memory: we're just looking for the maximum array size (which seems to be a constant defined somewhere). So:

因此,最大数组大小似乎是 ~1,207,959,549。然后我意识到我们实际上并不关心 Java 是否耗尽内存:我们只是在寻找最大数组大小(这似乎是在某处定义的常量)。所以:

for (int i = 0; i < 1_000; i++) {
    try {
        char[] array = new char[Integer.MAX_VALUE - i];
        Arrays.fill(array, 'a');
        String string = new String(array);
        System.out.println(string.length());
    } catch (Throwable e) {
        System.out.println(e.getMessage());
        System.out.println("Last: " + (Integer.MAX_VALUE - i));
        System.out.println("Last: " + i);
    }
}

Which prints:

哪个打印:

Requested array size exceeds VM limit
Last: 2147483647
Last: 0
Requested array size exceeds VM limit
Last: 2147483646
Last: 1
Java heap space
Last: 2147483645
Last: 2

So, it seems the max is Integer.MAX_VALUE - 2, or (2^31) - 3

因此,最大值似乎是 Integer.MAX_VALUE - 2 或 (2^31) - 3

P.S. I'm not sure why my StringBuildermaxed out at 1207959550while my char[]maxed out at (2^31)-3. It seems that AbstractStringBuilderdoubles the size of its internal char[]to grow it, so that probably causes the issue.

PS我不知道为什么我StringBuilder在刷爆了1207959550,而我char[]在(2 ^ 31)刷爆-3。似乎将AbstractStringBuilder其内部的大小加倍char[]以增长它,因此这可能会导致问题。

回答by DHS

As mentioned in Takahiko Kawasaki's answer, java represents Unicode strings in the form of modified UTF-8and in JVM-Spec CONSTANT_UTF8_info Structure, 2 bytes are allocated to length (and not the no. of characters of String).
To extend the answer, the ASM jvm bytecodelibrary's putUTF8method, contains this:

正如Takahiko Kawasaki 的回答中提到,java 以修改后的 UTF-8的形式表示 Unicode 字符串,在 JVM-Spec CONSTANT_UTF8_info Structure 中,2 个字节被分配给长度(而不是字符串的字符数)。
为了扩展答案,ASM jvm 字节码库的putUTF8方法包含以下内容:

public ByteVector putUTF8(final String stringValue) {
    int charLength = stringValue.length();
    if (charLength > 65535) {   
   // If no. of characters> 65535, than however UTF-8 encoded length, wont fit in 2 bytes.
      throw new IllegalArgumentException("UTF8 string too large");
    }
    for (int i = 0; i < charLength; ++i) {
      char charValue = stringValue.charAt(i);
      if (charValue >= '\u0001' && charValue <= '\u007F') {
        // Unicode code-point encoding in utf-8 fits in 1 byte.
        currentData[currentLength++] = (byte) charValue;
      } else {
        // doesnt fit in 1 byte.
        length = currentLength;
        return encodeUtf8(stringValue, i, 65535);
      }
    }
    ...
}

But when code-point mapping > 1byte, it calls encodeUTF8method:

但是当代码点映射 > 1byte 时,它​​调用encodeUTF8方法:

final ByteVector encodeUtf8(final String stringValue, final int offset, final int maxByteLength /*= 65535 */) {
    int charLength = stringValue.length();
    int byteLength = offset;
    for (int i = offset; i < charLength; ++i) {
      char charValue = stringValue.charAt(i);
      if (charValue >= 0x0001 && charValue <= 0x007F) {
        byteLength++;
      } else if (charValue <= 0x07FF) {
        byteLength += 2;
      } else {
        byteLength += 3;
      }
    }
   ...
}

In this sense, the max string length is 65535 bytes, i.e the utf-8 encoding length. and not charcount
You can find the modified-Unicode code-point range of JVM, from the above utf8 struct link.

从这个意义上说,最大字符串长度为 65535 字节,即 utf-8 编码长度。而char不算数
您可以从上面的 utf8 结构链接中找到 JVM 的修改后的 Unicode 代码点范围。