String.substring 在 Java 中究竟做了什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10830004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 02:45:47  来源:igfitidea点击:

What does String.substring exactly do in Java?

javastring

提问by Hymanson Tale

I always thought if I do String s = "Hello World".substring(0, 5), then I just get a new string s = "Hello". This is also documented in the Java API doc: "Returns a new string that is a substring of this string".

我一直认为如果我这样做了String s = "Hello World".substring(0, 5),那么我只是得到一个新的字符串s = "Hello"。这也记录在 Java API 文档中:"Returns a new string that is a substring of this string"

But when I saw the following two links, I began to doubt.

但是当我看到以下两个链接时,我开始怀疑了。

What is the purpose of the expression "new String(...)" in Java?

Java 中表达式“new String(...)”的目的是什么?

String constructor considered useless turns out to be useful after all

被认为无用的字符串构造函数终究是有用的

Basically, they say if I use String s = "Hello World".subString(0, 5), I still get a String which holds "Hello World"'s char array.

基本上,他们说如果我使用String s = "Hello World".subString(0, 5),我仍然会得到一个包含“Hello World”字符数组的字符串。

Why? Does Java really implement substring in this way? Why in this way? Why not just return a brand new shorter substring?

为什么?Java真的是这样实现子字符串的吗?为什么会这样?为什么不只返回一个全新的较短的子字符串?

采纳答案by Brian Agnew

It's supposed to be an efficiency measure. i.e. when you're taking a substring you won't create a new char array, but merely create a window onto the existing char array.

这应该是一种效率措施。即,当您获取一个子字符串时,您不会创建一个新的 char 数组,而只是在现有的 char 数组上创建一个窗口。

Is this worthwhile ? Maybe. The downside is that it causes some confusion (e.g. see this SO question), plus each Stringobject needs to carry the offset info into the array, even if it's not used.

这值得吗?或许。缺点是它会引起一些混乱(例如,请参阅此 SO question),而且每个String对象都需要将偏移信息携带到数组中,即使它没有被使用。

EDIT: This behaviour has now changed as of Java 7. See the linked answer for more info

编辑:从 Java 7 开始,此行为现已更改。有关更多信息,请参阅链接的答案

回答by Sean Owen

Turning it around, why allocate a new char[]when it is not necessary? This is a valid implementation since Stringis immutable. It saves allocations and memory in the aggregate.

反过来,为什么char[]在不需要的时候分配一个新的?这是一个有效的实现,因为它String是不可变的。它在聚合中节省了分配和内存。

回答by assylias

Does Java really implement subString in this way

Java真的是这样实现subString的吗

Looking at the code (JDK 7) (which I have simplified), yes:

查看代码(JDK 7)(我已经简化了),是的:

public String substring(int beginIndex, int endIndex) {
    .......
    return new String(offset + beginIndex, endIndex - beginIndex, value);
}

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}

Why in this way? Why not just return a brand new shorter substring?

为什么会这样?为什么不只返回一个全新的较短的子字符串?

the comment seems to imply that speed was the reason

该评论似乎暗示速度是原因

回答by Mark Rotteveel

Although it used to be true that a Stringcreated with subString()had the same backing char[](presumably to save space and time of copying), that is no longer true since Java 7 Update 6, as this sharing of char[]had its memory overhead. This overhead especially existed if (large) Strings are loaded, a small substring is taken and the large string is discarded. If the small string is kept for a long time this can lead to significant unneeded memory use.

尽管过去确实String创建了subString()具有相同的支持char[](大概是为了节省复制的空间和时间),但自 Java 7 Update 6 以来不再如此,因为这种共享char[]有其内存开销。如果加载(大)字符串,采用小子字符串并丢弃大字符串,则这种开销尤其存在。如果小字符串被保留很长时间,这可能会导致大量不必要的内存使用。

In any case, in the current version (Java 7 Update 21), subString()calls the constructor String(char value[], int offset, int count)with the char[]of the original string, the constructor then makes a copy of the specified range from the char array:

在任何情况下,在当前版本(Java 7 Update 21)中,使用原始字符串的subString()调用构造函数,然后构造函数从 char 数组中复制指定范围:String(char value[], int offset, int count)char[]

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
        throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

回答by mlucas67

Keeping in mind that strings are immutable, and that they take up memory, envision doing several substring operations on a string if each one created a new string! Instead, just create a new String object that points to the same immutable string but has different offset and count properties. Now, no matter how many substrings you do against that original string or substrings of that string there's only one copy of the string itself in memory. Much more efficient.

请记住,字符串是不可变的,并且它们会占用内存,如果每个子字符串操作都创建了一个新字符串,请设想对一个字符串执行多个子字符串操作!相反,只需创建一个新的 String 对象,该对象指向相同的不可变字符串,但具有不同的偏移和计数属性。现在,无论您对原始字符串或该字符串的子字符串执行多少个子字符串,内存中都只有该字符串本身的一个副本。效率更高。

Also, when doing String s = "Hello, World".substring(0,5);think about the order of operations. First the string "Hello, World" will be created on the heap and a brand new String object will point at it. Then the substring method will be called on the new String object and another new String object created and pointed at by the sinstance. So, therefore, spoints at the string on the heap "Hello, World" and has an offsetof 0 and a countof 5.

另外,在做的时候要String s = "Hello, World".substring(0,5);考虑操作的顺序。首先将在堆上创建字符串“Hello, World”,一个全新的 String 对象将指向它。然后将在新的 String 对象和s实例创建并指向的另一个新的 String 对象上调用 substring 方法。因此,因此,s指向堆上的字符串“Hello, World”,其offset值为 0,count值为 5。

回答by Saurabh

Because String is anyway immutable. So creating a new object altogether does not make much sense

因为 String 无论如何都是不可变的。所以完全创建一个新对象没有多大意义