试图理解 Java String 实现
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12009483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Trying to understand Java String implementation
提问by Frank
I'm looking at the openjdk implementation of String and the private, per instance members look like:
我正在查看 String 的 openjdk 实现和私有的每个实例成员如下所示:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence
{
/** The value is used for character storage. */
private final char value[];
/** The offset is the first index of the storage that is used. */
private final int offset;
/** The count is the number of characters in the String. */
private final int count;
/** Cache the hash code for the string */
private int hash; // Default to 0
[...]
}
But I know that Java uses reference and pools for Strings, to avoid duplication. I was naively expecting a pimpl idiom, where String would in fact be just a ref to an impl. I'm not seeing that so far. Can someone explain how Java will know to use references if I put a String x; member in one of my classes?
但我知道 Java 对字符串使用引用和池,以避免重复。我天真地期待一个 pimpl 成语,其中 String 实际上只是一个 impl 的引用。到目前为止我还没有看到。有人能解释一下如果我放一个字符串 x,Java 将如何知道使用引用吗?我的一个班级的成员?
Addendum: this is probably wrong, but if I'm in 32 bits mode, should I count: 4 bytes for the reference "value[]", 4 bytes for offset, 4 for count and 4 for hash for everything instance of class String? That would mean that writing "String x;" in one of my class automatically adds at least 32 bytes to the "weight" of my class (I'm probably wrong here).
附录:这可能是错误的,但如果我处于 32 位模式,我应该计算:4 个字节用于引用“value[]”,4 个字节用于偏移量,4 个字节用于计数,4 个用于哈希类 String 的所有实例? 那意味着写“String x;” 在我的一堂课中,我的班级的“权重”自动增加了至少 32 个字节(我可能在这里错了)。
回答by yshavit
The offset/count fields are somewhat orthogonal to the pooling/intern()
issues. Offset and count come when you have something like:
偏移/计数字段与池化/intern()
问题有些正交。当你有类似的东西时,偏移和计数就会出现:
String substring = myString.substring(5);
One way to implement this method would be something like:
实现此方法的一种方法是:
- allocate a new
char[]
withmyString.length() - 5
elements - copy all of the elements from index index 5 to
myString.length()
from myString to the newchar[]
substring
is constructed with this newchar[]
substring.charAt(i)
goes directly tochars[i]
substring.length()
goes directly tochars.length
- 分配一个新
char[]
的myString.length() - 5
元素 - 将索引索引 5 中的所有元素
myString.length()
从 myString复制到新的char[]
substring
是用这个新的char[]
substring.charAt(i)
直接去chars[i]
substring.length()
直接去chars.length
As you san see, this approach is O(N) -- where N is the new string's length -- and requires two allocations: the new String, and the new char[]. So instead, substring
works by resusing the original char[] but with an offset:
如您所见,这种方法是 O(N)——其中 N 是新字符串的长度——并且需要两次分配:新字符串和新字符 []。因此,相反,substring
通过重用原始 char[] 但具有偏移量来工作:
substring.offset
=myString.offset + newOffset
substring.count
=myString.count - newOffset
- use
myString.chars
as the chars array forsubstring
substring.charAt(i)
goes tochars[i+substring.offset]
substring.length()
goes tosubstring.count
substring.offset
=myString.offset + newOffset
substring.count
=myString.count - newOffset
- 使用
myString.chars
作为字符阵列为substring
substring.charAt(i)
去chars[i+substring.offset]
substring.length()
去substring.count
Note that we didn't need to create a new char[], and more importantly, we didn't need to copy the chars from the old char[] to the new one (since there is no new one). So this operation is just O(1) and requires only one allocation, that of the new String.
请注意,我们不需要创建新的 char[],更重要的是,我们不需要将旧的 char[] 中的字符复制到新的(因为没有新的)。所以这个操作只是 O(1) 并且只需要一个分配,新字符串的分配。
回答by Marko Topolnik
Java alwaysuses references to any object. There's no way to make it not use references. As for string pooling, that is achieved by the compiler for string literals and at runtime by calling String.intern
. It is natural that most of the implementation of String
is oblivious to whether it is dealing with an instance referred to by the constant pool or not.
Java总是使用对任何对象的引用。没有办法让它不使用引用。至于字符串池,这是由字符串文字的编译器在运行时通过调用String.intern
. 很自然, 的大部分实现String
都忽略了它是否正在处理常量池引用的实例。
回答by Maarten Bodewes
Java Strings are immutable. This means that the implementation can do a whole lot of things to the internal representation, without breaking any application code.
Java 字符串是不可变的。这意味着实现可以对内部表示做很多事情,而不会破坏任何应用程序代码。
Note that the Java String.intern()
has been defined to be native in the JDK implementation of Oracle. Native code has access to all fields of an object and may change the reference under water. So all that the implementors have to do is to change the reference and offset to a location where the string is interned and voila. Of course this breaks the immutability of the class, so this means that the intern() update better be thread safe.
请注意,JavaString.intern()
已在 Oracle 的 JDK 实现中被定义为原生的。本机代码可以访问对象的所有字段,并且可以在水下更改引用。因此,实现者所要做的就是将引用和偏移量更改为字符串所在的位置,瞧。当然,这打破了类的不变性,所以这意味着 intern() 更新最好是线程安全的。
You could check what happens to the fields when you call intern()
on a newly generated String. If nothing happens, it might be that the reference itself contains the memory location instead. The Java language specification does not define how references are implemented.
当您调用intern()
新生成的字符串时,您可以检查字段会发生什么。如果没有任何反应,则可能是引用本身包含内存位置。Java 语言规范没有定义引用是如何实现的。
回答by olive_tree
The accepted answer and other answers are outdated. After the Java 7 update 6, strings in Java no longer use offsets and are not tuned for substring optimization. Instead, every substring creates a new copy of the string.
接受的答案和其他答案已过时。在 Java 7 更新 6 之后,Java 中的字符串不再使用偏移量并且未针对子字符串优化进行调整。相反,每个子字符串都会创建该字符串的新副本。
If you wanted to use the original string implementation, you'd have to use CharSequence.
如果您想使用原始字符串实现,则必须使用 CharSequence。
For more information: https://jaxenter.com/the-state-of-string-in-java-107508.html
更多信息:https: //jaxenter.com/the-state-of-string-in-java-107508.html