试图理解 Java String 实现

Question

提问by Frank

I'm looking at the openjdk implementation of String and the private, per instance members look like:

我正在查看 String 的 openjdk 实现和私有的每个实例成员如下所示：

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence
{
    /** The value is used for character storage. */
    private final char value[];

    /** The offset is the first index of the storage that is used. */
    private final int offset;

    /** The count is the number of characters in the String. */
    private final int count;

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    [...]
}

But I know that Java uses reference and pools for Strings, to avoid duplication. I was naively expecting a pimpl idiom, where String would in fact be just a ref to an impl. I'm not seeing that so far. Can someone explain how Java will know to use references if I put a String x; member in one of my classes?

但我知道 Java 对字符串使用引用和池，以避免重复。我天真地期待一个 pimpl 成语，其中 String 实际上只是一个 impl 的引用。到目前为止我还没有看到。有人能解释一下如果我放一个字符串 x，Java 将如何知道使用引用吗？我的一个班级的成员？

Addendum: this is probably wrong, but if I'm in 32 bits mode, should I count: 4 bytes for the reference "value[]", 4 bytes for offset, 4 for count and 4 for hash for everything instance of class String? That would mean that writing "String x;" in one of my class automatically adds at least 32 bytes to the "weight" of my class (I'm probably wrong here).

附录：这可能是错误的，但如果我处于 32 位模式，我应该计算：4 个字节用于引用“value[]”，4 个字节用于偏移量，4 个字节用于计数，4 个用于哈希类 String 的所有实例? 那意味着写“String x;” 在我的一堂课中，我的班级的“权重”自动增加了至少 32 个字节（我可能在这里错了）。

Answer 1

回答by yshavit

The offset/count fields are somewhat orthogonal to the pooling/intern()issues. Offset and count come when you have something like:

偏移/计数字段与池化/intern()问题有些正交。当你有类似的东西时，偏移和计数就会出现：

String substring = myString.substring(5);

One way to implement this method would be something like:

实现此方法的一种方法是：

allocate a new char[]with myString.length() - 5elements
copy all of the elements from index index 5 to myString.length()from myString to the new char[]
substringis constructed with this new char[]
- substring.charAt(i)goes directly to chars[i]
- substring.length()goes directly to chars.length

分配一个新char[]的myString.length() - 5元素
将索引索引 5 中的所有元素myString.length()从 myString复制到新的char[]
substring是用这个新的 char[]
- substring.charAt(i)直接去 chars[i]
- substring.length()直接去 chars.length

As you san see, this approach is O(N) -- where N is the new string's length -- and requires two allocations: the new String, and the new char[]. So instead, substringworks by resusing the original char[] but with an offset:

如您所见，这种方法是 O(N)——其中 N 是新字符串的长度——并且需要两次分配：新字符串和新字符 []。因此，相反，substring通过重用原始 char[] 但具有偏移量来工作：

substring.offset= myString.offset + newOffset
substring.count= myString.count - newOffset
use myString.charsas the chars array for substring
- substring.charAt(i)goes to chars[i+substring.offset]
- substring.length()goes to substring.count

substring.offset= myString.offset + newOffset
substring.count= myString.count - newOffset
使用myString.chars作为字符阵列为substring
- substring.charAt(i)去 chars[i+substring.offset]
- substring.length()去 substring.count

Note that we didn't need to create a new char[], and more importantly, we didn't need to copy the chars from the old char[] to the new one (since there is no new one). So this operation is just O(1) and requires only one allocation, that of the new String.

请注意，我们不需要创建新的 char[]，更重要的是，我们不需要将旧的 char[] 中的字符复制到新的（因为没有新的）。所以这个操作只是 O(1) 并且只需要一个分配，新字符串的分配。

Answer 2

回答by Marko Topolnik

Java alwaysuses references to any object. There's no way to make it not use references. As for string pooling, that is achieved by the compiler for string literals and at runtime by calling String.intern. It is natural that most of the implementation of Stringis oblivious to whether it is dealing with an instance referred to by the constant pool or not.

Java总是使用对任何对象的引用。没有办法让它不使用引用。至于字符串池，这是由字符串文字的编译器在运行时通过调用String.intern. 很自然，的大部分实现String都忽略了它是否正在处理常量池引用的实例。

Answer 3

回答by Maarten Bodewes

Java Strings are immutable. This means that the implementation can do a whole lot of things to the internal representation, without breaking any application code.

Java 字符串是不可变的。这意味着实现可以对内部表示做很多事情，而不会破坏任何应用程序代码。

Note that the Java String.intern()has been defined to be native in the JDK implementation of Oracle. Native code has access to all fields of an object and may change the reference under water. So all that the implementors have to do is to change the reference and offset to a location where the string is interned and voila. Of course this breaks the immutability of the class, so this means that the intern() update better be thread safe.

请注意，JavaString.intern()已在 Oracle 的 JDK 实现中被定义为原生的。本机代码可以访问对象的所有字段，并且可以在水下更改引用。因此，实现者所要做的就是将引用和偏移量更改为字符串所在的位置，瞧。当然，这打破了类的不变性，所以这意味着 intern() 更新最好是线程安全的。

You could check what happens to the fields when you call intern()on a newly generated String. If nothing happens, it might be that the reference itself contains the memory location instead. The Java language specification does not define how references are implemented.

当您调用intern()新生成的字符串时，您可以检查字段会发生什么。如果没有任何反应，则可能是引用本身包含内存位置。Java 语言规范没有定义引用是如何实现的。

Answer 4

回答by olive_tree

The accepted answer and other answers are outdated. After the Java 7 update 6, strings in Java no longer use offsets and are not tuned for substring optimization. Instead, every substring creates a new copy of the string.

接受的答案和其他答案已过时。在 Java 7 更新 6 之后，Java 中的字符串不再使用偏移量并且未针对子字符串优化进行调整。相反，每个子字符串都会创建该字符串的新副本。

If you wanted to use the original string implementation, you'd have to use CharSequence.

如果您想使用原始字符串实现，则必须使用 CharSequence。

For more information: https://jaxenter.com/the-state-of-string-in-java-107508.html

更多信息：https: //jaxenter.com/the-state-of-string-in-java-107508.html

试图理解 Java String 实现

提问by Frank

回答by yshavit

回答by Marko Topolnik

回答by Maarten Bodewes

回答by olive_tree

相关推荐

最近更新

标签

试图理解 Java String 实现

提问by Frank

回答by yshavit

回答by Marko Topolnik

回答by Maarten Bodewes

回答by olive_tree

相关推荐

java 骆驼：我如何异步发送到端点

java 当端口存在时获取 javax.comm.NoSuchPortException

java 无需表单登录的 Spring Security

java System.out 在 servlet 中写入的位置？

相关推荐

最近更新

标签