java 避免Java中的重复字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5076099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 09:18:31  来源:igfitidea点击:

Avoid duplicate Strings in Java

javastringduplicates

提问by yaki_nuka

I want to ask a question about avoiding String duplicates in Java.

我想问一个关于在 Java 中避免字符串重复的问题。

The contextis: an XML with tags and attributes like this one:

情境是:以标签和属性像这样的一个XML:

<product id="PROD" name="My Product"...></product>

With JibX, this XML is marshalled/unmarshalled in a class like this:

使用 JibX,这个 XML 在这样的类中被编组/解组:

public class Product{
private String id;
private String name;
// constructor, getters, setters, methods  and so on
}

The program is a long-time batch processing, so Product objects are created, used, copied, etc.

该程序是一个长时间的批处理,因此创建、使用、复制 Product 对象等。

Well, the questionis: When I analysed the execution with software like Eclipse memory analyzer (MAT), I found several duplicated Strings. For example, in the id attribute, the PRODvalue is duplicated around 2000 instances, etc.

好吧,问题是:当我使用Eclipse 内存分析器 (MAT)等软件分析执行时,我发现了几个重复的字符串。例如,在 id 属性中,PROD值重复了大约 2000 个实例等。

How can I avoid this situation? Other attributes in Product class may change their value along the execution, but attrs like id, name... don't change so frequently.

我怎样才能避免这种情况?Product 类中的其他属性可能会在执行过程中更改它们的值,但诸如idname 之类的属性不会如此频繁地更改。

I have readed something about String.intern()method, but I haven't used yet and I'm not sure it's a solution for this. Could I define the most frequent values in those attributes like static finalconstants in the class?

我已经阅读了一些关于String.intern()方法的内容,但我还没有使用过,我不确定这是一个解决方案。我可以在这些属性中定义最常见的值,比如类中的静态最终常量吗?

I hope I'd have expressed my question in a right way. Any help or advice is very appreciated. Thanks in advance.

我希望我能以正确的方式表达我的问题。非常感谢任何帮助或建议。提前致谢。

回答by Andreas Dolk

interningwould be the right solution, if you really have a problem. Java stores String literals and a lot of other Strings in an internal pool and whenever a new String is about to becreated, the JVM first checks, if the String is already in the pool. If yes, it will not create a new instance but pass the reference to the internedString object.

如果你真的有问题,实习将是正确的解决方案。Java 将字符串文字和许多其他字符串存储在内部池中,每当创建新字符串时,JVM 首先检查该字符串是否已经在池中。如果是,则不会创建新实例,而是将引用传递给内部String 对象。

There are two ways to control this behaviour:

有两种方法可以控制这种行为:

String interned = String.intern(aString); // returns a reference to an interned String
String notInterned = new String(aString); // creates a new String instance (guaranteed)

So maybe, the libraries really create new instances for all xml attribute values. This is possible and you won't be able to change it.

所以也许,这些库真的为所有 xml 属性值创建了新实例。这是可能的,您将无法更改它。



internhas a global effect. An interned String is immediatly available "for any object" (this view doesn't really make sense, but it may help to understand it).

实习生具有全球影响。“任何对象”都可以立即使用实习字符串(此视图实际上没有意义,但可能有助于理解它)。

So, lets say we have a line in class Foo, method foolish:

所以,假设我们在 class Foo, method 中有一行foolish

String s = "ABCD";

String literals are interned immediatly. JVM checks, if "ABCD" is already in the pool, if not, "ABCD" is stored in the pool. The JVM assigns a reference to the interned String to s.

字符串字面量被立即插入。JVM 检查“ABCD”是否已经在池中,如果没有,则“ABCD”存储在池中。JVM 将对内部字符串的引用分配给s.

Now, maybe in another class Bar, in method barbar:

现在,也许在另一个类中Bar,在方法中barbar

String t = "AB"+"CD";

Then the JVM will intern "AB" and "CD" like above, create the concatenated String, look, if it is intered already, Hey, yes it is, and assign the reference to the interned String "ABCD" to t.

然后JVM将像上面一样实习“AB”和“CD”,创建连接的字符串,看,如果它已经被插入,嘿,是的,并将对实习字符串“ABCD”的引用分配给t



Calling "PROD".intern()may work or fail. Yes, it willintern the String "PROD". But there's a chance, that jibx really creates new Strings for attribute values with

调用"PROD".intern()可能有效也可能失败。是的,它实习生字符串"PROD"。但是有一个机会,jibx 真的为属性值创建了新的字符串

String value = new String(getAttributeValue(attribute));

In that case, valuewill not have a reference to an interned String (even if "PROD"is in the pool) but a reference to a new String instance on the heap.

在这种情况下,value将没有对内部字符串的引用(即使"PROD"在池中),而是对堆上新 String 实例的引用。

And, to the other question in your command: this happens at runtime only. Compiling simply creates class files, the String pool is a datastructure on the object heap and that is used by the JVM, that executes the application.

而且,对于您命令中的另一个问题:这仅在运行时发生。编译只是创建类文件,字符串池是对象堆上的一个数据结构,由 JVM 使用,用于执行应用程序。

回答by Joachim Sauer

While String.intern()could solve that problem by reducing each value to a single unique Stringinstance, it would introduce another problem: every intern()-ed Stringcan survive for a long time in the JVM. If the IDs vary a lot (i.e. they are not part of a limited set, but can be any value), then this can have massive negative effects in the long run.

虽然String.intern()可以通过将每个值减少为单个唯一String实例来解决该问题,但它会引入另一个问题:每个intern()-edString都可以在 JVM 中存活很长时间。如果 ID 变化很大(即它们不是有限集合的一部分,但可以是任何值),那么从长远来看,这可能会产生巨大的负面影响。

Edit: I used to claim that intern()-ed Strings can't ever be GCed, but @nanda proved me wrong with this JavaWorld article. While this somewhat reduces the problem introduced by intern()it's still not entirely removed: the pool provided by intern()can't be controlled and can have unexpected results with regards to garbage-collection).

编辑:我曾经声称intern()-ed 字符串永远不能被 GCed,但 @nanda 证明我在这篇 JavaWorld 文章中是错误的。虽然这在一定程度上减少了intern()它引入的问题,但仍然没有完全消除:提供的池intern()无法控制,并且可能会在垃圾收集方面产生意外结果)。

Luckily Guavaprovides a solution in the form of the Internerinterface and it's helper class Interners: Using Interners.newStrongInterner()you can create an object that can act as a "pool" of unique Stringobjects much in the same way as String.intern()does, except that the pool is bound to that instance and if you discard the pool, then the content can become eligible for garbage collection as well.

幸运的是GuavaInterner接口的形式提供了一个解决方案,它是一个帮助类Interners:使用Interners.newStrongInterner()你可以创建一个对象,该对象可以作为唯一String对象的“池”,其方式与此类似String.intern(),除了池绑定到该实例如果您丢弃池,则内容也可以进行垃圾收集。

回答by nanda

Yes, interning is the correct solution and you'd done your homework (that is checking with profiler that this is the problem).

是的,实习是正确的解决方案,并且您已经完成了作业(即使用探查器检查这是问题所在)。

Interning can cause problem if you store too much. The permgen memory needs to be increased. Despite what some people said, interned Strings are also garbage collected, so if some strings are not used anymore, it will be object to be garbage collected.

如果您存储太多,实习可能会导致问题。需要增加永久内存。尽管有人说,实习字符串也是垃圾收集的,所以如果不再使用某些字符串,它将成为垃圾收集的对象。

Some supporting articles:

一些支持文章:

  1. My blog: http://blog.firdau.si/2009/01/06/java-tips-memory-optimization-for-string/
  2. Does intern garbage collected?: http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
  3. Busting the 'Busting String.intern() Myths': http://kohlerm.blogspot.com/2009/01/is-javalangstringintern-really-evil.html
  1. 我的博客:http: //blog.firdau.si/2009/01/06/java-tips-memory-optimization-for-string/
  2. 实习生垃圾收集了吗?:http: //www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
  3. 打破“打破 String.intern() 神话”:http: //kohlrm.blogspot.com/2009/01/is-javalangstringintern-really-evil.html

回答by Ani

As everyone know, String objects can be created in two ways, by using the literals and through new operator.

众所周知,String 对象可以通过两种方式创建,一种是使用字面量,一种是通过 new 操作符。

If you use a literal like String test = "Sample";then this will be cached in String object pool. So interning is not required here as by default the string object will be cached.

如果您使用像String test = "Sample";这样的文字,那么这将被缓存在 String 对象池中。所以这里不需要实习,因为默认情况下字符串对象将被缓存。

But if you create a string object like String test = new String("Sample"); then this string object will not be added to the string pool. So here we need to use String test = new String("Sample").intern();to forcefully push the string object to the string cache.

但是如果你创建一个字符串对象,比如 String test = new String("Sample"); 那么这个字符串对象将不会被添加到字符串池中。所以这里我们需要使用String test = new String("Sample").intern();将字符串对象强行推送到字符串缓存中。

So it is always advisable to use string literals than new operator.

因此,始终建议使用字符串文字而不是 new 运算符。

So in your case private static final String id = "PROD"; is the right solution.

所以在你的情况下 private static final String id = "PROD"; 是正确的解决方案。

回答by Lukas Eder

An alternative solution:

另一种解决方案:

You could try is to define an <xs:enumeration/>restriction on your @idattribute (if your domain model would allow such a thing). If JibX is as intelligent as JAXBor other XML-Java mapping standards, then this could be mapped as a Java enumwith constant literals, which can be reused heavily.

您可以尝试定义<xs:enumeration/>对您的@id属性的限制(如果您的域模型允许这样的事情)。如果 JibX 与JAXB或其他 XML-Java 映射标准一样智能,那么它可以映射为enum具有常量文字的 Java ,可以大量重用。

I would try that for the IDvalue, since it kinda looks like an enumeration to me...

我会尝试这样做的ID价值,因为对我来说它有点像枚举......