问题描述
我想问一个关于在 Java 中避免字符串重复的问题.
I want to ask a question about avoiding String duplicates in Java.
context 是:一个带有如下标签和属性的 XML:
The context is: an XML with tags and attributes like this one:
<product id="PROD" name="My Product"...></product>
使用 JibX,此 XML 在这样的类中编组/解组:
With JibX, this XML is marshalled/unmarshalled in a class like this:
public class Product{
private String id;
private String name;
// constructor, getters, setters, methods and so on
}
程序是一个长时间的批处理,所以Product对象被创建、使用、复制等.
The program is a long-time batch processing, so Product objects are created, used, copied, etc.
嗯,问题是:当我使用 Eclipse 内存分析器 (MAT) 之类的软件分析执行时,我发现了几个重复的字符串.例如,在 id 属性中,PROD 值在 2000 个实例左右重复,等等.
Well, the question is: When I analysed the execution with software like Eclipse memory analyzer (MAT), I found several duplicated Strings. For example, in the id attribute, the PROD value is duplicated around 2000 instances, etc.
如何避免这种情况?Product 类中的其他属性可能会在执行过程中改变它们的值,但像 id、name 等属性不会如此频繁地改变.
How can I avoid this situation? Other attributes in Product class may change their value along the execution, but attrs like id, name... don't change so frequently.
我已经阅读了一些关于 String.intern() 方法的内容,但我还没有使用过,我不确定它是否可以解决这个问题.我可以在类中的 static final 常量等属性中定义最常见的值吗?
I have readed something about String.intern() method, but I haven't used yet and I'm not sure it's a solution for this. Could I define the most frequent values in those attributes like static final constants in the class?
我希望我能以正确的方式表达我的问题.非常感谢任何帮助或建议.提前致谢.
I hope I'd have expressed my question in a right way. Any help or advice is very appreciated. Thanks in advance.
推荐答案
interning 将是正确的解决方案,如果你真的有问题.Java 将字符串字面量和许多其他字符串存储在一个内部池中,每当 将要创建一个新字符串时,JVM 首先检查该字符串是否已经在池中.如果是,它不会创建新实例,而是将引用传递给 interned String 对象.
interning would be the right solution, if you really have a problem. Java stores String literals and a lot of other Strings in an internal pool and whenever a new String is about to be created, the JVM first checks, if the String is already in the pool. If yes, it will not create a new instance but pass the reference to the interned String object.
有两种方法可以控制这种行为:
There are two ways to control this behaviour:
String interned = String.intern(aString); // returns a reference to an interned String
String notInterned = new String(aString); // creates a new String instance (guaranteed)
所以也许,这些库确实为所有 xml 属性值创建了新实例.这是可能的,您将无法更改它.
So maybe, the libraries really create new instances for all xml attribute values. This is possible and you won't be able to change it.
实习生具有全球影响力.一个实习字符串可以立即用于任何对象"(这个视图实际上没有意义,但它可能有助于理解它).
intern has a global effect. An interned String is immediatly available "for any object" (this view doesn't really make sense, but it may help to understand it).
所以,假设我们在类 Foo
中有一行,方法 foolish
:
So, lets say we have a line in class Foo
, method foolish
:
String s = "ABCD";
字符串文字立即被实习.JVM 检查ABCD"是否已经在池中,如果没有,则ABCD"存储在池中.JVM 将对实习字符串的引用分配给 s
.
String literals are interned immediatly. JVM checks, if "ABCD" is already in the pool, if not, "ABCD" is stored in the pool. The JVM assigns a reference to the interned String to s
.
现在,也许在另一个类 Bar
中,在方法 barbar
中:
Now, maybe in another class Bar
, in method barbar
:
String t = "AB"+"CD";
然后JVM会像上面一样实习AB"和CD",创建连接的String,看,如果它已经被intered,嘿,是的,并将对interned StringABCD"的引用分配给<代码>t.
Then the JVM will intern "AB" and "CD" like above, create the concatenated String, look, if it is intered already, Hey, yes it is, and assign the reference to the interned String "ABCD" to t
.
调用 "PROD".intern()
可能会起作用,也可能会失败.是的,它将实习字符串PROD"
.但是有一个机会,jibx 确实为属性值创建了新的字符串
Calling "PROD".intern()
may work or fail. Yes, it will intern the String "PROD"
. But there's a chance, that jibx really creates new Strings for attribute values with
String value = new String(getAttributeValue(attribute));
在这种情况下,value 不会引用一个实习字符串(即使 "PROD"
在池中),而是引用一个新的 String 实例在堆上.
In that case, value will not have a reference to an interned String (even if "PROD"
is in the pool) but a reference to a new String instance on the heap.
而且,对于您命令中的另一个问题:这仅在运行时发生.编译只是创建类文件,字符串池是对象堆上的数据结构,由 JVM 使用,执行应用程序.
And, to the other question in your command: this happens at runtime only. Compiling simply creates class files, the String pool is a datastructure on the object heap and that is used by the JVM, that executes the application.
这篇关于避免Java中的重复字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!