java內存泄漏
Don't we all remember the days when we programmed C or C++? You had to use new and delete to explicitly create and remove objects. Sometimes you even had to malloc() an amount of memory. With all these constructs you had to take special care that you cleaned up afterwards, else you were leaking memory.
Now however, in the days of Java, most people aren't that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all normal cases. But sometimes, the Garbage Collector can't clean up, because you still have a reference, even though you didn't know that.
I stumbled across this small program while reading JavaPedia, which clearly shows that Java is also capable of inadvertent memory leaks.
private String large = new String(new char[100000]);
public String getSubString() {
return this.large.substring(0,2);
}
public static void main(String[] args) {
ArrayList<String> subStrings = new ArrayList<String>();
for (int i = 0; i <1000000; i++) {
TestGC testGC = new TestGC();
subStrings.add(testGC.getSubString());
}
}
}
Now, if you run this, you'll see that it crashes with something like the following stacktrace:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.String.
at TestGC.
at TestGC.main(TestGC.java:13)
Why does this happen? We should only be storing 1,000,000 Strings of length 2 right? That would amount to about 40Mb, which should fit in the PermGen space easily. So what happened here? Let's have a look at the substring method in the String class.
// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
public String substring(int beginIndex, int endIndex) {
if (beginIndex <0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex> count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex> endIndex) {
throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
We see that the substring call creates a new String using the given package protected constructor. And the one liner comment immediately shows what the problem is. The character array is shared with the large string. So instead of storing very small substrings, we were storing the large string every time, but with a different offset and length.
This problem extends to other operations, like String.split() and . The problem can be easily avoided by adapting the program as follows:
private String large = new String(new char[100000]);
public String getSubString() {
return new String(this.large.substring(0,2)); // <-- fixes leak!
}
public static void main(String[] args) {
ArrayList<String> subStrings = new ArrayList<String>();
for (int i = 0; i <1000000; i++) {
TestGC testGC = new TestGC();
subStrings.add(testGC.getSubString());
}
}
}
I have many times heard, and also shared this opinion that the String copy constructor is useless and causes problems with not interning Strings. But in this case, it seems to have a right of existence, as it effectively trims the character array, and keeps us from keeping a reference to the very large String.
Filed under: Java, Performance
October 4th, 2007 at 12:44 pm
[…] Devlib wrote an interesting post today!.Here’s a quick excerptNow however, in the days of Java, most people aren’t that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all … […]
October 4th, 2007 at 2:04 pm
Hi There,
Thanks for the insightful article! I found this quite useful - especially in understanding why Java OutOfMemory’s work…
Sherif
October 5th, 2007 at 1:30 am
Well, that’s not a memory leak. See:
http://en.wikipedia.org/wiki/Memory_leak
The behavior is intentional - it trades memory for performance. As most things in the standard library (eg collections) it’s optimized for general usage and, well, generally it’s alright. But you certainly shouldn’t tokenize a really big string this way.
The classic type of memory leaks doesn’t exist in managed languages. The only thing we can produce are so called reference leaks. That is… referencing stuff (and thus preventing em from being GCed) for longer as necessary (or for all eternity).
Fortunately it’s easy to avoid - for the most part.
The important things to know:
Locally defined objects can be GCed as soon as there are no more no more references to it. Typically it’s the end of the block they are defined in (if you don’t store the reference anywhere). If you do store references, be sure to remove em if you don’t need em anymore.
If you overwrite a reference with a new object, the object is first created and /then/ the reference is overwritten, which means the object can be only GCed /after/ the new object has been created.
Usually this doesn’t matter. However, if you want to overwrite an object which is so big that it only fits once into the memory, you’ll need to null the reference before creating/assigning the new instance.
Eg:
//FatObject fits only once into memory
FatObject fatty;
fatty=new FatObject();
fatty=new FatObject();
Will bomb with OOME. Whereas…
FatObject fatty;
fatty=new FatObject();
fatty=null;
fatty=new FatObject();
Will be fine, because the second creation of the FatObject will trigger a full GC and the GC will be able to clear enough memory (since the old reference has been nulled).
Well, that rarely matters, but it’s good to know.
October 5th, 2007 at 10:33 pm
[…] Jos Hirth wrote this in response to this post by Jeroen van Erp. […]
October 6th, 2007 at 2:45 am
[…] Xebia Blog Leaking Memory in Java (tags: java memoryleak programming jvm) […]
October 9th, 2007 at 11:07 pm
I don’t know which version of the JVM you are sunning but when it constructs a new string using this constructor:
String(char value[], int offset, int count)
It sets the value using this:
this.value = Arrays.copyOfRange(value, offset, offset+count);
October 10th, 2007 at 1:48 am
To be more obvious, with the underlying big char array being referenced, all the TestGC objects created in the big for-loop could not be GCed. that’s the problem.
Thanks
October 10th, 2007 at 8:36 am
James,
True for String(char[] value, int offset, int count), but not for String(int offset, int count, char[] value). The constructor you mention is a public constructor. The constructor that is called from the substring method is a package private constructor.