Tuesday, September 30, 2008

Beware of the subs

No, I don't mean these cute yellow things beneath the waves. I am going to talk about List#subList and String#substring methods in Java.

Apparently many people are unaware of what exactly these methods do. Unfortunately their ignorance may lead to unpleasant consequences. So getting straight to the point: both methods do not copy a portion of the original data, instead they create a view, or, in other words, a proxy to it. The important thing is that the new wrapper object holds a strong reference to the original object.

The Javadoc of subList at least admits that it's a view, as for substring, the only way to find out is by looking at the source - the method redirects to this constructor:

//Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
So what's the problem? Let's look at the following snippet from a real code-base:
List leaky = ...; //long list of big & hairy objects
leaky = leaky.subList(from, to);
Assuming that leaky wasn't referenced anywhere else in the code, there is no way for the programmer to access the elements that lie beyond the (from,to) range. But these bytes aint going to rehab, no, no, no - as far as JVM is concerned they are still strongly referenced. So if you really mean to extract a portion of a list (or string), and throw the rest away - copy it manually to a new list (or string). For example:
List sneaky = ...;
sneaky = new ArrayList(sneaky.subList(from, to));
Is there any better way a "sub" could be implemented? Well, maybe the reference to the original data could be kept weak, and only when (if) the original object is enqueued for garbage collection then the data could be copied into the "view". This would require backwards references from "original" object to "views", which would also need to be weak, so... overall this doesn't seem worth the effort, and hence avoiding leaky lists and strings shall remain the responsibility of the programmer.

Speaking of subList, another nasty thing about it is that sub-list is not Serializable, nor Cloneable or anything like that, even if the original list was. And speaking of leaky things that are caused by undercover strong references - never forget non-static inner classes that refer to their enclosing instance.

Take care, and keep your head above the water :-)