Tuesday, March 31, 2009

Oh Null, Null

Introduction
Some say it’s one of the worst inventions in history of programming languages, some say it is number one cause of errors in Java programs. Anyway you look at it - null pointer (in all its incarnations) is a sensitive subject. I wonder why there is no song about it, there are several funny re-works of Beatles songs with geeky lyrics, like Unix Man (Nowhere Man, “he’s a real Unix man sitting in his Unix LAN, making all his UNIX .plans for nobody…”), Write In C (Let It Be), Yesterday (“yesterday, all those backups seemed a waste of pay…”), Something ("Something in the way it fails, defies the algorithm logic...") and a not so funny Eleanor Rigby. I’d like to write a new song, “Null” using the “Girl” tune. Only my inspiration ended after a single verse:
If there’s anybody going to listen to my story, 
All about the Null who brought the fall.
It’s a value that you want so much, it makes you sorry,
Still you don’t regret a single call. 
Oh Null, Null…

I also have a middle verse inspired by Smalltalk, where the equivalent of null is nil, a keyword that returns an object of class UndefinedObject. It has no methods, except isNil and notNil that return true and false respectively, so nil responds to any other method invocation with “DoesNotUnderstand: <method name>”, and in my experience, at least, pretty much equivalent to NPE in Java. Interestingly enough, though, Objective-C turned nil into a “black hole” that returns self upon any invocation.

Nil’s a kind of null, she is not answering your calls,
you feel a fool (fool fool).
When you think the code looks good, she answers “it’s not understood”, 
she’s cruel (cruel cruel).
Ah null, null…
From Beatles to Elvis
One situation where nulls are very annoying, is when we have a “chain” of method invocations, such as account.getOwner().getAddress().getStreet(). If any of the methods return null, we get NullPointerException from the next method invocation. I previously suggested a solution to the problem, but now (due to popular demand) there is a proposal for Java 7 language enhancement, called The Elvis Operators.  One of the stronger arguments for this solution is that it is available in other languages like C# and Groovy, even if not everybody is entirely happy with it. Perhaps this is subjective, but after all, programming language design is about human perception and feelings, as much as about computer capabilities. So, I admit: I feel uneasy with this new way of method invocation.

The Demeter Law Suggestion
One of the more controversial guidelines of object-oriented programming, Demeter Law is related to “Tell Don’t Ask” and usually summarized as “only talk to your close friends”. It prohibits series of method invocations like the one described above; in fact, more than one dot in series of method invocations is disallowed (unless the chain originates in a this object’s member (?)). There’s a good description of the “The Law” in Pragmatic Programmers article; it was originally discovered and named by Karl Lieberherr, who created the Demeter Project  and wrote the book named “Adaptive Object-Oriented Software - The Demeter Method”. Martin Fowler demoted the Law to Suggestion, and in general, Demeter managed to generate a lot of controversy and debates inside both Java and Ruby communities, the debates ranging from intellectual (almost academic) polemics, to flamed battles between consulting firms over preferences in programming styles, music and hairdo. Yes, really. Anyway, if you believe in Demeter, our use-case for “null-safe” method invocations is flawed to begin with. But do we believe in Demeter? It took me a while to form an opinion.
1) I do believe in “Tell, Don’t Ask” and the internal iterator (as presented in Pragmatic Programmers article) to be preferable over external iterator (when applicable, of course). 
“Tell Don’t Ask” begs for closures in the language. For those who say “closures are not object-oriented”, Smalltalk not only supports closures, and with minimal syntactic overhead, but basic things like if and while are designed around closures and would not be possible otherwise. 
But "Tell Don't Ask" goes much deeper and wider. For example let's look through Demeter's eyes on service locator vs. injection: locator is wrong, because you go to a factory to get a factory to get a service. Injection is right, because you pass yourself to injector, which sets your the dependencies directly. How about ORM? How much grief is in loading strategies, because we do getContainer().getItem().getProperty(), vs. calling database.execute(query)? Food for thought.

2) Now the paper boy example – this one I do not buy. First of all, we don’t pay for things in cash these days – we give the seller our credit card. Not very secure? Maybe. But much more handy. 
People should generally stop obsessing around security in software, I think. At one point a highly ranked architect in the firm I worked for said “we can’t deploy any software in the browser because it is not secure”. Yes, it is not secure, it’s a problem and people are working on it. And yet, even as it is, is it secure for the jeweler store to put some jewels in the front window? Why not keep them all in the backroom safe?  Because they won’t have buyers, that’s why. What good is all the security, if you go out of business? 

Back to paper boy, the example also does not hold if you are a large organization – the receiver of the service is not the one who authorizes the payment, and not the one who handles the payment. The supplier may wish for “direct communication” as much as he wants, but the rules of corporate procurement are quite different. Then again, it may be silly to attack the example…? Well, the thing is that I hope to show later that you can’t follow Demeter and scale, just like the example doesn’t.

3) Now there’s “hard to mock” argument. Ok, at risk of starting a flame war here, I mean: I love unit-testing just like the next guy, and I know how useful it is, but let’s not get carried away here. 

Unit-testing, just like static types/compilation, or static analysis, is a means to achieve code quality, not the goal in itself. If it helps you – great, if it gets in your way – ditch it. I think the “100% test coverage” is a fallacy. Dijkstra said “tests only prove presence of bugs, never their absence”, which is even truer for unit-tests. (I heard “we have unit-tests, we don’t need QA” argument once or twice. Yeah, right.) 

Anyway, “hard to test” may be a smell, but unless the suspicion can be substantiated – sorry, circumstantial evidence not accepted.

4) So even though I partially agree that “Demeter violation” is a symptom of a problem, the solution proposed by Demeter is, in my opinion, simply absurd. The grand huge fat mega-façade? Dots replaced with underscores? It cannot scale and it is sweeping design problems under the carpet. I mean, give me hierarchies of nested objects that support subsumption, like Newspeak modules – now we are talking. But mega-façades? No thanks.
So there are good things that result from minding Demeter, but not following the law per se. 
a) For example, “don’t return a java.util.List from your API when you don’t want clients adding elements to the list” – that’s basics of any sound API design – return the minimum interface that the client needs, Josh Bloch has been saying this repeatedly. 
b) We should consider to proxy the returned object (perhaps with our own (inner) class), and then we’ll have control over what the returned object’s methods do.
c) Applied to DI – “do not depend on module A just to give you access to another module B, instead depend on B directly”, is also a reasonable rule of thumb. 

But let’s now go back to our beloved nulls.

Power to the method - let the declaration site decide
Demeter or not, methods are meant to do things. I would argue that typical methods should not be assumed to be wrapping field access, or performing trivial computations. Yeah, I know we all do POJOs, which are really POCS, Plain Old  C Structs in disguise.  (And what makes them structs are frameworks that assume a dumb accessor and mutator, and freak out if they are not.) I believe that most methods should do something, so invoking them on a null target should yield an error, not some ambivalent null… So I am not too happy with changing the programming language to accommodate to behavior that should have been atypical.

Now if some methods are “dumb” and have reasonable defaults (like null for property accessor in null object, or 0 for size/length of a null collection/array, or null for a null string upper case transformation), then the method implementer should decide – not the caller. And again, it’s not problem-free, but worth exploring. Somewhat similar behavior has been proposed by Jacek here.

Suppose at the method declaration we add a default clause: either default <value>, or default <expression>, or default <code-block>, with the return type of value/expression/code-block similar to method return type, and parameters and exceptions (in 2 latter cases) similar to the method’s ones. BTW default is an existing Java keyword, used in switch statements. 

Now examples:

public int size() { return this.size; } default 0; 
public boolean isEmpty() { return this.size == 0; } default true
public Person getOwner() { return this.owner; } default null
public Iterator iterator() { return … } default Collections.emptyList().iterator(); 
public Something produce(Param1 p1, Param2 p2) throws E1 {
   //normal method code here
} default {
  //default calculation, similar to static method, can use p1 and p2, throw sub-type of E1
  //needs to return Something

The location of the “default clause” can be after method body (and if we stick to coding convention of keeping default it on the same line as the closing curly brace, I think it is preferable), but it can sit also between end of method parameter list and the throws clause, or between the throws and the method body start. I think tail location is preferable, because it reminds of the switch statement. 
Another variation that increases similarity to switch may be adding a colon (default : ) and requiring return keyword, like default : return null; instead of default null. Also, the support for “defaults” can be added gradually, with only values in the first stage and more later if it proves to be successful. Also, alternative to default may be case null : … .
The method would be compiled now additional synthetic static method, e.g. ___size(), with the same signature and throws clause, the synthetic method returning the “default” body provided by the user. 

Unfortunately, though, call sites are affected by such method definition and compiled to something like:

SomeType object;
if (object != null) {
  object.m()
} else {
  try {
     SomeType.___m();
  } catch (NoSuchMethodError e) {
    throw new NullPointerException();
  }

So the “default” would be calculated based on declaration type of the object, just like any static method invocation. Why catching NoSuchMethod? Well, what if SomeType class is recompiled alone, and the default clause, and with it the ___m() method, are gone? We want NPE to be thrown, not NSME. 

If, on the other hand, we add the default behavior without recompiling clients, they’ll be still throwing NPEs, I think we would not want all method invocations to become complex byte-code like the one described above… If I understand Jacek’s proposal correctly, unlike me, he suggests every method invocation to go through a static method. 
It would be interesting to see to what extent JIT could optimize code like that when it learns that object is not null… could it throw away everything but object.m()? This certainly requires more investigation.
Maybe, babe, boom
Another option would be to turn to Functional Programming for an answer. This is one of the things known in FP world for ages: Maybe in Haskell, or Option in Scala and F#. Here’s a nice description by Debasish Gosh, and if you want to learn about Monads, James Iry’s blog or a Haskell book would be a good place to start. But we don’t have Monads in Java, Stephan Schmidt tried to simulate Maybe with Iterable, and the result is cute, very cute even, but lacks the monadic awesomeness of flatMap, as others have pointed out. Hm, but wait, we do have a monad in Java. That’s exceptions. The naïve translation of Maybe in Java would be:

String street;
try {
  street = account.getOwner().getAddress().getStreet();
} catch (NullPointerException e) {
  street = null;
}

Ok, it’s not so clean and pretty as pattern match, but… well, that’s Java. No, wait, smart and experienced programmers like us – we know that we shouldn’t use exceptions that way, because it is expensive! Hm, and why is it? Because filling in the stack trace, actually, otherwise it’s rather cheap. But we don’t need stack trace here, only… the exception is the standard NPE, we cannot subclass it and override fillInStackTrace(). Bummer.

Using exceptions for control flow is debatable. I am not a proponent of this style, just for the record. I am simply exploring options… what if the JVM “magically” knew not to fill in stack trace when it’s not used? Exceptions would become cheaper and whole new programming style in Java would emerge...? 

Some of the Java 7 language enhancement proposals are to do with exceptions, multi-catch for example, I am all for it BTW. Now what if we add another flavor to catch and introduce “no-instance” catch, such as:

try {
  //…
} catch (NullPointerException) {
  //…
}

Note that there is no e, no exception instance, no way to re-throw or print – clearly no need for a stack trace. What if JVM kept a bit-map of exception types in “no-instance” try/catch blocks on the stack, and threw these exceptions without filling in the stack trace? After all, we are guaranteed that they are caught and no “phantom” exception like this escapes to the user… these exceptions could be pooled too. Then we could say

String street;
try {
  street = account.getOwner().getAddress().getStreet();
} catch (NullPointerException) {
  street = null;
}

And if we had closures, we could even make it prettier. We could create a utility method like

public static <T> T nullsafe({ => T} expression) {
  return nullsafe(expression, null);
}
public static <T> T nullsafe({ => T} expression, T default_value) {
  try {
    return expression.invoke()
  } catch (NullPointerException) {
    return default_value;
  }
}

It would be invoked like this:

String street = nullsafe({ => account.getOwner().getAddress().getStreet() });

In all its glorious maybiness and with no, or little, performance overhead?! :-) But hey, I said I don’t like clients deciding on the value that the method returns… true, which is why I am not entirely happy with any of the proposals. But at least here we don’t change the language for the sake of null alone – the “exceptional” invocation is explicitly made with a catch clause or via a special library call. 

Epilog

All this just for the sake of intellectual exercise, so don't take too seriously. :-)

Update

So why I don't like any of the solutions, not even the ones I proposed? The "default clause" is essentially another static method. It may even work for interfaces, with a slight twist. The real problem is that static cannot be overridden, unlike the method it is "attached to", so the default clause and the actual method body will not correspond, so we basically gained nothing. 

As for the instance-less exception, call me old-fashined, but I think that exceptions are for exceptional things, and this is stretching the hacks around them just one bit too far. 

So I'll stay with plain old null for now, thank you.