Tuesday, March 31, 2009

Oh Null, Null

Introduction
Some say it’s one of the worst inventions in history of programming languages, some say it is number one cause of errors in Java programs. Anyway you look at it - null pointer (in all its incarnations) is a sensitive subject. I wonder why there is no song about it, there are several funny re-works of Beatles songs with geeky lyrics, like Unix Man (Nowhere Man, “he’s a real Unix man sitting in his Unix LAN, making all his UNIX .plans for nobody…”), Write In C (Let It Be), Yesterday (“yesterday, all those backups seemed a waste of pay…”), Something ("Something in the way it fails, defies the algorithm logic...") and a not so funny Eleanor Rigby. I’d like to write a new song, “Null” using the “Girl” tune. Only my inspiration ended after a single verse:
If there’s anybody going to listen to my story, 
All about the Null who brought the fall.
It’s a value that you want so much, it makes you sorry,
Still you don’t regret a single call. 
Oh Null, Null…

I also have a middle verse inspired by Smalltalk, where the equivalent of null is nil, a keyword that returns an object of class UndefinedObject. It has no methods, except isNil and notNil that return true and false respectively, so nil responds to any other method invocation with “DoesNotUnderstand: <method name>”, and in my experience, at least, pretty much equivalent to NPE in Java. Interestingly enough, though, Objective-C turned nil into a “black hole” that returns self upon any invocation.

Nil’s a kind of null, she is not answering your calls,
you feel a fool (fool fool).
When you think the code looks good, she answers “it’s not understood”, 
she’s cruel (cruel cruel).
Ah null, null…
From Beatles to Elvis
One situation where nulls are very annoying, is when we have a “chain” of method invocations, such as account.getOwner().getAddress().getStreet(). If any of the methods return null, we get NullPointerException from the next method invocation. I previously suggested a solution to the problem, but now (due to popular demand) there is a proposal for Java 7 language enhancement, called The Elvis Operators.  One of the stronger arguments for this solution is that it is available in other languages like C# and Groovy, even if not everybody is entirely happy with it. Perhaps this is subjective, but after all, programming language design is about human perception and feelings, as much as about computer capabilities. So, I admit: I feel uneasy with this new way of method invocation.

The Demeter Law Suggestion
One of the more controversial guidelines of object-oriented programming, Demeter Law is related to “Tell Don’t Ask” and usually summarized as “only talk to your close friends”. It prohibits series of method invocations like the one described above; in fact, more than one dot in series of method invocations is disallowed (unless the chain originates in a this object’s member (?)). There’s a good description of the “The Law” in Pragmatic Programmers article; it was originally discovered and named by Karl Lieberherr, who created the Demeter Project  and wrote the book named “Adaptive Object-Oriented Software - The Demeter Method”. Martin Fowler demoted the Law to Suggestion, and in general, Demeter managed to generate a lot of controversy and debates inside both Java and Ruby communities, the debates ranging from intellectual (almost academic) polemics, to flamed battles between consulting firms over preferences in programming styles, music and hairdo. Yes, really. Anyway, if you believe in Demeter, our use-case for “null-safe” method invocations is flawed to begin with. But do we believe in Demeter? It took me a while to form an opinion.
1) I do believe in “Tell, Don’t Ask” and the internal iterator (as presented in Pragmatic Programmers article) to be preferable over external iterator (when applicable, of course). 
“Tell Don’t Ask” begs for closures in the language. For those who say “closures are not object-oriented”, Smalltalk not only supports closures, and with minimal syntactic overhead, but basic things like if and while are designed around closures and would not be possible otherwise. 
But "Tell Don't Ask" goes much deeper and wider. For example let's look through Demeter's eyes on service locator vs. injection: locator is wrong, because you go to a factory to get a factory to get a service. Injection is right, because you pass yourself to injector, which sets your the dependencies directly. How about ORM? How much grief is in loading strategies, because we do getContainer().getItem().getProperty(), vs. calling database.execute(query)? Food for thought.

2) Now the paper boy example – this one I do not buy. First of all, we don’t pay for things in cash these days – we give the seller our credit card. Not very secure? Maybe. But much more handy. 
People should generally stop obsessing around security in software, I think. At one point a highly ranked architect in the firm I worked for said “we can’t deploy any software in the browser because it is not secure”. Yes, it is not secure, it’s a problem and people are working on it. And yet, even as it is, is it secure for the jeweler store to put some jewels in the front window? Why not keep them all in the backroom safe?  Because they won’t have buyers, that’s why. What good is all the security, if you go out of business? 

Back to paper boy, the example also does not hold if you are a large organization – the receiver of the service is not the one who authorizes the payment, and not the one who handles the payment. The supplier may wish for “direct communication” as much as he wants, but the rules of corporate procurement are quite different. Then again, it may be silly to attack the example…? Well, the thing is that I hope to show later that you can’t follow Demeter and scale, just like the example doesn’t.

3) Now there’s “hard to mock” argument. Ok, at risk of starting a flame war here, I mean: I love unit-testing just like the next guy, and I know how useful it is, but let’s not get carried away here. 

Unit-testing, just like static types/compilation, or static analysis, is a means to achieve code quality, not the goal in itself. If it helps you – great, if it gets in your way – ditch it. I think the “100% test coverage” is a fallacy. Dijkstra said “tests only prove presence of bugs, never their absence”, which is even truer for unit-tests. (I heard “we have unit-tests, we don’t need QA” argument once or twice. Yeah, right.) 

Anyway, “hard to test” may be a smell, but unless the suspicion can be substantiated – sorry, circumstantial evidence not accepted.

4) So even though I partially agree that “Demeter violation” is a symptom of a problem, the solution proposed by Demeter is, in my opinion, simply absurd. The grand huge fat mega-façade? Dots replaced with underscores? It cannot scale and it is sweeping design problems under the carpet. I mean, give me hierarchies of nested objects that support subsumption, like Newspeak modules – now we are talking. But mega-façades? No thanks.
So there are good things that result from minding Demeter, but not following the law per se. 
a) For example, “don’t return a java.util.List from your API when you don’t want clients adding elements to the list” – that’s basics of any sound API design – return the minimum interface that the client needs, Josh Bloch has been saying this repeatedly. 
b) We should consider to proxy the returned object (perhaps with our own (inner) class), and then we’ll have control over what the returned object’s methods do.
c) Applied to DI – “do not depend on module A just to give you access to another module B, instead depend on B directly”, is also a reasonable rule of thumb. 

But let’s now go back to our beloved nulls.

Power to the method - let the declaration site decide
Demeter or not, methods are meant to do things. I would argue that typical methods should not be assumed to be wrapping field access, or performing trivial computations. Yeah, I know we all do POJOs, which are really POCS, Plain Old  C Structs in disguise.  (And what makes them structs are frameworks that assume a dumb accessor and mutator, and freak out if they are not.) I believe that most methods should do something, so invoking them on a null target should yield an error, not some ambivalent null… So I am not too happy with changing the programming language to accommodate to behavior that should have been atypical.

Now if some methods are “dumb” and have reasonable defaults (like null for property accessor in null object, or 0 for size/length of a null collection/array, or null for a null string upper case transformation), then the method implementer should decide – not the caller. And again, it’s not problem-free, but worth exploring. Somewhat similar behavior has been proposed by Jacek here.

Suppose at the method declaration we add a default clause: either default <value>, or default <expression>, or default <code-block>, with the return type of value/expression/code-block similar to method return type, and parameters and exceptions (in 2 latter cases) similar to the method’s ones. BTW default is an existing Java keyword, used in switch statements. 

Now examples:

public int size() { return this.size; } default 0; 
public boolean isEmpty() { return this.size == 0; } default true
public Person getOwner() { return this.owner; } default null
public Iterator iterator() { return … } default Collections.emptyList().iterator(); 
public Something produce(Param1 p1, Param2 p2) throws E1 {
   //normal method code here
} default {
  //default calculation, similar to static method, can use p1 and p2, throw sub-type of E1
  //needs to return Something

The location of the “default clause” can be after method body (and if we stick to coding convention of keeping default it on the same line as the closing curly brace, I think it is preferable), but it can sit also between end of method parameter list and the throws clause, or between the throws and the method body start. I think tail location is preferable, because it reminds of the switch statement. 
Another variation that increases similarity to switch may be adding a colon (default : ) and requiring return keyword, like default : return null; instead of default null. Also, the support for “defaults” can be added gradually, with only values in the first stage and more later if it proves to be successful. Also, alternative to default may be case null : … .
The method would be compiled now additional synthetic static method, e.g. ___size(), with the same signature and throws clause, the synthetic method returning the “default” body provided by the user. 

Unfortunately, though, call sites are affected by such method definition and compiled to something like:

SomeType object;
if (object != null) {
  object.m()
} else {
  try {
     SomeType.___m();
  } catch (NoSuchMethodError e) {
    throw new NullPointerException();
  }

So the “default” would be calculated based on declaration type of the object, just like any static method invocation. Why catching NoSuchMethod? Well, what if SomeType class is recompiled alone, and the default clause, and with it the ___m() method, are gone? We want NPE to be thrown, not NSME. 

If, on the other hand, we add the default behavior without recompiling clients, they’ll be still throwing NPEs, I think we would not want all method invocations to become complex byte-code like the one described above… If I understand Jacek’s proposal correctly, unlike me, he suggests every method invocation to go through a static method. 
It would be interesting to see to what extent JIT could optimize code like that when it learns that object is not null… could it throw away everything but object.m()? This certainly requires more investigation.
Maybe, babe, boom
Another option would be to turn to Functional Programming for an answer. This is one of the things known in FP world for ages: Maybe in Haskell, or Option in Scala and F#. Here’s a nice description by Debasish Gosh, and if you want to learn about Monads, James Iry’s blog or a Haskell book would be a good place to start. But we don’t have Monads in Java, Stephan Schmidt tried to simulate Maybe with Iterable, and the result is cute, very cute even, but lacks the monadic awesomeness of flatMap, as others have pointed out. Hm, but wait, we do have a monad in Java. That’s exceptions. The naïve translation of Maybe in Java would be:

String street;
try {
  street = account.getOwner().getAddress().getStreet();
} catch (NullPointerException e) {
  street = null;
}

Ok, it’s not so clean and pretty as pattern match, but… well, that’s Java. No, wait, smart and experienced programmers like us – we know that we shouldn’t use exceptions that way, because it is expensive! Hm, and why is it? Because filling in the stack trace, actually, otherwise it’s rather cheap. But we don’t need stack trace here, only… the exception is the standard NPE, we cannot subclass it and override fillInStackTrace(). Bummer.

Using exceptions for control flow is debatable. I am not a proponent of this style, just for the record. I am simply exploring options… what if the JVM “magically” knew not to fill in stack trace when it’s not used? Exceptions would become cheaper and whole new programming style in Java would emerge...? 

Some of the Java 7 language enhancement proposals are to do with exceptions, multi-catch for example, I am all for it BTW. Now what if we add another flavor to catch and introduce “no-instance” catch, such as:

try {
  //…
} catch (NullPointerException) {
  //…
}

Note that there is no e, no exception instance, no way to re-throw or print – clearly no need for a stack trace. What if JVM kept a bit-map of exception types in “no-instance” try/catch blocks on the stack, and threw these exceptions without filling in the stack trace? After all, we are guaranteed that they are caught and no “phantom” exception like this escapes to the user… these exceptions could be pooled too. Then we could say

String street;
try {
  street = account.getOwner().getAddress().getStreet();
} catch (NullPointerException) {
  street = null;
}

And if we had closures, we could even make it prettier. We could create a utility method like

public static <T> T nullsafe({ => T} expression) {
  return nullsafe(expression, null);
}
public static <T> T nullsafe({ => T} expression, T default_value) {
  try {
    return expression.invoke()
  } catch (NullPointerException) {
    return default_value;
  }
}

It would be invoked like this:

String street = nullsafe({ => account.getOwner().getAddress().getStreet() });

In all its glorious maybiness and with no, or little, performance overhead?! :-) But hey, I said I don’t like clients deciding on the value that the method returns… true, which is why I am not entirely happy with any of the proposals. But at least here we don’t change the language for the sake of null alone – the “exceptional” invocation is explicitly made with a catch clause or via a special library call. 

Epilog

All this just for the sake of intellectual exercise, so don't take too seriously. :-)

Update

So why I don't like any of the solutions, not even the ones I proposed? The "default clause" is essentially another static method. It may even work for interfaces, with a slight twist. The real problem is that static cannot be overridden, unlike the method it is "attached to", so the default clause and the actual method body will not correspond, so we basically gained nothing. 

As for the instance-less exception, call me old-fashined, but I think that exceptions are for exceptional things, and this is stretching the hacks around them just one bit too far. 

So I'll stay with plain old null for now, thank you.


8 comments:

kirillkh said...

Being a hasty bypasser, I didn't take the time to read any of the friendly links, so maybe what I'm going to say is mentioned somewhere in their mist.

RE ORM: In my opinion, the whole issue is the lack of proper data structures in languages that are used for ORM. These "plain beans", or what-is-this-thing-called (can't believe I forgot it; hurray!) should have really been records, variant types, anything -- but not objects, as they are not supposed to have any behavior by definition. So I think the Demeters's principle simply doesn't apply here. Even strictly within the OOP bounds, one has to conclude that some objects are just "data" and nothing else, and it is perfectly fine to query their value. The proof if simple: suppose not; then you can't query an integer.

RE POJOs: ugh. that's what it's called

RE default return: nice idea, but making this annotation on implementing methods would be pointless in half of the cases. These cases are when the client's declared object type is an interface, for example, or an abstract class. So the only way to create a default statement for these cases would be to hook it to the abstract method declarations. The problem with this is, first, that interfaces are currently not allowed to contain any code at all (except in static nested classes) (which is another ridiculous limitation, if you ask me). More importantly, there are always cases, when different implementations of a method behave differently, and, from my experience, there is no way you are going to escape this problem completely, unless you are a full-blooded purist, also ready to rewrite every library, whose API goes against this principle (in which case you shouldn't be programming in Java). In some of these cases, different default return values will be required.

Another problem is that a method can already use null as a special-case return value, while it would return something else by default.

But my main argument against this is that returning a default value on a best-effort basis has an outstanding potential for indiscoverable bugs. So, if given two choices, I would pick static checking (as in Maybe), or fail-fast behavior (as in good old null), but never a silent best-effort solution.

RE catching NPE: unless you are going to completely convert NPE for use as triggers to return a default value, a NPE can also originate inside one of the called methods, which would subsequently create an ambiguity in its treating (have we reached a fatal condition, or one of the objects in the chain is null, which we are prepared to handle?).

Yardena said...

Hi Kirill,

About data structures in object oriented languages, I think there should be tuples, maybe named tuples - which are like variant types or records. But generally, objects should not be inspected for their internal structure, unless it is reflection, and it's a different story. I think the right way of going about it is when object decomposes itself to data that matches a certain pattern, sort of reverse constructors. Then the user can "query" an object without coupling with its internal structure. Scala extractors is an example of such approach. That's worth a follow-up post one day.

kirillkh said...

Hi,

As far as I see, Scala's match patterns are essentially the same as ML's? Then how come that by writing a match pattern in the client code, you aren't making assumptions regarding their structure? It's not necessarily their inner structure, right, but nevertheless you do make such assumptions. I fail to see, how is this different from object accessor methods (other than more convenient). For example, if we have
type ['a] tree = Node('a * 'a tree * 'a tree) | Nil
then a Node provides as much information of itself, as its Java counterpart would provide, and when you match it with T(5, Nil,Nil) in the client code, you make an assumption regarding its structure. Sorry, can't see any difference. I suspect that you might be thinking of keeping ALL of your patterns, not just constructor/destructor under the data structure writer's control, but that would make match patterns much less useful.

Yardena said...

Hi Kirill,

Scala's pattern matching originally was just like ML's, and later it was ported to extractors (which are generated automatically), so some examples do look like ML, but others use unique features of extractors.

Extractor is different from usual pattern matching because it does not match the actual structure of the object, but instead provides the unapply method which decomposes the object. As you can see even in the simplistic Twice example, there is no field that holds half a number, the matching is purely computational and not structural. I am not an ML expert, but I don't think it works that way.

k.k. (damn you, Google) said...

I've been hearing all the buzz around Scala for some time now. Considering some two years this has been going on, maybe I'll give it another look, when I get some spare time (next summer?). I recall that I didn't like something about it at the first glance, (think it was a problem with type inference).

Still, I have kinda reached conclusion that I don't want to depend on anything produced by corporations. The real fun is in the niche languages, their tight, brilliant communities, where people can really express themselves in any way possible, including calling someone brain-dead motherfucker and driving them off the list. That's the kind of spirit that's lacking in business today. Entry barriers my ass. (Whoa, it seems I got carried away a little).

Don't care of anything beyond that right now. Don't want to care.

Itay Maman said...

Great post.

I guess I'll get crucified for this, but I'll say it anyway: I often feel that Demeter is sort of an historical accident. A nice guideline, with limited applicability, that somehow got promoted to an almighty principle.

Paul said...

Hi Yardena,

Wow this one is a tour deforce! A lot to think about. I'll mull it over and respond with some deeper thoughts on my blog.

Initial reactions:

1. I like the idea of the Objective-C "black hole". Send a message to "nothing" and you get "nothing" back in return.

2. Law of demeter was recently the vain of my life. I agree its not a law, but rather a smell, but try telling that to good law abiding static language programmers :). With ORM's promoting the use of structs all over the place, it can become a difficult smell to avoid, and is often the consequence of choosing to use structs as objects in the first place.

Good point about higher level abstractions. Newspeak breaks the law of demeter all over the place for good reason. Interestingly so does Java and C# with their package imports and namespaces, but no seems to notice or care :)

3. I'm programming in C# knowadays as the day job, and I don't care much for the ?? operator. It's like trying to get rid of a smell with cheap air freshener :) Java doesn't need it.

4. I agree with you that the method should be responsible for providing a sensible default as part of its contract, meaning that clients shouldn't have to check for null. I have tried using the Null Object pattern in the past in an attempt to observe this "idea" with mixed results.

If this principle is observed, then a null is a true (programming) error, an exceptional condition, in which case getting a null pointer exception is to be expected.

Paul.

Morgan Creighton said...

Thanks for yet another thoughtful post! I used to use nulls in my programming, but I've since come to feel that nulls are evil. I now have the philosophy that any encountering of a null is a serious programming error.

Scala's Option is nice, but I'd really rather have a language that prevented nulls from existing at all. Then I could concentrate more on my biz logic. Unfortunately, I think it's impossible to eliminate nulls from JVM languages because you can't guarantee that a pointer is written before it is read. So I've taken to dousing my Scala with lots and lots of "require" statements.

I'm not really satisfied with such scaffolding pollution of my biz logic, so I'm wondering about using AspectJ to defend against nulls.