Wednesday, March 3, 2010

Sayeret Lambda

After delivering the keynote at JavaEdge2009, Ted Neward asserted that Israel is ready for the "polyglot" era. Ronen, Ophir and I decided to verify this assertion, so we are happy to announce the creation of "Sayeret Lambda", the Israeli "Lambda Lounge"! The group contains researchers, consultants, students, and industry professionals, interested in Scala, Clojure, Erlang, Lisp, Prolog, Smalltalk, Ruby, Fan, Groovy and so on...

Our first meeting is going to take place on March 10th, and is dedicated to Lisp and Clojure. In April we are going to host Gilad Bracha's talk about Newspeak. Both talks will be in Hebrew.

Programming language geeks in Israel are welcome to join us!

Tuesday, May 19, 2009

Lessons we learn, lessons we teach

Lately I've been a little involved in both (learning and teaching that is), and wanted to share some thoughts on the subject.

Do you like science fiction? I used to like it (when I was younger, so much younger than today). Asimov, Bradbury and such. It's especially fun when the story mixes past, future and present; reality and imagination. What fascinates me most in these stories, is how the author is making deep and painful observations about reality through moving the characters into a completely unrealistic setting - future comes handy, because it is hardest to imagine . It's just amazing how, by disguising things beyond recognition, the author breaks the thought conventions and emotional associations built in our brain, and doing that lets us understand the reality beyond what we were able to before. John Lennon said "nothing you can see that isn't shown", right, but you can imagine what you don't see, and that's how you "know the unknown" a little bit better.

There's this recurring theme in sci-fi stories "smart man builds machine, machine becomes smarter than man, machine rules man". Is it a real threat? If so, how can we prevent it? Should we stop building smart machines? Is high-tech going to destroy mankind? I don't think so. For one, if I did think so, I would quit and become an organic farmer, and I haven't done so... so far. But the question does bother me. And the answer is, I think, that if we want to continue building smarter machines, we absolutely need to keep building smarter people, or we'll end up like in those sci-fi stories.

The main purpose of education is often perceived as passing on what we already know to the next generation, so they don't waste time on rediscovering it. But it's only secondary. The real goal should be passing on the ability to explore the unknown and solve what previous generation haven't solved. Because if students can think at least as well as their teachers, rediscovering something will be just a small detour for them on the road to Knowledge. But without a developed mind they have no legs to walk that road. And it's a steep road up, so when they stop, they (inadvertently perhaps, but) inevitably slip to the bottom. So how do we nourish the ability to think?

It's not what we know, it is how we learned it.
When asking ourselves how to teach, we should first turn to introspection - how did we learn? If the way we've been taught made us discover all those great things that we are so anxious to pass on, why don't we teach what we've been taught? Sure, we will throw in a bit or two of what we've discovered, but basically why don't we build from the same grounds? They say it about parenting - if you like the way you were raised, you'll likely be a good parent, because you'll repeat what you saw. However, if you look around in what's going on in education system, in almost every level, you see the curriculum constantly changing, "updating", "modernizing" etc. "It was true then, it's wrong now. We don't need it." Why? Because we have the technology? Big mistake. Ancient Greeks, Hebrews, Egyptians, not having the technology, weren't even tiny bit stupider than us. Without the wheel, there would be no Internet, speaking of which, who invented the Net? But no, we don't need old men, old books and such, religion, history... we have TV commercials to teach us how to live, we have the technology. And once technology costs money, suppliers of technology don't want us thinking independently, because, Google forbid, we may decide we don't need it! In computer science, this is the nightmare sci-fi writers were warning us against, and we should at least try to prevent it from happening.

We can't go forward without understanding the past. Students can't possibly understand Java and Object Orientation before they understand procedural programming, functions, math, logic. Then, when (and if) we show them objects, let's show how they came about, and not a popular imitation. If we have a great book and curriculum that generations of computer scientists and engineers grew on, why are we throwing it away? Let Java, Python and their patrons wait. They will lay their heavy paws on the students in just very few years anyway and will turn them in Dilberts, Wallies, and Alices, converting large XML files to long stack traces. Let's give freedom of thought and curiousity a chance to grow just a little bit in students' minds, so at least some of it can survive through corporate development.

It's not what a technology does, it is how and why it works
We're all the time obsessively looking for solutions to our problems. We barely stop and analyze them, until the solution itself becomes our biggest problem. I watched this QCon presentation a while back, and it was a deja vu in many senses. I encountered problems like that, and I even solved them in somewhat similar way. I may be wrong, but what I get from the presentation on the technical level, is that sometimes Object Orientation as we know it (C#, Java), with all the patterns and practices and such, does not solve our problem. The problem in the presentation reminds me of the expression problem - data and operation-set need to evolve, how do we express the relationships? The proposed solution (although I may be getting it wrong) is to have interface per operation; then the implementation of the interface, using reified generic type parameter, stores the type of object it applies to; then at start-up something wires together data types and implementations; then it all becomes a big happy family of multi-methods, operation implementation chosen via dynamic dispatch on the data type. And an old saying goes "when the problem is hard enough, you will find yourself re-inventing Lisp to solve it". It's interesting how Udi describes arriving at this design - a team of people were struggling with the problem for years, until an "old programmer" came to the project retrospective meeting and said "make your roles explicit". Udi took the guy aside and made him explain. Then Udi (and his team, I suppose) implemented it, and now reported the success at QCon. Who was that "old geezer"? What was he? Udi didn't say, in the presentation he is portrayed as a little green troll. Anyway, why do I care who that mysterious character was? Because I didn't come to computer science for some wise Merlin to tell me where the Holy Grail is, I want to be that Merlin! Unfortunately, I am probably not smart enough, but someone else is. That's why I think we need more wanna-be Merlins; wanna-be Kings we already have plenty.

Since I became interested in functional programming, for example, my Java coding style changed significantly. I thought I knew how to use Generics, only after a taste of OCaml and Benjamin Pierce's writings I realized how little I knew about types. I thought I knew how to write object-oriented programs, only after playing with Smalltalk I realized what object oriented really meant. I may not be able to use Haskell for the day job, but the concepts of immutability, closures, function composition, laziness, are helpful no matter what language I use. Let's look at one example - the (in)famous problem of object-oriented design: does square extend rectangle, or in other words does circle extend ellipse? In both cases the former has 2 distinct properties (edge sizes or focal points) and the latter has only 1. Bob Martin discusses the problem in his article "Design Principles and Design Patterns". Does he offer a solution? No - "design by contract" and Eiffel and some hand-waving. On the other hand, understanding types helps, because then I can "design for subsumption": I ask myself - if Ellipse is a type that describes all ellipses in the world, do they all have 2 distinct focal points? No. Then maybe I shouldn't have methods to get and set them. Maybe my API should try to follow the definition of ellipse more precisely. That surely helps to design good APIs. Furthermore, magically, once we turn the shapes to immutable, the problem almost goes away. If Rectangle has a method Rectangle transform(x,y) that produces a new Rectangle with given sizes, then Square can inherit it with no problem, it would simply produce a Rectangle, not a Square when x != y. Same trick would work for Ellipse. After all, shapes are math definitions, why should they be mutable?! See, a bit of "functional" thinking solved the problem. And the moral - understanding the classic foundation of computer science is necessary for programmers.

Reality distracts clarity of thought.
If there is one more thing that we can learn from these sci-fi stories, it's that detaching things from reality may actually increase our ability to grasp them. However education in recent years is insisting on "examples", or worse - "realistic examples", or even worse "examples of being used in the industry". I am not saying the above is worthless, or unnecessary, but it should not be overrated. At some point education ministry decided to teach elementary school math with actual objects - sticks and such, rather than teaching kids the abstract idea of numbers, and it was a disaster. Math, in general, cannot be taught by following "real world" intuition. Nor logic. Physics has evolved way beyond relatively "intuitive" mechanics. So why are so many educational institutions chasing "real life" technologies, at the expense of classics, and ignoring the "too innovative to be popular"? I know why, of course, - money, pressure from the industry, pressure from students who want real jobs after they graduate. But resisting that pressure is absolutely necessary, for the sake of future generations, to save our civilization! Luckily we still have some universities in Northern Europe :-)

I noticed an interesting phenomena with students - when they are "fresh" and don't carry a baggage of "field experience" (C, Java, curly braces etc.), it is easier for them to take a "different" point of view, to understand more abstract ideas. Generics is one example. Teaching them to experienced Java programmers is extremely hard. But with undergraduates it is, surprisingly, much easier. So why don't we teach Haskell for types, Smalltalk for objects, and maybe C for low-level stuff? Then when they meet Java, or any language they will likely encounter in the industry, it will be a piece of cake to learn. Furthermore, some of them will be able to design the next Java!

Java nested classes - tips and tricks

I had to prepare this anyway, so I thought I might as well post. 

First of all to get the terminology straight: normal classes and interfaces are top-level. But class or interface can also be nested, if it is defined inside another class or interface. Nested classes originate in Beta programming language, and are available in Java since version 1.1. Non-static nested classes are called inner. Inner classes can be members (declared immediately inside outer class definitions), like this:
class Outer { class Inner { ... } }
Inner classes can also be declared within methods and other blocks, then they are called local, like:
class Outer { void foo() { class Local { ... } ... }
A more popular breed of local inner classes are anonymous classes: 
class Outer { void foo() { ... new Bar(...) { ... } ... }
For visual impression - check out this diagram.

Tip - construction: Creating new instance of nested or inner class within the outer class is simple. From outside it's a bit trickier, suppose we have a class like this:
class Outer {
   static class Nested { ... } 
   class Inner { ... } 
We can reference and instantiate the classes like this:
Outer.Nested nested = new Outer.Nested(...);
Outer out = ...  
out.Inner in = out.new Inner(...); //translated by javac to new Inner(out, ...)
The magic behind inner class constructor is that compiler implicitly adds Outer parameter to all Inner constructors, and passes the enclosing instance when constructor is invoked. From then on, inner class instance (for its entire lifetime) holds a strong reference to the enclosing instance.

Tip - instanceOf: If we have two distinct instances Outer out1, out2 then out1.Inner and out2.Inner denote the same class, but Inner instances will refer to a different enclosing instances. This is different from Scala and Newspeak, where inner class is distinct for every enclosing instance.

Tip - access enclosing instance: To access the instance of outer class from within a contained inner class use Outer.this. Nested/inner class methods/fields hide outer class ones, to access outer class elements prefix them with Outer class name, e.g. Outer.staticMethod(...) or Outer.this.anyMethod(...)

Tip - nesting and inheritance: Generally, method lookup rules in Java nested classes follow "comb semantics" - first search inheritance hierarchy, then enclosing lexical scopes. This behavior can introduce some wierd puzzlers, like #9 here. In Newspeak the enclosing scope is considered before inheritance, which makes it easier to follow from programmer's perspective.

Tip - generics: Generics type parameters of enclosing class (or method) can be used within inner classes.

Tip - interfaces: Interfaces may have nested classes (necessarily and implicitly static), it may be particurlaly useful for declaring nested enums.

Tip - statics: Inner class cannot have static declarations in it, except compile-time constants. To overcome this, static declarations canbe moved to the top-level class.
 
Trick - loading: Nested class is treated just as any other class by the JVM, e.g. it is not loaded/initialized until used. This fact is used to implement thread-safe lazy singletons using the Holder pattern.

Tip - final: Anonymous inner classes and local classes can access variables in the surrounding scope only if the variables are final:
void invokeAnon(final int number) {
   final String word = “hello”;
   someObject.pass(new Runnable() { 
      public void run() { 
         System.out.println(word.substring(number));
      }
   });
Trick - double braces: Double brace initialization is a trick of putting initialization block inside anonymous inner class declaration, like:
   final Map numbers = new HashMap(){{
      put("one", 1);
      put("two", 2);
      put("three", 3);
      //...
   }};

Trick - tokens: A cool Generics trick that uses local class to capture type parameters is super-type-token, a.k.a. Gafter Gadget. 

Tip - reflection: Starting from Java 5 a bunch of methods have been added to reflection with regards to nested classes. For example, Class#getEnclosingClass() method will let you find out the enclosing class for an inner class, for example:
class Enigma {    
 final static Class MY_CLASS = new Object(){}.getClass().getEnclosingClass();  
}  
Prolog: Last, but not least JLS is the ultimate resource for finding out more.

Friday, May 8, 2009

Community Choice Award


I nominated Newspeak Programming Language for "Most Likely to Change the Way You Do Everything" Community Choice Award. I don't know how exactly they choose winners, apparently:

The first phase will be to nominate finalists for each of the Categories. Nominations will be accepted at ... between May 6, 2009 at 10:00 a.m. PDT and May 29, 2009 at 4:00 PDT. Among the nominees for each of the Categories, the finalists for the Awards will be chosen. Voting for the final winners will commence at ... on June 22, 2009 at 10:00 a.m. PDT and end on July 20, 2009 at 4:00 pm PDT. ... The odds of winning in any category are dependent upon the total number of eligible nominations received.
Anyway, clicking the orange bot on Newspeak sourceforge page might help the odds - I ask all my readers to contribute a click for a good cause!

Thank you.

UPDATE: Why does Newspeak deserve it? Newspeak is a class-based dynamically-typed object-orientated language that revives the ideals of Smalltalk and Self.  It incorporates many great ideas, but one of the major innovations is its modularity support. 

No other language or framework today provides comprehensive solution for creating modular software. Some languages support hierarchical code organization, there are tools that build components, tools that manage dependencies between components, yet another set of tools and formats deal with module deployment, there are platforms and tools that facilitate versioning and patching, hot and cold updates. Newspeak has it all - the language, development environment, and platform together provide easy and intuitive end-to-end modularity support. 

Newspeak supports both mixin-based inheritance and class nesting - modules are top level classes, while all other classes are nested in them. Dependencies between modules are specified using constructors, the absence of global (static) scope enforces complete isolation of modules and prevents creation of incidental or implicit dependencies. Since all objects communicate via virtual method invocations, there is no hard wired dependency on a particular module implementation. Everything is virtual, including the parent-child relationship between classes, which allows for great flexibility and extensibility. The platform supports construction, serialization and loading of module instances, and therefore effectively supports building and deploying applications without the need for any external tools (even though some of this is still under development). Dynamic platform underneath Newspeak has rich meta-programming support and allows querying module definitions, extending modules and supports hot (incremental) updates. Multiple versions of the same module can coexist without interference. Security is maintained by capability-based model where access to resources is guarded by capability objects (also under development). Modules may access their execution environment (the virtual machine, or platform) and through it interact with external resources. Newspeak is also network-aware, and is designed to support distributed component management using service objects. 

Newspeak is open-source, it was not widely, but successfully, used in an industrial environment, until financial situation deteriorated and corporations turned their back on funding innovation. There are several publications, and more on the way, conference presentations are received with great excitement. Newspeak is modern, it combines "best-of-breed" ideas of computer science and decades of Smalltalk and Java practical experience. It is easy to learn and very pleasant to code in (and not just for Smalltalkers and programming languages afficonados, but also for averagely skilled Java programmers like yours truly). Newspeak philosophy is inspiring. The people who work on it are extremely smart, but also nice and cool guys... Need I say more?

Tuesday, March 31, 2009

Oh Null, Null

Introduction
Some say it’s one of the worst inventions in history of programming languages, some say it is number one cause of errors in Java programs. Anyway you look at it - null pointer (in all its incarnations) is a sensitive subject. I wonder why there is no song about it, there are several funny re-works of Beatles songs with geeky lyrics, like Unix Man (Nowhere Man, “he’s a real Unix man sitting in his Unix LAN, making all his UNIX .plans for nobody…”), Write In C (Let It Be), Yesterday (“yesterday, all those backups seemed a waste of pay…”), Something ("Something in the way it fails, defies the algorithm logic...") and a not so funny Eleanor Rigby. I’d like to write a new song, “Null” using the “Girl” tune. Only my inspiration ended after a single verse:
If there’s anybody going to listen to my story, 
All about the Null who brought the fall.
It’s a value that you want so much, it makes you sorry,
Still you don’t regret a single call. 
Oh Null, Null…

I also have a middle verse inspired by Smalltalk, where the equivalent of null is nil, a keyword that returns an object of class UndefinedObject. It has no methods, except isNil and notNil that return true and false respectively, so nil responds to any other method invocation with “DoesNotUnderstand: <method name>”, and in my experience, at least, pretty much equivalent to NPE in Java. Interestingly enough, though, Objective-C turned nil into a “black hole” that returns self upon any invocation.

Nil’s a kind of null, she is not answering your calls,
you feel a fool (fool fool).
When you think the code looks good, she answers “it’s not understood”, 
she’s cruel (cruel cruel).
Ah null, null…
From Beatles to Elvis
One situation where nulls are very annoying, is when we have a “chain” of method invocations, such as account.getOwner().getAddress().getStreet(). If any of the methods return null, we get NullPointerException from the next method invocation. I previously suggested a solution to the problem, but now (due to popular demand) there is a proposal for Java 7 language enhancement, called The Elvis Operators.  One of the stronger arguments for this solution is that it is available in other languages like C# and Groovy, even if not everybody is entirely happy with it. Perhaps this is subjective, but after all, programming language design is about human perception and feelings, as much as about computer capabilities. So, I admit: I feel uneasy with this new way of method invocation.

The Demeter Law Suggestion
One of the more controversial guidelines of object-oriented programming, Demeter Law is related to “Tell Don’t Ask” and usually summarized as “only talk to your close friends”. It prohibits series of method invocations like the one described above; in fact, more than one dot in series of method invocations is disallowed (unless the chain originates in a this object’s member (?)). There’s a good description of the “The Law” in Pragmatic Programmers article; it was originally discovered and named by Karl Lieberherr, who created the Demeter Project  and wrote the book named “Adaptive Object-Oriented Software - The Demeter Method”. Martin Fowler demoted the Law to Suggestion, and in general, Demeter managed to generate a lot of controversy and debates inside both Java and Ruby communities, the debates ranging from intellectual (almost academic) polemics, to flamed battles between consulting firms over preferences in programming styles, music and hairdo. Yes, really. Anyway, if you believe in Demeter, our use-case for “null-safe” method invocations is flawed to begin with. But do we believe in Demeter? It took me a while to form an opinion.
1) I do believe in “Tell, Don’t Ask” and the internal iterator (as presented in Pragmatic Programmers article) to be preferable over external iterator (when applicable, of course). 
“Tell Don’t Ask” begs for closures in the language. For those who say “closures are not object-oriented”, Smalltalk not only supports closures, and with minimal syntactic overhead, but basic things like if and while are designed around closures and would not be possible otherwise. 
But "Tell Don't Ask" goes much deeper and wider. For example let's look through Demeter's eyes on service locator vs. injection: locator is wrong, because you go to a factory to get a factory to get a service. Injection is right, because you pass yourself to injector, which sets your the dependencies directly. How about ORM? How much grief is in loading strategies, because we do getContainer().getItem().getProperty(), vs. calling database.execute(query)? Food for thought.

2) Now the paper boy example – this one I do not buy. First of all, we don’t pay for things in cash these days – we give the seller our credit card. Not very secure? Maybe. But much more handy. 
People should generally stop obsessing around security in software, I think. At one point a highly ranked architect in the firm I worked for said “we can’t deploy any software in the browser because it is not secure”. Yes, it is not secure, it’s a problem and people are working on it. And yet, even as it is, is it secure for the jeweler store to put some jewels in the front window? Why not keep them all in the backroom safe?  Because they won’t have buyers, that’s why. What good is all the security, if you go out of business? 

Back to paper boy, the example also does not hold if you are a large organization – the receiver of the service is not the one who authorizes the payment, and not the one who handles the payment. The supplier may wish for “direct communication” as much as he wants, but the rules of corporate procurement are quite different. Then again, it may be silly to attack the example…? Well, the thing is that I hope to show later that you can’t follow Demeter and scale, just like the example doesn’t.

3) Now there’s “hard to mock” argument. Ok, at risk of starting a flame war here, I mean: I love unit-testing just like the next guy, and I know how useful it is, but let’s not get carried away here. 

Unit-testing, just like static types/compilation, or static analysis, is a means to achieve code quality, not the goal in itself. If it helps you – great, if it gets in your way – ditch it. I think the “100% test coverage” is a fallacy. Dijkstra said “tests only prove presence of bugs, never their absence”, which is even truer for unit-tests. (I heard “we have unit-tests, we don’t need QA” argument once or twice. Yeah, right.) 

Anyway, “hard to test” may be a smell, but unless the suspicion can be substantiated – sorry, circumstantial evidence not accepted.

4) So even though I partially agree that “Demeter violation” is a symptom of a problem, the solution proposed by Demeter is, in my opinion, simply absurd. The grand huge fat mega-façade? Dots replaced with underscores? It cannot scale and it is sweeping design problems under the carpet. I mean, give me hierarchies of nested objects that support subsumption, like Newspeak modules – now we are talking. But mega-façades? No thanks.
So there are good things that result from minding Demeter, but not following the law per se. 
a) For example, “don’t return a java.util.List from your API when you don’t want clients adding elements to the list” – that’s basics of any sound API design – return the minimum interface that the client needs, Josh Bloch has been saying this repeatedly. 
b) We should consider to proxy the returned object (perhaps with our own (inner) class), and then we’ll have control over what the returned object’s methods do.
c) Applied to DI – “do not depend on module A just to give you access to another module B, instead depend on B directly”, is also a reasonable rule of thumb. 

But let’s now go back to our beloved nulls.

Power to the method - let the declaration site decide
Demeter or not, methods are meant to do things. I would argue that typical methods should not be assumed to be wrapping field access, or performing trivial computations. Yeah, I know we all do POJOs, which are really POCS, Plain Old  C Structs in disguise.  (And what makes them structs are frameworks that assume a dumb accessor and mutator, and freak out if they are not.) I believe that most methods should do something, so invoking them on a null target should yield an error, not some ambivalent null… So I am not too happy with changing the programming language to accommodate to behavior that should have been atypical.

Now if some methods are “dumb” and have reasonable defaults (like null for property accessor in null object, or 0 for size/length of a null collection/array, or null for a null string upper case transformation), then the method implementer should decide – not the caller. And again, it’s not problem-free, but worth exploring. Somewhat similar behavior has been proposed by Jacek here.

Suppose at the method declaration we add a default clause: either default <value>, or default <expression>, or default <code-block>, with the return type of value/expression/code-block similar to method return type, and parameters and exceptions (in 2 latter cases) similar to the method’s ones. BTW default is an existing Java keyword, used in switch statements. 

Now examples:

public int size() { return this.size; } default 0; 
public boolean isEmpty() { return this.size == 0; } default true
public Person getOwner() { return this.owner; } default null
public Iterator iterator() { return … } default Collections.emptyList().iterator(); 
public Something produce(Param1 p1, Param2 p2) throws E1 {
   //normal method code here
} default {
  //default calculation, similar to static method, can use p1 and p2, throw sub-type of E1
  //needs to return Something

The location of the “default clause” can be after method body (and if we stick to coding convention of keeping default it on the same line as the closing curly brace, I think it is preferable), but it can sit also between end of method parameter list and the throws clause, or between the throws and the method body start. I think tail location is preferable, because it reminds of the switch statement. 
Another variation that increases similarity to switch may be adding a colon (default : ) and requiring return keyword, like default : return null; instead of default null. Also, the support for “defaults” can be added gradually, with only values in the first stage and more later if it proves to be successful. Also, alternative to default may be case null : … .
The method would be compiled now additional synthetic static method, e.g. ___size(), with the same signature and throws clause, the synthetic method returning the “default” body provided by the user. 

Unfortunately, though, call sites are affected by such method definition and compiled to something like:

SomeType object;
if (object != null) {
  object.m()
} else {
  try {
     SomeType.___m();
  } catch (NoSuchMethodError e) {
    throw new NullPointerException();
  }

So the “default” would be calculated based on declaration type of the object, just like any static method invocation. Why catching NoSuchMethod? Well, what if SomeType class is recompiled alone, and the default clause, and with it the ___m() method, are gone? We want NPE to be thrown, not NSME. 

If, on the other hand, we add the default behavior without recompiling clients, they’ll be still throwing NPEs, I think we would not want all method invocations to become complex byte-code like the one described above… If I understand Jacek’s proposal correctly, unlike me, he suggests every method invocation to go through a static method. 
It would be interesting to see to what extent JIT could optimize code like that when it learns that object is not null… could it throw away everything but object.m()? This certainly requires more investigation.
Maybe, babe, boom
Another option would be to turn to Functional Programming for an answer. This is one of the things known in FP world for ages: Maybe in Haskell, or Option in Scala and F#. Here’s a nice description by Debasish Gosh, and if you want to learn about Monads, James Iry’s blog or a Haskell book would be a good place to start. But we don’t have Monads in Java, Stephan Schmidt tried to simulate Maybe with Iterable, and the result is cute, very cute even, but lacks the monadic awesomeness of flatMap, as others have pointed out. Hm, but wait, we do have a monad in Java. That’s exceptions. The naïve translation of Maybe in Java would be:

String street;
try {
  street = account.getOwner().getAddress().getStreet();
} catch (NullPointerException e) {
  street = null;
}

Ok, it’s not so clean and pretty as pattern match, but… well, that’s Java. No, wait, smart and experienced programmers like us – we know that we shouldn’t use exceptions that way, because it is expensive! Hm, and why is it? Because filling in the stack trace, actually, otherwise it’s rather cheap. But we don’t need stack trace here, only… the exception is the standard NPE, we cannot subclass it and override fillInStackTrace(). Bummer.

Using exceptions for control flow is debatable. I am not a proponent of this style, just for the record. I am simply exploring options… what if the JVM “magically” knew not to fill in stack trace when it’s not used? Exceptions would become cheaper and whole new programming style in Java would emerge...? 

Some of the Java 7 language enhancement proposals are to do with exceptions, multi-catch for example, I am all for it BTW. Now what if we add another flavor to catch and introduce “no-instance” catch, such as:

try {
  //…
} catch (NullPointerException) {
  //…
}

Note that there is no e, no exception instance, no way to re-throw or print – clearly no need for a stack trace. What if JVM kept a bit-map of exception types in “no-instance” try/catch blocks on the stack, and threw these exceptions without filling in the stack trace? After all, we are guaranteed that they are caught and no “phantom” exception like this escapes to the user… these exceptions could be pooled too. Then we could say

String street;
try {
  street = account.getOwner().getAddress().getStreet();
} catch (NullPointerException) {
  street = null;
}

And if we had closures, we could even make it prettier. We could create a utility method like

public static <T> T nullsafe({ => T} expression) {
  return nullsafe(expression, null);
}
public static <T> T nullsafe({ => T} expression, T default_value) {
  try {
    return expression.invoke()
  } catch (NullPointerException) {
    return default_value;
  }
}

It would be invoked like this:

String street = nullsafe({ => account.getOwner().getAddress().getStreet() });

In all its glorious maybiness and with no, or little, performance overhead?! :-) But hey, I said I don’t like clients deciding on the value that the method returns… true, which is why I am not entirely happy with any of the proposals. But at least here we don’t change the language for the sake of null alone – the “exceptional” invocation is explicitly made with a catch clause or via a special library call. 

Epilog

All this just for the sake of intellectual exercise, so don't take too seriously. :-)

Update

So why I don't like any of the solutions, not even the ones I proposed? The "default clause" is essentially another static method. It may even work for interfaces, with a slight twist. The real problem is that static cannot be overridden, unlike the method it is "attached to", so the default clause and the actual method body will not correspond, so we basically gained nothing. 

As for the instance-less exception, call me old-fashined, but I think that exceptions are for exceptional things, and this is stretching the hacks around them just one bit too far. 

So I'll stay with plain old null for now, thank you.


Friday, January 9, 2009

The return of forgotten design patterns

Some design patterns are used all the time and their names are known to all - like facade, factories and proxies. Some design patterns are more popular than they should be. But some, although rarely mentioned by their name, have been recently "rediscovered". I'm talking about Flyweight and Memento.

Flyweight
This one basically lets us share n instances between m concurrent clients when m > n. It separates "intrinistic state", the normal class members, and "extrinistic state", which is maintained via parameter passing and return values. Make your intrinistic state immutable, and you can share the same instance between multiple clients. Cool, ha? Priceless. Look at message passing concurrency instead of shared memory concurrency, REST, transactionless architecture... All these treasures actually follow the same spirit as Flyweight.

Memento
This pattern suggests that if you want to save your object state, you better export it in a new dedicated memento object and store the memento. Then restore your object from the memento.

This is like serialization, only serialization didn't follow the pattern, unfortunately.

Josh Bloch suggests we do it manually with so called "Serialization Proxies" - see item 78 in chapter 11 of 2nd edition of "Effective Java". Here's a slide from JavaOne 2006 preso:

The book lists more advantages of the pattern, like improved security (see item 76 - danger of hackers acquiring references to private fields using de-serialization), ability to de-serialize a different class from the originally serialized instance (RegularEnumSet and JumboEnumSet example) etc.The name "memento" isn't mentioned though.

Now imagine persistence architectures actually using intermediate memento objects... instead of modifying actual objects bytecode, breaking encapsulation with access to their private fields, imposing constraints like public empty constructors and so on... Maybe we would have been better off with mementos...?

Static types are from Mars, Dynamic types are from Venus


static - associated with logic and acting by the rules, strong, efficient, usually responsible for safety and order enforcement; but non-compromising (for better or worse), non-adaptive, weak in communication skills.

dynamic - associated with beauty and elegance, light, good in communication skills, can usually be easily made to do what you want them to, change all the time and adapt to change well; but unpredictable, act on intuition rather than logic, often seen as less efficient and weaker.


Wednesday, January 7, 2009

Types and components

Just some thoughts following a recent conversation I had. Don't we always want static types? If we can detect errors in our program, why not do it as early as possible? Sure. So when is "as early as possible"? I think the answer depends on what our program is - is it one monolithic piece or a component?


In the past monolithic software was prevalent, but today I would bet that most software is meant to be a component. Just look at open-source as it used to be before Java - you'd usually download the C source-code, change it if you need, build it all on your machine, and create one big executable. (And BTW - congratulations, you are a geek.) Cross platform Java changed the situation. Now open-source projects give you a jar to download, you put it on your classpath and you are ready to go. This lowered the bar for open-source adoption, and contributed (aside less restrictive licenses and other factors) to baby-boom of open-source frameworks and tools. Nowadays a Java developer cannot even imagine having no 3rd party jars in the classpath!

Static type check validates our software component against other components in the compilation environment. Does it match the runtime environment? What about different configurations of the runtime environment - there are tests and real deployments, and various types of deployments, and any given installation environment can change over time - new components being added, other updated or removed? How do we guarantee that compile time checks still hold? The short answer is - we can't. We need dynamic type safety anyway. Now let's examine the added value and the price (yes, there is one!) of deeply static types.

On code organization level, we try to reduce the dependencies to bare minimum - hide classes behind interfaces that we hope will remain more stable. The problem is that number of interfaces and factories in our application grows, while pursuing modularity we sacrifice simplicity... So maybe the problem is in the name? Some go as far as add support for structural types, minimizing dependency to a single field/method signature (not a problem-free solution, but there are interesting refinements). All this may help, but doesn't really solve the problem.

Another aspect we need to deal with is building and packaging the software. Here we enter the world of dependency management, the world of "make", Ant, Maven, repositories, jar versions; if it's a large enough and complex enough software we work on, simply speaking - we enter the world of pain. I still find it strange that we haven't found a better way.

As for application deployment and its problems, we'll get back to it later. But the truth is that no matter how hard we try, we can't guarantee there will be no errors when we deploy our software, so ... JVM doesn't trust us and gives us verification.

Simply put, when class is compiled, some of its requirements from other classes are captured and encoded into the bytecode. Then JVM would check them when the class is loaded, and reject the class if they can't be met. (This is really an over-simplified description of a complex algorithm, which also takes time to execute, despite optimization efforts on JVM side.) So this isn't really a dynamic check, it's something in between - names in our class get linked when it is loaded. In the classic Java SE class-loading scheme, where components are basically a chain, this scheme should work. But if we want real components, ones we can add, override, replace or remove while program is running - sweet turns sour. Our interfaces and factories have names, and classes that represent them need to reside in some "common vocabulary" usually loaded by the parent classloader, because it's not only the class bytecode that matters, but also who loaded what. Since we are talking actual classes, not their names, once we loaded two components, they cannot change their protocol of communication without reloading their parent, they also can't use a different version of a sub-component that the parent component has referenced.

In JEE that sort of things is necessary, that's why classloading in JEE is a terrible mess, not only it does not follow any specification, but it is different in almost each and every app server (wasn't there supposed to be portability?!) If you ever used commons-logging in a JEE app, you probably know what I mean. Maybe it got fixed lately, I don't know, but the Tech Guide for commons-logging is an ode to classloader frustration.

Back to deployment: whenever there are some sort of dynamic components - JEE, Spring or OSGi, there is always reflection. And most of the time there's lots of XML too. It's an escape route from static types. I attended Alef Arendsen's session at JavaEdge that presented OSGi and SpringSource. I carefully watched Alef juggle between XML, source and console like a child who watches a circus magician trying to uncover his tricks. But I didn't quite figure out the magic. And that was a whole session just for HelloWorld. I know Spring folks are doing best they can, and they're smart and all... but comparing to Smalltalk, I wasn't quite impressed. As for other solutions, although I haven't tried this out, there's Guice/OSGi integration without XML and with dynamic proxies and on-the-fly bytecode generation with ASM, but there's some overhead for the user, because it requires intermediate objects for services. So this way or the other, looks like JVM platform is holding us back.

Verification is addition, not replacement of dynamic checks. So what we get is basically a triple check of correctness (javac, verifier, dynamic) but loss of flexibility - we are interfering with components runtime life-cycles. If the invocation target is resolved just in time when the call is made, nothing precludes the target component from being reloaded between calls. But with preemptive validation, we get a static dependency tree at runtime, classes wired with each other "too early" and for good, which makes reloading a component very hard (although people keep trying). The reason for "early linking" is also performance, but late binding doesn't mean that the runtime platform can't do any optimization heuristics... but they'll have to be dynamic optimizations in the style of JIT. Will invokedynamic bring the salvation?

It seems when we are talking about multiple components, "statically typed platform" does not quite do the job. Static type check may mean a lot inside a component, but as for inter-component communication they are not only useless, but harmful. People sometimes dismiss dynamic types, because they think "it's like static types, but without static types". What they may not realize is that you are not just loosing, you are gaining something with dynamic types. You get late binding and meta-programming, and in a multi-component environment, it means a whole lot!

And that's when we are talking "inside the platform" components developed in the same language. Once you work with a system that runs on a different platform or developed in a different language - our type system doesn't normally stretch across the communication boundary. The other system may not even have static types, and since we are only as strong as the weakest link, our static types don't really help us. I think every time we try to encode types into communication between systems we end up with a monster like CORBA or Web Services. But there's another (unfortunately popular) extreme of just sending a string over and hoping for the best - with no checking on our side at all. Then we are relying on the other system to stop us from doing damage, and there's no way to correctly blame the component that made an error - was it a wrong string or an unexpected change on the other side? I think that ideally type or contract checking and conversions can be done dynamically on both sides, and not as part of the protocol. This results in light and flexible data-exchanging protocols (like HTTP or ATOM) which are easier to work with and I think will win in the end. On the more theoretical level I like this model for intercommunication and of course there are Aliens, that model external system as a special object in our system.

So as far as I see - components simply require a dynamic environment, they may be statically checked inside, but act as "dynamic" to the outside world. Sort of hard skeleton and soft shell. Indeed soft parts are much easier to fit together and less breakable, due to flexibility - this is used often in mechanical engineering and in nature, so why not in software?

Thursday, December 11, 2008

How not to implement Comparable

I have already blogged about the danger of numeric overflows. I recently came across this example in Java course materials (!!!):

public class Foo implements Comparable<Foo> {
private int number;
public int compareTo(Foo o) {
if (this == o) return 0;
return number - o.number;
}
}
How do you think Integer.MAX_VALUE compares to negative numbers? It will appear smaller. This reminds me of even worse case we encountered in a real codebase. Look at this:
public class Foo implements Comparable<Foo> {
private long id;
//...
public int compareTo(Foo o) {
if (this == o) return 0;
return (int)(id - o.id);
}
public boolean equals(Object o) {
return o != null && (o instanceOf Foo) && compareTo((Foo)o) == 0;
}
}
How do you think new Foo(8325671243L) and new Foo(25505540427L) compare? They are equal, but I will do it ala Weiqi Gao and leave you to find out why... :-)

Static initializers - update

After some interesting comments to my previous write-up I decided to make a follow-up post with some additional details.

Here is the code from Effective Java:

public class Person {
private final Date birthDate;
//...
private static final Date BOOM_START;
private static final Date BOOM_END;

static {
Calendar gmtCal = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
BOOM_START = gmtCal.getTime();
gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
BOOM_END = gmtCal.getTime();
}

public boolean isBabyBoomer() {
return birthDate.compareTo(BOOM_START) >= 0 &&
birthDate.compareTo(BOOM_END) < 0;
}
}


and here's how I would have changed it:

public class BabyBoom {
private static final BabyBoom boom = new BabyBoom();
private Date start = null;
private Date end = null;

private BabyBoom() {}

public static BabyBoom getInstance() {
if (boom.start == null || boom.end == null) {
Calendar gmtCal = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
boom.start = gmtCal.getTime();
gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
boom.end = gmtCal.getTime();
}
return boom;
}

public boolean contains(Date birthDate) {
return birthDate.compareTo(start) >= 0 && birthDate.compareTo(end) < 0;
}
}

public class Person {
private final Date birthDate;
//...
public boolean isBabyBoomer() {
BabyBoom boom = BabyBoom.getInstance();
return boom.contains(birthDate);
}
}

I didn't synchronize getInstance, because initializing the dates twice does no harm, so it's not worth the price of synchronization. However I did check that both fields are initialized before returning the object in getInstance

Sunday, December 7, 2008

A case against static initializers

"Effective Java" is an excellent book. I recently bought the 2nd edition, and it is absolutely fabulous, priceless. However after quite some time in the industry, I've learnt not to take any advice blindly. First edition was also excellent, but several of the items were revisited since then. So here's an item that I have mixed feelings about - Avoid creating unnecessary objects. The advice is to use static initializers for the expensive computation. 

Static initializers are double-edged sword. It's like with the stock exchange in times of crisis - for a particular individual it may be a good idea to sell the stock, but the trouble is that everybody's doing it, and in the end everybody's losing big-time. Same applies to static initializers. One or two may seem harmless, but they add up and together create a big problem. The fact that static initializers are (edit) may be invoked at program start-up affects everybody and since they potentially interfere with classloading, it's very hard (edit) harder to debug them if anything goes wrong.

Here is how it usually gets out of hand: people start with initializing static members in static blocks. Map of values, sort of configuration details. That alone sounds harmless. But soon comes the time when the values in a map need to be read from a properties file, so here we got IO within static block. Uh oh, better catch these exceptions. Before you notice the whole thing turns into a puzzler. Don't believe me? Here's a real problem I had.

Server in Java, client is either applet or JNLP. On certain machines, only one of them can run. You run server first - client never comes up, no error whatsoever. You reboot and connect as client to another machine - no problem. But if you try to start up the server locally - silent death. A team in India spends months on it. In vain. The whole release is detained, escalation to senior management. The thing ends up on my desk after a ruthless blame game between teams. Long story short: it's a DirectX problem. Why the **** does the server need DirectX? Ah. What's next after reading defaults from a properties file? Reading them from the database. Oh, but it's a different process, and the database needs to be up. So we not just connect to database from initialization block, we wait for the database process to be up. Great idea. How? We follow "best coding practices" and reuse: find a poller utility somewhere in the JDK. Apparently there is a java.awt.Timer. Why not? Great idea. Apparently, a touch of one AWT class causes a bunch of other AWT classes to load, which in turn loads DirectX and OpenGL dlls. And guess what - Windows on some machines has a nasty bug, that only allows to load them once per machine, regardless of the user. And when another user tries to do it - the loading gets stuck. And our server is of course a system process, while the client belongs to the logged in user. 

Since it was a last minute fix, we solved it with some JVM flags that disabled DirectX and OpenGL. The problem was not the fix, but the diagnosis. If it was part of the regular code, it would have been easy to connect with a debugger, see what call gets stuck, investigate it from there. But as it was part of start-up, people didn't know where to look. Not to mention the man-months accumulatively spent by developers who waited for the server to restart when testing. 

So... what's the lesson here? Life is better without static, avoid it as much as you can. 

Java... tea?