Thursday, November 1, 2007

Beware: Overflows in numeric operations

Well, isn't it amazing how huge can be the problems caused by a simple little code piece? Have a look at Java Puzzlers - there are plenty of examples. What is important to understand is that it's no fiction nor Neal and Josh perverted minds. It's out there in the software we use and the one we produce. Want proof? Nearly All Binary Searches and Mergesorts are Broken.

Here is the story of what happened to me. The code was written by a colleague, and at some point became my responsibility. I didn't notice anything wrong when first reviewing it, but later on I had to fix the problems. Now have a look:


public class Node implements Comparable {
private long _id;

public Node() {
_id = IdGenerator.getNewId();
}
public Node(long id) {
_id = id;
}

public boolean equals(Object obj) {
return compareTo(obj) == 0;
}

public int hashCode() {
return (int)(_id ^ (_id >> 32));
}

public int compareTo(Object obj) {
if (obj == null) {
return -1;
}
if (obj instanceof Node) {
Node tmp = (Node) obj;
return (int)(_id - tmp._id);
}
return -1;
}
}

Looks innocent, right? Why would then new Node(12884902889L).equals(new Node(4294968297L)) be true?! Ah, because 12884902889-4294968297=8589934592=2^33 and when cast to int, 2^33 is of course 0. Bellisimo!

But hey, here's one even worse. IdGenerator generates long Ids from 2 integers - persistent high and in-memory low, low being incremented all the time and high only when MAX_VALUE of ids have been given out or when the process restarts. So far so good. Look at this function then:

public synchronized long getNewAndReserve(int interval)
throws IdGeneratorException
{
long value;
if (low + interval >= MAX_VALUE) {
high = getNewHighValue();
low = 0;
}
value = getNewId(high, low);
low += interval;
return value;
}

MAX_VALUE is BTW Integer.MAX_VALUE. Now think puzzlers... YES! a sum of 2 integers is an integer, it can NEVER exceed a MAX_VALUE, it overflows to a negative number! The fix for this is to change the condition to (low >= MAX_VALUE - interval) and to add an assertion that (interval > 0).

All this makes you think whether high-level programming language like Java should help you avoid this kind of problems...? But that's worth of another post sometime. In the meantime, look after your numbers, folks.

No comments: