The Joys and Sorrows of Exceptions

August 13, 2006

Updated for republication in Mr Bool

In my experience, the most serious bugs in programs in production are in error handling routines. Inventive programmers often try fancy things when dealing with errors, but error situations are often omitted during testing. This article examines the fundamental questions of exceptions: What causes exceptions, and what can be done with them?

Bad User, Bad Server, or Bad Programmer

Practices of an Agile Developer puts exceptional events into three categories:

The user has done something wrong, like entered a numeric value in a non-numeric field, tried to access data which he’s not authorized to see, entered a negative amount where a positive was required and so on. These types of errors are not exceptional at all but part of the normal course of events. Nevertheless, exceptions are often used to handle them, and sometimes, this is even a good idea. In Java Integer.parseInt throws NumberFormatException if given a non-numeric string and new URL(String) throws an InvalidURLException if it is given something that can not be parsed as a URL. These events may be the result of user error.
Sometimes, a program depends upon things outside its control. Network and external servers are examples of this, as is the file system. No matter how good my code is, I can’t stop people from knocking out the network cable. I call these kinds of exceptional situations system or resource exceptions.
Lastly, try as I may, I never write perfectly correct programs. Occasionally, that pesky NullPointerException brings my program down like a house of cards. In this situation, all bets are off. I know that the program contains a bug, but I cannot safely assume anything about the nature of the bug.

Not all exceptions are cut and dry in what category they fall in. For example, a NumberFormatException is a configuration bug if it occurs while reading a configuration file, but a user error if it occurs while reading input data from a web form.

Dealing with Failure

The overall strategies for handling each of the exceptional situations above are very different. We should program to handle user errors and return pleasant and helpful error messages. However, when it comes to bugs, my recommendation is to get out of Dodge with as little fuzz as possible. The exact nature of the problem is by its very definition something we didn’t think about. A NullPointerException should be handled at the topmost level, along with similar errors. The user should be informed that “we’re terribly sorry, but despite our best, honest efforts, we have messed up. We’ll try and fix it as soon as we can. For now, the best you can do it to try something slightly different and pray.” Make sure that you don’t corrupt data, however. All persistent data operations must be rolled back.

That leaves only one category in which to get creative, namely the resource exceptions. This is where we can get creative. Using an alternative means of communication, retrying the operation, or storing data for later manual processing are all … things people try.

Whatcha Gonna Do ‘Bout It?

Just like there are three general types of errors, there is a finite number of things that can be done when an error occurs:

Deal: Sometimes, you know what caused the problem and you’re able to deal with it. For example, a method isInteger(String) may catch NumberFormatException and return false. This is the best approach, but sadly, it is seldom a real option. What is the correct way to “deal” with a database connection error? Or a syntax error in your SQL?
Fall over: Stop what you’re doing, “call it a day” and make sure nothing else bad happens. In COBOL, this is call to ‘ABEND’ (German for ’evening’). In C, it’s a core dump. In Java EE, it’s rolling back the current transaction. No matter what you do, make sure that there are logs showing as much as possible about what happened. The beauty of falling over is that it’s easy to do (as a novice martial artist, I speak from experience). I personally think this is an underused strategy.
Ignore and Continue: VB has the rather dubious language statement “On Error Resume Next”. This would be equivallent to language supported empty catch block in Java. The beauty of ignoring errors is that you generally don’t have any idea of what’s going to happen next. The most common thing I see in Java code is a NullPointerExceptions following ignore-and-continue block. This is a very good way of making the life of whoever has to fix the problem a living hell.
Rethrow: In Java, wrapping an exception in another exception is a pretty common approach. It’s sensible enough, but it’s not a sufficient strategy. It still leaves the job of actually dealing to someone else.
Retry: If you’re trying to transmit data over an unrobust connection and the connection falls down, trying again can work. However, retry code is fairly difficult to write correctly and to test. Chances are that if you haven’t tested it, your retry code contains one or more bugs.
Fallback: Similar to retrying. If at first you don’t succeed, try something else. Like retry code, this code is prone to poor test coverage and bugs.

The L-word

No, not that one. “Logging”. A few quick tips about logging:

Keep it simple: Complex logging code contains bugs. If your logging code ends up throwing a NullPointerException, you lose. It is more common than you’d think. I have seen perfectly recoverable error situations escalate into fatal crashes due to errors in logging code.
Don’t log and throw: If you expect someone else to do the real handling of the problem, leave the talking to them. Overly verbose logs make debugging harder.

Who Catches the Catchers

If you write a catch block, you are generally mistaken. I have examined the things you can do with an exception, and as the observant reader may have noticed, I have pointed out that there are many things that can go wrong in a catch block, and few good things that can happen. A bug in the exception handling code can easily obscure the real problem and also cause more damage. Exception handling is harder to test, and much exception code goes into production without the code ever having been executed. I know this, because I have debugged systems with error handling code that had to fail every time it was executed.

Instead, focus on safe and simple logging at the top level of your application. Localized falling over, as it were. If you are writing a web application, your application framework will generally let you deal sensibly with uncaught exceptions. If you’re writing your own event-driven applications, make the event loop catch and log exceptions and continue with the next event.

Only when you discover a specific case that you are required to deal with should you write specific exception handling code. The simple reason for this strategy: If someone cares enough ask for specific error handling functionality, they will probably care enough to test it as well. Finally, never, ever try to treat exceptions from bugs as a normal situation. Make sure that you can get as much information as necessary, and get out of there. And then fix the bug.

Post Scriptum

As I am updating this article, Neil Gafter has just proposed making all exceptions unchecked in Java 7. Naturally, the debate over checked exceptions has flared up again. The main argument for checked exceptions is that they force programmers to take action. Considering what actions you can and should take, what is the impact of forcing the programmer to make a local decision on what to do?

Comments:

Sergio Bossa - Aug 15, 2006

Hi Johannes,

thanks for your good write up about exception handling, IMHO a very hot argument.

However, what you miss is IMHO how to deal with business exceptions: a very important question that may deserve a whole post ;)

Probably they are those exceptions you say “you are required to deal with” … I’d like to know more from your opinion and experience.

Cheers!

Sergio B.

Johannes Brodwall - Aug 17, 2006

Hi Sergio,

Thanks for the comment. I agree with you that business exceptions deserve a whole post of its own. I would really appreciate feedback on what you’d like to read about in such a post.

There are two types of exception handling strategies that I could explore more fully: Things like retrying, failing to another node or mechanism and such strategies, mostly for resource exceptions. Then there is “what to do when you get an AccountOverDraftException, NumberFormatException, AuthorizationFailureException etc.” I assume it is the latter you are refering to?

~Johannes

Bjørn Bjerkeli - Aug 20, 2006

Good writeup Johannes, what could be interestning as well is to add some practical examples related to common mistakes.

I often encounter the issue programs failing to establish a proper context in programs that raises exceptions.

Establishing such a context is trivial, but often omitted rendering the code much more difficult to debug. consider the following example of an instance method in an Account class:

public void validateAcount() { if (isOverdrawn()) { throw new IllegalStateException(“Account is overdrawn, balance ["+getBalance()+”], limit ["+getLimit()+"]"); }

}

compared to the following version that has context (given a proper toString() implementation):

public void validateAcount() { if (isOverdrawn()) { throw new IllegalStateException("["+this+"] is overdrawn, balance ["+getBalance()+"], limit ["+getLimit()+"]"); }

}

[Casey Mullen] - Apr 20, 2007

I just read “Practices of an Agile Developer” as well, and the categories do sound useful… although I am not sure how to practically express the categorization of a given exception. If i’m the thrower, I could throw my own “categorized” exception, but what about exceptions emanating from some other library?

Does anyone know of any practical (open source?) project that has made practical, explicit use of these (or similar) categories? Or are these categories more theoretical than practical?

Johannes Brodwall - Apr 22, 2007

Good questions, Casey.

We have created subclasses of RuntimeException for SystemException and ApplicationException (should’ve been InvalidUsageException). Everything that is not classified is a “bug”. Out top-level interceptors which log, retry etc. use this to decide the log level and retry strategy.

I don’t think the categories I propose have been used as such in any open source project. However, I think one of the best things about the Spring Exceptions, is that they allow you to decide what sort of a problem it was.

DataAccessResourceFailureException tells you that the database or network is down, InvalidDataAccessUsageException tells you that the client code did something wrong, OptimisticLockFailureException is not actually an error at all (and my categories don’t include it). In JDBC, these, and more, were lumped together as SQLException.

I guess what I am trying to say, is that as far as I know, the categories I propose are more a description of why I think Spring has good exceptions than it is a description of what they’ve done.