Updated for republication in Mr Bool
In my experience, the most serious bugs in programs in production are in error handling routines. Inventive programmers often try fancy things when dealing with errors, but error situations are often omitted during testing. This article examines the fundamental questions of exceptions: What causes exceptions, and what can be done with them?
Bad User, Bad Server, or Bad Programmer
Practices of an Agile Developer puts exceptional events into three categories:
- The user has done something wrong, like entered a numeric value in a non-numeric field, tried to access data which he’s not authorized to see, entered a negative amount where a positive was required and so on. These types of errors are not exceptional at all but part of the normal course of events. Nevertheless, exceptions are often used to handle them, and sometimes, this is even a good idea. In Java
NumberFormatExceptionif given a non-numeric string and
new URL(String)throws an
InvalidURLExceptionif it is given something that can not be parsed as a URL. These events may be the result of user error.
- Sometimes, a program depends upon things outside its control. Network and external servers are examples of this, as is the file system. No matter how good my code is, I can’t stop people from knocking out the network cable. I call these kinds of exceptional situations system or resource exceptions.
- Lastly, try as I may, I never write perfectly correct programs. Occasionally, that pesky
NullPointerExceptionbrings my program down like a house of cards. In this situation, all bets are off. I know that the program contains a bug, but I cannot safely assume anything about the nature of the bug.
Not all exceptions are cut and dry in what category they fall in. For example, a
NumberFormatException is a configuration bug if it occurs while reading a configuration file, but a user error if it occurs while reading input data from a web form.
Dealing with Failure
The overall strategies for handling each of the exceptional situations above are very different. We should program to handle user errors and return pleasant and helpful error messages. However, when it comes to bugs, my recommendation is to get out of Dodge with as little fuzz as possible. The exact nature of the problem is by its very definition something we didn’t think about. A
NullPointerException should be handled at the topmost level, along with similar errors. The user should be informed that “we’re terribly sorry, but despite our best, honest efforts, we have messed up. We’ll try and fix it as soon as we can. For now, the best you can do it to try something slightly different and pray.” Make sure that you don’t corrupt data, however. All persistent data operations must be rolled back.
That leaves only one category in which to get creative, namely the resource exceptions. This is where we can get creative. Using an alternative means of communication, retrying the operation, or storing data for later manual processing are all … things people try.
Whatcha Gonna Do ‘Bout It?
Just like there are three general types of errors, there is a finite number of things that can be done when an error occurs:
- Deal: Sometimes, you know what caused the problem and you’re able to deal with it. For example, a method
NumberFormatExceptionand return false. This is the best approach, but sadly, it is seldom a real option. What is the correct way to “deal” with a database connection error? Or a syntax error in your SQL?
- Fall over: Stop what you’re doing, “call it a day” and make sure nothing else bad happens. In COBOL, this is call to ‘ABEND‘ (German for ‘evening’). In C, it’s a core dump. In Java EE, it’s rolling back the current transaction. No matter what you do, make sure that there are logs showing as much as possible about what happened. The beauty of falling over is that it’s easy to do (as a novice martial artist, I speak from experience). I personally think this is an underused strategy.
- Ignore and Continue: VB has the rather dubious language statement “On Error Resume Next”. This would be equivallent to language supported empty catch block in Java. The beauty of ignoring errors is that you generally don’t have any idea of what’s going to happen next. The most common thing I see in Java code is a
NullPointerExceptionsfollowing ignore-and-continue block. This is a very good way of making the life of whoever has to fix the problem a living hell.
- Rethrow: In Java, wrapping an exception in another exception is a pretty common approach. It’s sensible enough, but it’s not a sufficient strategy. It still leaves the job of actually dealing to someone else.
- Retry: If you’re trying to transmit data over an unrobust connection and the connection falls down, trying again can work. However, retry code is fairly difficult to write correctly and to test. Chances are that if you haven’t tested it, your retry code contains one or more bugs.
- Fallback: Similar to retrying. If at first you don’t succeed, try something else. Like retry code, this code is prone to poor test coverage and bugs.
No, not that one. “Logging”. A few quick tips about logging:
- Keep it simple: Complex logging code contains bugs. If your logging code ends up throwing a
NullPointerException, you lose. It is more common than you’d think. I have seen perfectly recoverable error situations escalate into fatal crashes due to errors in logging code.
- Don’t log and throw: If you expect someone else to do the real handling of the problem, leave the talking to them. Overly verbose logs make debugging harder.
Who Catches the Catchers
If you write a
catch block, you are generally mistaken. I have examined the things you can do with an exception, and as the observant reader may have noticed, I have pointed out that there are many things that can go wrong in a catch block, and few good things that can happen. A bug in the exception handling code can easily obscure the real problem and also cause more damage. Exception handling is harder to test, and much exception code goes into production without the code ever having been executed. I know this, because I have debugged systems with error handling code that had to fail every time it was executed.
Instead, focus on safe and simple logging at the top level of your application. Localized falling over, as it were. If you are writing a web application, your application framework will generally let you deal sensibly with uncaught exceptions. If you’re writing your own event-driven applications, make the event loop catch and log exceptions and continue with the next event.
Only when you discover a specific case that you are required to deal with should you write specific exception handling code. The simple reason for this strategy: If someone cares enough ask for specific error handling functionality, they will probably care enough to test it as well. Finally, never, ever try to treat exceptions from bugs as a normal situation. Make sure that you can get as much information as necessary, and get out of there. And then fix the bug.
As I am updating this article, Neil Gafter has just proposed making all exceptions unchecked in Java 7. Naturally, the debate over checked exceptions has flared up again. The main argument for checked exceptions is that they force programmers to take action. Considering what actions you can and should take, what is the impact of forcing the programmer to make a local decision on what to do?