The forgotten art of error checking

I was just rambling randomly yesterday, in the usual microblogging platforms, about how result checking seems to be ignored or done badly. The precise wording was:

It’s really amazing how little attention error handling receives in most software development. Even *tutorials* often ignore it.

It indeed does amaze me. It sometimes feels like we write code for theoretical perfect worlds.. “If the processor executes exactly in this order, and the weather is calm, this program will work.”. There are countless examples of bad assumptions.. someday I will come with some statistics of the form “Every N seconds someone forgets to check the result of write().”.

If you are a teacher, or a developer that enjoys writing snippets of code to teach people, please join me in the quest of building a better future. Do not tell us that you’re “avoiding result checking for terseness”, because that’s exactly what we people will do (terseness is good, right?). On the contrary, take this chance to make us feel bad about avoiding result checking. You might do this by putting a comment like “If you don’t do this, you’re a bad programmer.” right next to the logic which is handling the result, and might take this chance to teach people how proper result handling is done.

Of course, there’s another forgotten art related to result checking. It sits on the other side of the fence. If you are a library author, do think through about how you plan to make us check conditions which happen inside your library, and try to imagine how to make our lives easier. If we suck at handling results when there are obvious ways to handle it, you can imagine what happens when you structure your result logic badly.

Here is a clear example of what not to do, coming straight from Python’s standard library, in the imaplib module:

    def login(self, user, password):
        typ, dat = self._simple_command('LOGIN', user, self._quote(password))
        if typ != 'OK':
            raise self.error(dat[-1])
        self.state = 'AUTH'
        return typ, dat

You see the problem there? How do you handle errors from this library? Should we catch the exception, or should we verify the result code? “Both!” is the right answer, unfortunately, because the author decided to do us a little favor and check the error condition himself in some arbitrary cases and raise the error, while letting it go through and end up in the result code in a selection of other arbitrary cases.

I may provide some additional advice on result handling in the future, but for now I’ll conclude with the following suggestion: please check the results from your actions, and help others to check theirs. That’s a good life-encompassing recommendation, actually.

10 Responses to The forgotten art of error checking

rbp says:

2010/06/17 at 12:25

But it’s not really “some arbitrary case”, is it? I don’t know what _simple_command might return, but it seems like typ can either be ‘OK’ or, supposedly, something that’s not Ok. If it’s not Ok, the login mehtod raises an error. So we only need to catch the exception. If no exception is raised, “typ” is necessarily ‘OK’.

Otherwise, +1 :)
Gustavo Niemeyer says:

2010/06/17 at 12:56

Yes, it is.

If all the library did was raise an error when the code was not “OK”, then it’s pretty clear that you don’t need to return “OK” at any time.

If you read the rest of the library, you’ll see that it uses the result form (code, value) pretty much everywhere, and in some cases it will raise errors on certain results. E.g. ‘NO’ in a ‘FETCH’ command is an error according to the RFC, but it doesn’t raise one in the library.

There’s no other choice but to go look at the code to see which results come to me through the result code, and which ones come through an exception.
rbp says:

2010/06/17 at 13:16

I see what you mean. Ugh…
Martin Pool says:

2010/06/17 at 22:55

That’s quite true.

There’s an even more interesting story once you start raising an exception: if you expect the exception to normally propagate all the way to the top level and stop the program, you’re fine. If you expect people to catch it and retry it’s quite hard to be sure that all the normal invariants of your program still hold if it’s been arbitrarily unwound from some point.

In this example we don’t really know what state the socket is in: can we send another command, or did something interrupt us in the middle of sending the first command?

I think the
Chris Cheney says:

2010/06/18 at 12:45

I also wrote a blog entry related to this issue on June 7.

At a previous place that I worked on proprietary software we not only had to check for all error conditions and handle them in a sane manner, if possible, but we also had to document all functions for what they were intended to do and all of their return codes.

These are both areas that the vast majority of open source code fail to address, and claims of RTSL are really not sufficient in cases where there are bugs.
Gustavo Niemeyer says:

2010/06/18 at 12:50

@Martin

Indeed, error recovery is another very interesting topic. I find worth learning the ideas for error recover explored within the Erlang practices (supervisors, etc).

@Chris

The only thing I’d add is that I don’t think this is specific to open source software. Of course, with open source we can actually *see* that happening. :-)

Would you like to link to your blog post?
Chris Cheney says:

2010/06/18 at 12:52

With respect to my last comment, what spurred me to write my original blog post on June 7 was the fact that some important server code, which will remain nameless, has some major problems due to not checking for errors and properly handling them. To compound the problem the external program that it calls does not really return useful error codes to begin with and instead just has one catch all error code that is used even for transient locking issues.
Chris Cheney says:

2010/06/18 at 12:55

Gustavo Niemeyer,

http://chrischeney.wordpress.com/2010/06/07/error-handling/

It was made intentionally vague to avoid insulting the project, at the time of writing I was highly annoyed. :-)

And I agree with you, I am sure there are plenty of proprietary projects that are as equally guilty of these issues and the transparency of floss does make it much more obvious which projects have major problems in these areas.
Marc Tardif says:

2010/06/18 at 17:46

@Martin

Writing exception safe code is actually more complicated than it seems and I found that Herb Sutter gives pretty good insight into the problem in his book: Exceptional C++. If I recall correctly, he gives a simple example of a class implementing a stack which actually contains more pitfalls than would be expected. Fortunately, some programming languages alleviate the complexity of managing memory when throwing or raising exceptions but, as you pointed out, other factors such as internal state must also be considered carefully.
Adenilson Cavalcanti says:

2010/06/19 at 00:25

Gustavo

Excellent post, it really is a shame how examples are written assuming that the sky is blue and nothing is going to crash ever.

As I wrote in libgcal website (2 years ago), just before the examples of code usage: “Bellow you can see some examples of libgcal usage (they could have less lines, but checking for function return values is always a good programming practice).”

Best regards

Adenilson

The forgotten art of error checking

10 Responses to The forgotten art of error checking

Leave a Reply Cancel reply

Recent Posts

Categories

Archives

Meta