Exceptional crashes

Last week I was part of a rant with a couple of coworkers around the fact Go handles errors for expected scenarios by returning an error value instead of using exceptions or a similar mechanism. This is a rather controversial topic because people have grown used to having errors out of their way via exceptions, and Go brings back an improved version of a well known pattern previously adopted by a number of languages — including C — where errors are communicated via return values. This means that errors are in the programmer’s face and have to be dealt with all the time. In addition, the controversy extends towards the fact that, in languages with exceptions, every unadorned error comes with a full traceback of what happened and where, which in some cases is convenient.

All this convenience has a cost, though, which is rather simple to summarize:

Exceptions teach developers to not care about errors.

A sad corollary is that this is relevant even if you are a brilliant developer, as you’ll be affected by the world around you being lenient towards error handling. The problem will show up in the libraries that you import, in the applications that are sitting in your desktop, and in the servers that back your data as well.

Raymond Chen described the issue back in 2004 as:

Writing correct code in the exception-throwing model is in a sense harder than in an error-code model, since anything can fail, and you have to be ready for it. In an error-code model, it’s obvious when you have to check for errors: When you get an error code. In an exception model, you just have to know that errors can occur anywhere.

In other words, in an error-code model, it is obvious when somebody failed to handle an error: They didn’t check the error code. But in an exception-throwing model, it is not obvious from looking at the code whether somebody handled the error, since the error is not explicit.
(…)
When you’re writing code, do you think about what the consequences of an exception would be if it were raised by each line of code? You have to do this if you intend to write correct code.

That’s exactly right. Every line that may raise an exception holds a hidden “else” branch for the error scenario that is very easy to forget about. Even if it sounds like a pointless repetitive task to be entering that error handling code, the exercise of writing it down forces developers to keep the alternative scenario in mind, and pretty often it doesn’t end up empty.

It isn’t the first time I write about that, and given the controversy that surrounds these claims, I generally try to find one or two examples that bring the issue home. So here is the best example I could find today, within the pty module of Python’s 3.3 standard library:

def spawn(argv, master_read=_read, stdin_read=_read):
    """Create a spawned process."""
    if type(argv) == type(''):
        argv = (argv,)
    pid, master_fd = fork()
    if pid == CHILD:
        os.execlp(argv[0], *argv)
    (...)

Every time someone calls this logic with an improper executable in argv there will be a new Python process lying around, uncollected, and unknown to the application, because execlp will fail, and the process just forked will be disregarded. It doesn’t matter if a client of that module catches that exception or not. It’s too late. The local duty wasn’t done. Of course, the bug is trivial to fix by adding a try/except within the spawn function itself. The problem, though, is that this logic looked fine for everybody that ever looked at that function since 1994 when Guido van Rossum first committed it!

Here is another interesting one:

$ make clean
Sorry, command-not-found has crashed! Please file a bug report at:
https://bugs.launchpad.net/command-not-found/+filebug
Please include the following information with the report:

command-not-found version: 0.3
Python version: 3.2.3 final 0
Distributor ID: Ubuntu
Description:    Ubuntu 13.04
Release:        13.04
Codename:       raring
Exception information:

unsupported locale setting
Traceback (most recent call last):
  File "/.../CommandNotFound/util.py", line 24, in crash_guard
    callback()
  File "/usr/lib/command-not-found", line 69, in main
    enable_i18n()
  File "/usr/lib/command-not-found", line 40, in enable_i18n
    locale.setlocale(locale.LC_ALL, '')
  File "/usr/lib/python3.2/locale.py", line 541, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

That’s a pretty harsh crash for the lack of locale data in a system-level application that is, ironically, supposed to tell users what packages to install when commands are missing. Note that at the top of the stack there’s a reference to crash_guard. This function has the intent of catching all exceptions right at the edge of the call stack, and displaying a detailed system specification and traceback to aid in fixing the problem.

Such “parachute catching” is a fairly common pattern in exception-oriented programming and tends to give developers the false sense of having good error handling within the application. Rather than actually guarding the application, though, it’s just a useful way to crash. The proper thing to have done in the case above would be to print a warning, if at all, and then let the program run as usual. This would have been achieved by simply wrapping that one line as in:

try:
    locale.setlocale(locale.LC_ALL, '')
except Exception as e:
    print("Cannot change locale:", e)

Clearly, it was easy to handle that one. The problem, again, is that it was very natural to not do it in the first place. In fact, it’s more than natural: it actually feels good to not be looking at the error path. It’s less code, more linear, and what’s left is the most desired outcome.

The consequence, unfortunately, is that we’re immersing ourselves in a world of brittle software and pretty whales. Although more verbose, the error result style builds the correct mindset: does that function or method have a possible error outcome? How is it being handled? Is that system-interacting function not returning an error? What is being done with the problem that, of course, can happen?

A surprising number of crashes and plain misbehavior is a result of such unconscious negligence.

This entry was posted in Architecture, Article, C/C++, Design, Go, Python, Snippet. Bookmark the permalink.

16 Responses to Exceptional crashes

  1. I wrote a little about why command-not-found is crashing on my blog http://blog.suxx.pl/2013/04/why-is-command-not-found-crashing.html

    You are right that at that case a simple try/catch would have prevented the crash. I think though, that you have missed the intent of that parachute catch. It is indeed when all else has failed and we are about to crash anyway. The screen that is being displayed is arguable more friendly than the raw python backtrace.

    The danger of handling exceptions (and I’m talking about particular functions failing) the way you show is that it might hide genuine problems. Here the problem is in fact in the ssh / pam layer, that allows the user to forward unsupported locale settings across their network connection.

    I agree that command-not-found should not make this a fatal error but it is a problem that is hard to anticipate up front. I think that a bit more friendly locale and i18n routines than what is offered by the standard library would help a lot here as there are a lot of other things that are broken about this.

    Best regards
    Zygmunt Krynicki

  2. niemeyer says:

    Thanks for your considerations, Zygmunt. A few thoughts:

    > (…) you have missed the intent of that parachute catch. It is indeed when all else has failed and we are about to crash anyway.

    I believe I was honest to that intent. It’s my opinion that it still gives a false sense of correctness, as described, but I’m not suggesting it’s a bad practice in the terms you put.

    > The danger of handling exceptions (and I’m talking about particular functions failing) the way you show is that it might hide genuine problems. (…)

    A warning, as shown, would let both you and the user know that something isn’t quite right. That’s handling, rather than hiding. setlocale would still be able to crash either way, though, in the case of actual catastrophic issues such as memory errors.

    > I agree that command-not-found should not make this a fatal error but it is a problem that is hard to anticipate up front. (…)

    We’re in agreement. The post addresses some of the reasons why it feels non-natural to handle these problems.

  3. Gavin Panella says:

    I agree that it’s easy to make a fragile Python program, and that can
    be frustrating. The ability to ignore exceptions can be falsely
    comforting.

    But it’s often an advantage too. By doing nothing except concentrating
    on solving my own problem, I know that my program will crash if it
    encounters an exceptional condition that I haven’t thought about. When
    writing tests I’ll encounter some of these situations and codify how
    to handle them. My program may well ship without considering every
    possible failure, but I’m happy with that because it’s still a useful
    piece of software, and on the whole I write software that I’d rather
    crashed hard than silently continue, possibly making a bad situation
    worse.

    Web applications are an example. It can be efficient to let unhandled
    exceptions cause the whole request to fail, rolling back transactions
    on the way out. The request might be retried by middleware, and
    succeed, or it will manifest as a bug. Given rapid enough development
    and deployment, it might well be a business advantage to do things
    this way, instead of spending a lot of time and money in advance
    trying to plug every hole.

    I would not take this approach to build a software component for an
    aeroplane or a piece of network infrastructure for example. But
    neither would I take Go’s approach:

    – The toolchain doesn’t complain when I forget to capture an error,
    unless I’m trying to capture only one thing from a multiple-value
    return. Even then it doesn’t complain when I send it to _, which is
    a pretty easy thing to do; far less verbose than explicitly ignoring
    an exception.

    – If a function definition changes from returning nothing to returning
    an error, my code will still compile and I will be ignorant of the
    error condition I should be checking. Tests won’t suddenly crash
    because an unhandled exception gets thrown, so the burden is firmly
    on my shoulders to be ever vigilent.

    – There’s no warning if I forget to check err != nil or somesuch if
    err is used elsewhere in the same function. The `if err := blah();
    err != nil` idiom helps with this, but it’s not required, and it
    gets hard to read when the function call takes many arguments.

    – Even though it’s bad style to panic across a package boundary, a
    panic could come from anywhere and crash my program or goroutine.
    There’s unchecked-exception-like behaviour in Go after all.

    Aside: I am one of those people who likes Java’s checked exceptions,
    because I was never tempted to simply catch Exception and throw it
    away. Despite Java’s many other shortcomings, this helps to make it
    attractive for developing network services, for example. It’s not a
    panacea, and Java has unchecked exceptions too, but they can be a
    useful tool.

  4. niemeyer says:

    > But it’s often an advantage too. By doing nothing except concentrating
    on solving my own problem, I know that my program will crash if it
    encounters an exceptional condition that I haven’t thought about.

    Precisely. That’s the practice it encourages at all times, and that is widely spread. Where you see an advantage I see an issue, though.

    > – The toolchain doesn’t complain when I forget to capture an error (…)
    > – There’s no warning if I forget to check err != nil or somesuch (…)

    As the post makes it clear, it’s all about the mindset encouraged by either style. Go will not force you to handle errors. People do that in practice, though, as you can see throughout the standard library and in real applications written. It’s very obvious when people leave errors unchecked as well, because the well established style and practice is to check them at all times and do something, even if it’s bubbling them up, so the misses standout.

    > – Even though it’s bad style to panic across a package boundary, a panic could come from anywhere and crash my program or goroutine. (…)

    Thankfully so. These are truly exceptional situations that do deserve a harsh take down: memory errors, bogus logic that would cause buffer overflows in other languages, etc. That’s all very different from trying to open a non-existent file, or asking to set a locale that doesn’t exist, though. These code paths are handled via exceptions in languages such as Python, and via return values in languages such as C and Go.

    > – If a function definition changes from returning nothing to returning an error, (…)

    That’s obviously a serious API breakage and an issue in any style.

    > Aside: I am one of those people who likes Java’s checked exceptions, (…)

    They certainly help, but it’s my impression that there are related issues there as well, due to the grouping of statements and the common practice of just declaring a general enough re-raise of exception that catches the whole body. I don’t have enough field experience with them, though, so I’ll concede on the doubt.

  5. Nate Finch says:

    As Gustavo says, it’s more about the standards that each style encourages. You can ignore errors in any language. However, by *requiring* error handling code to be near to the source of the error, error handling is generally much more specific and much more accurate in Go.

    Java’s checked exceptions are extremely rigid, which is both good and bad… good because you know that if a function is declared not to throw, it definitely won’t (except unchecked exceptions, which you should probably never catch anyway), and if it’s defined to throw only a few exceptions, you know exactly what it can throw. It’s also bad, because if you’re just letting exceptions fly back up the stack, if a function you call suddenly starts throwing a new type of exception, you now have to change the exceptions each method up the stack is defined to throw…. or do what most people do, and just define everything as throwing “exception”… which eliminates the benefit of checked exceptions entirely.

    Also, checked exceptions are only defined at the function level…. this often means that you have one try/catch wrapping an entire function, and that is almost never good enough to properly understand the context of an exception. And that’s the primary problem with exceptions – it encourages you to move the error handling far away from the source of the error, to a point where you’ve lost all context of the error, and often times *can’t* properly handle it because you don’t really know what went wrong.

  6. James Henstridge says:

    Most Go error handling I’ve seen and written often boils down to checking for the error condition, performing local cleanup and then passing the error back to the caller. While that certainly gets me thinking about the releasing resources at the local level, it isn’t clear I’ve done much more than letting an exception propagate, potentially with some try/finally cleanup.

    While I might be handling things correctly at the micro level everywhere, if an unwanted error propagates all the way up the call stack, it isn’t always obvious where it occurred (particularly if there are multiple locations where it could have come from).

    For example, in your command-not-found case you might decide that a failure to set the locale shouldn’t be considered an error. The traceback tells you exactly where to go to fix that. If all I had was the error message “locale.Error: unsupported locale setting”, I’d have to go searching. Admittedly this particular search would be pretty easy, but I’ve had many cases where the traceback was invaluable in tracking down the problem.

    It kind of feels like throwing the baby out with the bath water. If Go had a standard way of building a traceback to go with the errors I was passing up (and all the third party libraries I cared about used it), a lot of my misgivings would disappear.

  7. niemeyer says:

    Go has tracebacks for panics, but not for normal errors, unless you embed them yourself (see Stack in runtime/debug), you won’t get them, which indeed may make bugs a bit harder to debug in some cases, specially on poor error messages. The counterpoint, though, is that errors tend to be more descriptive of the issues that are actually going on. As a simple example:


    >>> try: re.compile(")(")
    ... except Exception as e: print e
    ...
    unbalanced parenthesis


    m, err := regexp.MatchString(")(", "foo")
    if err != nil {
        fmt.Println(err)
    }
    error parsing regexp: unexpected ): `)(`

  8. James Henstridge says:

    Right. I understand I can provide richer information in my Go errors: but that doesn’t help if none of the third party code I rely on does it too.

    It’d be a shame if the answer was to use panic everywhere because it is easier to debug :-)

    As far as the regexp error goes, the quality of the error message seems to differ package by package in both Go and Python. In your example, you’ve also thrown away the information that this is a regular expression error (i.e. the exception’s type is re.error), which would be apparent if it was left uncaught, or if you were using a more constrained “except” clause.

  9. niemeyer says:

    Whenever you call onto a third party package that lacks proper error reporting, it’s easy to fix that yourself by putting better context on what happened.

    As a bonus point, the *user* ends up knowing what actually happened as well, which is often not the case when we throw a traceback at them, let alone if the error message is bad such as the regular expression example.

    I agree, though. It’s not a clear win on either side.

  10. Pingback: Exceptional crashes | thoughts...

  11. oelewapperke says:

    The elephant in the room here is that of course Go doesn’t require correct error handling. Ignoring return codes is perfectly valid in Go.

    For a test for how your own error handling is, take your nearest Go program code that opens a socket. Any socket. Counter the number of different error handling paths, do you switch between v4 and v6 (and back in the case of a v6 lookup returning a mapped v4 address ?). Count the number of paths through that code. Is it less than 8 ? Sorry to tell you, but you don’t handle errors correctly. When opening a listening socket, you are properly switching it to a dual-stack socket in all cases, right ?

    (if you’re interested: read why : http://long.ccaba.upc.edu/long/045Guidelines/eva/ipv6.html#guidelines )

    Go doesn’t improve bad programmers who don’t have time to consider all the options. That’s impossible, it’s just partly masked by the fact that mostly only good programmers currently know Go. That’s either going to change (Go succeeds), or not (Go fails to gain traction).

    Then the Go error handling will merely have switched everyone’s code from “on error, abort in a recoverable manner”, to “on error continue”.

    I’m making the same argument as many have done before:

    close_nuclear_bunker()
    set_off_the_bomb()

    In most cases, recoverable abort (ie. exceptions) is the correct course of action, and this all just sounds like an excuse to bring back a C-ism (because I don’t find the “programmers will consider errors if we make them put them in a variable !” convincing, and I’m pretty sure people like the Go authors don’t find it convincing either). Also, it just happens to be the simple solution (because for a compiler the situation is reversed, C/Go-style error handling is much, much easier to compile than exceptions, for obvious reasons. Both languages, of course, found that they did have to implement at least 2 exception types, both introduced that later on. Every line of Go code with very few exceptions can lead to OOM error and there’s things like divide by zero and such).

    C is famous for having extremely brittle programs, and rightfully so. Bringing it’s error-handling style back is not a good thing. C is also famous for crashing in unpredictable ways with horrendously unclear error messages. And everybody who’s had a Go program with 10 goroutines crash on them knows that Go shares this characteristic too.

    It is a matter of time till Go shares C’s reputation. Or it could fail entirely I guess. I’d prefer it to succeed.

    Until then, can we please stop claiming that making programs continue on encountering error conditions somehow improves programmers ? Because it’s ridiculous.

  12. niemeyer says:

    > I’m making the same argument as many have done before:
    >
    > close_nuclear_bunker()
    > set_off_the_bomb()

    That’s very obviously broken code if you’re using an error result, and any code review would catch it even if the programmers (which should be fired in this specific case) left it off. Now, as described in the post, the curious thing is that I cannot tell if this is correct or not if it’s using exceptions. If I get an exception out of set_off_the_bomb, was the error handled internally to prevent collateral damage, or was it overlooked as the examples in the post demonstrate? The common practice of fixing lack of error handling once an exception shows would go pretty poorly.

    All of this goes back to the points in the post: the advantage isn’t that Go forces you to do anything. Bad programmers can do bad programming on any language and any error handling style. The point is that the conventions established encourage keeping the error path in mind, and that does make a great deal of a difference in practice when people are actually trying to do a good job, which is my reality.

    > C is also famous for crashing in unpredictable ways with horrendously unclear error messages. And everybody who’s had a Go program with 10 goroutines crash on them knows that Go shares this characteristic too.

    Go actually shows a very clean traceback for all the 10 goroutines when it does so, in a much better fashion than any concurrent programming I’ve done before that I can remember.

    > Also, it just happens to be the simple solution (…) C/Go-style error handling is much, much easier to compile than exceptions. Every line of Go code with very few exceptions can lead to OOM error and there’s things like divide by zero and such).

    These are truly exceptional situations and are handled as panics which work similarly to exceptions, and can be recovered from, although that’s of course rarely effective for cases such as OOM. This also kills the “they did it like that because it’s simple” argument. It does have an exception-like mechanism, and it’s consciously not used for non-exceptional things for good reasons.

    > Until then, can we please stop claiming that making programs continue on encountering error conditions somehow improves programmers ? Because it’s ridiculous.

    This is ridiculous indeed, and you’re the first one to mention it.

  13. Anoop says:

    Good post.

    Having used C++/Java/Python for networking software (shipped as appliances) and cloud hosted services, what I have felt is that for small scripts where special error handling is not required the “parachute exception” is ok.

    But for any serious long running software GO’s error handling seems to be the right one though it may appear slightly verbose. It is really good that the GO apis clearly tells if a particular function will fail or not and encourages to handle it. Many times I miss this in python and have the experience of process crashing(and forced to do blind try:except: blocks at top level). Temporary -ve cases are pretty common in most networking software and will need to be handled at right place.The reason why appliance based software is stressed in the comment is because applying a fix to a 100 customers is not an easy and pleasant task. So an error handling paradigm (like GO) which forces programmer to think/handle at the place where it happens is the right choice.

  14. Peter Recore says:

    This is not strictly about Exceptions vs error codes. Not all languages allow you to ignore exceptions like Python does. In some, (like Java) your code won’t compile if you don’t handle the appropriate exceptions. In the example:
    > close_nuclear_bunker()
    > set_off_the_bomb()
    If you didn’t catch “DoorJammedException” somewhere, your code won’t compile.

  15. niemeyer says:

    > In some, (like Java) your code won’t compile if you don’t handle the appropriate exceptions.

    Please see comments above.

  16. Evan Jones says:

    I think like many software engineering issues, there are (at least) two sides here, with fundamental tradeoffs. Error handling is one of the “hard problems” in software.

    My (admittedly limited) personal experience with writing some Go programs is that I *want* to crash on the first error by default, since I almost never know what the right thing to do is, at least not the first time I’m writing something. I’ve written my own “assert-like” function that panics to make this easier to implement (asserts are a different debate).

    However, I also agree that to make something more robust and reliable, you want to handle errors correctly. I do think that making recoverable errors more explicitly visible is good, and that Go’s mechanism seems to result it slightly less pain than checked exceptions.

    I do wish there was an easier way for me to get the “crash on error” behavior I want though. I also agree with James Henstridge: when you have “chained” errors, having a stack trace would make it easier to track down the ultimate source of the error.

Leave a Reply

Your email address will not be published. Required fields are marked *