Death of goroutines under control

Certainly one of the reasons why many people are attracted to the Go language is its first-class concurrency aspects. Features like communication channels, lightweight processes (goroutines), and proper scheduling of these are not only native to the language but are integrated in a tasteful manner.

If you stay around listening to community conversations for a few days there’s a good chance you’ll hear someone proudly mentioning the tenet:

Do not communicate by sharing memory; instead, share memory by communicating.

There is a blog post on the topic, and also a code walk covering it.

That model is very sensible, and being able to approach problems this way makes a significant difference when designing algorithms, but that’s not exactly news. What I address in this post is an open aspect we have today in Go related to this design: the termination of background activity.

As an example, let’s build a purposefully simplistic goroutine that sends lines across a channel:

type LineReader struct {
        Ch chan string
        r  *bufio.Reader
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &LineReader{
                Ch: make(chan string),
                r:  bufio.NewReader(r),
        }
        go lr.loop()
        return lr
}

The type has a channel where the client can consume lines from, and an internal buffer
used to produce the lines efficiently. Then, we have a function that creates an initialized
reader, fires the reading loop, and returns. Nothing surprising there.

Now, let’s look at the loop itself:

func (lr *LineReader) loop() {
        for {
                line, err := lr.r.ReadSlice('\n')
                if err != nil {
                        close(lr.Ch)
                        return
                }
                lr.Ch <- string(line)
        }
}

In the loop we’ll grab a line from the buffer, close the channel in case of errors and stop, or otherwise send the line to the other side, perhaps blocking while the other side is busy with other activities. Should sound sane and familiar to Go developers.

There are two details related to the termination of this logic, though: first, the error information is being dropped, and then there’s no way to interrupt the procedure from outside in a clean way. The error might be easily logged, of course, but what if we wanted to store it in a database, or send it over the wire, or even handle it taking in account its nature? Stopping cleanly is also a valuable feature in many circumstances, like when one is driving the logic from a test runner.

The claim is not that this is something difficult to do, but rather that there isn’t today an idiom for handling these aspects in a simple and consistent way. Or maybe there wasn’t. The tomb package for Go is an experiment being released today in an attempt to address this problem.

The model is simple: a Tomb tracks whether one or more goroutines are alive, dying, or dead, and the death reason.

To understand that model, let’s see the concept being applied to the LineReader example. As a first step, creation is tweaked to introduce Tomb support:

type LineReader struct {
        Ch chan string
        r  *bufio.Reader
        t  tomb.Tomb
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &LineReader{
                Ch: make(chan string),
                r:  bufio.NewReader(r),
        }
        lr.t.Go(lr.loop)
        return lr
}

Looks very similar. Just a new field in the struct, and the goroutine creation is now delegated to the tomb.

Next, the loop function is modified to support tracking of errors and interruptions:

func (lr *LineReader) loop() error {
        for {
                line, err := lr.r.ReadSlice('n')
                if err != nil {
                        close(lr.Ch)
                        return err
                }
                select {
                case lr.Ch <- string(line):
                case <-lr.t.Dying():
                        close(lr.Ch)
                        return nil
                }
        }
}

Note a few interesting points here: first, there’s now an error result as conventional to any Go function or method that can fail. Then, the previously loose error is now returned, flagging the reason for the goroutine termination. Finally, the channel send was tweaked so that it doesn’t block in case the goroutine is dying for whatever reason.

A Tomb has both Dying and Dead channels returned by the respective methods, which are closed when the Tomb state changes accordingly. These channels enable explicit blocking until the state changes, and also to selectively unblock select statements in those cases, as done above.

With the loop modified as above, a Stop method can trivially be introduced to request the clean termination of the goroutine synchronously from outside:

func (lr *LineReader) Stop() error {
        lr.t.Kill(nil)
        return lr.t.Wait()
}

In this case the Kill method will put the tomb in a dying state from outside the running goroutine, and Wait will block until the goroutine terminates itself by returning. This procedure behaves correctly even if the goroutine was already dead or in a dying state due to internal errors, because only the first call to Kill with an actual error is recorded as the cause for the goroutine death. The nil value provided to t.Kill is used as a reason when terminating cleanly without an actual error, and it causes Wait to return nil once the goroutine terminates, flagging a clean stop per common Go idioms.

This is pretty much all that there is to it. When I started developing in Go I wondered if coming up with a good convention for this sort of problem would require more support from the language, such as some kind of goroutine state tracking in a similar way to what Erlang does with its lightweight processes, but it turns out this is mostly a matter of organizing the workflow with existing building blocks.

The tomb package and its Tomb type are a tangible representation of a good convention for goroutine termination, inspired in existing idioms. If you want to make use of it, go get the package with:

$ go get gopkg.in/tomb.v2

The source code and API documentation with more usage details is available at the same URL.

Have fun!

UPDATE 1: there was a minor simplification in the API since this post was originally written, and the post was changed accordingly.

UPDATE 2: there was a second simplification in the API, and the post was changed accordingly.

UPDATE 3: there was a third improvement in the API, and the post was once again changed to serve as reference.

This entry was posted in Architecture, Design, Erlang, Go, Project, Snippet, Test. Bookmark the permalink.

8 Responses to Death of goroutines under control

  1. Alex Plugaru says:

    Gustavo thanks for this package, I wondered how to track the state of goroutines myself and stumbled upon a gonuts post that explained how to do it with channels, but this is better.

  2. Sindre Myren says:

    Nice I guess. But can this not also be done by WaitGroups (Found in package sync)?
    http://golang.org/pkg/sync/

  3. Pavel Korotkov says:

    Gustavo, you do really great job! All the packages you share with community are top notch and extremely useful in practice. The idea to idiomatically shape a transparent goroutine life-cycle control is timely and important to new Go programmers who only become a skilled hand at Go designing and applying Go patterns. Just make a couple of casual remarks on the post.
    1. Why did you exported Ch channel in the example? I’d hide it from the consumer to prevent unwanted closings from outside.
    2. I’d also replace Fatal(…) and Fatalf(…) methods with a single *status*-prominent one:
    // NB: At least either note or reason must be given by a caller
    SetDying(note string, reason os.Error)
    which I believe looks more reasonable in code.

    Best regards,
    PK

  4. Sindre, I was also involved in the implementation of WaitGroup. If you read both this post and the WaitGroup documentation you’ll notice why they’re different.

    Pavel, thanks for the kind words. About (1), as the post explains the example was purposefully simplistic. Regarding (2), Fatal/Fatalf still feels cleaner.

  5. Hi Gustavo,

    inspired by the Erlang supervisors I added something almost similar to my Tideland Common Go Library. Take a look at http://code.google.com/p/tideland-cgl/source/browse/cglsup.go. But my solution only handles errors in goroutines. And an external Supervisor monitors any instance implementing the Recoverable interface.

    I’ll take a deeper look into your solution.

    Warm regards,
    mue

  6. Mikhail says:

    The loop function after tweaking for Tomb support can benefit from moving close(lr.Ch) to defer statement:

    defer lr.t.Done()
    defer close(lr.Ch)

  7. niemeyer says:

    Mikhail,

    Indeed, and that’s how I generally use it in those cases too. I’ve intentionally avoided it in the explanation because it makes the evolution of the description simpler, allowing better focus on the problem being solved.

  8. Michael Meier says:

    Hey Gustavo.

    Thank you for this very useful and lucidly designed gem of a library.

    I use it regularly to manage the lifecycle of goroutines. When doing non-challenge-response communication, I often spawn two goroutines per connection. One is for writing to, the other for reading from the connection. Managing them, their errors and how alive exactly they are often became a very complex task. With tomb such management logic is concise, revealing what I wanted to do in the first place.

    Cheers,
    Michael

Leave a Reply

Your email address will not be published. Required fields are marked *