Certainly one of the reasons why many people are attracted to the Go language is its first-class concurrency aspects. Features like communication channels, lightweight processes (goroutines), and proper scheduling of these are not only native to the language but are integrated in a tasteful manner.
If you stay around listening to community conversations for a few days there’s a good chance you’ll hear someone proudly mentioning the tenet:
Do not communicate by sharing memory; instead, share memory by communicating.
There is a blog post on the topic, and also a code walk covering it.
That model is very sensible, and being able to approach problems this way makes a significant difference when designing algorithms, but that’s not exactly news. What I address in this post is an open aspect we have today in Go related to this design: the termination of background activity.
As an example, let’s build a purposefully simplistic goroutine that sends lines across a channel:
type LineReader struct {
Ch chan string
r *bufio.Reader
}
func NewLineReader(r io.Reader) *LineReader {
lr := &LineReader{make(chan string), bufio.NewReader(r)}
go lr.loop()
return lr
}
The type has a channel where the client can consume lines from, and an internal buffer
used to produce the lines efficiently. Then, we have a function that creates an initialized
reader, fires the reading loop, and returns. Nothing surprising there.
Now, let’s look at the loop itself:
func (lr *LineReader) loop() {
for {
line, err := lr.r.ReadSlice('\n')
if err != nil {
close(lr.Ch)
return
}
lr.Ch <- string(line)
}
}
In the loop we'll grab a line from the buffer, close the channel in case of errors and stop, or otherwise send the line to the other side, perhaps blocking while the other side is busy with other activities. Should sound sane and familiar to Go developers.
There are two details related to the termination of this logic, though: first, the error information is being dropped, and then there's no way to interrupt the procedure from outside in a clean way. The error might be easily logged, of course, but what if we wanted to store it in a database, or send it over the wire, or even handle it taking in account its nature? Stopping cleanly is also a valuable feature in many circumstances, like when one is driving the logic from a test runner.
I'm not claiming this is something difficult to do, by any means. What I'm saying is that there isn't today an idiom for handling these aspects in a simple and consistent way. Or maybe there wasn't. The tomb package for Go is an experiment I'm releasing today in an attempt to address this problem.
The model is simple: a Tomb tracks whether the goroutine is alive, dying, or dead, and the death reason.
To understand that model, let's see the concept being applied to the LineReader example. As a first step, creation is tweaked to introduce Tomb support:
type LineReader struct {
Ch chan string
r *bufio.Reader
*tomb.Tomb
}
func NewLineReader(r io.Reader) *LineReader {
lr := &LineReader{
make(chan string),
bufio.NewReader(r),
tomb.New(),
}
go lr.loop()
return lr
}
Looks very similar. Just a new field in the struct and its respective initialization. We've used it as an embedded field just so we can use the Tomb methods directly in the lr variable.
Next, the loop function is modified to support tracking of errors and interruptions:
func (lr *LineReader) loop() {
defer lr.Done()
for {
line, err := lr.r.ReadSlice('\n')
if err != nil {
close(lr.Ch)
lr.Fatal(err)
return
}
select {
case lr.Ch <- string(line):
case <-lr.Dying:
close(lr.Ch)
return
}
}
}
Note a few interesting points here: first, Done is called to track the goroutine termination right before the loop function returns. Then, the previously loose error now goes into the Fatal Tomb method, flagging the goroutine as dying. Finally, the channel send was tweaked so that it doesn't block in case the goroutine is dying for whatever reason.
A Tomb has both Dying and Dead channels, which are closed when the Tomb state changes accordingly. These channels enable explicit blocking until the state changes, and also to selectively unblock select statements in those cases, as done above.
With the loop modified as above, a Stop method can trivially be introduced to request the clean termination of the goroutine synchronously from outside:
func (lr *LineReader) Stop() os.Error {
lr.Fatal(tomb.Stop)
return lr.Wait()
}
In this case the Fatal method will put the goroutine in a dying state from outside, and Wait will block until the goroutine terminates itself and notifies via the Done method as seen before. This procedure behaves correctly even if the goroutine was already dead or in a dying state due to internal errors, because only the first call to Fatal with an actual error is recorded as the cause for the goroutine death. The tomb.Stop value is used as a reason when terminating cleanly without an actual error, and it causes Wait to return nil once the goroutine terminates, flagging a clean stop per common Go idioms.
(UPDATE: there was a minor simplification in the API since this post was originally written, and the paragraph above was adapted to cover the new API)
This is pretty much all that there is to it. When I started developing in Go I wondered if coming up with a good convention for this sort of problem would require more support from the language, such as some kind of goroutine state tracking in a similar way to what Erlang does with its lightweight processes, but it turns out this is mostly a matter of organizing the workflow with existing building blocks.
The tomb package and its Tomb type are a tangible representation of a good convention for goroutine termination, with familiar method names inspired in existing idioms. If you want to make use of it, goinstall the package with:
$ goinstall launchpad.net/tomb
The API documentation with details is available at:
Have fun!
Gustavo thanks for this package, I wondered how to track the state of goroutines myself and stumbled upon a gonuts post that explained how to do it with channels, but this is better.
Nice I guess. But can this not also be done by WaitGroups (Found in package sync)?
http://golang.org/pkg/sync/
Gustavo, you do really great job! All the packages you share with community are top notch and extremely useful in practice. The idea to idiomatically shape a transparent goroutine life-cycle control is timely and important to new Go programmers who only become a skilled hand at Go designing and applying Go patterns. Just make a couple of casual remarks on the post.
1. Why did you exported Ch channel in the example? I’d hide it from the consumer to prevent unwanted closings from outside.
2. I’d also replace Fatal(…) and Fatalf(…) methods with a single *status*-prominent one:
// NB: At least either note or reason must be given by a caller
SetDying(note string, reason os.Error)
which I believe looks more reasonable in code.
Best regards,
PK
Sindre, I was also involved in the implementation of WaitGroup. If you read both this post and the WaitGroup documentation you’ll notice why they’re different.
Pavel, thanks for the kind words. About (1), as the post explains the example was purposefully simplistic. Regarding (2), Fatal/Fatalf still feels cleaner.
Hi Gustavo,
inspired by the Erlang supervisors I added something almost similar to my Tideland Common Go Library. Take a look at http://code.google.com/p/tideland-cgl/source/browse/cglsup.go. But my solution only handles errors in goroutines. And an external Supervisor monitors any instance implementing the Recoverable interface.
I’ll take a deeper look into your solution.
Warm regards,
mue