Good concurrency changes the game

A long time before I seriously got into using distributed version control systems (DVCS) such as Bazaar and Git for developing software, it was already well known to me how the mechanics of these systems worked, and why people benefited from them. That said, it wasn’t until I indeed started to use DVCS tools that I understood how much my daily workflow around code bases would be changed and improved.

This weekend, while flying home from MongoSV, I could experience that same feeling in relation to first class concurrency support in programming languages. Everybody knows how the feature may be used, but I have the feeling that until one actually experiences it in practice, it’s very hard to really understand how much the relationship with ordering while developing software may be improved.

I was having some fun working on improvements to Goetveld. This package allows Go programs to communicate with Rietveld servers to manipulate code review entries. The Rietveld API is a bit rough in a few places, and as a result some features of the package actually parse an HTML form to extract some data, before sending it back. You may have done something similar before while attempting to script a web site that wasn’t originally intended to be.

The interesting fact here is that this is an intrinsically serial procedure: load a form, change it, and send it back, right? Well, not really. As one might intuitively expect, establishing an SSL session and its underlying TCP connection are not instantaneous operations.

To give an idea, here is part of a dump of an SSL connection being initiated (that is, no HTTP data was sent yet) to codereview.appspot.com, originated from my home location:

# tcpdump -ttttt -i wlan0 'host codereview.appspot.com and port 443'
(...)
00:00:00.000000 IP (...)
00:00:00.000063 IP (...)
00:00:00.000562 IP (...)
00:00:00.341627 IP (...)
00:00:00.357009 IP (...)
00:00:00.357118 IP (...)
00:00:00.360362 IP (...)
00:00:00.360550 IP (...)
00:00:00.366011 IP (...)
00:00:00.689446 IP (...)
00:00:00.727693 IP (...)

That’s more than half a second before the application layer was even touched. So, turns out that to save that roundtrip time, we can start both the form loading and the form sending requests at the same time. By the time the form loading ends, processing the data locally is extremely fast, and we can complete the sending side by just providing the request body.

At this time you may be thinking something like “Ugh, that’s too much trouble.. why bother?”, and that highlights precisely the point I’d like to make: it is too much trouble because most people are used to languages that turn it into too much trouble, but the issue is not inherently complex. In fact, this is the entire implementation of this logic in Go:

func (r *Rietveld) UpdateIssue(issue *Issue) error {
        op := &opInfo{r: r, issue: issue}
        errs := make(chan error)
        ch := make(chan map[string]string, 1)
        go func() {
                errs <- r.do(&editLoadHandler{op: op, form: ch})
                close(ch)
        }()
        go func() {
                errs <- r.do(&editHandler{op: op, form: ch})
        }()
        return firstError(2, errs)
}

I'm not cheating. The procedure was being done serially before, with very similar logic. Previously it had to take the form variable itself from the first request and manually provide it to the next one. Now, instead of providing the form, it's providing a channel that will be used to send the form across. One might even argue that the channel makes the algorithm more natural, curiously.

This is the kind of procedure that becomes fun and natural to write, after having first class concurrency at hand for some time. But, as in the case of DVCS, it takes a while to get used to the idea that concurrency and simplicity are not necessarily at opposing ends.

5 thoughts on “Good concurrency changes the game

  1. Antti Rasinen

    Hi!

    As a casual observer I must admit that the example does not make much sense to me. Even with the descriptions I could not follow the logic. May I ask for some clarification?

    First question: Why does one of the goroutines close ch? The other goroutine seems to use it as well. Doesn’t this cause a race condition?

    Second question: Where exactly do you begin the response? Is it even in the code snippet? Nothing seems to read from errs, except firstError. That function, by its name, does not suggest anything responssy.

    Can you perhaps provide the earlier version? That might help to clarify the data flow somewhat.

    Thank you in advance.

  2. Gustavo Niemeyer Post author

    Hi Antti,

    The first goroutine run will close ch before returning so that the second one necessarily unblocks if the first one fails early (in which case the form might never be delivered).

    The handler types (editHandler and editLoadHandler) have a method responsible for building up the request, and another one for processing the response.

    A serial version of this logic looks something like this:

    func (r *Rietveld) UpdateIssue(issue *Issue) error {
            op := &opInfo{r: r, issue: issue}
            form := make(map[string]string)
            err := r.do(&editLoadHandler{op: op, form: form})
            if err != nil {
                    return err
            }
            return r.do(&editHandler{op: op, form: form})
    }
    

    I hope the picture is more clear.

  3. Antti Rasinen

    Thank you! This was most enlightening.

    I must admit that I found the serial version easier to read and understand. I still live in the single-core mindset, I suppose.

    The goroutined version naturally does look very simple for a concurrent solution.

  4. Gustavo Niemeyer Post author

    Antti,

    It’s comparatively simpler indeed, and more usual for people not familiar with Go syntax and semantics. The point made is that the concurrent version is trivial as well, assuming familiarity with the syntax and semantics of the language.

  5. Elazar Leibovich

    Can’t you get a very similar code when using Java with Futures? Push all the futures into a list, and get the errors from them (if any). The code will look very similar to your Go code.

    I’m not saying that Go’s concurrency isn’t good, I’m just not sure your particular example shows it.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>