<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Labix Blog &#187; Architecture</title>
	<atom:link href="http://blog.labix.org/tag/architecture/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.labix.org</link>
	<description>by Gustavo Niemeyer</description>
	<lastBuildDate>Mon, 16 Jan 2012 04:02:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Good concurrency changes the game</title>
		<link>http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game</link>
		<comments>http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game#comments</comments>
		<pubDate>Mon, 12 Dec 2011 17:52:47 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=886</guid>
		<description><![CDATA[A long time before I seriously got into using distributed version control systems (DVCS) such as Bazaar and Git for developing software, it was already well known to me how the mechanics of these systems worked, and why people benefited &#8230; <a href="http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A long time before I seriously got into using distributed version control systems (DVCS) such as Bazaar and Git for developing software, it was already well known to me how the mechanics of these systems worked, and why people benefited from them. That said, it wasn&#8217;t until I indeed started to use DVCS tools that I understood how much my daily workflow around code bases would be changed and improved.</p>
<p><span id="more-886"></span>This weekend, while flying home from <a href="http://www.10gen.com/events/mongosv-2011">MongoSV</a>, I could experience that same feeling in relation to first class concurrency support in programming languages. Everybody knows how the feature may be used, but I have the feeling that until one actually experiences it in practice, it&#8217;s very hard to really understand how much the relationship with ordering while developing software may be improved.</p>
<p>I was having some fun working on improvements to <a href="https://wiki.ubuntu.com/goetveld">Goetveld</a>. This package allows <a href="http://golang.org">Go</a> programs to communicate with <a href="https://codereview.appspot.com">Rietveld</a> servers to manipulate code review entries. The Rietveld API is a bit rough in a few places, and as a result some features of the package actually parse an HTML form to extract some data, before sending it back. You may have done something similar before while attempting to script a web site that wasn&#8217;t originally intended to be.</p>
<p>The interesting fact here is that this is an intrinsically serial procedure: load a form, change it, and send it back, right? Well, not really. As one might intuitively expect, establishing an SSL session and its underlying TCP connection are not instantaneous operations.</p>
<p>To give an idea, here is part of a dump of an SSL connection being <i>initiated</i> (that is, no HTTP data was sent yet) to codereview.appspot.com, originated from my home location:</p>
<pre>
# tcpdump -ttttt -i wlan0 'host codereview.appspot.com and port 443'
(...)
00:00:00.000000 IP (...)
00:00:00.000063 IP (...)
00:00:00.000562 IP (...)
00:00:00.341627 IP (...)
00:00:00.357009 IP (...)
00:00:00.357118 IP (...)
00:00:00.360362 IP (...)
00:00:00.360550 IP (...)
00:00:00.366011 IP (...)
00:00:00.689446 IP (...)
00:00:00.727693 IP (...)
</pre>
<p>That&#8217;s more than half a second before the application layer was even touched. So, turns out that to save that roundtrip time, we can start <i>both</i> the form loading and the form sending requests <i>at the same time</i>. By the time the form loading ends, processing the data locally is extremely fast, and we can complete the sending side by just providing the request body.</p>
<p>At this time you may be thinking something like <i>&#8220;Ugh, that&#8217;s too much trouble.. why bother?&#8221;</i>, and that highlights precisely the point I&#8217;d like to make: it is too much trouble because most people are used to languages that <i>turn</i> it into too much trouble, but the issue is not inherently complex. In fact, this is the entire implementation of this logic in Go:</p>
<pre>
func (r *Rietveld) UpdateIssue(issue *Issue) error {
        op := &#038;opInfo{r: r, issue: issue}
        errs := make(chan error)
        ch := make(chan map[string]string, 1)
        go func() {
                errs <- r.do(&#038;editLoadHandler{op: op, form: ch})
                close(ch)
        }()
        go func() {
                errs <- r.do(&#038;editHandler{op: op, form: ch})
        }()
        return firstError(2, errs)
}
</pre>
<p>I'm not cheating. The procedure was being done serially before, with very similar logic. Previously it had to take the form variable itself from the first request and manually provide it to the next one. Now, instead of providing the form, it's providing a channel that will be used to send the form across.  One might even argue that the channel makes the algorithm <i>more natural</i>, curiously.</p>
<p>This is the kind of procedure that becomes fun and natural to write, after having first class concurrency at hand for some time. But, as in the case of DVCS, it takes a while to get used to the idea that concurrency and simplicity are not necessarily at opposing ends.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Death of goroutines under control</title>
		<link>http://blog.labix.org/2011/10/09/death-of-goroutines-under-control</link>
		<comments>http://blog.labix.org/2011/10/09/death-of-goroutines-under-control#comments</comments>
		<pubDate>Sun, 09 Oct 2011 19:53:47 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=717</guid>
		<description><![CDATA[Certainly one of the reasons why many people are attracted to the Go language is its first-class concurrency aspects. Features like communication channels, lightweight processes (goroutines), and proper scheduling of these are not only native to the language but are &#8230; <a href="http://blog.labix.org/2011/10/09/death-of-goroutines-under-control">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Certainly one of the reasons why many people are attracted to the <a href="http://golang.org">Go</a> language is its first-class concurrency aspects. Features like communication channels, lightweight processes (<i>goroutines</i>), and proper scheduling of these are not only native to the language but are integrated in a tasteful manner.</p>
<p><span id="more-717"></span>If you stay around listening to community conversations for a few days there&#8217;s a good chance you&#8217;ll hear someone proudly mentioning the tenet:</p>
<blockquote><p>Do not communicate by sharing memory; instead, share memory by communicating.</p></blockquote>
<p>There is a <a href="http://blog.golang.org/2010/07/share-memory-by-communicating.html">blog post</a> on the topic, and also a <a href="http://golang.org/doc/codewalk/sharemem/">code walk</a> covering it.</p>
<p>That model is very sensible, and being able to approach problems this way makes a significant difference when designing algorithms, but that&#8217;s not exactly news. What I address in this post is an open aspect we have today in Go related to this design: the <i>termination</i> of background activity.</p>
<p>As an example, let&#8217;s build a purposefully simplistic goroutine that sends lines across a channel:</p>
<pre>
type LineReader struct {
        Ch chan string
        r *bufio.Reader
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &#038;LineReader{make(chan string), bufio.NewReader(r)}
        go lr.loop()
        return lr
}
</pre>
<p>The type has a channel where the client can consume lines from, and an internal buffer<br />
used to produce the lines efficiently. Then, we have a function that creates an initialized<br />
reader, fires the reading loop, and returns. Nothing surprising there.</p>
<p>Now, let&#8217;s look at the loop itself:</p>
<pre>
func (lr *LineReader) loop() {
        for {
                line, err := lr.r.ReadSlice('\n')
                if err != nil {
                        close(lr.Ch)
                        return
                }
                lr.Ch <- string(line)
        }
}
</pre>
<p>In the loop we'll grab a line from the buffer, close the channel in case of errors and stop, or otherwise send the line to the other side, perhaps blocking while the other side is busy with other activities. Should sound sane and familiar to Go developers.</p>
<p>There are two details related to the termination of this logic, though: first, the error information is being dropped, and then there's no way to interrupt the procedure from outside in a clean way. The error might be easily logged, of course, but what if we wanted to store it in a database, or send it over the wire, or even handle it taking in account its nature? Stopping cleanly is also a valuable feature in many circumstances, like when one is driving the logic from a test runner.</p>
<p>I'm not claiming this is something <i>difficult</i> to do, by any means.  What I'm saying is that there isn't today an <i>idiom</i> for handling these aspects in a simple and consistent way. Or maybe there wasn't. The <i>tomb</i> package for Go is an experiment I'm releasing today in an attempt to address this problem.</p>
<p>The model is simple: a <i>Tomb</i> tracks whether the goroutine is alive, dying, or dead, and the death reason.</p>
<p>To understand that model, let's see the concept being applied to the LineReader example. As a first step, creation is tweaked to introduce Tomb support:</p>
<pre>
type LineReader struct {
        Ch chan string
        r *bufio.Reader
        <span style="color: blue">*tomb.Tomb</span>
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &#038;LineReader{
                make(chan string),
                bufio.NewReader(r),
                <span style="color: blue">tomb.New(),</span>
        }
        go lr.loop()
        return lr
}
</pre>
<p>Looks very similar. Just a new field in the struct and its respective initialization. We've used it as an embedded field just so we can use the Tomb methods directly in the <i>lr</i> variable.</p>
<p>Next, the loop function is modified to support tracking of errors and interruptions:</p>
<pre>
func (lr *LineReader) loop() {
        <span style="color: blue">defer lr.Done()</span>
        for {
                line, err := lr.r.ReadSlice('\n')
                if err != nil {
                        close(lr.Ch)
                        <span style="color: blue">lr.Fatal(err)</span>
                        return
                }
                select {
                case lr.Ch <- string(line):
                <span style="color: blue">case <-lr.Dying:</span>
                        close(lr.Ch)
                        return
                }
        }
}
</pre>
<p>Note a few interesting points here: first, <i>Done</i> is called to track the goroutine termination right before the loop function returns. Then, the previously loose error now goes into the <i>Fatal</i> Tomb method, flagging the goroutine as dying. Finally, the channel send was tweaked so that it doesn't block in case the goroutine is dying for whatever reason.</p>
<p>A Tomb has both <i>Dying</i> and <i>Dead</i> channels, which are closed when the Tomb state changes accordingly. These channels enable explicit blocking until the state changes, and also to selectively unblock select statements in those cases, as done above.</p>
<p>With the loop modified as above, a Stop method can trivially be introduced to request the clean termination of the goroutine synchronously from outside:</p>
<pre>
func (lr *LineReader) Stop() os.Error {
        <span style="color: blue">lr.Fatal(tomb.Stop)</span>
        return <span style="color: blue">lr.Wait()</span>
}
</pre>
<p>In this case the <i>Fatal</i> method will put the goroutine in a dying state from outside, and <i>Wait</i> will block until the goroutine terminates itself and notifies via the <i>Done</i> method as seen before. This procedure behaves correctly even if the goroutine was already dead or in a dying state due to internal errors, because only the first call to Fatal with an actual error is recorded as the cause for the goroutine death. The <i>tomb.Stop</i> value is used as a reason when terminating cleanly without an actual error, and it causes Wait to return nil once the goroutine terminates, flagging a clean stop per common Go idioms.</p>
<p>(<b>UPDATE:</b> there was <a href="http://groups.google.com/group/golang-nuts/browse_thread/thread/383f7cabbb174460">a minor simplification</a> in the API since this post was originally written, and the paragraph above was adapted to cover the new API)</p>
<p>This is pretty much all that there is to it. When I started developing in Go I wondered if coming up with a good convention for this sort of problem would require more support from the language, such as some kind of goroutine state tracking in a similar way to what <a href="http://www.erlang.org/doc/reference_manual/processes.html">Erlang does</a> with its lightweight processes, but it turns out this is mostly a matter of organizing the workflow with existing building blocks.</p>
<p>The tomb package and its Tomb type are a tangible representation of a good convention for goroutine termination, with familiar method names inspired in existing idioms. If you want to make use of it, goinstall the package with:</p>
<pre>
$ goinstall launchpad.net/tomb
</pre>
<p>The API documentation with details is available at:</p>
<p><span style="padding-left: 2em;"><a href="http://goneat.org/lp/tomb">http://goneat.org/lp/tomb</a></span></p>
<p>Have fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2011/10/09/death-of-goroutines-under-control/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Ensemble, Go, and MongoDB at Canonical</title>
		<link>http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical</link>
		<comments>http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical#comments</comments>
		<pubDate>Fri, 05 Aug 2011 03:49:14 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=706</guid>
		<description><![CDATA[About 1 year after development started in Ensemble, today the stars finally aligned just the right way (review queue mostly empty, no other pressing needs, etc) for me to start writing the specification about the repository system we&#8217;ve been jointly &#8230; <a href="http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>About 1 year after development started in <a href="https://ensemble.ubuntu.com">Ensemble</a>, today the stars finally aligned just the right way (review queue mostly empty, no other pressing needs, etc) for me to start writing the specification about the repository system we&#8217;ve been jointly planning for a long time. This is the system that the Ensemble client will communicate with for discovering which <a href="https://ensemble.ubuntu.com/docs/formula.html">formulas</a> are available, for publishing new formulas, for obtaining formula files for deployment, and so on.</p>
<p><span id="more-706"></span>We of course would have liked for this part of the project to have been specified and written a while ago, but unfortunately that wasn&#8217;t possible for several reasons. That said, there are also good sides of having an important piece flying around in minds and conversations for such a long time: sitting down to specify the system and describe the inner-working details has been a breeze. Even details such as the namespacing of formulas, which hasn&#8217;t been entirely clear in my mind, was just streamed into the document as the ideas we&#8217;ve been evolving finally got together in a written form. </p>
<p>One curious detail: this is the first long term project at <a href="https://www.canonical.com">Canonical</a> that will be developed in <a href="http://golang.org">Go</a>, rather than Python or C/C++, which are the most used languages for projects within Canonical. Not only that, but we&#8217;ll also be using <a href="http://www.mongodb.org">MongoDB</a> for a change, rather than the traditional <a href="http://www.postgresql.com">PostgreSQL</a>, and will also use (you guessed) the <a href="http://labix.org/mgo">mgo driver</a> which I&#8217;ve been pushing entirely as a personal project for about 8 months now.</p>
<p>Naturally, with so many moving parts that are new to the company culture, this is still being seen as a closely watched experiment. Still, this makes me highly excited, because when I started developing mgo, the MongoDB driver for Go, my hopes that the Go, MongoDB, and mgo trio would eventually be used at Canonical were very low, precisely because they were all alien to the culture. We only got here after quite a lot of internal debate, experiments, and trust too.</p>
<p>All of that means these are happy times. Important feature in Ensemble being specified and written, very exciting tools, home grown software being useful..</p>
<p>Awesomeness.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Efficient algorithm for expanding circular buffers</title>
		<link>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers</link>
		<comments>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers#comments</comments>
		<pubDate>Thu, 23 Dec 2010 12:57:40 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Article]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=580</guid>
		<description><![CDATA[Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the &#8220;Hello world!&#8221; days. They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc. &#8230; <a href="http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the <i>&#8220;Hello world!&#8221;</i> days.  They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc.</p>
<p>One feature which is not always desired, though, it the fact that circular buffers traditionally will either overwrite the last element, or raise an overflow error, since they are generally implemented as a buffer of <i>constant</i> size.  This is an unwanted property when one is attempting to <i>consume</i> items from the buffer and it is not an option to blindly drop items, for instance.</p>
<p>This post presents an efficient (and potentially novel) algorithm for implementing circular buffers which preserves most of the key aspects of the traditional version, while also supporting dynamic expansion when the buffer would otherwise have its oldest entry overwritten. It&#8217;s not clear if the described approach is novel or not (most of my novel ideas seem to have been written down 40 years ago), so I&#8217;ll publish it below and let you decide.</p>
<p><span id="more-580"></span><b>Traditional circular buffers</b></p>
<p>Before introducing the variant which can actually expand during use, let&#8217;s go through a quick review on traditional circular buffers, so that we can then reuse the nomenclature when extending the concept.  All the snippets provided in this post are written in Python, as a better alternative to pseudo-code, but the concepts are naturally portable to any other language.</p>
<p>So, the most basic circular buffer needs the buffer itself, its total capacity, and a position where the next write should occur.  The following snippet demonstrates the concept in practice:</p>
<pre>
buf = [None, None, None, None, None]
bufcap = len(buf)
pushi = 0   

for elem in range(7):
    buf[pushi] = elem
    pushi = (pushi + 1) % bufcap

print buf # => [5, 6, 2, 3, 4]
</pre>
<p>In the example above, the first two elements of the series (0 and 1) were overwritten once the pointer wrapped around. That&#8217;s the specific feature of circular buffers which the proposal in this post will offer an alternative for.</p>
<p>The snippet below provides a full implementation of the traditional approach, this time including both the pushing and popping logic, and raising an error when an overflow or underflow would occur.  Please note that these snippets are not necessarily idiomatic Python.  The intention is to highlight the algorithm itself.</p>
<pre>
class CircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.pushi = self.popi = 0
        self.bufcap = len(self.buf)

    def push(self, x):
        assert self.buflen == 0 or self.pushi != self.popi, \
               "Buffer overflow!"
        self.buf[self.pushi] = x
        self.pushi = (self.pushi + 1) % self.bufcap
        self.buflen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.popi = (self.popi + 1) % self.bufcap
        return x
</pre>
<p>With the basics covered, let&#8217;s look at how to extend this algorithm to support dynamic expansion in case of overflows.</p>
<p><b>Dynamically expanding a circular buffer</b></p>
<p>The approach consists in imagining that the same buffer can contain both a circular buffer area (referred to as <i>the ring area</i> from here on), and an overflow area, and that it is possible to transform a mixed buffer back into a pure circular buffer again.  To clarify what this means, some examples are presented below.  The full algorithm will be presented afterwards.</p>
<p>First, imagine that we have an empty buffer with a capacity of 5 elements as per the snippet above, and then the following operations take place:</p>
<pre>
for i in range(5):
    circbuf.push(i)

circbuf.pop() # => 0
circbuf.pop() # => 1

circbuf.push(5)
circbuf.push(6)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4</font>]
</pre>
<p>At this point we have a full buffer, and with the original implementation an additional push would raise an assertion error. To implement expansion, the algorithm will be changed so that those items will be appended at the end of the buffer.  Following the example, pushing two additional elements would behave the following way:</p>
<pre>
circbuf.push(7)
circbuf.push(8)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4,</font> <font color="red">7, 8</font>]
</pre>
<p>In that example, elements 7 and 8 are part of the overflow area, and the ring area remains with the same capacity and length of the original buffer. Let&#8217;s perform a few additional operations to see how it would behave when items are popped and pushed while the buffer is split:</p>
<pre>
circbuf.pop() # => 2
circbuf.pop() # => 3
circbuf.push(9)

print circbuf.buf # => [<font style="color: blue">5, 6,</font> None, None, <font style="color: blue">4,</font> <font style="color: red">7, 8, 9</font>]
</pre>
<p>In this case, even though there are two free slots available in the ring area, the last item pushed was still appended at the overflow area.  That&#8217;s necessary to preserve the FIFO semantics of the circular buffer, and means that the buffer may expand more than strictly necessary given the space available. In most cases this should be a reasonable trade off, and should stop happening once the circular buffer size stabilizes to reflect the production vs. consumption pressure (if you have a producer which constantly operates faster than a consumer, though, please look at the literature for plenty of advice on the problem).</p>
<p>The remaining interesting step in that sequence of events is the moment when the ring area capacity is expanded to cover the full allocated buffer again, with the previous overflow area being integrated into the ring area.  This will happen when the content of the previous partial ring area is fully consumed, as shown below:</p>
<pre>
circbuf.pop() # => 4
circbuf.pop() # => 5
circbuf.pop() # => 6
circbuf.push(10)

print circbuf.buf # => [<font style="color: blue">10,</font> None, None, None, None, <font style="color: blue">7, 8, 9</font>]
</pre>
<p>At this point, the whole buffer contains just a ring area and the overflow area is again empty, which means it becomes a traditional circular buffer.</p>
<p><b>Sample algorithm</b></p>
<p>With some simple modifications in the traditional implementation presented previously, the above semantics may be easily supported. Note how the additional properties did not introduce significant overhead. Of course, this version will incur in additional memory allocation to support the buffer expansion, bu that&#8217;s inherent to the problem being solved.</p>
<pre>
class ExpandingCircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringlen == self.ringcap or \
           self.ringcap != self.bufcap:
            self.buf.append(x)
            self.buflen += 1
            self.bufcap += 1
            if self.pushi == 0: # Optimization.
                self.ringlen = self.buflen
                self.ringcap = self.bufcap
        else:
            self.buf[self.pushi] = x
            self.pushi = (self.pushi + 1) % self.ringcap
            self.buflen += 1
            self.ringlen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.pushi = 0
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p>Note that the above algorithm will allocate each element in the list individually, but in sensible situations it may be better to allocate additional space for the overflow area in advance, to avoid potentially frequent reallocation.  In a situation when the rate of consumption of elements is about the same as the rate of production, for instance, there are advantages in doubling the amount of allocated memory per expansion.  Given the way in which the algorithm works, the previous ring area will be exhausted before the mixed buffer becomes circular again, so with a constant rate of production and an equivalent consumption it will effectively have its size doubled on expansion.</p>
<p><b>UPDATE:</b> Below is shown a version of the same algorithm which not only allows allocating more than one additional slot at a time during expansion, but also incorporates it in the overflow area immediately so that the allocated space is used optimally.</p>
<pre>
class ExpandingCircBuf2(object):

    def __init__(self):
        self.buf = []
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringcap != self.bufcap:
            expandbuf = (self.pushi == 0)
            expandring = False
        elif self.ringcap == self.ringlen:
            expandbuf = True
            expandring = (self.pushi == 0)
        else:
            expandbuf = False
            expandring = False

        if expandbuf:
            self.pushi = self.bufcap
            expansion = [None, None, None]
            self.buf.extend(expansion)
            self.bufcap += len(expansion)
            if expandring:
                self.ringcap = self.bufcap

        self.buf[self.pushi] = x
        self.buflen += 1
        if self.pushi < self.ringcap:
            self.ringlen += 1
        self.pushi = (self.pushi + 1) % self.bufcap

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p><b>Conclusion</b></p>
<p>This blog post presented an algorithm which supports the expansion of circular buffers while preserving most of their key characteristics.  When not faced with an overflowing buffer, the algorithm should offer very similar performance characteristics to a normal circular buffer, with a few additional instructions and constant space for registers only. When faced with an overflowing buffer, the algorithm maintains the FIFO property and enables using contiguous allocated memory to maintain both the original circular buffer and the additional elements, and follows up reusing the full area as part of a new circular buffer in an attempt to find the proper size for the given use case.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Vector clock support for Go</title>
		<link>http://blog.labix.org/2010/12/21/vector-clock-support-for-go</link>
		<comments>http://blog.labix.org/2010/12/21/vector-clock-support-for-go#comments</comments>
		<pubDate>Tue, 21 Dec 2010 18:03:47 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=564</guid>
		<description><![CDATA[One more Go library oriented towards building distributed systems hot off the presses: govclock. This one offers full vector clock support for the Go language. Vector clocks allow recording and analyzing the inherent partial ordering of events in a distributed &#8230; <a href="http://blog.labix.org/2010/12/21/vector-clock-support-for-go">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>One more Go library oriented towards building distributed systems hot off the presses: <a href="http://labix.org/govclock">govclock</a>. This one offers full <a href="http://en.wikipedia.org/wiki/Vector_clock">vector clock</a> support for the <a href="http://golang.org">Go language</a>.  Vector clocks allow recording and analyzing the inherent partial ordering of events in a distributed system in a comfortable way.</p>
<p>The following features are offered by govclock, in addition to basic event tracking:</p>
<p><span id="more-564"></span>
<ul>
<li>Compact serialization and deserialization
<li>Flexible truncation (min/max entries, min/max update time)
<li>Unit-independent update times
<li>Traditional merging
<li>Fast and memory efficient
</ul>
<p>If you&#8217;d like to know more about vector clocks, the Basho guys did a great job in the following pair of blog posts:</p>
<ul>
<li><a href="http://blog.basho.com/2010/01/29/why-vector-clocks-are-easy/">Why vector clocks are easy</a>
<li><a href="http://blog.basho.com/2010/04/05/why-vector-clocks-are-hard/">Why vector clocks are hard</a>
</ul>
<p>The following sample program demonstrates some sequential and concurrent events, dumping and loading, as well as merging of clocks.  For more details, please look at the <a href="http://labix.org/govclock">web page</a>.  The project is available under a BSD license.</p>
<pre>

package main

import (
    "launchpad.net/govclock"
    "fmt"
)

func main() {
    vc1 := govclock.New()
    vc1.Update([]byte("A"), 1)

    vc2 := vc1.Copy()
    vc2.Update([]byte("B"), 0)

    fmt.Println(vc2.Compare(vc1, govclock.Ancestor))   // => true
    fmt.Println(vc1.Compare(vc2, govclock.Descendant)) // => true

    vc1.Update([]byte("C"), 5)

    fmt.Println(vc1.Compare(vc2, govclock.Descendant)) // => false
    fmt.Println(vc1.Compare(vc2, govclock.Concurrent)) // => true

    vc2.Merge(vc1)

    fmt.Println(vc1.Compare(vc2, govclock.Descendant)) // => true

    data := vc2.Bytes()
    fmt.Printf("%#v\n", string(data))
    // => "\x01\x01\x01\x01A\x01\x01\x01B\x01\x00\x01C"

    vc3, err := govclock.FromBytes(data)
    if err != nil { panic(err.String()) }

    fmt.Println(vc3.Compare(vc2, govclock.Equal))      // => true
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/21/vector-clock-support-for-go/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Integrating Go with C: the ZooKeeper binding experience</title>
		<link>http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience</link>
		<comments>http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience#comments</comments>
		<pubDate>Fri, 10 Dec 2010 16:17:16 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Article]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=534</guid>
		<description><![CDATA[ZooKeeper is a clever generic coordination server for distributed systems, and is one of the core softwares which facilitate the development of Ensemble (project for automagic IaaS deployments which we push at Canonical), so it was a natural choice to &#8230; <a href="http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://zookeeper.apache.org">ZooKeeper</a> is a clever generic coordination server for distributed systems, and is one of the core softwares which facilitate the development of Ensemble (project for automagic <a href="http://en.wikipedia.org/wiki/Cloud_computing">IaaS</a> deployments which we push at <a href="http://www.canonical.com">Canonical</a>), so it was a natural choice to experiment with.</p>
<p><a href="https://wiki.ubuntu.com/gozk">Gozk</a> is a complete binding for ZooKeeper which explores the native features of Go to facilitate the interaction with a ZooKeeper server.  To avoid reimplementing the well tested bits of the protocol in an unstable way, Gozk is built on top of the standard C ZooKeeper library.</p>
<p>The experience of integrating ZooKeeper with Go was certainly valuable on itself, and worked as a nice way to learn the details of integrating the Go language with a C library. If you&#8217;re interested in learning a bit about Go, ZooKeeper, or other details related to the creation of bindings and asynchronous programming, please fasten the seatbelt now.</p>
<p><span id="more-534"></span><b>Basics of C wrapping in Go</b></p>
<p>Creating the binding on itself was a pretty interesting experiment already.  I have worked on the creation of quite a few bindings and language bridges before, and must say I was pleasantly surprised with the experience of creating the Go binding.  With <i>Cgo</i>, the name given to the &#8220;<i>foreign function interface</i>&#8221; mechanism for C integration, one basically declares a special import statement which causes a pre-processor to look at the comment preceding it.  Something similar to this:</p>
<pre>
// #include &lt;zookeeper.h&gt;
import "C"
</pre>
<p>The comment doesn&#8217;t have to be restricted to a single line, or to <i>#include</i> statements even.  The C code contained in the comment will be transparently inserted into a helper C file which is compiled and linked with the final object file, and the given snippet will also be parsed and inclusions processed.  In the Go side, that &#8220;C&#8221; import is simulated as if it were a normal Go package so that the C functions, types, and values are all directly accessible.</p>
<p>As an example, a C function with this prototype:</p>
<pre>
int zoo_wexists(zhandle_t *zh, const char *path, watcher_fn watcher,
                void *context, struct Stat *stat);
</pre>
<p>In Go may be used as:</p>
<pre>
cstat := C.struct_Stat{}
rc, cerr := C.zoo_wexists(zk.handle, cpath, nil, nil, &#038;cstat)
</pre>
<p>When the C function is used in a context where two result values are requested, as done above, Cgo will save the well known <i>errno</i> variable after the function has finished executing and will return it wrapped into an <i>os.Errno</i> value.</p>
<p>Also, note how the C struct is defined in a way that can be passed straight to the C function.  Interestingly, the allocation of the memory backing the structure is going to be performed and tracked by the Go runtime, and will be garbage collected appropriately once no more references exist <i>within the Go runtime</i>. This fact has to be kept in mind since the application will crash if a value allocated normally within Go is saved with a foreign C function and maintained after all the Go references are gone.  The alternative in these cases is to call the usual C functions to get hold of memory for the involved values.  That memory won&#8217;t be touched by the garbage collector, and, of course, must be explicitly freed when no longer necessary.  Here is a simple example showing explicit allocation:</p>
<pre>
cbuffer := (*C.char)(C.malloc(bufferSize))
defer C.free(unsafe.Pointer(cbuffer))
</pre>
<p>Note the use of the <i>defer</i> statement above. Even when dealing with foreign functionality, it comes in handy. The above call will ensure that the buffer is deallocated right before the current function returns, for instance, so it&#8217;s a nice way to ensure no leaks happen, even if in the future the function suddenly gets a new exit point which didn&#8217;t consider the allocation of resources.</p>
<p>In terms of typing, Go is more strict than C, and Cgo-based logic will also ensure that the types returned and passed into the foreign C functions are correctly typed, in the same way done for the native types.  Note above, for instance, how the call to the <i>free()</i> function has to explicitly convert the value into an <i>unsafe.Pointer</i>, even though in C no casting would be necessary to pass a pointer into a <i>void *</i> parameter.</p>
<p>The <i>unsafe.Pointer</i> is in fact a very special type within Go. Using it, one can convert any pointer type into any other pointer type in an unsafe way (thus the package name), and also back and forth into a <i>uintptr</i> value with the address of the memory referenced by the pointer.  For every other type conversion, Go will ensure at compilation time that doing the conversion at runtime is a safe operation.</p>
<p>With all of these resources, including the ability to use common Go syntax and functionality even when dealing with foreign types, values, and function calls, the integration task turns out to be quite a pleasant experience.  That said, some of the things may still require some good thinking to get right, as we&#8217;ll see shortly.</p>
<p><b>Watch callbacks and channels</b></p>
<p>One of the most interesting (and slightly tricky) aspects of mapping the ZooKeeper concepts into Go was the &#8220;watch&#8221; functionality.  ZooKeeper allows one to attach a &#8220;watch&#8221; to a node so that the server will report back when changes happen to the given node.  In the C library, this functionality is exposed via a callback function which is executed once the monitored node aspect is modified.</p>
<p>It would certainly be possible to offer this functionality in Go using a similar mechanism, but <a href="http://golang.org/doc/go_spec.html#Channel_types">Go channels</a> provide a number of advantages for that kind of asynchronous notification: waiting for multiple events via the <a href="http://golang.org/doc/go_spec.html#Select_statements">select statement</a>, synchronous blocking until the event happens, testing if the event is already available, etc.</p>
<p>The tricky bit, though, isn&#8217;t the use of channels.  That part is quite simple.  The tricky detail is that the C callback function execution happens in a C thread started by the ZooKeeper library, and happens asynchronously, while the Go application is doing its business elsewhere.  Right now, there&#8217;s no straightforward way to transfer the execution of this asynchronous C function back into the Go land.  The solution for this problem was found with some help from the folks at the <a href="">golang-nuts</a> mailing list, and luckily it&#8217;s not that hard to support or understand.  That said, this is a good opportunity to get some coffee or your preferred focus-enhancing drink.</p>
<p>The solution works like this: when the ZooKeeper C library gets a watch notification, it executes a C callback function which is inside a Gozk helper file. Rather than transferring control to Go right away, this C function simply appends data about the event onto a queue, and signals a pthread condition variable to notify that an event is available.  Then, on the Go side, once the first ZooKeeper connection is initialized, a new goroutine is fired and loops waiting for events to be available.  The interesting detail about this loop, is that it blocks <i>within a foreign C function</i> waiting for an event to be available, through the signaling of the shared pthread condition variable.  In the Go side, that&#8217;s how the call looks like, just to give a more practical feeling:</p>
<pre>
// This will block until there's a watch available.
data := C.wait_for_watch()
</pre>
<p>Then, on the C side, here is the function definition:</p>
<pre>
watch_data *wait_for_watch() {
    watch_data *data = NULL;
    pthread_mutex_lock(&#038;watch_mutex);
    if (first_watch == NULL)
        pthread_cond_wait(&#038;watch_available, &#038;watch_mutex);
    data = first_watch;
    first_watch = first_watch->next;
    pthread_mutex_unlock(&#038;watch_mutex);
    return data;
}
</pre>
<p>As you can see, not really a big deal.  When that kind of blocking occurs inside a foreign C function, the Go runtime will correctly continue the execution of other goroutines within other operating system threads.</p>
<p>The result of this mechanism is a nice to use interface based on channels, which may be explored in different ways depending on the application needs.  Here is a simple example blocking on the event synchronously, for instance:</p>
<pre>
stat, watch, err := zk.ExistsW("/some/path")
if stat == nil &#038;&#038; err == nil {
    event := <-watch
    // Use event ...
}
</pre>
<p><b>Concluding</b></p>
<p>Those were some of the interesting aspects of implementing the ZooKeeper binding.  I would like to speak about some additional details, but this post is rather long already, so I'll keep that for a future opportunity.  The code is available under the LGPL, so if you're curious about some other aspect, or would like to use ZooKeeper with Go, please move on and <a href="https://wiki.ubuntu.com/gozk">check it out</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Removing seatbelts with the Go language for mmap support</title>
		<link>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support</link>
		<comments>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support#comments</comments>
		<pubDate>Sun, 28 Nov 2010 18:33:29 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=505</guid>
		<description><![CDATA[Continuing the sequence of experiments I&#8217;ve been running with the Go language, I&#8217;ve just made available a tiny but useful new package: gommap. As one would imagine, this new package provides access to low-level memory mapping for files and devices, &#8230; <a href="http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Continuing the sequence of experiments I&#8217;ve been running with the Go language, I&#8217;ve just made available a tiny but useful new package: <a href="http://labix.org/gommap">gommap</a>. As one would imagine, this new package provides access to low-level memory mapping for files and devices, and it allowed exploring a few new edges of the language implementation.  Note that, strictly speaking, some of the details ahead are really more about the implementation than the <i>language</i> itself.</p>
<p><span id="more-505"></span>There were basically two main routes to follow when implementing support for memory mapping in Go.  The first one is usually the way higher-level languages handle it.  In Python, for instance, this is the way one may use a memory mapped file:</p>
<pre>
>>> import mmap
>>> file = open("/etc/passwd")
>>> mm = mmap.mmap(file.fileno(), size, access=PROT_READ)
>>> mm[0:4]
'root'
</pre>
<p>The way this was done has an advantage and a disadvantage which are perhaps non entirely obvious on a first look.  The advantage is that the memory mapped area is truly hidden behind that interface, so any improper attempt to access a region which was already unmapped, for instance, may be blocked within the application with a nice error message which explains the issue.  The disadvantage, though, is that this interface usually comes with a restriction that the way to use the memory region with normal libraries, is via copying of data.  In the above example, for instance, the &#8220;root&#8221; string isn&#8217;t backed by the original mapped memory anymore, and is rather a copy of its contents (see <a href="http://www.python.org/dev/peps/pep-3118/">PEP 3118</a> for a way to improve a bit this aspect with Python).</p>
<p>The other path, which can be done with Go, is to back a normal native array type with the allocated memory.  This means that normal libraries don&#8217;t need to copy data out of the mapped memory, or to use a special memory saving interface, to deal with the memory mapped region.  As a simple example, this would get the first line in the given file:</p>
<pre>
mmap, err := gommap.Map(file.Fd(), PROT_READ, MAP_PRIVATE)
if err == nil {
    end := bytes.Index(mmap, []byte{'\n'})
    firstLine := mmap[:end]
}
</pre>
<p>In the procedure above, <i>mmap</i> is defined as an alias to a native <i>[]byte</i> array, so even though the standard <i>bytes</i> module was used, at no point was the data from the memory mapped region copied out or any auxiliary buffers allocated, so this is a <i>very</i> fast operation.  To give an idea about this, let&#8217;s pretend for a moment that we want to increase a simple 8 bit counter in a file.  This might be done with something as simple as:</p>
<pre>
mmap[13] += 1
</pre>
<p>This line of code would be compiled into something similar to the following assembly (amd64):</p>
<pre>
MOVQ    mmap+-32(SP),BX
CMPL    8(BX),$13
JHI     ,68
CALL    ,runtime.panicindex+0(SB)
MOVQ    (BX),BX
INCB    ,13(BX)
</pre>
<p>As you can see, this is just doing some fast index checking before incrementing the value <i>directly in memory</i>. Given that one of the important reasons why memory mapped files are used is to speed up access to disk files (sometimes <i>large</i> disk files), this advantage in performance is actually meaningful in this context.</p>
<p>Unfortunately, though, doing things this way also has an important disadvantage, at least right now.  There&#8217;s no way at the moment to track references to the underlying memory, which was allocated by means not known to the Go runtime.  This means that <i>unmapping</i> this memory is not a safe operation.  The munmap system call will simply take the references away from the process, and any further attempt to touch those areas will crash the application.</p>
<p>To give you an idea about the background &#8220;magic&#8221; which is going on to achieve this support in Go, here is an interesting excerpt from the underlying mmap syscall as of this writing:</p>
<pre>
addr, _, errno := syscall.Syscall6(syscall.SYS_MMAP, (...))
(...)
dh := (*reflect.SliceHeader)(unsafe.Pointer(&#038;mmap))
dh.Data = addr
dh.Len = int(length)
dh.Cap = dh.Len
</pre>
<p>As you can see, this is taking apart the memory backing the slice value into its constituting structure, and altering it to point to the mapped memory, including information about the length mapped so that bound checking as observed in the assembly above will work correctly.</p>
<p>In case the garbage collector is at some point extended to track references to these foreign regions, it would be possible to implement some kind of <i>UnmapOnGC()</i> method which would only unmap the memory once the last reference is gone.  For now, though, the advantages of being able to reference memory mapped regions directly, at least to me, surpass the danger of having improper slices of the given region being used after unmapping.  Also, I expect that usage of this kind of functionality will generally be encapsulated within higher level libraries, so it shouldn&#8217;t be too hard to keep the constraint in mind while using it this way.</p>
<p>For those reasons, <a href="http://labix.org/gommap">gommap</a> was implemented with the latter approach.  In case you need memory mapping support for Go, just move ahead and <i>goinstall launchpad.net/gommap</i>.</p>
<p><b>UPDATE (2010-12-02):</b> The interface was updated so that mmap itself is an array, rather than mmap.Data, and this post was changed to reflect this.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Interfaces and the design of software</title>
		<link>http://blog.labix.org/2010/11/09/interfaces-and-the-design-of-software</link>
		<comments>http://blog.labix.org/2010/11/09/interfaces-and-the-design-of-software#comments</comments>
		<pubDate>Tue, 09 Nov 2010 18:30:29 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Article]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=450</guid>
		<description><![CDATA[A while ago Martin Pool made a very interesting post on the design of interfaces, inspired by a talk from Rusty Russel from 2003. Besides the interesting scale of interface quality explained there, this is a very insightful comment, often &#8230; <a href="http://blog.labix.org/2010/11/09/interfaces-and-the-design-of-software">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A while ago Martin Pool made a <a href="http://sourcefrog.net/weblog/software/aesthetics/interface-levels.html">very interesting post</a> on the design of interfaces, inspired by a <a href="http://www.ozlabs.com/~rusty/ols-2003-keynote/ols-keynote-2003.html">talk from Rusty Russel</a> from 2003.</p>
<p>Besides the interesting scale of interface quality explained there, this is a very insightful comment, often overlooked:</p>
<p><span id="more-450"></span><br />
<blockquote>
Once the code gets too big for one person, it&#8217;s all about damage control. Interfaces make damage control possible&#8230; except when the interfaces themselves are the problem.
</p></blockquote>
<p>Designing a system as a group of people requires splitting tasks up among team members, and/or the community, and perhaps even separate teams. Interfaces are the touch points of that splitting, and is what represents the functionality offered within the module/library/file/command/service/whatever.  Too often, people spend a long time working on the implementation details, thinking really deep about how to obtain the desired behavior, and forget to define clearly what is the <i>interface</i> to that behavior.</p>
<p>Having good interfaces is a key aspect of software development, and getting it correctly offers a number of important benefits:</p>
<p><b>Good encapsulation</b></p>
<p>Having good encapsulation is pretty much a synonym of having good interfaces. Too often, though, people focus on the encapsulation of the small pieces (the functions, the classes, etc), and forget about the encapsulation of the larger blocks (the libraries, modules, packages, commands, etc).</p>
<p>Also, in my experience trying to encourage good architectures, I have found that stating <i>&#8220;We need good encapsulation!&#8221;</i> gives developers no tangible line of action.  It reminds me of a parent telling the child <i>&#8220;You should be responsible!&#8221;</i>.  Sure, encapsulation and responsibility both sound great, but.. what does that <i>really mean</i>?</p>
<p>When inviting developers to think about the <i>interfaces</i> of the system parts they are responsible for, encapsulation becomes a natural outcome. It&#8217;s clear that there must be a line drawn between that part of the system and the rest, and the shape of this line must be considered while (or even better, <i>before</i>) the behavior is implemented.</p>
<p>Given well designed interfaces, the additional requirement of only using other parts of the system through their public interfaces seals the achievement of good encapsulation. Ideally, this barrier would be a natural property of the language used to develop the system (see the interface quality scale in <a href="http://sourcefrog.net/weblog/software/aesthetics/interface-levels.html">Martin&#8217;s post</a>). In other cases, this must be achieved through conventions, agreements, and good documentation.</p>
<p><b>Improved scope and communication</b></p>
<p>By inviting developers to think about the <i>interfaces</i> of the parts they are responsible for, one is basically encouraging the consideration of the interaction between those pieces and the rest of the system.  This process gives an interesting perspective, both in terms of the external expectations (what do I need to offer other people?), as well as the internal goals (what do I need to implement for satisfying what other people need?).</p>
<p>Besides helping people to figure the scope and goal of the piece being developed, this will also give a nice structure to some of the communication which must inevitably happen to integrate correctly the separate parts of the system being developed.</p>
<p><b>Improved testing and experimentation</b></p>
<p>If an interface is well designed and defined, and encapsulates well part of the functionality of the system, it improves significantly the testing and experimentation related to that part of the system.  Again, this has an effect internally and externally to the interface.</p>
<p>Internally in the sense that there&#8217;s a clear boundary between the part in development and the rest of the system, and thus it should be easier to verify that the bits which compose it are working according to plan without dragging the whole system together, and also to verify that the interface itself is behaving as intended (and hopefully as documented).</p>
<p>Externally in the sense that, given that there&#8217;s agreement regarding what is the public interface to the part being considered, one may easily provide a test double (a fake, or dummy, or mock) to simulate that part of the system.  This is well known to be useful in a number of ways:</p>
<ul>
<li>Dependent work may be run in parallel by different people</li>
<li>Real implementation backing the given interface may be postponed, until the idea is proven useful, and the interface feels suitable</li>
<li>External systems which would be hard to run locally may be simulated so that tests run fast and cheap, even without network connections</li>
<li>Faults may be injected in the system via the test doubles to verify behavior in hostile conditions</li>
</ul>
<p>and so on.</p>
<p><b>Quality isolation</b></p>
<p>This point is also my understanding of what Rusty refers to as <i>damage control</i> in his talk.  This property is very useful when designing a system, but even then it&#8217;s often missed when discussing interfaces and encapsulation.</p>
<p>If there&#8217;s a well defined interface to a piece of functionality in the system, and that interface was carefully considered to cover the needs of the system, the implementation of that interface may not start as the most beautiful, or most scalable, or even most reliable piece of software. As any developer responsible for a successful startup will happily point out, a half-baked implementation is often good enough to get things going, prove the concept, and extend the project runway.</p>
<p>Good interfaces play an important role in this kind of situation.  They are, in this sense, a way to be better prepared for success (or, for failure, <a href="http://twitter.com">depending on the perspective</a>).  If the interface implementation suddenly becomes an issue for whatever reason, the implementation itself may be replaced by something which better suits the current reality, while preserving the interaction with the rest of the system.</p>
<p>Of course, it&#8217;s still very hard to predict future system behavior when facing a completely different reality.  Changing the scale requirements for the system a few orders of magnitude, for instance, may easily break existing assumptions, and interfaces designed around these assumptions. Still, even if good interfaces won&#8217;t be enough to avoid modifications in the architecture and integration points in many cases, they will certainly help framing the conversations which will take place when this happens and new interfaces must be developed.</p>
<p><b>Conclusion</b></p>
<p>When developing non-trivial software products, there&#8217;s no other way but to split out the problem solving in several layers and components.  Looking at the points where these layers and components touch each other is a very useful and natural way to organize conversations and structure work which must take place to push the product forward.</p>
<p>It&#8217;s quite revealing to look at the points above, and note that it&#8217;s not simply the existence of interfaces themselves which presents the advantages described, but the process which they encourage around them.  Software architecture is essentially about people.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/11/09/interfaces-and-the-design-of-software/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gocheck: A rich testing library for Go</title>
		<link>http://blog.labix.org/2010/11/06/gocheck-a-rich-testing-library-for-go</link>
		<comments>http://blog.labix.org/2010/11/06/gocheck-a-rich-testing-library-for-go#comments</comments>
		<pubDate>Sat, 06 Nov 2010 23:31:59 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=433</guid>
		<description><![CDATA[It&#8217;s time to release my &#8220;side project&#8221; which has been evolving over the last several months: Gocheck. I&#8217;ve been watching Go for some time, and have been getting more and more interested in the language. My first attempt to write &#8230; <a href="http://blog.labix.org/2010/11/06/gocheck-a-rich-testing-library-for-go">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s time to release my &#8220;side project&#8221; which has been evolving over the last several months: <a href="http://j.mp/dcKGmQ" _href="http://labix.org/gocheck">Gocheck</a>. I&#8217;ve been watching Go for some time, and have been getting more and more interested in the language.  My first attempt to write something interesting in it made it obvious that there would be benefit in having a richer testing platform than what is <a href="http://golang.org/pkg/testing/">available in the standard library</a>.  That said, I do understand why the standard one is slim: it&#8217;s pretty minimalist, because it&#8217;s used by itself to test the rest of the platform.  With Gocheck, though, I don&#8217;t have that requirement.  I&#8217;m able to trust that the standard library works well, and focus on having features which will make me more productive while writing tests, including features such as:</p>
<p><span id="more-433"></span>
<ul>
<li>Better error reporting
<li> Richer test helpers: assertions which interrupt the test immediately, deep multi-type comparisons, string matching, etc
<li> Suite-based grouping of tests
<li> Fixtures: per suite and/or per test set up and tear down
<li> Management of temporary directories
<li> Panic-catching logic, with proper error reporting
<li> Proper counting of successes, failures, panics, missed tests, skips, etc
<li> Support for expected failures
<li> Fully tested (yes, it manages to test itself reliably!)
</ul>
<p>That last point was actually quite fun to get right.  It&#8217;s the first time I wrote a testing framework from the ground up, and of course I wanted to have it fully tested by itself, but I didn&#8217;t want to simply use a foreign testing framework to test it.  So what it does is basically to have a &#8220;bootstrapping&#8221; phase, which ensures that the very basic parts of the library work, without trusting on pretty much any internal functionality (e.g. it verifies the number of executed functions, and works with low-level panics). Then, once the lower layers are trusted, tests for higher functionality was introduced by building on the trusted bits.</p>
<p>Gocheck is actually <i>mostly</i> ready for some time now, but I&#8217;ve been polishing edges with some real world usage before releasing it.  Since both the real world usage and Gocheck itself are side projects, you can imagine that took a bit of time.  Today, though, I&#8217;ve managed to fix the last few things which were bothering me, so it&#8217;s up for world consumption.</p>
<p>I hope you enjoy it, and make some good use of it so that we can all have more reliable software. <img src='http://blog.labix.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/11/06/gocheck-a-rich-testing-library-for-go/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Integrating IRC with LDAP and two-way SMSing</title>
		<link>http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing</link>
		<comments>http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing#comments</comments>
		<pubDate>Sat, 19 Jun 2010 21:56:07 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=296</guid>
		<description><![CDATA[A bit of history I don&#8217;t know exactly why, but I&#8217;ve always enjoyed IRC bots. Perhaps it&#8217;s the fact that it emulates a person in an easy-to-program way, or maybe it&#8217;s about having a flexible and shared &#8220;command line&#8221; tool, &#8230; <a href="http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><b>A bit of history</b></p>
<p>I don&#8217;t know exactly why, but I&#8217;ve always enjoyed IRC bots.  Perhaps it&#8217;s the fact that it emulates a person in an easy-to-program way, or maybe it&#8217;s about having a flexible and shared &#8220;command line&#8221; tool, or maybe it&#8217;s just the fact that it helps people perceive things in an asynchronous way without much effort.  Probably a bit of everything, actually.</p>
<p><span id="more-296"></span></p>
<p>My bot programming started with <a href="http://labix.org/pybot">pybot</a> many years ago, when I was still working at <a href="http://www.conectiva.com.br">Conectiva</a>.  Besides having many interesting features, this bot eventually got in an abandonware state, since <a href="http://www.canonical.com">Canonical</a> already had pretty much equivalent features available when I joined, and I had other interests which got in the way.  The code was a bit messy as well.. it was a time when I wasn&#8217;t very used to testing software properly (a friend has a great excuse for that kind of messy software: <i>&#8220;I was young, and needed the money!&#8221;</i>).</p>
<p>Then, a couple of years ago, while working in the <a href="http://landscape.canonical.com">Landscape</a> project, there was an opportunity of getting some information more visible to the team.  Coincidently, it was also a time when I wanted to get some practice with the concepts of <a href="http://erlang.org">Erlang</a>, so I decided to write a bot from scratch with some nice support for plugins, just to get a feeling of how the promised stability of Erlang actually took place for real.  This bot is called <a href="https://launchpad.net/mup">mup</a> (Mup Pet, more formally), and its code is available publicly through <a href="https://launchpad.net/mup">Launchpad</a>.</p>
<p>This was a nice experiment indeed, and I did learn quite a bit about the ins and outs of Erlang with it.  Somewhat unexpected, though, was the fact that the bot grew up a few extra features which multiple teams in Canonical started to appreciate.  This was of course very nice, but it also made it more obvious that the egocentric reason for having a bot written in Erlang would now hurt, because most of Canonical&#8217;s own coding is done in Python, and that&#8217;s what internal tools should generally be written in for everyone to contribute and help maintaining the code.</p>
<p>That&#8217;s where the desire of migrating mup into a Python-based brain again came from, and having a new feature to write was the perfect motivator for this.</p>
<p><b>LDAP and two-way SMSing over IRC</b></p>
<p>Canonical is a <i>very</i> distributed company.  Employees are distributed over dozens of countries, literally.  Not only that, but most people also work from their homes, rather than in an office.  Many different countries also means many different timezones, and working from home with people from different timezones means flexible timing.  All of that means communication gets&#8230; well.. interesting.</p>
<p>How do we reach someone that should be in an online meeting and is not?  Or someone that is traveling to get to a sprint?  Or how can someone that has no network connectivity reach an IRC channel to talk to the team?  There are probably several answers to this question, but one of them is of course SMS.  It&#8217;s not exactly cheap if we consider the cost of the data being transfered, but pretty much everyone has a mobile phone which can do SMS, and the model is not that far away from IRC, which is the main communication system used by the company.</p>
<p>So, the itch was itching.  Let&#8217;s scratch it!</p>
<p>Getting the mobile phone of employees was already a solved problem for mup, because it had a plugin which could interact with the LDAP directory, allowing people to do something like this:</p>
<blockquote><p>
&lt;joe&gt; mup: poke gustavo<br />
&lt;mup&gt; joe: niemeyer is Gustavo Niemeyer &lt;&#8230;@canonical.com&gt; &lt;time:&#8230;&gt; &lt;mobile:&#8230;&gt;
</p></blockquote>
<p>This just had to be migrated from Erlang into a Python-based brain for the reasons stated above. This time, though, there was no reason to write something from scratch.  I could even have used pybot itself, but there was also <a href="http://sourceforge.net/projects/supybot/">supybot</a>, an IRC bot which started around the same time I wrote the first version of pybot, and unlike the latter, supybot&#8217;s author was much more diligent in evolving it.  There is quite a comprehensive list of plugins for supybot nowadays, and it includes means for testing plugins and so on.  The choice of using it was straighforward, and getting &#8220;<i>poke</i>&#8221; support ported into a plugin wasn&#8217;t hard at all.</p>
<p>So, on to SMSing.  Canonical already had a contract with an SMS gateway company which we established to test-drive some ideas on <a href="https://landscape.canonical.com">Landscape</a>. With the mobile phone numbers coming out of the LDAP directory in hands and an SMS contract established, all that was needed was a plugin for the bot to talk to the SMS gateway.  That &#8220;conversation&#8221; with the SMS gateway allows not only sending messages, but also receiving SMS messages which were sent to a specific number.</p>
<p>In practice, this means that people which are connected to IRC can very easily deliver an SMS to someone using their nicks.  Something like this:</p>
<blockquote><p>
&lt;joe&gt; @sms niemeyer Where are you?  We&#8217;re waiting!
</p></blockquote>
<p>And this would show up in the mobile screen as:</p>
<blockquote><p>
joe&gt; Where are you?  We&#8217;re waiting!
</p></blockquote>
<p>In addition to this, people which have <i>no connectivity</i> can also contact individuals and channels on IRC, with mup working as a middle man.  The message would show up on IRC in a similar way to:</p>
<blockquote><p>
&lt;mup&gt; [SMS] &lt;niemeyer&gt; Sorry, the flight was delayed. Will be there in 5.
</p></blockquote>
<p>The communication from the bot to the gateway happens via plain HTTPS.  The communication back is a bit more complex, though.  There is a small proxy service deployed in <a href="http://code.google.com/appengine">Google App Engine</a> to receive messages from the SMS gateway.  This was done to avoid losing messages when the bot itself is taken down for maintenance.  The SMS gateway doesn&#8217;t handle this case very well, so it&#8217;s better to have something which will be up most of the time buffering messages.</p>
<p>A picture is worth 2<sup>10</sup> words, so here is a simple diagram explaining how things got linked together:</p>
<p><a href="http://blog.labix.org/wp-content/uploads/2010/06/mup-sms.png"><img src="http://blog.labix.org/wp-content/uploads/2010/06/mup-sms.png" alt="" title="SMS integration diagram" width="449" height="255" class="aligncenter size-full wp-image-308" /></a></p>
<p>This is now up for experimentation, and so far it&#8217;s working nicely.  I&#8217;m hoping that in the next few weeks we&#8217;ll manage to port the rest of mup into the supybot-based brain.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

