<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Labix Blog &#187; Snippet</title>
	<atom:link href="http://blog.labix.org/tag/snippet/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.labix.org</link>
	<description>by Gustavo Niemeyer</description>
	<lastBuildDate>Mon, 16 Jan 2012 04:02:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Good concurrency changes the game</title>
		<link>http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game</link>
		<comments>http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game#comments</comments>
		<pubDate>Mon, 12 Dec 2011 17:52:47 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=886</guid>
		<description><![CDATA[A long time before I seriously got into using distributed version control systems (DVCS) such as Bazaar and Git for developing software, it was already well known to me how the mechanics of these systems worked, and why people benefited &#8230; <a href="http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A long time before I seriously got into using distributed version control systems (DVCS) such as Bazaar and Git for developing software, it was already well known to me how the mechanics of these systems worked, and why people benefited from them. That said, it wasn&#8217;t until I indeed started to use DVCS tools that I understood how much my daily workflow around code bases would be changed and improved.</p>
<p><span id="more-886"></span>This weekend, while flying home from <a href="http://www.10gen.com/events/mongosv-2011">MongoSV</a>, I could experience that same feeling in relation to first class concurrency support in programming languages. Everybody knows how the feature may be used, but I have the feeling that until one actually experiences it in practice, it&#8217;s very hard to really understand how much the relationship with ordering while developing software may be improved.</p>
<p>I was having some fun working on improvements to <a href="https://wiki.ubuntu.com/goetveld">Goetveld</a>. This package allows <a href="http://golang.org">Go</a> programs to communicate with <a href="https://codereview.appspot.com">Rietveld</a> servers to manipulate code review entries. The Rietveld API is a bit rough in a few places, and as a result some features of the package actually parse an HTML form to extract some data, before sending it back. You may have done something similar before while attempting to script a web site that wasn&#8217;t originally intended to be.</p>
<p>The interesting fact here is that this is an intrinsically serial procedure: load a form, change it, and send it back, right? Well, not really. As one might intuitively expect, establishing an SSL session and its underlying TCP connection are not instantaneous operations.</p>
<p>To give an idea, here is part of a dump of an SSL connection being <i>initiated</i> (that is, no HTTP data was sent yet) to codereview.appspot.com, originated from my home location:</p>
<pre>
# tcpdump -ttttt -i wlan0 'host codereview.appspot.com and port 443'
(...)
00:00:00.000000 IP (...)
00:00:00.000063 IP (...)
00:00:00.000562 IP (...)
00:00:00.341627 IP (...)
00:00:00.357009 IP (...)
00:00:00.357118 IP (...)
00:00:00.360362 IP (...)
00:00:00.360550 IP (...)
00:00:00.366011 IP (...)
00:00:00.689446 IP (...)
00:00:00.727693 IP (...)
</pre>
<p>That&#8217;s more than half a second before the application layer was even touched. So, turns out that to save that roundtrip time, we can start <i>both</i> the form loading and the form sending requests <i>at the same time</i>. By the time the form loading ends, processing the data locally is extremely fast, and we can complete the sending side by just providing the request body.</p>
<p>At this time you may be thinking something like <i>&#8220;Ugh, that&#8217;s too much trouble.. why bother?&#8221;</i>, and that highlights precisely the point I&#8217;d like to make: it is too much trouble because most people are used to languages that <i>turn</i> it into too much trouble, but the issue is not inherently complex. In fact, this is the entire implementation of this logic in Go:</p>
<pre>
func (r *Rietveld) UpdateIssue(issue *Issue) error {
        op := &#038;opInfo{r: r, issue: issue}
        errs := make(chan error)
        ch := make(chan map[string]string, 1)
        go func() {
                errs <- r.do(&#038;editLoadHandler{op: op, form: ch})
                close(ch)
        }()
        go func() {
                errs <- r.do(&#038;editHandler{op: op, form: ch})
        }()
        return firstError(2, errs)
}
</pre>
<p>I'm not cheating. The procedure was being done serially before, with very similar logic. Previously it had to take the form variable itself from the first request and manually provide it to the next one. Now, instead of providing the form, it's providing a channel that will be used to send the form across.  One might even argue that the channel makes the algorithm <i>more natural</i>, curiously.</p>
<p>This is the kind of procedure that becomes fun and natural to write, after having first class concurrency at hand for some time. But, as in the case of DVCS, it takes a while to get used to the idea that concurrency and simplicity are not necessarily at opposing ends.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2011/12/12/good-concurrency-changes-the-game/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Death of goroutines under control</title>
		<link>http://blog.labix.org/2011/10/09/death-of-goroutines-under-control</link>
		<comments>http://blog.labix.org/2011/10/09/death-of-goroutines-under-control#comments</comments>
		<pubDate>Sun, 09 Oct 2011 19:53:47 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=717</guid>
		<description><![CDATA[Certainly one of the reasons why many people are attracted to the Go language is its first-class concurrency aspects. Features like communication channels, lightweight processes (goroutines), and proper scheduling of these are not only native to the language but are &#8230; <a href="http://blog.labix.org/2011/10/09/death-of-goroutines-under-control">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Certainly one of the reasons why many people are attracted to the <a href="http://golang.org">Go</a> language is its first-class concurrency aspects. Features like communication channels, lightweight processes (<i>goroutines</i>), and proper scheduling of these are not only native to the language but are integrated in a tasteful manner.</p>
<p><span id="more-717"></span>If you stay around listening to community conversations for a few days there&#8217;s a good chance you&#8217;ll hear someone proudly mentioning the tenet:</p>
<blockquote><p>Do not communicate by sharing memory; instead, share memory by communicating.</p></blockquote>
<p>There is a <a href="http://blog.golang.org/2010/07/share-memory-by-communicating.html">blog post</a> on the topic, and also a <a href="http://golang.org/doc/codewalk/sharemem/">code walk</a> covering it.</p>
<p>That model is very sensible, and being able to approach problems this way makes a significant difference when designing algorithms, but that&#8217;s not exactly news. What I address in this post is an open aspect we have today in Go related to this design: the <i>termination</i> of background activity.</p>
<p>As an example, let&#8217;s build a purposefully simplistic goroutine that sends lines across a channel:</p>
<pre>
type LineReader struct {
        Ch chan string
        r *bufio.Reader
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &#038;LineReader{make(chan string), bufio.NewReader(r)}
        go lr.loop()
        return lr
}
</pre>
<p>The type has a channel where the client can consume lines from, and an internal buffer<br />
used to produce the lines efficiently. Then, we have a function that creates an initialized<br />
reader, fires the reading loop, and returns. Nothing surprising there.</p>
<p>Now, let&#8217;s look at the loop itself:</p>
<pre>
func (lr *LineReader) loop() {
        for {
                line, err := lr.r.ReadSlice('\n')
                if err != nil {
                        close(lr.Ch)
                        return
                }
                lr.Ch <- string(line)
        }
}
</pre>
<p>In the loop we'll grab a line from the buffer, close the channel in case of errors and stop, or otherwise send the line to the other side, perhaps blocking while the other side is busy with other activities. Should sound sane and familiar to Go developers.</p>
<p>There are two details related to the termination of this logic, though: first, the error information is being dropped, and then there's no way to interrupt the procedure from outside in a clean way. The error might be easily logged, of course, but what if we wanted to store it in a database, or send it over the wire, or even handle it taking in account its nature? Stopping cleanly is also a valuable feature in many circumstances, like when one is driving the logic from a test runner.</p>
<p>I'm not claiming this is something <i>difficult</i> to do, by any means.  What I'm saying is that there isn't today an <i>idiom</i> for handling these aspects in a simple and consistent way. Or maybe there wasn't. The <i>tomb</i> package for Go is an experiment I'm releasing today in an attempt to address this problem.</p>
<p>The model is simple: a <i>Tomb</i> tracks whether the goroutine is alive, dying, or dead, and the death reason.</p>
<p>To understand that model, let's see the concept being applied to the LineReader example. As a first step, creation is tweaked to introduce Tomb support:</p>
<pre>
type LineReader struct {
        Ch chan string
        r *bufio.Reader
        <span style="color: blue">*tomb.Tomb</span>
}

func NewLineReader(r io.Reader) *LineReader {
        lr := &#038;LineReader{
                make(chan string),
                bufio.NewReader(r),
                <span style="color: blue">tomb.New(),</span>
        }
        go lr.loop()
        return lr
}
</pre>
<p>Looks very similar. Just a new field in the struct and its respective initialization. We've used it as an embedded field just so we can use the Tomb methods directly in the <i>lr</i> variable.</p>
<p>Next, the loop function is modified to support tracking of errors and interruptions:</p>
<pre>
func (lr *LineReader) loop() {
        <span style="color: blue">defer lr.Done()</span>
        for {
                line, err := lr.r.ReadSlice('\n')
                if err != nil {
                        close(lr.Ch)
                        <span style="color: blue">lr.Fatal(err)</span>
                        return
                }
                select {
                case lr.Ch <- string(line):
                <span style="color: blue">case <-lr.Dying:</span>
                        close(lr.Ch)
                        return
                }
        }
}
</pre>
<p>Note a few interesting points here: first, <i>Done</i> is called to track the goroutine termination right before the loop function returns. Then, the previously loose error now goes into the <i>Fatal</i> Tomb method, flagging the goroutine as dying. Finally, the channel send was tweaked so that it doesn't block in case the goroutine is dying for whatever reason.</p>
<p>A Tomb has both <i>Dying</i> and <i>Dead</i> channels, which are closed when the Tomb state changes accordingly. These channels enable explicit blocking until the state changes, and also to selectively unblock select statements in those cases, as done above.</p>
<p>With the loop modified as above, a Stop method can trivially be introduced to request the clean termination of the goroutine synchronously from outside:</p>
<pre>
func (lr *LineReader) Stop() os.Error {
        <span style="color: blue">lr.Fatal(tomb.Stop)</span>
        return <span style="color: blue">lr.Wait()</span>
}
</pre>
<p>In this case the <i>Fatal</i> method will put the goroutine in a dying state from outside, and <i>Wait</i> will block until the goroutine terminates itself and notifies via the <i>Done</i> method as seen before. This procedure behaves correctly even if the goroutine was already dead or in a dying state due to internal errors, because only the first call to Fatal with an actual error is recorded as the cause for the goroutine death. The <i>tomb.Stop</i> value is used as a reason when terminating cleanly without an actual error, and it causes Wait to return nil once the goroutine terminates, flagging a clean stop per common Go idioms.</p>
<p>(<b>UPDATE:</b> there was <a href="http://groups.google.com/group/golang-nuts/browse_thread/thread/383f7cabbb174460">a minor simplification</a> in the API since this post was originally written, and the paragraph above was adapted to cover the new API)</p>
<p>This is pretty much all that there is to it. When I started developing in Go I wondered if coming up with a good convention for this sort of problem would require more support from the language, such as some kind of goroutine state tracking in a similar way to what <a href="http://www.erlang.org/doc/reference_manual/processes.html">Erlang does</a> with its lightweight processes, but it turns out this is mostly a matter of organizing the workflow with existing building blocks.</p>
<p>The tomb package and its Tomb type are a tangible representation of a good convention for goroutine termination, with familiar method names inspired in existing idioms. If you want to make use of it, goinstall the package with:</p>
<pre>
$ goinstall launchpad.net/tomb
</pre>
<p>The API documentation with details is available at:</p>
<p><span style="padding-left: 2em;"><a href="http://goneat.org/lp/tomb">http://goneat.org/lp/tomb</a></span></p>
<p>Have fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2011/10/09/death-of-goroutines-under-control/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Efficient algorithm for expanding circular buffers</title>
		<link>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers</link>
		<comments>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers#comments</comments>
		<pubDate>Thu, 23 Dec 2010 12:57:40 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Article]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=580</guid>
		<description><![CDATA[Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the &#8220;Hello world!&#8221; days. They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc. &#8230; <a href="http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the <i>&#8220;Hello world!&#8221;</i> days.  They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc.</p>
<p>One feature which is not always desired, though, it the fact that circular buffers traditionally will either overwrite the last element, or raise an overflow error, since they are generally implemented as a buffer of <i>constant</i> size.  This is an unwanted property when one is attempting to <i>consume</i> items from the buffer and it is not an option to blindly drop items, for instance.</p>
<p>This post presents an efficient (and potentially novel) algorithm for implementing circular buffers which preserves most of the key aspects of the traditional version, while also supporting dynamic expansion when the buffer would otherwise have its oldest entry overwritten. It&#8217;s not clear if the described approach is novel or not (most of my novel ideas seem to have been written down 40 years ago), so I&#8217;ll publish it below and let you decide.</p>
<p><span id="more-580"></span><b>Traditional circular buffers</b></p>
<p>Before introducing the variant which can actually expand during use, let&#8217;s go through a quick review on traditional circular buffers, so that we can then reuse the nomenclature when extending the concept.  All the snippets provided in this post are written in Python, as a better alternative to pseudo-code, but the concepts are naturally portable to any other language.</p>
<p>So, the most basic circular buffer needs the buffer itself, its total capacity, and a position where the next write should occur.  The following snippet demonstrates the concept in practice:</p>
<pre>
buf = [None, None, None, None, None]
bufcap = len(buf)
pushi = 0   

for elem in range(7):
    buf[pushi] = elem
    pushi = (pushi + 1) % bufcap

print buf # => [5, 6, 2, 3, 4]
</pre>
<p>In the example above, the first two elements of the series (0 and 1) were overwritten once the pointer wrapped around. That&#8217;s the specific feature of circular buffers which the proposal in this post will offer an alternative for.</p>
<p>The snippet below provides a full implementation of the traditional approach, this time including both the pushing and popping logic, and raising an error when an overflow or underflow would occur.  Please note that these snippets are not necessarily idiomatic Python.  The intention is to highlight the algorithm itself.</p>
<pre>
class CircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.pushi = self.popi = 0
        self.bufcap = len(self.buf)

    def push(self, x):
        assert self.buflen == 0 or self.pushi != self.popi, \
               "Buffer overflow!"
        self.buf[self.pushi] = x
        self.pushi = (self.pushi + 1) % self.bufcap
        self.buflen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.popi = (self.popi + 1) % self.bufcap
        return x
</pre>
<p>With the basics covered, let&#8217;s look at how to extend this algorithm to support dynamic expansion in case of overflows.</p>
<p><b>Dynamically expanding a circular buffer</b></p>
<p>The approach consists in imagining that the same buffer can contain both a circular buffer area (referred to as <i>the ring area</i> from here on), and an overflow area, and that it is possible to transform a mixed buffer back into a pure circular buffer again.  To clarify what this means, some examples are presented below.  The full algorithm will be presented afterwards.</p>
<p>First, imagine that we have an empty buffer with a capacity of 5 elements as per the snippet above, and then the following operations take place:</p>
<pre>
for i in range(5):
    circbuf.push(i)

circbuf.pop() # => 0
circbuf.pop() # => 1

circbuf.push(5)
circbuf.push(6)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4</font>]
</pre>
<p>At this point we have a full buffer, and with the original implementation an additional push would raise an assertion error. To implement expansion, the algorithm will be changed so that those items will be appended at the end of the buffer.  Following the example, pushing two additional elements would behave the following way:</p>
<pre>
circbuf.push(7)
circbuf.push(8)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4,</font> <font color="red">7, 8</font>]
</pre>
<p>In that example, elements 7 and 8 are part of the overflow area, and the ring area remains with the same capacity and length of the original buffer. Let&#8217;s perform a few additional operations to see how it would behave when items are popped and pushed while the buffer is split:</p>
<pre>
circbuf.pop() # => 2
circbuf.pop() # => 3
circbuf.push(9)

print circbuf.buf # => [<font style="color: blue">5, 6,</font> None, None, <font style="color: blue">4,</font> <font style="color: red">7, 8, 9</font>]
</pre>
<p>In this case, even though there are two free slots available in the ring area, the last item pushed was still appended at the overflow area.  That&#8217;s necessary to preserve the FIFO semantics of the circular buffer, and means that the buffer may expand more than strictly necessary given the space available. In most cases this should be a reasonable trade off, and should stop happening once the circular buffer size stabilizes to reflect the production vs. consumption pressure (if you have a producer which constantly operates faster than a consumer, though, please look at the literature for plenty of advice on the problem).</p>
<p>The remaining interesting step in that sequence of events is the moment when the ring area capacity is expanded to cover the full allocated buffer again, with the previous overflow area being integrated into the ring area.  This will happen when the content of the previous partial ring area is fully consumed, as shown below:</p>
<pre>
circbuf.pop() # => 4
circbuf.pop() # => 5
circbuf.pop() # => 6
circbuf.push(10)

print circbuf.buf # => [<font style="color: blue">10,</font> None, None, None, None, <font style="color: blue">7, 8, 9</font>]
</pre>
<p>At this point, the whole buffer contains just a ring area and the overflow area is again empty, which means it becomes a traditional circular buffer.</p>
<p><b>Sample algorithm</b></p>
<p>With some simple modifications in the traditional implementation presented previously, the above semantics may be easily supported. Note how the additional properties did not introduce significant overhead. Of course, this version will incur in additional memory allocation to support the buffer expansion, bu that&#8217;s inherent to the problem being solved.</p>
<pre>
class ExpandingCircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringlen == self.ringcap or \
           self.ringcap != self.bufcap:
            self.buf.append(x)
            self.buflen += 1
            self.bufcap += 1
            if self.pushi == 0: # Optimization.
                self.ringlen = self.buflen
                self.ringcap = self.bufcap
        else:
            self.buf[self.pushi] = x
            self.pushi = (self.pushi + 1) % self.ringcap
            self.buflen += 1
            self.ringlen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.pushi = 0
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p>Note that the above algorithm will allocate each element in the list individually, but in sensible situations it may be better to allocate additional space for the overflow area in advance, to avoid potentially frequent reallocation.  In a situation when the rate of consumption of elements is about the same as the rate of production, for instance, there are advantages in doubling the amount of allocated memory per expansion.  Given the way in which the algorithm works, the previous ring area will be exhausted before the mixed buffer becomes circular again, so with a constant rate of production and an equivalent consumption it will effectively have its size doubled on expansion.</p>
<p><b>UPDATE:</b> Below is shown a version of the same algorithm which not only allows allocating more than one additional slot at a time during expansion, but also incorporates it in the overflow area immediately so that the allocated space is used optimally.</p>
<pre>
class ExpandingCircBuf2(object):

    def __init__(self):
        self.buf = []
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringcap != self.bufcap:
            expandbuf = (self.pushi == 0)
            expandring = False
        elif self.ringcap == self.ringlen:
            expandbuf = True
            expandring = (self.pushi == 0)
        else:
            expandbuf = False
            expandring = False

        if expandbuf:
            self.pushi = self.bufcap
            expansion = [None, None, None]
            self.buf.extend(expansion)
            self.bufcap += len(expansion)
            if expandring:
                self.ringcap = self.bufcap

        self.buf[self.pushi] = x
        self.buflen += 1
        if self.pushi < self.ringcap:
            self.ringlen += 1
        self.pushi = (self.pushi + 1) % self.bufcap

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p><b>Conclusion</b></p>
<p>This blog post presented an algorithm which supports the expansion of circular buffers while preserving most of their key characteristics.  When not faced with an overflowing buffer, the algorithm should offer very similar performance characteristics to a normal circular buffer, with a few additional instructions and constant space for registers only. When faced with an overflowing buffer, the algorithm maintains the FIFO property and enables using contiguous allocated memory to maintain both the original circular buffer and the additional elements, and follows up reusing the full area as part of a new circular buffer in an attempt to find the proper size for the given use case.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Vector clock support for Go</title>
		<link>http://blog.labix.org/2010/12/21/vector-clock-support-for-go</link>
		<comments>http://blog.labix.org/2010/12/21/vector-clock-support-for-go#comments</comments>
		<pubDate>Tue, 21 Dec 2010 18:03:47 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=564</guid>
		<description><![CDATA[One more Go library oriented towards building distributed systems hot off the presses: govclock. This one offers full vector clock support for the Go language. Vector clocks allow recording and analyzing the inherent partial ordering of events in a distributed &#8230; <a href="http://blog.labix.org/2010/12/21/vector-clock-support-for-go">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>One more Go library oriented towards building distributed systems hot off the presses: <a href="http://labix.org/govclock">govclock</a>. This one offers full <a href="http://en.wikipedia.org/wiki/Vector_clock">vector clock</a> support for the <a href="http://golang.org">Go language</a>.  Vector clocks allow recording and analyzing the inherent partial ordering of events in a distributed system in a comfortable way.</p>
<p>The following features are offered by govclock, in addition to basic event tracking:</p>
<p><span id="more-564"></span>
<ul>
<li>Compact serialization and deserialization
<li>Flexible truncation (min/max entries, min/max update time)
<li>Unit-independent update times
<li>Traditional merging
<li>Fast and memory efficient
</ul>
<p>If you&#8217;d like to know more about vector clocks, the Basho guys did a great job in the following pair of blog posts:</p>
<ul>
<li><a href="http://blog.basho.com/2010/01/29/why-vector-clocks-are-easy/">Why vector clocks are easy</a>
<li><a href="http://blog.basho.com/2010/04/05/why-vector-clocks-are-hard/">Why vector clocks are hard</a>
</ul>
<p>The following sample program demonstrates some sequential and concurrent events, dumping and loading, as well as merging of clocks.  For more details, please look at the <a href="http://labix.org/govclock">web page</a>.  The project is available under a BSD license.</p>
<pre>

package main

import (
    "launchpad.net/govclock"
    "fmt"
)

func main() {
    vc1 := govclock.New()
    vc1.Update([]byte("A"), 1)

    vc2 := vc1.Copy()
    vc2.Update([]byte("B"), 0)

    fmt.Println(vc2.Compare(vc1, govclock.Ancestor))   // => true
    fmt.Println(vc1.Compare(vc2, govclock.Descendant)) // => true

    vc1.Update([]byte("C"), 5)

    fmt.Println(vc1.Compare(vc2, govclock.Descendant)) // => false
    fmt.Println(vc1.Compare(vc2, govclock.Concurrent)) // => true

    vc2.Merge(vc1)

    fmt.Println(vc1.Compare(vc2, govclock.Descendant)) // => true

    data := vc2.Bytes()
    fmt.Printf("%#v\n", string(data))
    // => "\x01\x01\x01\x01A\x01\x01\x01B\x01\x00\x01C"

    vc3, err := govclock.FromBytes(data)
    if err != nil { panic(err.String()) }

    fmt.Println(vc3.Compare(vc2, govclock.Equal))      // => true
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/21/vector-clock-support-for-go/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Integrating Go with C: the ZooKeeper binding experience</title>
		<link>http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience</link>
		<comments>http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience#comments</comments>
		<pubDate>Fri, 10 Dec 2010 16:17:16 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Article]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=534</guid>
		<description><![CDATA[ZooKeeper is a clever generic coordination server for distributed systems, and is one of the core softwares which facilitate the development of Ensemble (project for automagic IaaS deployments which we push at Canonical), so it was a natural choice to &#8230; <a href="http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://zookeeper.apache.org">ZooKeeper</a> is a clever generic coordination server for distributed systems, and is one of the core softwares which facilitate the development of Ensemble (project for automagic <a href="http://en.wikipedia.org/wiki/Cloud_computing">IaaS</a> deployments which we push at <a href="http://www.canonical.com">Canonical</a>), so it was a natural choice to experiment with.</p>
<p><a href="https://wiki.ubuntu.com/gozk">Gozk</a> is a complete binding for ZooKeeper which explores the native features of Go to facilitate the interaction with a ZooKeeper server.  To avoid reimplementing the well tested bits of the protocol in an unstable way, Gozk is built on top of the standard C ZooKeeper library.</p>
<p>The experience of integrating ZooKeeper with Go was certainly valuable on itself, and worked as a nice way to learn the details of integrating the Go language with a C library. If you&#8217;re interested in learning a bit about Go, ZooKeeper, or other details related to the creation of bindings and asynchronous programming, please fasten the seatbelt now.</p>
<p><span id="more-534"></span><b>Basics of C wrapping in Go</b></p>
<p>Creating the binding on itself was a pretty interesting experiment already.  I have worked on the creation of quite a few bindings and language bridges before, and must say I was pleasantly surprised with the experience of creating the Go binding.  With <i>Cgo</i>, the name given to the &#8220;<i>foreign function interface</i>&#8221; mechanism for C integration, one basically declares a special import statement which causes a pre-processor to look at the comment preceding it.  Something similar to this:</p>
<pre>
// #include &lt;zookeeper.h&gt;
import "C"
</pre>
<p>The comment doesn&#8217;t have to be restricted to a single line, or to <i>#include</i> statements even.  The C code contained in the comment will be transparently inserted into a helper C file which is compiled and linked with the final object file, and the given snippet will also be parsed and inclusions processed.  In the Go side, that &#8220;C&#8221; import is simulated as if it were a normal Go package so that the C functions, types, and values are all directly accessible.</p>
<p>As an example, a C function with this prototype:</p>
<pre>
int zoo_wexists(zhandle_t *zh, const char *path, watcher_fn watcher,
                void *context, struct Stat *stat);
</pre>
<p>In Go may be used as:</p>
<pre>
cstat := C.struct_Stat{}
rc, cerr := C.zoo_wexists(zk.handle, cpath, nil, nil, &#038;cstat)
</pre>
<p>When the C function is used in a context where two result values are requested, as done above, Cgo will save the well known <i>errno</i> variable after the function has finished executing and will return it wrapped into an <i>os.Errno</i> value.</p>
<p>Also, note how the C struct is defined in a way that can be passed straight to the C function.  Interestingly, the allocation of the memory backing the structure is going to be performed and tracked by the Go runtime, and will be garbage collected appropriately once no more references exist <i>within the Go runtime</i>. This fact has to be kept in mind since the application will crash if a value allocated normally within Go is saved with a foreign C function and maintained after all the Go references are gone.  The alternative in these cases is to call the usual C functions to get hold of memory for the involved values.  That memory won&#8217;t be touched by the garbage collector, and, of course, must be explicitly freed when no longer necessary.  Here is a simple example showing explicit allocation:</p>
<pre>
cbuffer := (*C.char)(C.malloc(bufferSize))
defer C.free(unsafe.Pointer(cbuffer))
</pre>
<p>Note the use of the <i>defer</i> statement above. Even when dealing with foreign functionality, it comes in handy. The above call will ensure that the buffer is deallocated right before the current function returns, for instance, so it&#8217;s a nice way to ensure no leaks happen, even if in the future the function suddenly gets a new exit point which didn&#8217;t consider the allocation of resources.</p>
<p>In terms of typing, Go is more strict than C, and Cgo-based logic will also ensure that the types returned and passed into the foreign C functions are correctly typed, in the same way done for the native types.  Note above, for instance, how the call to the <i>free()</i> function has to explicitly convert the value into an <i>unsafe.Pointer</i>, even though in C no casting would be necessary to pass a pointer into a <i>void *</i> parameter.</p>
<p>The <i>unsafe.Pointer</i> is in fact a very special type within Go. Using it, one can convert any pointer type into any other pointer type in an unsafe way (thus the package name), and also back and forth into a <i>uintptr</i> value with the address of the memory referenced by the pointer.  For every other type conversion, Go will ensure at compilation time that doing the conversion at runtime is a safe operation.</p>
<p>With all of these resources, including the ability to use common Go syntax and functionality even when dealing with foreign types, values, and function calls, the integration task turns out to be quite a pleasant experience.  That said, some of the things may still require some good thinking to get right, as we&#8217;ll see shortly.</p>
<p><b>Watch callbacks and channels</b></p>
<p>One of the most interesting (and slightly tricky) aspects of mapping the ZooKeeper concepts into Go was the &#8220;watch&#8221; functionality.  ZooKeeper allows one to attach a &#8220;watch&#8221; to a node so that the server will report back when changes happen to the given node.  In the C library, this functionality is exposed via a callback function which is executed once the monitored node aspect is modified.</p>
<p>It would certainly be possible to offer this functionality in Go using a similar mechanism, but <a href="http://golang.org/doc/go_spec.html#Channel_types">Go channels</a> provide a number of advantages for that kind of asynchronous notification: waiting for multiple events via the <a href="http://golang.org/doc/go_spec.html#Select_statements">select statement</a>, synchronous blocking until the event happens, testing if the event is already available, etc.</p>
<p>The tricky bit, though, isn&#8217;t the use of channels.  That part is quite simple.  The tricky detail is that the C callback function execution happens in a C thread started by the ZooKeeper library, and happens asynchronously, while the Go application is doing its business elsewhere.  Right now, there&#8217;s no straightforward way to transfer the execution of this asynchronous C function back into the Go land.  The solution for this problem was found with some help from the folks at the <a href="">golang-nuts</a> mailing list, and luckily it&#8217;s not that hard to support or understand.  That said, this is a good opportunity to get some coffee or your preferred focus-enhancing drink.</p>
<p>The solution works like this: when the ZooKeeper C library gets a watch notification, it executes a C callback function which is inside a Gozk helper file. Rather than transferring control to Go right away, this C function simply appends data about the event onto a queue, and signals a pthread condition variable to notify that an event is available.  Then, on the Go side, once the first ZooKeeper connection is initialized, a new goroutine is fired and loops waiting for events to be available.  The interesting detail about this loop, is that it blocks <i>within a foreign C function</i> waiting for an event to be available, through the signaling of the shared pthread condition variable.  In the Go side, that&#8217;s how the call looks like, just to give a more practical feeling:</p>
<pre>
// This will block until there's a watch available.
data := C.wait_for_watch()
</pre>
<p>Then, on the C side, here is the function definition:</p>
<pre>
watch_data *wait_for_watch() {
    watch_data *data = NULL;
    pthread_mutex_lock(&#038;watch_mutex);
    if (first_watch == NULL)
        pthread_cond_wait(&#038;watch_available, &#038;watch_mutex);
    data = first_watch;
    first_watch = first_watch->next;
    pthread_mutex_unlock(&#038;watch_mutex);
    return data;
}
</pre>
<p>As you can see, not really a big deal.  When that kind of blocking occurs inside a foreign C function, the Go runtime will correctly continue the execution of other goroutines within other operating system threads.</p>
<p>The result of this mechanism is a nice to use interface based on channels, which may be explored in different ways depending on the application needs.  Here is a simple example blocking on the event synchronously, for instance:</p>
<pre>
stat, watch, err := zk.ExistsW("/some/path")
if stat == nil &#038;&#038; err == nil {
    event := <-watch
    // Use event ...
}
</pre>
<p><b>Concluding</b></p>
<p>Those were some of the interesting aspects of implementing the ZooKeeper binding.  I would like to speak about some additional details, but this post is rather long already, so I'll keep that for a future opportunity.  The code is available under the LGPL, so if you're curious about some other aspect, or would like to use ZooKeeper with Go, please move on and <a href="https://wiki.ubuntu.com/gozk">check it out</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/10/integrating-go-with-c-the-zookeeper-binding-experience/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Removing seatbelts with the Go language for mmap support</title>
		<link>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support</link>
		<comments>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support#comments</comments>
		<pubDate>Sun, 28 Nov 2010 18:33:29 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=505</guid>
		<description><![CDATA[Continuing the sequence of experiments I&#8217;ve been running with the Go language, I&#8217;ve just made available a tiny but useful new package: gommap. As one would imagine, this new package provides access to low-level memory mapping for files and devices, &#8230; <a href="http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Continuing the sequence of experiments I&#8217;ve been running with the Go language, I&#8217;ve just made available a tiny but useful new package: <a href="http://labix.org/gommap">gommap</a>. As one would imagine, this new package provides access to low-level memory mapping for files and devices, and it allowed exploring a few new edges of the language implementation.  Note that, strictly speaking, some of the details ahead are really more about the implementation than the <i>language</i> itself.</p>
<p><span id="more-505"></span>There were basically two main routes to follow when implementing support for memory mapping in Go.  The first one is usually the way higher-level languages handle it.  In Python, for instance, this is the way one may use a memory mapped file:</p>
<pre>
>>> import mmap
>>> file = open("/etc/passwd")
>>> mm = mmap.mmap(file.fileno(), size, access=PROT_READ)
>>> mm[0:4]
'root'
</pre>
<p>The way this was done has an advantage and a disadvantage which are perhaps non entirely obvious on a first look.  The advantage is that the memory mapped area is truly hidden behind that interface, so any improper attempt to access a region which was already unmapped, for instance, may be blocked within the application with a nice error message which explains the issue.  The disadvantage, though, is that this interface usually comes with a restriction that the way to use the memory region with normal libraries, is via copying of data.  In the above example, for instance, the &#8220;root&#8221; string isn&#8217;t backed by the original mapped memory anymore, and is rather a copy of its contents (see <a href="http://www.python.org/dev/peps/pep-3118/">PEP 3118</a> for a way to improve a bit this aspect with Python).</p>
<p>The other path, which can be done with Go, is to back a normal native array type with the allocated memory.  This means that normal libraries don&#8217;t need to copy data out of the mapped memory, or to use a special memory saving interface, to deal with the memory mapped region.  As a simple example, this would get the first line in the given file:</p>
<pre>
mmap, err := gommap.Map(file.Fd(), PROT_READ, MAP_PRIVATE)
if err == nil {
    end := bytes.Index(mmap, []byte{'\n'})
    firstLine := mmap[:end]
}
</pre>
<p>In the procedure above, <i>mmap</i> is defined as an alias to a native <i>[]byte</i> array, so even though the standard <i>bytes</i> module was used, at no point was the data from the memory mapped region copied out or any auxiliary buffers allocated, so this is a <i>very</i> fast operation.  To give an idea about this, let&#8217;s pretend for a moment that we want to increase a simple 8 bit counter in a file.  This might be done with something as simple as:</p>
<pre>
mmap[13] += 1
</pre>
<p>This line of code would be compiled into something similar to the following assembly (amd64):</p>
<pre>
MOVQ    mmap+-32(SP),BX
CMPL    8(BX),$13
JHI     ,68
CALL    ,runtime.panicindex+0(SB)
MOVQ    (BX),BX
INCB    ,13(BX)
</pre>
<p>As you can see, this is just doing some fast index checking before incrementing the value <i>directly in memory</i>. Given that one of the important reasons why memory mapped files are used is to speed up access to disk files (sometimes <i>large</i> disk files), this advantage in performance is actually meaningful in this context.</p>
<p>Unfortunately, though, doing things this way also has an important disadvantage, at least right now.  There&#8217;s no way at the moment to track references to the underlying memory, which was allocated by means not known to the Go runtime.  This means that <i>unmapping</i> this memory is not a safe operation.  The munmap system call will simply take the references away from the process, and any further attempt to touch those areas will crash the application.</p>
<p>To give you an idea about the background &#8220;magic&#8221; which is going on to achieve this support in Go, here is an interesting excerpt from the underlying mmap syscall as of this writing:</p>
<pre>
addr, _, errno := syscall.Syscall6(syscall.SYS_MMAP, (...))
(...)
dh := (*reflect.SliceHeader)(unsafe.Pointer(&#038;mmap))
dh.Data = addr
dh.Len = int(length)
dh.Cap = dh.Len
</pre>
<p>As you can see, this is taking apart the memory backing the slice value into its constituting structure, and altering it to point to the mapped memory, including information about the length mapped so that bound checking as observed in the assembly above will work correctly.</p>
<p>In case the garbage collector is at some point extended to track references to these foreign regions, it would be possible to implement some kind of <i>UnmapOnGC()</i> method which would only unmap the memory once the last reference is gone.  For now, though, the advantages of being able to reference memory mapped regions directly, at least to me, surpass the danger of having improper slices of the given region being used after unmapping.  Also, I expect that usage of this kind of functionality will generally be encapsulated within higher level libraries, so it shouldn&#8217;t be too hard to keep the constraint in mind while using it this way.</p>
<p>For those reasons, <a href="http://labix.org/gommap">gommap</a> was implemented with the latter approach.  In case you need memory mapping support for Go, just move ahead and <i>goinstall launchpad.net/gommap</i>.</p>
<p><b>UPDATE (2010-12-02):</b> The interface was updated so that mmap itself is an array, rather than mmap.Data, and this post was changed to reflect this.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The forgotten art of error checking</title>
		<link>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking</link>
		<comments>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking#comments</comments>
		<pubDate>Thu, 17 Jun 2010 15:15:59 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Snippet]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=275</guid>
		<description><![CDATA[I was just rambling randomly yesterday, in the usual microblogging platforms, about how result checking seems to be ignored or done badly. The precise wording was: It&#8217;s really amazing how little attention error handling receives in most software development. Even &#8230; <a href="http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I was just rambling randomly yesterday, in the usual microblogging platforms, about how result checking seems to be ignored or done badly.  The precise wording was:</p>
<blockquote><p>
It&#8217;s really amazing how little attention error handling receives in most software development. Even *tutorials* often ignore it.
</p></blockquote>
<p>It indeed does amaze me.  It sometimes feels like we write code for theoretical perfect worlds.. <i>&#8220;If the processor executes exactly in this order, and the weather is calm, this program will work.&#8221;</i>.  There are countless examples of bad assumptions.. someday I will come with some statistics of the form <i>&#8220;Every N seconds someone forgets to check the result of write().&#8221;</i>.</p>
<p><span id="more-275"></span></p>
<p>If you are a teacher, or a developer that enjoys writing snippets of code to teach people, please join me in the quest of building a better future.  Do <i>not</i> tell us that you&#8217;re &#8220;avoiding result checking for terseness&#8221;, because that&#8217;s exactly what we people will do (terseness is good, right?).  On the contrary, take this chance to make us feel <i>bad</i> about avoiding result checking.  You might do this by putting a comment like &#8220;If you don&#8217;t do this, you&#8217;re a bad programmer.&#8221; right next to the logic which is handling the result, and might take this chance to teach people how proper result handling is done.</p>
<p>Of course, there&#8217;s another forgotten art related to result checking.  It sits on the other side of the fence.  If you are a library author, do think through about how you plan to make us check conditions which happen inside your library, and try to imagine how to make our lives easier.  If we suck at handling results when there are obvious ways to handle it, you can imagine what happens when you structure your result logic badly.</p>
<p>Here is a clear example of what <i>not</i> to do, coming straight from Python&#8217;s standard library, in the <i>imaplib</i> module:</p>
<pre>
    def login(self, user, password):
        typ, dat = self._simple_command('LOGIN', user, self._quote(password))
        if typ != 'OK':
            raise self.error(dat[-1])
        self.state = 'AUTH'
        return typ, dat
</pre>
<p>You see the problem there?  How do you handle errors from this library?  Should we catch the exception, or should we verify the result code? <i>&#8220;Both!&#8221;</i> is the right answer, unfortunately, because the author decided to do us a little favor and check the error condition himself in some arbitrary cases and raise the error, while letting it go through and end up in the result code in a selection of other arbitrary cases.</p>
<p>I may provide some additional advice on result handling in the future, but for now I&#8217;ll conclude with the following suggestion: please check the results from your actions, and help others to check theirs.  That&#8217;s a good life-encompassing recommendation, actually.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Mocker 0.10 and trivial patch-mocking of existing objects</title>
		<link>http://blog.labix.org/2007/12/09/mocker-010-and-trivial-patch-mocking-of-existing-objects</link>
		<comments>http://blog.labix.org/2007/12/09/mocker-010-and-trivial-patch-mocking-of-existing-objects#comments</comments>
		<pubDate>Sun, 09 Dec 2007 23:07:13 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Snippet]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/2007/12/09/mocker-010-and-trivial-patch-mocking-of-existing-objects/</guid>
		<description><![CDATA[Mocker 0.10 is out, with a number of improvements! While we&#8217;re talking about Mocker, here is another interesting use case, exploring a pretty unique feature it offers. Suppose we want to test that a method hello() on an object will &#8230; <a href="http://blog.labix.org/2007/12/09/mocker-010-and-trivial-patch-mocking-of-existing-objects">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://labix.org/mocker">Mocker</a> 0.10 is out, with a <a href="https://launchpad.net/mocker/trunk/0.10">number of improvements</a>!</p>
<p>While we&#8217;re talking about Mocker, here is another interesting use case, exploring a pretty unique feature it offers.</p>
<p>Suppose we want to test that a method <i>hello()</i> on an object will call <i>self.show(&#8220;Hello world!&#8221;)</i> at some point.  Let&#8217;s say that the code we want to test is this:</p>
<pre>
 class Greeting(object):

     def show(self, sentence):
         print sentence

     def hello(self):
         self.show("Hello world!")
</pre>
<p>This is the <i>entire</i> test method:</p>
<pre>
def test_hello(self):
    # Define expectation.
    mock = self.mocker.patch(Greeting)
    mock.show("Hello world!")
    self.mocker.replay()

    # Rock on!
    Greeting().hello()
</pre>
<p>This has helped me in practice a few times already, when testing some involved situations.</p>
<p>Note that you can also <i>passthrough</i> the call.  In other words, the call may actually be made on the real method, and mocker will just assert that the call was really made, whatever the effect is.</p>
<p>One more important point: mocker ensures that the real method <i>exists</i> in the real object, and has a specification compatible with the call made.  If it doesn&#8217;t, and assertion error is raised in the test with a nice error message.</p>
<p><b>UPDATE:</b> <i>The method for doing this is actually mocker.patch() rather than mocker.mock(), as documented. Apologies.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2007/12/09/mocker-010-and-trivial-patch-mocking-of-existing-objects/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Partial stubbing of os.path.isfile() with Mocker</title>
		<link>http://blog.labix.org/2007/11/22/partial-stubbing-of-ospathisfile-with-mocker</link>
		<comments>http://blog.labix.org/2007/11/22/partial-stubbing-of-ospathisfile-with-mocker#comments</comments>
		<pubDate>Thu, 22 Nov 2007 23:27:14 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Snippet]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/2007/11/22/partial-stubbing-of-ospathisfile-with-mocker/</guid>
		<description><![CDATA[One neat feature which Mocker offers is the ability to very easily implement custom behavior on specific functions or methods. Take for instance the case where you want to pretend to some code that a given file exists, but you &#8230; <a href="http://blog.labix.org/2007/11/22/partial-stubbing-of-ospathisfile-with-mocker">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>One neat feature which Mocker offers is the ability to very easily implement custom behavior on specific functions or methods.</p>
<p>Take for instance the case where you want to pretend to some code that a given file exists, but you don&#8217;t want to get on the way of everything else which needs the same function: </p>
<pre>
>>> from mocker import *
>>> mocker = Mocker()
>>> isfile = mocker.replace("os.path.isfile", count=False)
>>> _ = expect(isfile("/non/existent")).result(True)
>>> _ = expect(isfile(ANY)).passthrough()

>>> mocker.replay()

>>> import os
>>> os.path.isfile("/non/existent")
True
>>> os.path.isfile("/etc/passwd")
True
>>> os.path.isfile("/other")
False

>>> mocker.restore()

>>> os.path.isfile("/non/existent")
False
</pre>
<p>Notice that the <i>count=False</i> parameter is available in version 0.9.2.  Without it Mocker will act in a more <i>mocking-strict</i> way and enforce that the given expressions should be executed precisely the given number of times (which defaults to one, and may be modified with the <i>count()</i> method). </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2007/11/22/partial-stubbing-of-ospathisfile-with-mocker/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python&#8217;s os.environ</title>
		<link>http://blog.labix.org/2007/10/17/pythons-osenviron</link>
		<comments>http://blog.labix.org/2007/10/17/pythons-osenviron#comments</comments>
		<pubDate>Wed, 17 Oct 2007 17:16:11 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/2007/10/17/pythons-osenviron/</guid>
		<description><![CDATA[As Chris Armstrong pointed out yesterday, os.environ.pop() is broken in Python versions at least up to 2.5. The method will simply remove the entry from the in-memory dictionary which holds a copy of the environment: >>> import os >>> os.system("echo &#8230; <a href="http://blog.labix.org/2007/10/17/pythons-osenviron">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As <a href="http://radix.twistedmatrix.com/2007/10/dont-expect-osenvironpop-to-do-anything.html">Chris Armstrong pointed out</a> yesterday, os.environ.pop() is broken in Python versions at least up to 2.5.  The method will simply remove the entry from the in-memory dictionary which holds <i>a copy</i> of the environment:</p>
<pre>
>>> import os
>>> os.system("echo $ASD")

0
>>> os.environ["ASD"] = "asd"
>>> os.system("echo $ASD")
asd
0
>>> os.environ.pop("ASD")
'asd'
>>> os.system("echo $ASD")
asd
0
</pre>
<p>I can understand that the interface of dictionaries has evolved since os.environ was originally planned, and the os.environ.pop method was overlooked for a while.  What surprises me a bit, though, is why it was originally designed the way it is.  First, the interface will completely ignore new methods added to the dictionary interface, and they <b>will apparently work</b>. Then, why use a <i>copy</i> of the environment in the first place?  This will mean that any changes to the <i>real</i> environment are not seen.</p>
<p>This sounds like something somewhat simple to do right.  Here is a working hack using ctypes to show an example of the behavior I&#8217;d expect out of os.environ (Python 2.5 on Ubuntu Linux):</p>
<pre>
from ctypes import cdll, c_char_p, POINTER
from UserDict import DictMixin
import os

c_char_pp = POINTER(c_char_p)

class Environ(DictMixin):

    def __init__(self):
        self._process = cdll.LoadLibrary(None)
        self._getenv = self._process.getenv
        self._getenv.restype = c_char_p
        self._getenv.argtypes = [c_char_p]

    def keys(self):
        result = []
        environ = c_char_pp.in_dll(self._process, "environ")
        i = 0
        while environ[i]:
            result.append(environ[i].split("=", 1)[0])
            i += 1
        return result

    def __getitem__(self, key):
        value = self._getenv(key)
        if value is None:
            raise KeyError(key)
        return value

    def __setitem__(self, key, value):
        os.putenv(key, value)

    def __delitem__(self, key):
        os.unsetenv(key)
</pre>
<p>I may be missing some implementation detail which would explain the original design.  If not, I suggest we just change the implementation to something equivalent (without ctypes).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2007/10/17/pythons-osenviron/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

