<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Labix Blog &#187; Python</title>
	<atom:link href="http://blog.labix.org/tag/python/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.labix.org</link>
	<description>by Gustavo Niemeyer</description>
	<lastBuildDate>Mon, 16 Jan 2012 04:02:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Bazaar, the git way</title>
		<link>http://blog.labix.org/2012/01/16/bazaar-the-git-way</link>
		<comments>http://blog.labix.org/2012/01/16/bazaar-the-git-way#comments</comments>
		<pubDate>Mon, 16 Jan 2012 04:02:51 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Conference]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[RCS]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=935</guid>
		<description><![CDATA[Back at the Ubuntu Platform Rally last week, I&#8217;ve pestered some of the Bazaar team with questions about co-location of branches in the same directory with Bazaar. The great news is that this seems to be really coming for the &#8230; <a href="http://blog.labix.org/2012/01/16/bazaar-the-git-way">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Back at the Ubuntu Platform Rally last week, I&#8217;ve pestered some of the Bazaar team with questions about co-location of branches in the same directory with Bazaar. The great news is that this seems to be really coming for the next release, with first-class integration of the feature in the command set. Unfortunately, though, it&#8217;s not quite yet ready for prime time, or even for <i>I&#8217;m-crazy-and-want-this-feature</i> time.</p>
<p>Some background on why this feature turns out to be quite important right now may be interesting, since life with Bazaar in the past years hasn&#8217;t really brought that up as a blocker. <span id="more-935"></span>The cause for the new interest lies in some recent changes in the toolset of the Go language. The new <i>go</i> tool not only makes building and interacting with Go packages a breeze, but it also solves a class of problems previously existent. For the <i>go</i> tool to work, though, it requires the use of $GOPATH consistently, and this means that the package has to live in a <i>well defined directory</i>. The traditional way that Bazaar manages branches into their own directories becomes a deal breaker then.</p>
<p>So, last week I had the chance to exchange some ideas with Jelmer Vernooij and Vincent Ladeuil (both Bazaar hackers) on these problems, and they introduced me to the approach of using lightweight checkouts to workaround some of the limitations. Lightweight checkouts in Bazaar makes the working tree resemble a little bit the old-style VCS tools, with the working tree being bound to another location that actually has the core content. The idea is great, and given how well lightweight checkouts work with Bazaar, building a full fledged solution shouldn&#8217;t be a lot of work really.</p>
<p>After that conversation, I&#8217;ve put a trivial hack together that would make bzr look like git from the outside, by wrapping the command line, and did a lightning talk demo. This got a few more people interested on the concept, which was enough motivation for me to move the idea forward onto a working implementation. Now I just needed the time to do it, but it wasn&#8217;t too hard to find it either.</p>
<p>I happen to be part of the unlucky group that too often takes more than 24 hours to get back home from these events. This is not entirely bad, though.. I also happen to be part of the lucky group that can code while flying and riding buses as means to relieve the boredom (reading helps too). This time around, <a href="http://labix.org/cobzr">cobzr</a> became the implementation of choice, and given ~10 hours of coding, we have a very neat and over-engineered wrapper for the bzr command.</p>
<p>The core of the implementation is the same as the original hack: wrap bzr and call it from outside to restructure the tree. That said, rather than being entirely lazy and hackish line parsing, it actually parses bzr&#8217;s &#8211;help output for commands to build a base of supported options, and parses the command line exactly like Bazaar itself would, validating options as it goes and distinguishing between flags with arguments from positional parameters. That enables the proxying to do much more interesting work on the intercepted arguments.</p>
<p>Here is a quick session that shows a branch being created with the tool. It should look fairly familiar for someone used to git:</p>
<p><code><br />
[~]% bzr branch lp:juju<br />
Branched 443 revisions.                                                                                                                       </p>
<p>[~]% cd juju<br />
[~/juju]% bzr branch<br />
* master</p>
<p>[~/juju]% bzr checkout -b new-feature<br />
Shared repository with trees (format: 2a)<br />
Location:<br />
  shared repository: .bzr/cobzr<br />
Branched 443 revisions.<br />
Branched 443 revisions.<br />
Tree is up to date at revision 443.<br />
Switched to branch: /home/niemeyer/juju/.bzr/cobzr/new-feature/</p>
<p>[~/juju]% bzr branch other-feature<br />
Branched 443 revisions.                                                                                                                       </p>
<p>[~/juju]% bzr branch<br />
&nbsp;&nbsp;master<br />
* new-feature<br />
&nbsp;&nbsp;other-feature<br />
</code></p>
<p>Note that cobzr will not reorganize the tree layout before the multiple branch support is required.</p>
<p>Even though the wrapping is taking place and bzr&#8217;s &#8211;help output is parsed, there&#8217;s pretty much no noticeable overhead given the use of Go for the implementation and also that the processed output of &#8211;help is cached (I <i>said</i> it was overengineered).</p>
<p>As an example, the first is the real bzr, while the second is a link to cobzr:</p>
<p><code><br />
[~/juju]% time /usr/bin/bzr status<br />
/usr/bin/bzr status  0.24s user 0.03s system 88% cpu 0.304 total</p>
<p>[~/juju]% time bzr status<br />
bzr status  0.19s user 0.08s system 88% cpu 0.307 total<br />
</code></p>
<p>This should be more than enough for surviving comfortably until bzr itself comes along with first class support for co-located branches in the next release.</p>
<p>In case you&#8217;re interested in using it or are just curious about the command set or other details, please check out the web page for the project:</p>
<ul>
<li><a href="http://labix.org/cobzr">http://labix.org/cobzr</a>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2012/01/16/bazaar-the-git-way/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Ensemble, Go, and MongoDB at Canonical</title>
		<link>http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical</link>
		<comments>http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical#comments</comments>
		<pubDate>Fri, 05 Aug 2011 03:49:14 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=706</guid>
		<description><![CDATA[About 1 year after development started in Ensemble, today the stars finally aligned just the right way (review queue mostly empty, no other pressing needs, etc) for me to start writing the specification about the repository system we&#8217;ve been jointly &#8230; <a href="http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>About 1 year after development started in <a href="https://ensemble.ubuntu.com">Ensemble</a>, today the stars finally aligned just the right way (review queue mostly empty, no other pressing needs, etc) for me to start writing the specification about the repository system we&#8217;ve been jointly planning for a long time. This is the system that the Ensemble client will communicate with for discovering which <a href="https://ensemble.ubuntu.com/docs/formula.html">formulas</a> are available, for publishing new formulas, for obtaining formula files for deployment, and so on.</p>
<p><span id="more-706"></span>We of course would have liked for this part of the project to have been specified and written a while ago, but unfortunately that wasn&#8217;t possible for several reasons. That said, there are also good sides of having an important piece flying around in minds and conversations for such a long time: sitting down to specify the system and describe the inner-working details has been a breeze. Even details such as the namespacing of formulas, which hasn&#8217;t been entirely clear in my mind, was just streamed into the document as the ideas we&#8217;ve been evolving finally got together in a written form. </p>
<p>One curious detail: this is the first long term project at <a href="https://www.canonical.com">Canonical</a> that will be developed in <a href="http://golang.org">Go</a>, rather than Python or C/C++, which are the most used languages for projects within Canonical. Not only that, but we&#8217;ll also be using <a href="http://www.mongodb.org">MongoDB</a> for a change, rather than the traditional <a href="http://www.postgresql.com">PostgreSQL</a>, and will also use (you guessed) the <a href="http://labix.org/mgo">mgo driver</a> which I&#8217;ve been pushing entirely as a personal project for about 8 months now.</p>
<p>Naturally, with so many moving parts that are new to the company culture, this is still being seen as a closely watched experiment. Still, this makes me highly excited, because when I started developing mgo, the MongoDB driver for Go, my hopes that the Go, MongoDB, and mgo trio would eventually be used at Canonical were very low, precisely because they were all alien to the culture. We only got here after quite a lot of internal debate, experiments, and trust too.</p>
<p>All of that means these are happy times. Important feature in Ensemble being specified and written, very exciting tools, home grown software being useful..</p>
<p>Awesomeness.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2011/08/05/ensemble-go-and-mongodb-at-canonical/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Efficient algorithm for expanding circular buffers</title>
		<link>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers</link>
		<comments>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers#comments</comments>
		<pubDate>Thu, 23 Dec 2010 12:57:40 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Article]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=580</guid>
		<description><![CDATA[Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the &#8220;Hello world!&#8221; days. They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc. &#8230; <a href="http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the <i>&#8220;Hello world!&#8221;</i> days.  They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc.</p>
<p>One feature which is not always desired, though, it the fact that circular buffers traditionally will either overwrite the last element, or raise an overflow error, since they are generally implemented as a buffer of <i>constant</i> size.  This is an unwanted property when one is attempting to <i>consume</i> items from the buffer and it is not an option to blindly drop items, for instance.</p>
<p>This post presents an efficient (and potentially novel) algorithm for implementing circular buffers which preserves most of the key aspects of the traditional version, while also supporting dynamic expansion when the buffer would otherwise have its oldest entry overwritten. It&#8217;s not clear if the described approach is novel or not (most of my novel ideas seem to have been written down 40 years ago), so I&#8217;ll publish it below and let you decide.</p>
<p><span id="more-580"></span><b>Traditional circular buffers</b></p>
<p>Before introducing the variant which can actually expand during use, let&#8217;s go through a quick review on traditional circular buffers, so that we can then reuse the nomenclature when extending the concept.  All the snippets provided in this post are written in Python, as a better alternative to pseudo-code, but the concepts are naturally portable to any other language.</p>
<p>So, the most basic circular buffer needs the buffer itself, its total capacity, and a position where the next write should occur.  The following snippet demonstrates the concept in practice:</p>
<pre>
buf = [None, None, None, None, None]
bufcap = len(buf)
pushi = 0   

for elem in range(7):
    buf[pushi] = elem
    pushi = (pushi + 1) % bufcap

print buf # => [5, 6, 2, 3, 4]
</pre>
<p>In the example above, the first two elements of the series (0 and 1) were overwritten once the pointer wrapped around. That&#8217;s the specific feature of circular buffers which the proposal in this post will offer an alternative for.</p>
<p>The snippet below provides a full implementation of the traditional approach, this time including both the pushing and popping logic, and raising an error when an overflow or underflow would occur.  Please note that these snippets are not necessarily idiomatic Python.  The intention is to highlight the algorithm itself.</p>
<pre>
class CircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.pushi = self.popi = 0
        self.bufcap = len(self.buf)

    def push(self, x):
        assert self.buflen == 0 or self.pushi != self.popi, \
               "Buffer overflow!"
        self.buf[self.pushi] = x
        self.pushi = (self.pushi + 1) % self.bufcap
        self.buflen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.popi = (self.popi + 1) % self.bufcap
        return x
</pre>
<p>With the basics covered, let&#8217;s look at how to extend this algorithm to support dynamic expansion in case of overflows.</p>
<p><b>Dynamically expanding a circular buffer</b></p>
<p>The approach consists in imagining that the same buffer can contain both a circular buffer area (referred to as <i>the ring area</i> from here on), and an overflow area, and that it is possible to transform a mixed buffer back into a pure circular buffer again.  To clarify what this means, some examples are presented below.  The full algorithm will be presented afterwards.</p>
<p>First, imagine that we have an empty buffer with a capacity of 5 elements as per the snippet above, and then the following operations take place:</p>
<pre>
for i in range(5):
    circbuf.push(i)

circbuf.pop() # => 0
circbuf.pop() # => 1

circbuf.push(5)
circbuf.push(6)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4</font>]
</pre>
<p>At this point we have a full buffer, and with the original implementation an additional push would raise an assertion error. To implement expansion, the algorithm will be changed so that those items will be appended at the end of the buffer.  Following the example, pushing two additional elements would behave the following way:</p>
<pre>
circbuf.push(7)
circbuf.push(8)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4,</font> <font color="red">7, 8</font>]
</pre>
<p>In that example, elements 7 and 8 are part of the overflow area, and the ring area remains with the same capacity and length of the original buffer. Let&#8217;s perform a few additional operations to see how it would behave when items are popped and pushed while the buffer is split:</p>
<pre>
circbuf.pop() # => 2
circbuf.pop() # => 3
circbuf.push(9)

print circbuf.buf # => [<font style="color: blue">5, 6,</font> None, None, <font style="color: blue">4,</font> <font style="color: red">7, 8, 9</font>]
</pre>
<p>In this case, even though there are two free slots available in the ring area, the last item pushed was still appended at the overflow area.  That&#8217;s necessary to preserve the FIFO semantics of the circular buffer, and means that the buffer may expand more than strictly necessary given the space available. In most cases this should be a reasonable trade off, and should stop happening once the circular buffer size stabilizes to reflect the production vs. consumption pressure (if you have a producer which constantly operates faster than a consumer, though, please look at the literature for plenty of advice on the problem).</p>
<p>The remaining interesting step in that sequence of events is the moment when the ring area capacity is expanded to cover the full allocated buffer again, with the previous overflow area being integrated into the ring area.  This will happen when the content of the previous partial ring area is fully consumed, as shown below:</p>
<pre>
circbuf.pop() # => 4
circbuf.pop() # => 5
circbuf.pop() # => 6
circbuf.push(10)

print circbuf.buf # => [<font style="color: blue">10,</font> None, None, None, None, <font style="color: blue">7, 8, 9</font>]
</pre>
<p>At this point, the whole buffer contains just a ring area and the overflow area is again empty, which means it becomes a traditional circular buffer.</p>
<p><b>Sample algorithm</b></p>
<p>With some simple modifications in the traditional implementation presented previously, the above semantics may be easily supported. Note how the additional properties did not introduce significant overhead. Of course, this version will incur in additional memory allocation to support the buffer expansion, bu that&#8217;s inherent to the problem being solved.</p>
<pre>
class ExpandingCircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringlen == self.ringcap or \
           self.ringcap != self.bufcap:
            self.buf.append(x)
            self.buflen += 1
            self.bufcap += 1
            if self.pushi == 0: # Optimization.
                self.ringlen = self.buflen
                self.ringcap = self.bufcap
        else:
            self.buf[self.pushi] = x
            self.pushi = (self.pushi + 1) % self.ringcap
            self.buflen += 1
            self.ringlen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.pushi = 0
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p>Note that the above algorithm will allocate each element in the list individually, but in sensible situations it may be better to allocate additional space for the overflow area in advance, to avoid potentially frequent reallocation.  In a situation when the rate of consumption of elements is about the same as the rate of production, for instance, there are advantages in doubling the amount of allocated memory per expansion.  Given the way in which the algorithm works, the previous ring area will be exhausted before the mixed buffer becomes circular again, so with a constant rate of production and an equivalent consumption it will effectively have its size doubled on expansion.</p>
<p><b>UPDATE:</b> Below is shown a version of the same algorithm which not only allows allocating more than one additional slot at a time during expansion, but also incorporates it in the overflow area immediately so that the allocated space is used optimally.</p>
<pre>
class ExpandingCircBuf2(object):

    def __init__(self):
        self.buf = []
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringcap != self.bufcap:
            expandbuf = (self.pushi == 0)
            expandring = False
        elif self.ringcap == self.ringlen:
            expandbuf = True
            expandring = (self.pushi == 0)
        else:
            expandbuf = False
            expandring = False

        if expandbuf:
            self.pushi = self.bufcap
            expansion = [None, None, None]
            self.buf.extend(expansion)
            self.bufcap += len(expansion)
            if expandring:
                self.ringcap = self.bufcap

        self.buf[self.pushi] = x
        self.buflen += 1
        if self.pushi < self.ringcap:
            self.ringlen += 1
        self.pushi = (self.pushi + 1) % self.bufcap

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p><b>Conclusion</b></p>
<p>This blog post presented an algorithm which supports the expansion of circular buffers while preserving most of their key characteristics.  When not faced with an overflowing buffer, the algorithm should offer very similar performance characteristics to a normal circular buffer, with a few additional instructions and constant space for registers only. When faced with an overflowing buffer, the algorithm maintains the FIFO property and enables using contiguous allocated memory to maintain both the original circular buffer and the additional elements, and follows up reusing the full area as part of a new circular buffer in an attempt to find the proper size for the given use case.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Removing seatbelts with the Go language for mmap support</title>
		<link>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support</link>
		<comments>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support#comments</comments>
		<pubDate>Sun, 28 Nov 2010 18:33:29 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=505</guid>
		<description><![CDATA[Continuing the sequence of experiments I&#8217;ve been running with the Go language, I&#8217;ve just made available a tiny but useful new package: gommap. As one would imagine, this new package provides access to low-level memory mapping for files and devices, &#8230; <a href="http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Continuing the sequence of experiments I&#8217;ve been running with the Go language, I&#8217;ve just made available a tiny but useful new package: <a href="http://labix.org/gommap">gommap</a>. As one would imagine, this new package provides access to low-level memory mapping for files and devices, and it allowed exploring a few new edges of the language implementation.  Note that, strictly speaking, some of the details ahead are really more about the implementation than the <i>language</i> itself.</p>
<p><span id="more-505"></span>There were basically two main routes to follow when implementing support for memory mapping in Go.  The first one is usually the way higher-level languages handle it.  In Python, for instance, this is the way one may use a memory mapped file:</p>
<pre>
>>> import mmap
>>> file = open("/etc/passwd")
>>> mm = mmap.mmap(file.fileno(), size, access=PROT_READ)
>>> mm[0:4]
'root'
</pre>
<p>The way this was done has an advantage and a disadvantage which are perhaps non entirely obvious on a first look.  The advantage is that the memory mapped area is truly hidden behind that interface, so any improper attempt to access a region which was already unmapped, for instance, may be blocked within the application with a nice error message which explains the issue.  The disadvantage, though, is that this interface usually comes with a restriction that the way to use the memory region with normal libraries, is via copying of data.  In the above example, for instance, the &#8220;root&#8221; string isn&#8217;t backed by the original mapped memory anymore, and is rather a copy of its contents (see <a href="http://www.python.org/dev/peps/pep-3118/">PEP 3118</a> for a way to improve a bit this aspect with Python).</p>
<p>The other path, which can be done with Go, is to back a normal native array type with the allocated memory.  This means that normal libraries don&#8217;t need to copy data out of the mapped memory, or to use a special memory saving interface, to deal with the memory mapped region.  As a simple example, this would get the first line in the given file:</p>
<pre>
mmap, err := gommap.Map(file.Fd(), PROT_READ, MAP_PRIVATE)
if err == nil {
    end := bytes.Index(mmap, []byte{'\n'})
    firstLine := mmap[:end]
}
</pre>
<p>In the procedure above, <i>mmap</i> is defined as an alias to a native <i>[]byte</i> array, so even though the standard <i>bytes</i> module was used, at no point was the data from the memory mapped region copied out or any auxiliary buffers allocated, so this is a <i>very</i> fast operation.  To give an idea about this, let&#8217;s pretend for a moment that we want to increase a simple 8 bit counter in a file.  This might be done with something as simple as:</p>
<pre>
mmap[13] += 1
</pre>
<p>This line of code would be compiled into something similar to the following assembly (amd64):</p>
<pre>
MOVQ    mmap+-32(SP),BX
CMPL    8(BX),$13
JHI     ,68
CALL    ,runtime.panicindex+0(SB)
MOVQ    (BX),BX
INCB    ,13(BX)
</pre>
<p>As you can see, this is just doing some fast index checking before incrementing the value <i>directly in memory</i>. Given that one of the important reasons why memory mapped files are used is to speed up access to disk files (sometimes <i>large</i> disk files), this advantage in performance is actually meaningful in this context.</p>
<p>Unfortunately, though, doing things this way also has an important disadvantage, at least right now.  There&#8217;s no way at the moment to track references to the underlying memory, which was allocated by means not known to the Go runtime.  This means that <i>unmapping</i> this memory is not a safe operation.  The munmap system call will simply take the references away from the process, and any further attempt to touch those areas will crash the application.</p>
<p>To give you an idea about the background &#8220;magic&#8221; which is going on to achieve this support in Go, here is an interesting excerpt from the underlying mmap syscall as of this writing:</p>
<pre>
addr, _, errno := syscall.Syscall6(syscall.SYS_MMAP, (...))
(...)
dh := (*reflect.SliceHeader)(unsafe.Pointer(&#038;mmap))
dh.Data = addr
dh.Len = int(length)
dh.Cap = dh.Len
</pre>
<p>As you can see, this is taking apart the memory backing the slice value into its constituting structure, and altering it to point to the mapped memory, including information about the length mapped so that bound checking as observed in the assembly above will work correctly.</p>
<p>In case the garbage collector is at some point extended to track references to these foreign regions, it would be possible to implement some kind of <i>UnmapOnGC()</i> method which would only unmap the memory once the last reference is gone.  For now, though, the advantages of being able to reference memory mapped regions directly, at least to me, surpass the danger of having improper slices of the given region being used after unmapping.  Also, I expect that usage of this kind of functionality will generally be encapsulated within higher level libraries, so it shouldn&#8217;t be too hard to keep the constraint in mind while using it this way.</p>
<p>For those reasons, <a href="http://labix.org/gommap">gommap</a> was implemented with the latter approach.  In case you need memory mapping support for Go, just move ahead and <i>goinstall launchpad.net/gommap</i>.</p>
<p><b>UPDATE (2010-12-02):</b> The interface was updated so that mmap itself is an array, rather than mmap.Data, and this post was changed to reflect this.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/11/28/removing-seatbelts-with-the-go-language-for-mmap-support/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Python has a GIL, and lots of complainers</title>
		<link>http://blog.labix.org/2010/07/09/python-has-a-gil-and-lots-of-complainers</link>
		<comments>http://blog.labix.org/2010/07/09/python-has-a-gil-and-lots-of-complainers#comments</comments>
		<pubDate>Fri, 09 Jul 2010 19:15:49 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=381</guid>
		<description><![CDATA[I&#8217;ve just read a post by Brett Cannon where, basically, he complains about complainers. If you don&#8217;t know who Brett is, you&#8217;re probably not a heavy Python user. Brett is a very important Python core developer which has been around &#8230; <a href="http://blog.labix.org/2010/07/09/python-has-a-gil-and-lots-of-complainers">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just read <a href="http://sayspy.blogspot.com/2010/07/two-types-of-people-who-cause-biggest.html">a post by Brett Cannon</a> where, basically, he complains about complainers.</p>
<p>If you don&#8217;t know who Brett is, you&#8217;re probably not a heavy Python user.  Brett is a very important Python core developer which has been around for a while and who does a great job at it.  His post, though, makes me a bit sad.</p>
<p><span id="more-381"></span>Brett points out that there are two types of personalities which do not contribute to open source.  The first one he defines as:</p>
<blockquote><p>
The first type is the &#8220;complainer&#8221;. This is someone who finds something they don&#8217;t like, points out that the thing they don&#8217;t like is suboptimal, but then offers no solutions.
</p></blockquote>
<p>And the second one is defined as:</p>
<blockquote><p>
(&#8230;) This is someone who, upon finding out about a decision that they think was sub-optimal, decides to bring up new ideas and solutions. The person is obviously trying to be helpful by bringing up new ideas and solutions, thinking that the current one is simply going to flop and they need to stop people from making a big mistake.  The thing is, this person is not helping. (&#8230;)
</p></blockquote>
<p>This, on itself, is already shortsighted. If you&#8217;re tired of hearing the same arguments again and again for 10 years, from completely different people, there&#8217;s a pretty good chance that there&#8217;s an actual issue with your project, and your users are trying in their way to contribute and interact with you in the hope that it might get fixed.</p>
<p>This is really important:  They are <i>people</i>, which <i>use your project</i>, and are trying to <i>improve it</i>. If you can&#8217;t stand that, you should stop maintaining an open source project now, or pick something which no one cares about.</p>
<p>The other issue which took my attention in his post is his example: the Python GIL.  Look at the way in which Brett dismisses the problem:</p>
<blockquote><p>
(I am ignoring the fact that few people write CPU-intensive code requiring true threading support, that there is the multiprocessing library, true power users have extension  modules which do operate with full threading, and that there are multiple VMs out there with a solution that have other concurrency solutions)
</p></blockquote>
<p>Brett, we can understand that <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=214235">the GIL is hard to remove</a>, but it&#8217;s a <a href="http://www.dabeaz.com/GIL/">fundamental flaw in the most important Python implementation</a>, and being dismissive about it will either draw further complaints at you, or will simply drive users away from the language entirely.</p>
<p>I can understand why you think this way, though.  Guido presents the same kind of feeling about the GIL for a very long time.  Here is one excerpt from a <a href="http://mail.python.org/pipermail/python-3000/2007-May/007414.html">mail thread about it</a>:</p>
<blockquote><p>
Nevertheless, you&#8217;re right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.</p>
<p>Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions.
</p></blockquote>
<p>I apologize, but I have a very hard time reading this and not complaining.</p>
<p>In my world, the golden days of geometric growth in vertical processing power is over, multi-processed machines are here to stay, and the amount of traffic flowing through networks is just increasing.  It feels reasonable to desire a less naïve approach to deal with real world problems, such as executing tasks concurrently.</p>
<p>I actually would love to not worry about things like non-determinism and race conditions, and would love even more to have a <i>programming language</i> which helps me with that!</p>
<p>Python, though, has a Global Interpreter Lock (yes, I&#8217;m talking about CPython, the most important interpreter).  Python programs execute in sequence.  No <a href="http://www.infoq.com/interviews/doug-lea-fork-join">Fork/Join frameworks</a>, no <a href="http://golang.org">coroutines</a>, no <a href="http://erlang.org">lightweight processes</a>, nothing.  Your <i>Python</i> code <i>will execute in sequence</i> if it lives in the same process space.</p>
<p>The answer from Brett and Guido to concurrency?  Develop your code in C, or write your code to execute in multiple processes.  If they really want people to get rid of non-determinism, locking issues, race conditions, and so on, they&#8217;re not helping at all.</p>
<p>I know this is just yet another complaint, though. I honestly cannot fix the problem either, and rather just talk about it in the hope that someone who&#8217;s able to do it will take care of it.  That said, I wish that the language maintainers would <i>do the same</i>, and tell the world that it&#8217;s an unfortunate problem, and that they wished someone else would go there and fix it!  If, instead, maintainers behave in a ridiculously dismissive way, like Guido did in that mail thread, and like Brett is doing in his post, the smart people that could solve the problem get turned down.  People like to engage with motivated maintainers.. they like to solve problems that others are interested in seeing solved.</p>
<p>Perhaps agreeing with the shortcomings won&#8217;t help, though, and no one will show up to fix the problem either. But then, at least users will know that the maintainers are on the same side of the fence, and the hope that it will get fixed survives.  If the maintainers just complain about the users which complain, and dismiss the problem, users are put in an awkward position.  I can&#8217;t complain.. I can&#8217;t provide ideas or solutions.. I can&#8217;t fix the problem.. they don&#8217;t even <i>care</i> about the problem.  Why am I using this thing at all?</p>
<p>Would you rather have users, or have no complainers?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/07/09/python-has-a-gil-and-lots-of-complainers/feed</wfw:commentRss>
		<slash:comments>31</slash:comments>
		</item>
		<item>
		<title>Mocker 1.0 for Python released!</title>
		<link>http://blog.labix.org/2010/06/20/mocker-1-0-for-python-released</link>
		<comments>http://blog.labix.org/2010/06/20/mocker-1-0-for-python-released#comments</comments>
		<pubDate>Mon, 21 Jun 2010 00:09:02 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=323</guid>
		<description><![CDATA[After a few years in development, version 1.0 of Mocker is now available! Check out the changes since 0.10.1, the supported features, or go straight to the download page.]]></description>
			<content:encoded><![CDATA[<p>After a few years in development, version 1.0 of <a href="http://bit.ly/dm0GGy" _href="http://labix.org/mocker">Mocker</a> is now <a href="http://bit.ly/dez9jQ" _href="https://launchpad.net/mocker/trunk/1.0">available</a>!  Check out the <a href="http://bit.ly/czhiru " _href="http://bazaar.launchpad.net/~niemeyer/mocker/trunk/annotate/head:/NEWS">changes since 0.10.1</a>, the <a href="http://bit.ly/dm0GGy" _href="http://labix.org/mocker">supported features</a>, or go straight to the <a href="http://bit.ly/9Wszmp" _href="https://launchpad.net/mocker/+download">download page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/06/20/mocker-1-0-for-python-released/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Integrating IRC with LDAP and two-way SMSing</title>
		<link>http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing</link>
		<comments>http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing#comments</comments>
		<pubDate>Sat, 19 Jun 2010 21:56:07 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=296</guid>
		<description><![CDATA[A bit of history I don&#8217;t know exactly why, but I&#8217;ve always enjoyed IRC bots. Perhaps it&#8217;s the fact that it emulates a person in an easy-to-program way, or maybe it&#8217;s about having a flexible and shared &#8220;command line&#8221; tool, &#8230; <a href="http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><b>A bit of history</b></p>
<p>I don&#8217;t know exactly why, but I&#8217;ve always enjoyed IRC bots.  Perhaps it&#8217;s the fact that it emulates a person in an easy-to-program way, or maybe it&#8217;s about having a flexible and shared &#8220;command line&#8221; tool, or maybe it&#8217;s just the fact that it helps people perceive things in an asynchronous way without much effort.  Probably a bit of everything, actually.</p>
<p><span id="more-296"></span></p>
<p>My bot programming started with <a href="http://labix.org/pybot">pybot</a> many years ago, when I was still working at <a href="http://www.conectiva.com.br">Conectiva</a>.  Besides having many interesting features, this bot eventually got in an abandonware state, since <a href="http://www.canonical.com">Canonical</a> already had pretty much equivalent features available when I joined, and I had other interests which got in the way.  The code was a bit messy as well.. it was a time when I wasn&#8217;t very used to testing software properly (a friend has a great excuse for that kind of messy software: <i>&#8220;I was young, and needed the money!&#8221;</i>).</p>
<p>Then, a couple of years ago, while working in the <a href="http://landscape.canonical.com">Landscape</a> project, there was an opportunity of getting some information more visible to the team.  Coincidently, it was also a time when I wanted to get some practice with the concepts of <a href="http://erlang.org">Erlang</a>, so I decided to write a bot from scratch with some nice support for plugins, just to get a feeling of how the promised stability of Erlang actually took place for real.  This bot is called <a href="https://launchpad.net/mup">mup</a> (Mup Pet, more formally), and its code is available publicly through <a href="https://launchpad.net/mup">Launchpad</a>.</p>
<p>This was a nice experiment indeed, and I did learn quite a bit about the ins and outs of Erlang with it.  Somewhat unexpected, though, was the fact that the bot grew up a few extra features which multiple teams in Canonical started to appreciate.  This was of course very nice, but it also made it more obvious that the egocentric reason for having a bot written in Erlang would now hurt, because most of Canonical&#8217;s own coding is done in Python, and that&#8217;s what internal tools should generally be written in for everyone to contribute and help maintaining the code.</p>
<p>That&#8217;s where the desire of migrating mup into a Python-based brain again came from, and having a new feature to write was the perfect motivator for this.</p>
<p><b>LDAP and two-way SMSing over IRC</b></p>
<p>Canonical is a <i>very</i> distributed company.  Employees are distributed over dozens of countries, literally.  Not only that, but most people also work from their homes, rather than in an office.  Many different countries also means many different timezones, and working from home with people from different timezones means flexible timing.  All of that means communication gets&#8230; well.. interesting.</p>
<p>How do we reach someone that should be in an online meeting and is not?  Or someone that is traveling to get to a sprint?  Or how can someone that has no network connectivity reach an IRC channel to talk to the team?  There are probably several answers to this question, but one of them is of course SMS.  It&#8217;s not exactly cheap if we consider the cost of the data being transfered, but pretty much everyone has a mobile phone which can do SMS, and the model is not that far away from IRC, which is the main communication system used by the company.</p>
<p>So, the itch was itching.  Let&#8217;s scratch it!</p>
<p>Getting the mobile phone of employees was already a solved problem for mup, because it had a plugin which could interact with the LDAP directory, allowing people to do something like this:</p>
<blockquote><p>
&lt;joe&gt; mup: poke gustavo<br />
&lt;mup&gt; joe: niemeyer is Gustavo Niemeyer &lt;&#8230;@canonical.com&gt; &lt;time:&#8230;&gt; &lt;mobile:&#8230;&gt;
</p></blockquote>
<p>This just had to be migrated from Erlang into a Python-based brain for the reasons stated above. This time, though, there was no reason to write something from scratch.  I could even have used pybot itself, but there was also <a href="http://sourceforge.net/projects/supybot/">supybot</a>, an IRC bot which started around the same time I wrote the first version of pybot, and unlike the latter, supybot&#8217;s author was much more diligent in evolving it.  There is quite a comprehensive list of plugins for supybot nowadays, and it includes means for testing plugins and so on.  The choice of using it was straighforward, and getting &#8220;<i>poke</i>&#8221; support ported into a plugin wasn&#8217;t hard at all.</p>
<p>So, on to SMSing.  Canonical already had a contract with an SMS gateway company which we established to test-drive some ideas on <a href="https://landscape.canonical.com">Landscape</a>. With the mobile phone numbers coming out of the LDAP directory in hands and an SMS contract established, all that was needed was a plugin for the bot to talk to the SMS gateway.  That &#8220;conversation&#8221; with the SMS gateway allows not only sending messages, but also receiving SMS messages which were sent to a specific number.</p>
<p>In practice, this means that people which are connected to IRC can very easily deliver an SMS to someone using their nicks.  Something like this:</p>
<blockquote><p>
&lt;joe&gt; @sms niemeyer Where are you?  We&#8217;re waiting!
</p></blockquote>
<p>And this would show up in the mobile screen as:</p>
<blockquote><p>
joe&gt; Where are you?  We&#8217;re waiting!
</p></blockquote>
<p>In addition to this, people which have <i>no connectivity</i> can also contact individuals and channels on IRC, with mup working as a middle man.  The message would show up on IRC in a similar way to:</p>
<blockquote><p>
&lt;mup&gt; [SMS] &lt;niemeyer&gt; Sorry, the flight was delayed. Will be there in 5.
</p></blockquote>
<p>The communication from the bot to the gateway happens via plain HTTPS.  The communication back is a bit more complex, though.  There is a small proxy service deployed in <a href="http://code.google.com/appengine">Google App Engine</a> to receive messages from the SMS gateway.  This was done to avoid losing messages when the bot itself is taken down for maintenance.  The SMS gateway doesn&#8217;t handle this case very well, so it&#8217;s better to have something which will be up most of the time buffering messages.</p>
<p>A picture is worth 2<sup>10</sup> words, so here is a simple diagram explaining how things got linked together:</p>
<p><a href="http://blog.labix.org/wp-content/uploads/2010/06/mup-sms.png"><img src="http://blog.labix.org/wp-content/uploads/2010/06/mup-sms.png" alt="" title="SMS integration diagram" width="449" height="255" class="aligncenter size-full wp-image-308" /></a></p>
<p>This is now up for experimentation, and so far it&#8217;s working nicely.  I&#8217;m hoping that in the next few weeks we&#8217;ll manage to port the rest of mup into the supybot-based brain.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/06/19/integrating-irc-with-ldap-and-two-way-smsing/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Released editmoin 1.15</title>
		<link>http://blog.labix.org/2010/06/19/released-editmoin-1-15</link>
		<comments>http://blog.labix.org/2010/06/19/released-editmoin-1-15#comments</comments>
		<pubDate>Sat, 19 Jun 2010 19:15:18 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=291</guid>
		<description><![CDATA[Version 1.15 of editmoin is now available. The following changes were made: Moin used to work with numerical IDs for identification, and editmoin was still based on this model. This release adds support for direct authentication as available in current &#8230; <a href="http://blog.labix.org/2010/06/19/released-editmoin-1-15">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Version 1.15 of <a href="http://labix.org/editmoin">editmoin</a> is now available.</p>
<p>The following changes were made:</p>
<ul>
<li> Moin used to work with numerical IDs for identification, and editmoin was still based on this model. This release adds support for direct authentication as available in current Moin releases.  This was inspired by Reimar Bauer.
<li> The new file ~/.moin_users is now parsed to obtain usernames, supporting the feature above.  Shortcuts are also supported in this file.
<li> Added support for textcha question handling.
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/06/19/released-editmoin-1-15/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The forgotten art of error checking</title>
		<link>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking</link>
		<comments>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking#comments</comments>
		<pubDate>Thu, 17 Jun 2010 15:15:59 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Snippet]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=275</guid>
		<description><![CDATA[I was just rambling randomly yesterday, in the usual microblogging platforms, about how result checking seems to be ignored or done badly. The precise wording was: It&#8217;s really amazing how little attention error handling receives in most software development. Even &#8230; <a href="http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I was just rambling randomly yesterday, in the usual microblogging platforms, about how result checking seems to be ignored or done badly.  The precise wording was:</p>
<blockquote><p>
It&#8217;s really amazing how little attention error handling receives in most software development. Even *tutorials* often ignore it.
</p></blockquote>
<p>It indeed does amaze me.  It sometimes feels like we write code for theoretical perfect worlds.. <i>&#8220;If the processor executes exactly in this order, and the weather is calm, this program will work.&#8221;</i>.  There are countless examples of bad assumptions.. someday I will come with some statistics of the form <i>&#8220;Every N seconds someone forgets to check the result of write().&#8221;</i>.</p>
<p><span id="more-275"></span></p>
<p>If you are a teacher, or a developer that enjoys writing snippets of code to teach people, please join me in the quest of building a better future.  Do <i>not</i> tell us that you&#8217;re &#8220;avoiding result checking for terseness&#8221;, because that&#8217;s exactly what we people will do (terseness is good, right?).  On the contrary, take this chance to make us feel <i>bad</i> about avoiding result checking.  You might do this by putting a comment like &#8220;If you don&#8217;t do this, you&#8217;re a bad programmer.&#8221; right next to the logic which is handling the result, and might take this chance to teach people how proper result handling is done.</p>
<p>Of course, there&#8217;s another forgotten art related to result checking.  It sits on the other side of the fence.  If you are a library author, do think through about how you plan to make us check conditions which happen inside your library, and try to imagine how to make our lives easier.  If we suck at handling results when there are obvious ways to handle it, you can imagine what happens when you structure your result logic badly.</p>
<p>Here is a clear example of what <i>not</i> to do, coming straight from Python&#8217;s standard library, in the <i>imaplib</i> module:</p>
<pre>
    def login(self, user, password):
        typ, dat = self._simple_command('LOGIN', user, self._quote(password))
        if typ != 'OK':
            raise self.error(dat[-1])
        self.state = 'AUTH'
        return typ, dat
</pre>
<p>You see the problem there?  How do you handle errors from this library?  Should we catch the exception, or should we verify the result code? <i>&#8220;Both!&#8221;</i> is the right answer, unfortunately, because the author decided to do us a little favor and check the error condition himself in some arbitrary cases and raise the error, while letting it go through and end up in the result code in a selection of other arbitrary cases.</p>
<p>I may provide some additional advice on result handling in the future, but for now I&#8217;ll conclude with the following suggestion: please check the results from your actions, and help others to check theirs.  That&#8217;s a good life-encompassing recommendation, actually.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Xpresser &#8211; Python library for GUI automation with image matching</title>
		<link>http://blog.labix.org/2010/05/18/xpresser-python-library-for-gui-automation-with-image-matching</link>
		<comments>http://blog.labix.org/2010/05/18/xpresser-python-library-for-gui-automation-with-image-matching#comments</comments>
		<pubDate>Tue, 18 May 2010 20:43:24 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=267</guid>
		<description><![CDATA[In a hurry? Go check it out! The context A while ago I found out about Sikuli, a very interesting project which allows people to script actions in GUIs based on screenshot excerpts. The idea is that you basically take &#8230; <a href="http://blog.labix.org/2010/05/18/xpresser-python-library-for-gui-automation-with-image-matching">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><b>In a hurry?</b></p>
<p><a href="http://wiki.ubuntu.com/Xpresser">Go check it out!</a></p>
<p><b>The context</b></p>
<p>A while ago I found out about <a href="http://sikuli.org">Sikuli</a>, a very interesting project which allows people to script actions in GUIs based on screenshot excerpts.  The idea is that you basically take images representing portions of your screen, like a button, or a label, or an icon, and then create a script which can detect a position in the screen which resembles one of these images, and perform actions on them, such as clicking, or hovering.</p>
<p><span id="more-267"></span></p>
<p>I had never imagined something like this, and the idea got me really excited about the possibilities.   Imagine, for instance, what can be done in terms of testing.  Testing of GUIs is unfortunately not yet a trivial task nowadays.  We do have frameworks which are based on accessibility hooks, for instance, but these sometimes can&#8217;t be used because the hook is missing, or is even far off in terms of the context being tested (imagine testing that a browser can open a specific flash site successfully, for instance).</p>
<p>So, Sikuli opened my eyes to the possibility of using image matching technology in a GUI automation context, and I really wanted to play with it.  In the days following the discovery, I fiddled a bit, communicated with the author, and even submitted some changes to make it work well in Ubuntu.</p>
<p>Then, the idea cooled down in my head, and I moved on with life.  Well&#8230; until two weeks ago.</p>
<p>Right before heading to the <a href="http://wiki.ubuntu.com/UDS-M">Ubuntu Developer Summit</a> for the next Ubuntu release, the desire of automating GUIs appeared again in the context of the widely scoped Ubuntu-level testing suite.  Then, over the first few days last week, I was able to catch up with quite a few people which were interested in the concept of automating GUIs, with different purposes (testing, design approval, etc), which of course was all I needed to actually push that old desire forward.</p>
<p>Trying to get Sikuli to work, though, was quite painful.  Even though I had sent patches upstream before, it looks like the build process isn&#8217;t working in Ubuntu again for other reasons (it&#8217;s not a polished build process, honestly), and even if I managed to make it work and contributed that to the upstream, in the end the path to integrate the Java-based tool in the Python-based testing framework which Ubuntu uses (<a href="https://launchpad.net/mago">Mago</a>) wasn&#8217;t entirely straightforward either.</p>
<p><b>Reinventing the wheel</b></p>
<p>So, the the itch was in place, and there was a reason to let the <a href="http://en.wikipedia.org/wiki/Not_Invented_Here">NIH</a> syndrome take over a bit.  Plus, image processing is something I&#8217;d like to get a foot in anyway, so it felt like a good chance to have a closer look and at the same time contribute a small bit to potential quality improvements of Ubuntu.</p>
<p>That&#8217;s when <a href="http://wiki.ubuntu.com/Xpresser">Xpresser</a> was born.  Xpresser is a clean room implementation of the concepts explored by Sikuli, in the form of a Python library which can be used standalone, or embedded into other programs and testing frameworks such as Mago.</p>
<p>The project is sponsored by <a href="http://canonical.com">Canonical</a>, and licensed under the LGPL.</p>
<p>Internally, it makes use of opencv for the image matching, pyatspi for the event generation (mouse clicks, etc), gtk for screen capturing and testing (of itself), and numpy for matrix operations. Clearly, the NIH syndrome, wasn&#8217;t <i>entirely</i> active. <img src='http://blog.labix.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   As a side note, I haven&#8217;t played with numpy and gtk for some time, and I&#8217;m always amazed by the quality of these modules.</p>
<p><b>Contribute code and ideas</b></p>
<p>Concluding this post, which is already longer than I expected, the basics of Xpresser are in place, so go ahead and play with it!  That said, there are quite a few low hanging fruits to get it to a point of being a really compelling GUI-driving library, so if you have any interest in the concept, I invite you to play with the code and submit contributions too.  If you want ideas of what else could be done, let&#8217;s have a chat.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/05/18/xpresser-python-library-for-gui-automation-with-image-matching/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

