<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Labix Blog &#187; Ruby</title>
	<atom:link href="http://blog.labix.org/tag/ruby/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.labix.org</link>
	<description>by Gustavo Niemeyer</description>
	<lastBuildDate>Mon, 16 Jan 2012 04:02:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Efficient algorithm for expanding circular buffers</title>
		<link>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers</link>
		<comments>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers#comments</comments>
		<pubDate>Thu, 23 Dec 2010 12:57:40 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Article]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Snippet]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=580</guid>
		<description><![CDATA[Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the &#8220;Hello world!&#8221; days. They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc. &#8230; <a href="http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Circular buffers are based on an algorithm well known by any developer who&#8217;s got past the <i>&#8220;Hello world!&#8221;</i> days.  They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc.</p>
<p>One feature which is not always desired, though, it the fact that circular buffers traditionally will either overwrite the last element, or raise an overflow error, since they are generally implemented as a buffer of <i>constant</i> size.  This is an unwanted property when one is attempting to <i>consume</i> items from the buffer and it is not an option to blindly drop items, for instance.</p>
<p>This post presents an efficient (and potentially novel) algorithm for implementing circular buffers which preserves most of the key aspects of the traditional version, while also supporting dynamic expansion when the buffer would otherwise have its oldest entry overwritten. It&#8217;s not clear if the described approach is novel or not (most of my novel ideas seem to have been written down 40 years ago), so I&#8217;ll publish it below and let you decide.</p>
<p><span id="more-580"></span><b>Traditional circular buffers</b></p>
<p>Before introducing the variant which can actually expand during use, let&#8217;s go through a quick review on traditional circular buffers, so that we can then reuse the nomenclature when extending the concept.  All the snippets provided in this post are written in Python, as a better alternative to pseudo-code, but the concepts are naturally portable to any other language.</p>
<p>So, the most basic circular buffer needs the buffer itself, its total capacity, and a position where the next write should occur.  The following snippet demonstrates the concept in practice:</p>
<pre>
buf = [None, None, None, None, None]
bufcap = len(buf)
pushi = 0   

for elem in range(7):
    buf[pushi] = elem
    pushi = (pushi + 1) % bufcap

print buf # => [5, 6, 2, 3, 4]
</pre>
<p>In the example above, the first two elements of the series (0 and 1) were overwritten once the pointer wrapped around. That&#8217;s the specific feature of circular buffers which the proposal in this post will offer an alternative for.</p>
<p>The snippet below provides a full implementation of the traditional approach, this time including both the pushing and popping logic, and raising an error when an overflow or underflow would occur.  Please note that these snippets are not necessarily idiomatic Python.  The intention is to highlight the algorithm itself.</p>
<pre>
class CircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.pushi = self.popi = 0
        self.bufcap = len(self.buf)

    def push(self, x):
        assert self.buflen == 0 or self.pushi != self.popi, \
               "Buffer overflow!"
        self.buf[self.pushi] = x
        self.pushi = (self.pushi + 1) % self.bufcap
        self.buflen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.popi = (self.popi + 1) % self.bufcap
        return x
</pre>
<p>With the basics covered, let&#8217;s look at how to extend this algorithm to support dynamic expansion in case of overflows.</p>
<p><b>Dynamically expanding a circular buffer</b></p>
<p>The approach consists in imagining that the same buffer can contain both a circular buffer area (referred to as <i>the ring area</i> from here on), and an overflow area, and that it is possible to transform a mixed buffer back into a pure circular buffer again.  To clarify what this means, some examples are presented below.  The full algorithm will be presented afterwards.</p>
<p>First, imagine that we have an empty buffer with a capacity of 5 elements as per the snippet above, and then the following operations take place:</p>
<pre>
for i in range(5):
    circbuf.push(i)

circbuf.pop() # => 0
circbuf.pop() # => 1

circbuf.push(5)
circbuf.push(6)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4</font>]
</pre>
<p>At this point we have a full buffer, and with the original implementation an additional push would raise an assertion error. To implement expansion, the algorithm will be changed so that those items will be appended at the end of the buffer.  Following the example, pushing two additional elements would behave the following way:</p>
<pre>
circbuf.push(7)
circbuf.push(8)

print circbuf.buf # => [<font style="color: blue">5, 6, 2, 3, 4,</font> <font color="red">7, 8</font>]
</pre>
<p>In that example, elements 7 and 8 are part of the overflow area, and the ring area remains with the same capacity and length of the original buffer. Let&#8217;s perform a few additional operations to see how it would behave when items are popped and pushed while the buffer is split:</p>
<pre>
circbuf.pop() # => 2
circbuf.pop() # => 3
circbuf.push(9)

print circbuf.buf # => [<font style="color: blue">5, 6,</font> None, None, <font style="color: blue">4,</font> <font style="color: red">7, 8, 9</font>]
</pre>
<p>In this case, even though there are two free slots available in the ring area, the last item pushed was still appended at the overflow area.  That&#8217;s necessary to preserve the FIFO semantics of the circular buffer, and means that the buffer may expand more than strictly necessary given the space available. In most cases this should be a reasonable trade off, and should stop happening once the circular buffer size stabilizes to reflect the production vs. consumption pressure (if you have a producer which constantly operates faster than a consumer, though, please look at the literature for plenty of advice on the problem).</p>
<p>The remaining interesting step in that sequence of events is the moment when the ring area capacity is expanded to cover the full allocated buffer again, with the previous overflow area being integrated into the ring area.  This will happen when the content of the previous partial ring area is fully consumed, as shown below:</p>
<pre>
circbuf.pop() # => 4
circbuf.pop() # => 5
circbuf.pop() # => 6
circbuf.push(10)

print circbuf.buf # => [<font style="color: blue">10,</font> None, None, None, None, <font style="color: blue">7, 8, 9</font>]
</pre>
<p>At this point, the whole buffer contains just a ring area and the overflow area is again empty, which means it becomes a traditional circular buffer.</p>
<p><b>Sample algorithm</b></p>
<p>With some simple modifications in the traditional implementation presented previously, the above semantics may be easily supported. Note how the additional properties did not introduce significant overhead. Of course, this version will incur in additional memory allocation to support the buffer expansion, bu that&#8217;s inherent to the problem being solved.</p>
<pre>
class ExpandingCircBuf(object):

    def __init__(self):
        self.buf = [None, None, None, None, None]
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringlen == self.ringcap or \
           self.ringcap != self.bufcap:
            self.buf.append(x)
            self.buflen += 1
            self.bufcap += 1
            if self.pushi == 0: # Optimization.
                self.ringlen = self.buflen
                self.ringcap = self.bufcap
        else:
            self.buf[self.pushi] = x
            self.pushi = (self.pushi + 1) % self.ringcap
            self.buflen += 1
            self.ringlen += 1

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.pushi = 0
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p>Note that the above algorithm will allocate each element in the list individually, but in sensible situations it may be better to allocate additional space for the overflow area in advance, to avoid potentially frequent reallocation.  In a situation when the rate of consumption of elements is about the same as the rate of production, for instance, there are advantages in doubling the amount of allocated memory per expansion.  Given the way in which the algorithm works, the previous ring area will be exhausted before the mixed buffer becomes circular again, so with a constant rate of production and an equivalent consumption it will effectively have its size doubled on expansion.</p>
<p><b>UPDATE:</b> Below is shown a version of the same algorithm which not only allows allocating more than one additional slot at a time during expansion, but also incorporates it in the overflow area immediately so that the allocated space is used optimally.</p>
<pre>
class ExpandingCircBuf2(object):

    def __init__(self):
        self.buf = []
        self.buflen = self.ringlen = self.pushi = self.popi = 0
        self.bufcap = self.ringcap = len(self.buf)

    def push(self, x):
        if self.ringcap != self.bufcap:
            expandbuf = (self.pushi == 0)
            expandring = False
        elif self.ringcap == self.ringlen:
            expandbuf = True
            expandring = (self.pushi == 0)
        else:
            expandbuf = False
            expandring = False

        if expandbuf:
            self.pushi = self.bufcap
            expansion = [None, None, None]
            self.buf.extend(expansion)
            self.bufcap += len(expansion)
            if expandring:
                self.ringcap = self.bufcap

        self.buf[self.pushi] = x
        self.buflen += 1
        if self.pushi < self.ringcap:
            self.ringlen += 1
        self.pushi = (self.pushi + 1) % self.bufcap

    def pop(self):
        assert self.buflen != 0, "Buffer underflow!"
        x = self.buf[self.popi]
        self.buf[self.popi] = None
        self.buflen -= 1
        self.ringlen -= 1
        if self.ringlen == 0 and self.buflen != 0:
            self.popi = self.ringcap
            self.ringlen = self.buflen
            self.ringcap = self.bufcap
        else:
            self.popi = (self.popi + 1) % self.ringcap
        return x
</pre>
<p><b>Conclusion</b></p>
<p>This blog post presented an algorithm which supports the expansion of circular buffers while preserving most of their key characteristics.  When not faced with an overflowing buffer, the algorithm should offer very similar performance characteristics to a normal circular buffer, with a few additional instructions and constant space for registers only. When faced with an overflowing buffer, the algorithm maintains the FIFO property and enables using contiguous allocated memory to maintain both the original circular buffer and the additional elements, and follows up reusing the full area as part of a new circular buffer in an attempt to find the proper size for the given use case.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/12/23/efficient-algorithm-for-expanding-circular-buffers/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Introducing The Hacking Sandbox</title>
		<link>http://blog.labix.org/2010/09/25/introducing-the-hacking-sandbox</link>
		<comments>http://blog.labix.org/2010/09/25/introducing-the-hacking-sandbox#comments</comments>
		<pubDate>Sat, 25 Sep 2010 16:33:54 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=410</guid>
		<description><![CDATA[When I started programming in Python long ago, one of the features which really hooked me up was the quality interactive interpreter offered with the language implementation. It was (and still is) a fantastic way to experiment with syntax, semantics, &#8230; <a href="http://blog.labix.org/2010/09/25/introducing-the-hacking-sandbox">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When I started programming in Python long ago, one of the features which really hooked me up was the quality interactive interpreter offered with the language implementation. It was (and still is) a fantastic way to experiment with syntax, semantics, modules, and whatnot.  So much so that many first-class Python practitioners will happily tell you that the interactive interpreter is used not only as a programming sandbox, but many times as the their personal calculator too.  This kind of interactive interpreter is also known as a <a href="http://en.wikipedia.org/wiki/Read-eval-print_loop">REPL</a>, standing for <i>Read Eval Print Loop</i>, and many languages have pretty advanced choices in that area by now.</p>
<p>After much rejoice with Python&#8217;s REPL, though, and as a normal human being, I&#8217;ve started wishing for more.  The problem has a few different levels, which are easy to understand.</p>
<p><span id="more-410"></span>First, we&#8217;re using <a href="http://twistedmatrix.com/">Python Twisted</a> in Ensemble, one of the projects being pushed at Canonical.  Twisted is an event-driven framework, which among other things means it works a lot with closures and callbacks.  Having to redefine multi-line functions frequently to drive experiments isn&#8217;t exactly fun in a line-based interactive interpreter.  Then, some of the languages I&#8217;ve started playing with, such as <a href="http://erlang.org">Erlang</a>, have limited REPLs which differ in functionality significantly compared to what may be done in a text file. And finally, other languages I&#8217;ve been programming with recently, such as <a href="http://golang.org">Go</a>, lack a reasonable REPL altogether (there are only unusable hacks around).</p>
<p>Alright, so here is the idea: what if instead of being given an interactive REPL, you were presented with your favorite text editor, and whenever you wrote the file down, it was executed and results presented?  That&#8217;s The Hacking Sandbox, or <a href="http://labix.org/hsandbox">hsandbox</a>.  It supports 11 different programming languages out of the box, and given its nature it should be trivial to support any other language.</p>
<p>Here is a screenshot to clarify the idea:</p>
<p><a href="http://blog.labix.org/wp-content/uploads/2010/09/hsandbox.png"><img src="http://blog.labix.org/wp-content/uploads/2010/09/hsandbox.png" alt="" title="hsandbox screenshot" width="600" height="359" class="aligncenter size-full wp-image-417" /></a></p>
<p>Note that if you open a sandbox for a language like C or Go, the skeleton of what&#8217;s needed to run a program will already be in place, so you just have to &#8220;fill the blanks&#8221;.</p>
<p>For more details and download information, please check the <a href="http://j.mp/hsandbox">hsandbox web page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/09/25/introducing-the-hacking-sandbox/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>The forgotten art of error checking</title>
		<link>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking</link>
		<comments>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking#comments</comments>
		<pubDate>Thu, 17 Jun 2010 15:15:59 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Go]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Snippet]]></category>
		<category><![CDATA[Test]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=275</guid>
		<description><![CDATA[I was just rambling randomly yesterday, in the usual microblogging platforms, about how result checking seems to be ignored or done badly. The precise wording was: It&#8217;s really amazing how little attention error handling receives in most software development. Even &#8230; <a href="http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I was just rambling randomly yesterday, in the usual microblogging platforms, about how result checking seems to be ignored or done badly.  The precise wording was:</p>
<blockquote><p>
It&#8217;s really amazing how little attention error handling receives in most software development. Even *tutorials* often ignore it.
</p></blockquote>
<p>It indeed does amaze me.  It sometimes feels like we write code for theoretical perfect worlds.. <i>&#8220;If the processor executes exactly in this order, and the weather is calm, this program will work.&#8221;</i>.  There are countless examples of bad assumptions.. someday I will come with some statistics of the form <i>&#8220;Every N seconds someone forgets to check the result of write().&#8221;</i>.</p>
<p><span id="more-275"></span></p>
<p>If you are a teacher, or a developer that enjoys writing snippets of code to teach people, please join me in the quest of building a better future.  Do <i>not</i> tell us that you&#8217;re &#8220;avoiding result checking for terseness&#8221;, because that&#8217;s exactly what we people will do (terseness is good, right?).  On the contrary, take this chance to make us feel <i>bad</i> about avoiding result checking.  You might do this by putting a comment like &#8220;If you don&#8217;t do this, you&#8217;re a bad programmer.&#8221; right next to the logic which is handling the result, and might take this chance to teach people how proper result handling is done.</p>
<p>Of course, there&#8217;s another forgotten art related to result checking.  It sits on the other side of the fence.  If you are a library author, do think through about how you plan to make us check conditions which happen inside your library, and try to imagine how to make our lives easier.  If we suck at handling results when there are obvious ways to handle it, you can imagine what happens when you structure your result logic badly.</p>
<p>Here is a clear example of what <i>not</i> to do, coming straight from Python&#8217;s standard library, in the <i>imaplib</i> module:</p>
<pre>
    def login(self, user, password):
        typ, dat = self._simple_command('LOGIN', user, self._quote(password))
        if typ != 'OK':
            raise self.error(dat[-1])
        self.state = 'AUTH'
        return typ, dat
</pre>
<p>You see the problem there?  How do you handle errors from this library?  Should we catch the exception, or should we verify the result code? <i>&#8220;Both!&#8221;</i> is the right answer, unfortunately, because the author decided to do us a little favor and check the error condition himself in some arbitrary cases and raise the error, while letting it go through and end up in the result code in a selection of other arbitrary cases.</p>
<p>I may provide some additional advice on result handling in the future, but for now I&#8217;ll conclude with the following suggestion: please check the results from your actions, and help others to check theirs.  That&#8217;s a good life-encompassing recommendation, actually.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2010/06/17/the-forgotten-art-of-error-checking/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>MagLev and distributed VMs</title>
		<link>http://blog.labix.org/2008/06/01/maglev-and-distributed-vms</link>
		<comments>http://blog.labix.org/2008/06/01/maglev-and-distributed-vms#comments</comments>
		<pubDate>Mon, 02 Jun 2008 01:04:16 +0000</pubDate>
		<dc:creator>Gustavo Niemeyer</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://blog.labix.org/?p=93</guid>
		<description><![CDATA[Avi Bryant is working on MagLev, a Ruby interpreter, based on Gemstone&#8217;s Smalltalk VM, with some very amazing features, like transactioned objects distributed across several VMs: The integrated VMs, cache, and storage conspire to create an illusion that global state &#8230; <a href="http://blog.labix.org/2008/06/01/maglev-and-distributed-vms">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Avi Bryant is working on <a href="http://www.avibryant.com/2008/06/maglev-recap.html">MagLev</a>, a Ruby interpreter, based on Gemstone&#8217;s Smalltalk VM, with some very amazing features, like transactioned objects distributed across several VMs:</p>
<blockquote><p>
The integrated VMs, cache, and storage conspire to create an illusion that global state is shared across all instances: no matter how many VMs you add, over however many machines, they all see and work with the same set of Ruby objects.
</p></blockquote>
<p>My geek side finds this highly exciting, and eager to see it released to see how people will deal with it in practice.</p>
<p>At the same time, my <i>let&#8217;s-build-stable-and-maintainable-software</i> side is a bit skeptic.  As Joe Armstrong puts <a href="http://www.infoq.com/presentations/erlang-software-for-a-concurrent-world">so enthusiastically</a>, shared state is hard to manage correctly, global transactions reduce scalability, and transparent RPC is seductive, but <a href="http://armstrongonsoftware.blogspot.com/2008/05/road-we-didnt-go-down.html">dangerous</a>.</p>
<p>I&#8217;m also curious about the speed gains pointed out.  It&#8217;s well known that the Ruby VM isn&#8217;t very fast, which means that there must be opportunities for speedups.  Even then, 100x faster is impressive, and <a href="http://www.sidhe.org/~dan/blog/archives/cat_piethon.html">history shows</a> that sometimes the significant improvements are harder when the semantics are precisely the same.  Let&#8217;s hope Avi can manage to <a href="http://headius.blogspot.com/2008/06/maglev.html">run the Ruby tests successfully</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.labix.org/2008/06/01/maglev-and-distributed-vms/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

