Archive for the 'C/C++' Category

Changing people or changing rules

In my previous post I made an open statement which I’d like to clarify a bit further:

(…) when the rules don’t work for people, the rules should be changed, not the people.

This leaves a lot of room for personal interpretation of what was actually meant, and TIm Hoffman pointed that out nicely with the following questioning in a comment:

I wonder when the rule is important enough to change the people though. For instance [, if your] development process is oriented to TDD and people don’t write the tests or do the job poorly will you change them then?

This is indeed a nice scenario to explore the idea. If it happens at some point that a team claims to be using TDD, but if in practice no developer actually writes tests first, the rules are clearly not working. If everyone in the team hates doing TDD, enforcing it most probably won’t show its intended benefits, and that was the heart of my comment. You can’t simply keep the rule as is if no one follows it, unless you don’t really care about the outcome of the rule.

One interesting point, though, is that when you have a high level of influence over the environment in which people are, it may be possible to tweak the rules or the processes to adapt to reality, and tweaking the processes may change the way that people feel about the rules as a consequence (arguably, changing people as a side effect).

As a more concrete example, if I found myself in the described scenario, I’d try to understand why TDD is not working, and would try to discuss with the team to see how we should change the process so that it starts to work for us somehow. Maybe what would be needed is more discussion to show the value of TDD, and perhaps some pair programming with people that do TDD very well so that the joy of doing it becomes more visible.

In either case, I wouldn’t be simply asking people “Everyone has to do TDD from now on!“, I’d be tweaking the process so that it feels better and more natural to people. Then, if nothing similar works either, well, let’s change the rule. I’d try to use more conventional unit testing or some other system which people do follow more naturally and that presents similar benefits.

Class member access control: enforcement vs. convention

For a long time I’ve been an advocate of Python’s notion of controlling access to private and protected members (attributes, methods, etc) with conventions, by simply naming them like “_name”, with an initial underline.  Even though Python does support the “__name” (with double underscore) for “private” members (this actually mangles the name rather than hiding it), you’ll notice that even this is rarely used in practice, and the largely agreed mantra is that convention should be enough and thus one underscore suffices. This always resonated quite well with me, since I generally prefer to handle situations by agreement rather than enforcement. Well, I’m now changing my opinion.that this works well for this purpose, at least in certain situations.

This methodology may work quite well in situations where the code scope is within a very controlled environment, with one or more teams which follow strictly a single development guideline, and have the power to refactor the affected code base somewhat easily when the original decisions are too limiting.

Having worked on a few major projects now, and some of them being libraries which are used by several teams within the same company or outside, I now perceive that people very often take shortcuts over these decisions for getting their job done quickly. It’s way easier to simply read the code and get to the private guts of a library than to try to get agreement over the right way to do something, or sending a patch with a suggested change which was carefully architected.

Many people by now are probably thinking: “Well, that’s their problem, isn’t it? If their code base breaks on the next upgrade they’ll get burden and won’t be able to upgrade cleanly.”, and I can honestly understand this feeling, since I shared it. But, for a number of reasons, I now understand that this isn’t just their problem, it’s very much my problem too.

Most importantly, on any serious software, these problems will usually come back to the implementors, and many times the problem will have a much larger magnitude by then than they had at the time a change could have been done “the right way” on the implementation, because code dependent on the private bits will have settled.

Most people are optimist by nature and believe that the implementation won’t change, but, of course, one of the reasons why private information is made private in the first place is exactly because the implementor believes that having the freedom to change these details in the future is important, and not rarely there’s already a plan of evolution in place for these private pieces, which may include revamping the implementation entirely for scalability or for other goals.

In the best case, the careless people will get burden on the upgrade and will ask for support or simply won’t upgrade silently, and both cases hurt implementors, because providing support for broken software takes time and energy, and amazingly can even hurt the software image. Lack of upgrades also means more ancient versions in the wild to give support for. Besides these, in the worst case scenario, the careless people have enough influence on the affected project to cause as much burden on it as if the private data was public in the first place.

As much as I’m a believer in handling situation by agreement rather than enforcement, I’m also a believer that when the rules don’t work for people, the rules should be changed, not the people. So my positioning now is that the language supported access constraints (public, protected, private), as available in languages like Java and C++, are a better alternative when compared to convention as used today in Python, since they provide an additional layer of encouragement for people to not break the rules carelessly, and that helps in the maintenance and reuse of software that has greater visibility.

Comparing package versions in PostgreSQL

This weekend I’ve played a bit with PostgreSQL extensions written in C.

A while ago I wrote a Python C extension for Smart to compare Debian package versions. Now I was trying to do something similar inside PostgreSQL, and thus ported the original Python C extension code to a PostgreSQL C extension. It enables queries like the following:

# SELECT 'Matched' WHERE deb_version_match('1.2', '<=', '2.0');
  ?column?
----------
 Matched

The implementation of the PostgreSQL C extension was quite straightforward, but I'm a bit disappointed by the performance of PL/PGSQL.

I've made tests using two environments. One of them is a PL/Python function executed as a trigger, which calls the original Python C extension and executes SQL back in PostgreSQL using the plpy module. The other is a PL/PGSQL function which uses the implemented PostgreSQL C extension directly.

Considering that the function logic consisted of one loop over a SELECT statement, a few tests, and an INSERT statement, I was expecting that the overhead introduced by going back and forth between the PostgreSQL state and the Python interpreter state would be a lot more noticeable when compared with PL/PGSQL executing an internal PostgreSQL function. Tests have shown about 10% of improvement roughly, when doing a similar logic over about 5000 items.

I'm not yet sure if the speed improvement pays off the limited debugging feedback provided by the PL/PGSQL interpreter on errors.

Smart Package Manager is out!

After 6 months of fun working on the project in silence, Smart Package Manager was finally released.

As the README says:

The Smart Package Manager project has the ambitious objective of creating smart and portable algorithms for solving adequately the problem of managing software upgrading and installation. This tool works in all major distributions, and will bring notable advantages over native tools currently in use (APT, APT-RPM, YUM, URPMI, etc).

Some of the interesting features, which are covered in more detail in the README file:

  • Smart transactions
  • Multiple interfaces
  • Support for several channel formats
  • Priority handling
  • Autobalancing mirror system
  • Parallel downloading mechanism
  • Flexible removable media support

Have fun! ;)

Embedding Lua interpreter into RPM

I’ve recently committed into the RPM CVS HEAD the internal support for Lua scripts.

Please, notice that this is experimental stuff.

Why embedding Lua in RPM?

  • Many scripts execute simple operations which in an internal interpreter require no forking at all
  • Internal scripts reduce or eliminate external dependencies related to script slots
  • Internal scripts operate even under unfriendly situations like stripped chroots (anyone said installers?)
  • Internal scripts in Lua are really fast
  • Syntax errors in internal scripts are detected at package building time

How it works?

Just use -p <lua> in any script slot (%pre, %post, etc).

For example:

%pre -p <lua>
print("Wow! It really works!")

What is accessible from Lua?

The standard Lua library, the posix module (basic system access, by Luiz Henrique de Figueiredo and Claudio Terra), and the rex module (regular expressions, by Reuben Thomas).

Macro support

Support for Lua macros was also introduced. It means that one can create custom content using Lua macros anywhere.

For example:

%{lua: print("Requires: hello-world > 1.0") }

More additions to APT-RPM Lua interface

The APT-RPM Lua interface is constantly being improved. This time, the following functions were added:

pkgid() and verid()

Return a unique integer identifying a package or a version.

verpkg()

Returns the parent package of some given version.

verdeplist()

Returns a list of dependencies for a given package, including complete information about it.

These new functions were introduced to give support for something which is frequently asked by APT-RPM users: the ability to discover which installed packages are not required by any other installed package.

Here is a script using these functions to list these packages. This script will be called list-nodeps, and will be available in the contrib/ directory of the next APT-RPM release.

-- Collect dependencies from installed packages
deplist = {}
verlist = {}
for i, pkg in ipairs(pkglist()) do
    ver = pkgvercur(pkg)
    if ver then
        table.insert(verlist, ver)
        for i, dep in ipairs(verdeplist(ver)) do
            for i, depver in ipairs(dep.verlist) do
                deplist[verid(depver)] = true
            end
        end
    end
end

-- Now list all versions which are not dependencies
for i, ver in ipairs(verlist) do
    if not deplist[verid(ver)] then
        name = pkgname(verpkg(ver))
        if name ~= "gpg-pubkey" then
            -- Print package name and version without epoch
            -- (rpm -e friendly ;-) .
            print(name.."-"..string.gsub(verstr(ver), "^%d+:", ""))
        end
    end
end

More information about the introduced functions is available in https://moin.conectiva.com.br/AptRpm/Scripting

New KDE frontend for APT-RPM

Here is a screenshot of Kynaptic, a new experiment I’m working on. This is a KDE-based frontend for APT-RPM and APT. Kynaptic is based on Synaptic, and shares the same base code on top of the APT library.

On the right side you see a new toy: the interactive search dialog. This should definetly improve the user experience while looking for packages, and could eventually be ported to Synaptic itself.

Another interesting point is that during the development process of Kynaptic I’m also cleaning up the Synaptic code, something I had in my mind for some time already.

Toy interpreter for Linear Algebra

While working with Linear Algebra, I’ve decided to build a toy interpreter in C to play around. This was a quite interesting experiment for myself, since it was the first time I’ve built a complete (with tokenizer, compiler, and interpreter) and auto-suficient (no external dependencies, no additional tools) interpreter from the ground.

The interpreter is currently 1076 lines long, and is based on a tokenizer for the grammar, a compiler which creates a list of trees of expressions, an interpreter for the generated structure, and two modules for symbol maintenance and matrix operations.

Here is a quick example:

a = 2*3+4/2
print(a)

b = [1,2,3|4,5,6]
c = [1,2,3|4,5,6]
print((b+c)/2)
print(trans(b))

d = [1,0|-2,3|5,4|0,1]
e = [0,6,1|3,8,-2]
print(d*e)

f = [1,2|3,4]
print(det(f))
f = [2,3,4|-5,5,6|7,8,9]
print(det(f))
f = [1,-4,2,-2|4,7,-3,5|3,0,8,0|-5,-1,6,9]
print(det(f))

And here is the output:

8
[ 1, 2, 3 | 4, 5, 6 ]
[ 1, 4 | 2, 5 | 3, 6 ]
[ 0, 6, 1 | 9, 12, -8 | 12, 62, -3 | 3, 8, -2 ]
-2
-45
2042

New RPM transaction locking

Last week I’ve worked with Jeff Johnson to fix a small RPM issue which was bothering us for quite some time. Until this change, RPM was accepting two transactions to be run at the same time but, unfortunately, this might put the database in an inconsistent state, and some headers might get lost while restoring it with –rebuilddb.

Our fix was rather simple, based on fcntl() locking while transactions are being committed to the system. This should be enough, protecting databases while the issue is not fixed The Right Way (the time consuming one ;-) ).

Developers of applications linked with RPM should not worry about additional errors, since the current behavior is to wait for the lock to be released while entering in the transaction committing function.

Compiler warnings in Python’s SRE

After a long time, and many people complaining about it, I finally took some time to fix some annoying compiler warnings in Python’s regular expression engine. Since it’s a rather uncommon case, I’ll explain it here with a quick example.

Have a look at the following code:

#include <stdio.h>

int main(int argc, char *argv[])
{
   unsigned char c = 1;
   if (c < 256)
      printf("Hello world!\n");
   return 0;
}

If you try to compile that code, you'll probably get a warning like this:

test.c: In function `main':
test.c:7: warning: comparison is always true due to
limited range of data type

Yes, the compiler is right. Our char type will never reach the given limit. On the other hand, suppose that this code is preprocessed, and that the c variable has sometimes a multibyte character type, like wchar_t, for example. In this case our test is legitimate, and the dozens of warnings being caused by a common macro are really annoying.

There are many different ways to remove these warnings. Unfortunately, the most obvious one, which is casting the variable to a larger type, doesn't work as expected.

My adopted solution was to reimplement the same test in another way, surpassing the compiler warnings for this specific case. Instead of writing (c < 255), I've written (!(c & ~255)). This is 100% safe for any unsigned type, which is my case. Ok, perhaps it's a little bit sick, but a large comment should leave everyone aware about the rationale, and away from the warnings.