Today I’ve been working to fix an old problem which beats us, Python users, from time to time: the recursive limitation in the regular expression engine, SRE. I’ll probably talk more about this fix in the future. For now, I’d like to explain a technique I’ve used, following a suggestion by Skip Montanaro (thanks Skip!), to improve the test suite of the SRE engine. This technique consists of using the gcov tool, part of gcc, to obtain a coverage report from the test suite.
Ok.. here we go. The first thing to do is to compile _sre.c with coverage support. Checking the manual of gcov you’ll see that we need the options -fprofile-arcs -ftest-coverage while compiling the _sre.o object, so that it generates the necessary information while Python is running functions inside it. Let’s do it (I got the original command from make output):
gcc -fprofile-arcs -ftest-coverage -pthread -g -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -c ./Modules/_sre.c -o Modules/_sre.o
Besides generating the object with support to coverage tests, it will output the following files:
- tmp.stdout.22153.bb
- tmp.stdout.22153.bbg
This is a little bit strange, since the manual and the gcov program says these files should be called _sre.bb and _sre.bbg. Luckily, it was just a matter of renaming these files to the right names.
Now that we have the “right” object, with coverage test support, we can ask make to do the rest of the work for us. After running it you’ll get the python executable we’ll use in our coverage tests.
Ok.. we have the prepared executable, and some of the necessary information files. We need one more information file, before running gcov: the file which is generated when the test suite is run, with the coverage information for _sre.c. To get this file, it’s just a matter of running the test suite with our “magic” python executable:
./python Lib/test/test_re.py
After that, we got the last needed file:
- tmp.stdout.19080.da
Rename it to _sre.da as you did with the other ones. You’ll probably want to automate some of these steps to achieve a fast test cycle.
Now we’re ready to get our coverage report:
gcov -o . Modules/_sre.c
Bingo! We should have a file named _sre.c.gcov in the current directory. This is a text report which includes the source code with information about how many times each line was executed, and marks like ###### showing which ones weren’t executed at all.
With this information, we are able to improve the test suite, making it cover as much as possible from the module code (after some iterations, of course ;-) ).