In a hurry?
A while ago I found out about Sikuli, a very interesting project which allows people to script actions in GUIs based on screenshot excerpts. The idea is that you basically take images representing portions of your screen, like a button, or a label, or an icon, and then create a script which can detect a position in the screen which resembles one of these images, and perform actions on them, such as clicking, or hovering.
I had never imagined something like this, and the idea got me really excited about the possibilities. Imagine, for instance, what can be done in terms of testing. Testing of GUIs is unfortunately not yet a trivial task nowadays. We do have frameworks which are based on accessibility hooks, for instance, but these sometimes can’t be used because the hook is missing, or is even far off in terms of the context being tested (imagine testing that a browser can open a specific flash site successfully, for instance).
So, Sikuli opened my eyes to the possibility of using image matching technology in a GUI automation context, and I really wanted to play with it. In the days following the discovery, I fiddled a bit, communicated with the author, and even submitted some changes to make it work well in Ubuntu.
Then, the idea cooled down in my head, and I moved on with life. Well… until two weeks ago.
Right before heading to the Ubuntu Developer Summit for the next Ubuntu release, the desire of automating GUIs appeared again in the context of the widely scoped Ubuntu-level testing suite. Then, over the first few days last week, I was able to catch up with quite a few people which were interested in the concept of automating GUIs, with different purposes (testing, design approval, etc), which of course was all I needed to actually push that old desire forward.
Trying to get Sikuli to work, though, was quite painful. Even though I had sent patches upstream before, it looks like the build process isn’t working in Ubuntu again for other reasons (it’s not a polished build process, honestly), and even if I managed to make it work and contributed that to the upstream, in the end the path to integrate the Java-based tool in the Python-based testing framework which Ubuntu uses (Mago) wasn’t entirely straightforward either.
Reinventing the wheel
So, the the itch was in place, and there was a reason to let the NIH syndrome take over a bit. Plus, image processing is something I’d like to get a foot in anyway, so it felt like a good chance to have a closer look and at the same time contribute a small bit to potential quality improvements of Ubuntu.
That’s when Xpresser was born. Xpresser is a clean room implementation of the concepts explored by Sikuli, in the form of a Python library which can be used standalone, or embedded into other programs and testing frameworks such as Mago.
The project is sponsored by Canonical, and licensed under the LGPL.
Internally, it makes use of opencv for the image matching, pyatspi for the event generation (mouse clicks, etc), gtk for screen capturing and testing (of itself), and numpy for matrix operations. Clearly, the NIH syndrome, wasn’t entirely active. As a side note, I haven’t played with numpy and gtk for some time, and I’m always amazed by the quality of these modules.
Contribute code and ideas
Concluding this post, which is already longer than I expected, the basics of Xpresser are in place, so go ahead and play with it! That said, there are quite a few low hanging fruits to get it to a point of being a really compelling GUI-driving library, so if you have any interest in the concept, I invite you to play with the code and submit contributions too. If you want ideas of what else could be done, let’s have a chat.