In a hurry?
The context
A while ago I found out about Sikuli, a very interesting project which allows people to script actions in GUIs based on screenshot excerpts. The idea is that you basically take images representing portions of your screen, like a button, or a label, or an icon, and then create a script which can detect a position in the screen which resembles one of these images, and perform actions on them, such as clicking, or hovering.
I had never imagined something like this, and the idea got me really excited about the possibilities. Imagine, for instance, what can be done in terms of testing. Testing of GUIs is unfortunately not yet a trivial task nowadays. We do have frameworks which are based on accessibility hooks, for instance, but these sometimes can’t be used because the hook is missing, or is even far off in terms of the context being tested (imagine testing that a browser can open a specific flash site successfully, for instance).
So, Sikuli opened my eyes to the possibility of using image matching technology in a GUI automation context, and I really wanted to play with it. In the days following the discovery, I fiddled a bit, communicated with the author, and even submitted some changes to make it work well in Ubuntu.
Then, the idea cooled down in my head, and I moved on with life. Well… until two weeks ago.
Right before heading to the Ubuntu Developer Summit for the next Ubuntu release, the desire of automating GUIs appeared again in the context of the widely scoped Ubuntu-level testing suite. Then, over the first few days last week, I was able to catch up with quite a few people which were interested in the concept of automating GUIs, with different purposes (testing, design approval, etc), which of course was all I needed to actually push that old desire forward.
Trying to get Sikuli to work, though, was quite painful. Even though I had sent patches upstream before, it looks like the build process isn’t working in Ubuntu again for other reasons (it’s not a polished build process, honestly), and even if I managed to make it work and contributed that to the upstream, in the end the path to integrate the Java-based tool in the Python-based testing framework which Ubuntu uses (Mago) wasn’t entirely straightforward either.
Reinventing the wheel
So, the the itch was in place, and there was a reason to let the NIH syndrome take over a bit. Plus, image processing is something I’d like to get a foot in anyway, so it felt like a good chance to have a closer look and at the same time contribute a small bit to potential quality improvements of Ubuntu.
That’s when Xpresser was born. Xpresser is a clean room implementation of the concepts explored by Sikuli, in the form of a Python library which can be used standalone, or embedded into other programs and testing frameworks such as Mago.
The project is sponsored by Canonical, and licensed under the LGPL.
Internally, it makes use of opencv for the image matching, pyatspi for the event generation (mouse clicks, etc), gtk for screen capturing and testing (of itself), and numpy for matrix operations. Clearly, the NIH syndrome, wasn’t entirely active. :-) As a side note, I haven’t played with numpy and gtk for some time, and I’m always amazed by the quality of these modules.
Contribute code and ideas
Concluding this post, which is already longer than I expected, the basics of Xpresser are in place, so go ahead and play with it! That said, there are quite a few low hanging fruits to get it to a point of being a really compelling GUI-driving library, so if you have any interest in the concept, I invite you to play with the code and submit contributions too. If you want ideas of what else could be done, let’s have a chat.
This is really cool. There is definitely a need for an open-source solution in this space.
However, in case you are not aware, there is an existing commercial product that does functional GUI testing and automation using image-matching technology.
http://www.testplant.com/products/eggplant_functional_tester
I don’t know if they have any patents on their technology, but it may be prudent to check.
I use EggPlant at work and it is an amazing tool. My only gripe is that is uses an AppleScript derivative for scripting (called SenseTalk) instead of something more mainstream like Python.
Thanks for the note Warren.
I’ve not heard about this tool before your comment, and I’m not really concerned about patent attacks. If I had to look at what’s patented by someone every time I tried to implement something, I would have to retire as a developer.
Either way, thanks for letting me know.
Excellent! Used Sikuli recently to automate some very boring task, was very impressed with the idea, but missed recent Python features and wanted more control over the matching etc.
I’ll be playing with this as soon as I get home, I’d like to be able to use it on windows (MSAA), so I’ll have a go at that.
Any luck with Windows Tony? I’m going to fiddle around with that right now. I wish I was lucky enough to need to handle testing and/or automation on Ubuntu machines, but I’m afraid it’s the window’s shops that actually hire people..
I’m not very familiar with launchpad/bzr but I’ve created a blueprint for windows support and I’ll update it with any progress.
https://blueprints.launchpad.net/xpresser/+spec/ms-windows-support
Hello, I created a branch with support for PyQt and Xlib (pure Python) to remove the dependency on Gtk and pyatspi (which needs Gnome).
You find the code at https://code.launchpad.net/~henning-schroeder/xpresser/pyqt
Thanks for the changeset Henning.
Please note that, even though it depends on gtk to *implement* its features, at this point there’s absolutely nothing specific to gtk on the features and interface that Xpresser offers.
With this in mind, I don’t see a big reason to be pushing two different implementations in parallel.
If we manage to find a way to avoid gtk *and* qt entirely, then that might be an interesting move. It should certainly be possible, given what we’re doing with Xpresser.
As a side note, running tests with your qt interface breaks the test suite, which so far has almost 100% coverage, and ideally the screenshot taking routine should avoid touching the disk, for speed purposes.
I see you submitted a bug at:
https://bugs.launchpad.net/xpresser/+bug/583124
Let’s communicate about further ideas regarding this branch there.
Biggest problem I had was finding a windows version of PyGTK with Numeric support. I found one.. but when pixbuf.get_pixels_array() is called it hard crashes without any output.
I.. was not up to trying to compile my own.
So yeah, as far as it not depending on GTK.. it doesn’t seem to, except for the screenshot-taking bit. I looked at doing it another way, but I’m a very junior programmer and unfamiliar with much of this :)
I will keep following though, as best I can, to see if it will be useful to me.
Quasar beta windows branch is available.
Find it through:
https://blueprints.launchpad.net/xpresser/+spec/ms-windows-support
I think for this tool to be successful it needs to be cross-platform compatible. I have been trying extensively to integrate Sikuli style GUI testing with some applications I’m developing but it has been a very frustrating process to do anything outside of the Sikuli IDE. I am unable to even check the code out in windows using bazaar! I am not an incredibly experienced Python programmer (Java & FLEX/Flash) but I have been using it a lot these past few months and really enjoy it. I think tying in tesseract via the pytesser module might be a good idea. It could help with matching and allow the scripts “read” information out of images.
Pingback: Xpresser: Easy visual tests with Python « The Quality Hour
Hi Gustavo,
Looks like awesome work! I had been thinking about re-implementing Sikuli in something closer to pure python as well, for use in OS X. It doesn’t look like Xpresser is cross-platform yet; what do you think it would take to get it working in “the other Nix”?
Thanks,
Alex
I also had been thinking of a pure CPython Sikuli as a stand lib rather than an IDE. Seems that Sikuli project is not very active currently. How is going Xpresser? can we use it alone without GTK?
The dependency on gtk is really shallow. In fact, Henning mentions above that he ported it to other backends already.
As for how it’s going, I don’t have much use for it myself, so my own development on it stopped, but I believe Chris Wayne is still maintaining it actively for developing visual test cases.
What is the current status of the GUI test automation? Does it use X-Windows which would make it useful in Linux and Ubuntu in particular? I have some parser generator (Yacc & Lex & Perl) background and experience in Eggplant GUI test automation if you google my name. I would like to see a similar generic testing capability in the open source domain.
Steve, if I understand what you’re asking, it’s matches the previous question and answer. The dependency on the toolkit is very shallow.