Archive for the 'Architecture' Category

Breaking into an Android password manager – Practice

In the last post, we’ve seen some security issues which exist in the Android password manager gbaSafe version 1.1.0a, by analyzing the security description provided in its web site. As described there, even though the system depends on a “master key” which might be secure, the security of the system is seriously compromised by the encouragement of very weak keys (a few digits only) in what is named an “unlock key”, used to encrypt the master key itself. All of that in an application which claims to strongly protect people’s data from unwanted eyes.

In this post, we will play a bit with the Linux-based Android OS to actually explore these security deficiencies, demonstrating that such issues are very real, and that the claims of being hard to unveil the data is unfounded. Since the most serious weakness lies in the key itself, we’ll run a simple brute force attack to try to find arbitrary unlock keys.

This procedure is actually mentioned by the author of gbaSafe himself in the web page, except he overestimates the work involved in producing such a mechanism:

Theoretically, somebody could write a program that tries to decrypt the master key by trying all possible values of the short key (with 4 digits there are only 10000 possibilities), but this would still be much work, as more information about the crypting algorithm is needed (e.g. salt bytes, iteration count).

So let’s get started.

As a first step, we’ll need the Android SDK with a working emulator (I’ve used API level 5, revision 1), and a copy of the application itself. I got a trial version of the application from AndAppStore.com.

The application downloaded is bundled within that .apk file, which is really a .zip file that may be opened up normally with any tool which understands this file format.

Once that’s done, we get access to all the information needed to run the application, including icons, interface layouts, and most importantly in this case, the bytecode which targets the Dalvik VM. This bytecode is the end result of a sequence of translations which happen when the program’s Java source code is compiled, so that’s what we’ll have to fiddle with to figure details of the application we want to investigate.

The bytecode is located inside the classes.dex file, and as expected it’s not easy to read in its native format. Luckily, though, a smart guy has already written a couple of tools, smali and baksmali, which allow people to decompile and recompile that bytecode format to/from something that is much easier to understand.

After downloading these tools, the following command should decompile the file:

$ java -jar baksmali.jar –output classes classes.dex

We now have a classes/ directory full of .smali files.

Before going any further, let’s ponder for a moment about what we want to do. A brute force attack is when we attempt sequentially many possible keys, and given the context already presented, what we’re looking after is to attempt different “unlock keys”. With that in mind, we’ll introduce a very small modification in the application so that it will attempt to enter the unlock key automatically, rather than reporting an error when the key entered in the unlock dialog is invalid.

With that in mind, after some quick research, it looks like the onClick() method within the DlgUnlock.smali file is a pretty good candidate. This is supposedly called when the button in the unlock dialog is clicked, so something interesting about the password being correct or not must happen there.

Before doing anything there, I’ve increased the number of registers in the function to 12, to get some additional registers to play with, and then initialized a register with the value zero, to serve as a monotonically increasing number (our keys!):

.method public onClick(Landroid/view/View;)V
- .registers 9
+ .registers 12
.parameter “view”
+ const/16 v9, 0×0

Then, we have to instruct the program to use these keys rather than whatever is typed in the dialog box. Some lines down, we’ll see a call to the checkUnlockKey() method, which is certainly what we’re looking after. Let’s do this now:

+ :mykey
+ invoke-static {v9}, Ljava/lang/String;->valueOf(I)Ljava/lang/String;
+ move-result-object v2
invoke-static {v2}, Lcom/gbizapps/safeA/Crypt;->checkUnlockKey(Ljava/lang/String;)I

Now, what if this key is wrong? We don’t want the master key to be removed as mentioned in the software description. We want to simply attempt the next key. With some analysis, we see that in case of errors, the next couple of lines below the above code will instruct the VM to jump to an error branch. Rather than following up with the normal error logic, we’ll increment the key, and jump back to the above code:

:cond_6c
+ add-int/lit8 v9, v9, 0×1
+ goto :mykey

Now we just have to rebundle this and put it into the emulator. I won’t go over it in too much detail here, since there’s plenty of information available online, but the steps to do that are:

  1. Recreate a modified classes.dex with smali
  2. Recreate a modified .apk file by just zipping the modified content
  3. Sign and zipalign the new .apk file
  4. Install it

And that’s it, seriously! This would be enough to break the software security if it was working correctly.

Interestingly, though, the software wasn’t working correctly with this change. Instead, it was Force Closing on certain keys. To test it out, use the master key “master key”, and the unlock key “999999″, and then once you close and open the application again, try to unlock it with the key “1175″. Instead of showing an error message, it will break badly.

Now, for the proof of concept to work, I actually had to fix the bug, which felt a bit funny to do given the context.

Looking at the traceback trough adb logcat, I found out that there was a null being dereferenced in the file Crypt.smali, so I fixed the problem by injecting some error checking at this position and jumping the flow into an existing error branch:

+ if-eqz v3, :cond_5a
const-string v4, “ucpmhkexov85MDKhdfdfFGQPYxywq7209fcrqhghjkuiopy”

With this in place came the biggest surprise of the experiment. The keys which were crashing the application were special, in the sense that they actually decode the master key successfully! That’s right: whatever the algorithm is doing, that six-digit “999999″ encrypts the master key in such a way that attempting the “1175″ key works, so even big keys are rendered extremely weak with the logic used to encrypt the master key.

At this point, I added some trivial logic to display the key found with a Toast, just to ensure the whole thing was working correctly:

Toast displaying unlock key found

Note that the key generation implemented above is a bit simplistic, in the sense that it doesn’t attempt keys with leading zeros, but this would be trivial to implement, and my intention here isn’t to actually break any keys for real, but just to show how the promised security in this application is not to be trusted at all. Just the logic above will already be enough for a brute force attack against the application, and has broken all the keys I’ve tried in mere seconds, in a slow emulator.

As a conclusion, if you want to put your data in a secure place, rather than picking an application which promises security because the salt is hidden somewhere or because it’s too much work to figure its logic, pick an open source application with logic which is publicly verifiable and has already had many eyes over it. Chances are that doing something like what was described in this post won’t be so trivial. Then, choose your keys wisely! The most secure application won’t be enough if you pick a bad key.

Breaking into an Android password manager – Theory

For some time now I’ve been wanting to research more deeply about the internals of Android. Until now, though, this was just a sentiment. Then, a couple of weeks ago I’ve finally managed to replace my iPhone for an Android phone, and that was the final motivator for me to actually get into learning more about the inner workings of the Linux-based OS.

Now, I just had to pick an actual task for digging into. The Dalvik VM is certainly one of the most innovative and advertised technical details about the OS, so something around it would be a nice start.. some kind of bytecode fiddling perhaps, but what? Luckily, even without trying too hard, I eventually stumbled upon an interesting case for researching upon.

The “victim” of this research is the application gbaSafe version 1.1.0a, which claims to protect user passwords using unbreakable algorithms (how’s that for a hint of a Snake oil case?).

Before we get into some hacking, let’s see some words on the software security by the author himself, and then render some analysis on conceptual issues on it:

The confidential data can only be decrypted if the master key is known. You should choose a long key (at least 16 characters) with mixed case and unreadable text. Of course you cannot enter this key each time you want to access the confidential data, so it is stored in the user settings encrypted with a shorter key (4 to 6 digits) and normally you only have to enter this unlock key. Theoretically it is possible to try all possible values (brute force attack), but then you must use another program, since gbaSafe deletes the encrypted master key from the user settings when you enter the unlock key wrong three times repeatedly, and then you must enter the master key. If you wrote a program to decrypt the master key, you would have to know the algorithm used, the salt bytes and iteration count (used to augment the short unlock key), which are very hard to extract from the binary program module gbaSafe.

If you have some security background, I’m sure that by now you’re already counting the issues on this single paragraph.

The most obvious issue is the fact that there’s a “strong key” and a “weak key”, and the strong key is encrypted with the weak one. This is a very common cryptography sin, as would say my friend and coworker Andreas Hasenack (a security researcher himself). A security system is only as secure as its weakest spot. It obviously makes little difference for an attacker if he has to attempt decrypting a master key or the actual data, since decrypting the master key will give access to the data.

Then, it mentions en passant that the software enforces the use of digits for the weak key. This ensures that the weak key is really weak! Four digits is basically ten thousand attempts, which is absolutely nothing for nowadays’s hardware. This number would move up to about 15 million by simply allowing upper and lowercase letters as well (which isn’t great either, but a few orders of magnitude never hurt in this scenario).

It follows up encouraging people to think that it’s actually hard to figure the algorithm and other implementation details. Considering that there’s absolutely nothing preventing people from getting their hands in the implementation itself, this is in fact asserting that the security mechanism is based on the ignorance of the attacker. Counting on the ignorance of people is bad at all times, and in a security context it’s a major error.

There’s a final security issue in this description which is a bit more subtle, but further analysis on the logic used leaves no doubt. In cryptography, the salt is supposed to increase the work needed in a brute force attack by strengthening the number of bits of the actual passphrase, in a case where the salt is actually unavailable, or at least prevent that a single large word dictionary can be used to attack several encryptions or hashes at once, in a case where the salt is known but variable. In the latter case, it helps because encrypting a single key with two different salts must be done twice, rather than once, so it increases the computational task when attacking multiple items. A salt which is known and does not change across all processed items is worth pretty close to nothing.

So, indeed, considering the many security issues here, this isn’t something I’d store my passwords or credit card numbers on, and I suggest you don’t do it either.

In my next post on this topic I’ll actually implement a trivial brute force attack to prove that these issues are very real, and that, actually, it’s not even hard to break into a security system like this.

The application author has been contacted about this blog post, since he’ll likely want to fix some of these issues.

The last 4 years (and the next N?)

Some interesting changes have been happening in my professional life, so I wanted to share it here to update friends and also for me to keep track of things over time (at some point I will be older and will certainly laugh at what I called “interesting changes” in the ol’days). Given the goal, I apologize but this may come across as more egocentric than usual, so please feel free to jump over to your next blog post at any time.

It’s been little more than four years since I left Conectiva / Mandriva and joined Canonical, in August of 2005. Shortly after I joined, I had the luck of spending a few months working on the different projects which the company was pushing at the time, including Launchpad, then Bazaar, then a little bit on some projects which didn’t end up seeing much light. It was a great experience by itself, since all of these projects were abundant in talent. Following that, in the beginning of 2006, counting on the trust of people which knew more than I did, I was requested/allowed to lead the development of a brand new project the company wanted to attempt. After a few months of research I had the chance to sit next to Chris Armstrong and Jamu Kakar to bootstrap the development of what is now known as the Landscape distributed systems management project.

Fast forward three and a half years, in mid 2009, and Landscape became a massive project with hundreds of thousands of very well tested lines, sprawling not only a client branch, but also external child projects such as the Storm Object Relational Mapper, in use also by Launchpad and Ubuntu One. In the commercial side of things it looks like Landscape’s life is just starting, with its hosted and standalone versions getting more and more attention from enterprise customers. And the three guys which started the project didn’t do it alone, for sure. The toy project of early 2006 has grown to become a well structured team, with added talent spreading areas such as development, business and QA.

While I wasn’t watching, though, something happened. Facing that great action, my attention was slowly being spread thinly among management, architecture, development, testing, code reviews, meetings, and other tasks, sometimes in areas not entirely related, but very interesting of course. The net result of increased attention sprawl isn’t actually good, though. If it persists, even when the several small tasks may be individually significant, the achievement just doesn’t feel significant given the invested effort as a whole. At least not for someone that truly enjoys being a software architect, and loves to feel that the effort invested in the growth of a significant working software is really helping people out in the same magnitude of that investment. In simpler words, it felt like my position within the team just wasn’t helping the team out the same way it did before, and thus it was time for a change.

Last July an external factor helped to catapult that change. Eucalyptus needed a feature to be released with Ubuntu 9.10, due in October, to greatly simplify the installation of some standard machine images.. an Image Store. It felt like a very tight schedule, even more considering that I hadn’t been doing Java for a while, and Eucalyptus uses some sexy (and useful) new technology called the Google Web Toolkit, something I had to get acquainted with. Two months looked like a tight schedule, and a risky bet overall, but it also felt like a great opportunity to strongly refocus on a task that needed someone’s attention urgently. Again I was blessed with trust I’m thankful for, and by now I’m relieved to look back and perceive that it went alright, certainly thanks to the help of other people like Sidnei da Silva and Mathias Gug. Meanwhile, on the Landscape side, my responsibilities were distributed within the team so that I could be fully engaged on the problem.

Moving this forward a little bit we reach the current date. Right now the Landscape project has a new organizational structure, and it actually feels like it’s moving along quite well. Besides the internal changes, a major organizational change also took place around Landscape over that period, and the planned restructuring led me to my current role. In practice, I’m now engaging into the research of a new concept which I’m hoping to publish openly quite soon, if everything goes well. It’s challenging, it’s exciting, and most importantly, allows me to focus strongly on something which has a great potential (I will stop teasing you now). In addition to this, I’ll definitely be spending some of that time on the progress of Landscape and the Image Store, but mostly from an architectural point of view, since both of these projects will have bright hands taking care of them more closely.

Sit by the fireside if you’re interested in the upcoming chapters of that story. ;-)

Virtual Private Cloud is not the Private Cloud

More than 40 years ago, a guy named Douglas Parkhill described the concept of utility computing. He described it as containing features such as:

  • Essentially simultaneous use of the system by many remote users.
  • Concurrent running of different multiple programs.
  • Availability of at least the same range of facilities and capabilities at the remote stations as the user would expect if he where the sole operator of a private computer.
  • A system of charging based upon a flat service charge and a variable charge based on usage.
  • Capacity for indefinite growth, so that as the customer load increases, the system can expanded without limit by various means.

Fast forward 40 years, and we now name pretty much this same concept as Cloud Computing, and everyone is very excited about the possibilities that exist within this new world. Different companies are pushing this idea in different ways. One of the pioneers in that area is of course Amazon, which managed to create a quite good public cloud offering through their Amazon Web Services product.

This kind of publicly consumable infrastructure is very interesting, because it allows people to do exactly what Douglas Parkhill described 40 years ago, so individuals and organizations can rent computing resources with minimum initial investment, and pay for as much as they need, no more no less.

This is all good, but one of the details is that not every organization can afford to send data or computations to a public cloud like Amazon’s AWS. There are many potential reasons for this, from legal regulations to volume cost. Out of these issues the term Private Cloud was coined. It basically represents exactly the same ideas that Douglas Parkhill described, but rather than using third party infrastructure, some organizations opt to use the same kind of technology, such as the Eucalyptus project deployed in a private infrastructure, so that the teams within the organization can still benefit from the mentioned features.

So we have the Public Cloud and the Private Cloud. Now, what would a Virtual Private Cloud be?

Well, it turns out that this is just a marketing term, purposefully coined to blur the line between a Private and a Public cloud .

The term was used in the announcement Amazon has made yesterday:

Amazon VPC enables enterprises to connect their existing infrastructure to a set of isolated AWS compute resources via a Virtual Private Network (VPN) connection, (…)

So, what is interesting about this is that this is actually not a Private Cloud, because the resources on the other side of the VPN are actually public infrastructure, and as such it doesn’t solve any of the problems which private clouds were created for solving in the first place.

Not only that, but it creates the false impression that organizations would have their own isolated resources. What isolated resources? A physical computer? Storage? Network? Of course, isolating these is not economically viable if you are charging 10 cents an hour per computer instance:

Each month, you pay for VPN Connection-hours and the amount of data transferred via the VPN connections. VPCs, subnets, VPN gateways, customer gateways, and data transferred between subnets within the same VPC are free. Charges for other AWS services, including Amazon EC2, are billed separately at published standard rates.

That doesn’t quite fit together, does it?

To complete the plot, Werner Vogels runs to his blog and screams out loud “Private Cloud is not the Cloud”, while announcing the Virtual Private Cloud which is actually a VPN to his Public Cloud, with infrastructure shared with the world.

Sure. What can I say? Well, maybe that Virtual Private Cloud is not the Private Cloud.

Accessing RESTful information efficiently

Alright, so I appreciate the idea of RESTful Web Services, but I’ve got a small dilemma I’d appreciate some opinions on.

In the RESTful Web Services book, by Leonard Richardson and Sam Ruby, there’s emphasis on making the programmable web look like the human web, by following an architecture oriented to having addressable resources rather than oriented to remote procedure calls. Through the book, the RPC (or REST-RPC, when mixed with some RESTful characteristics), is clearly downplayed. In some cases, though, it’s unclear to me what’s the extent of this advice. Humans and computers are of course very different in the nature of tasks they perform, and how well they perform them. To illustrate the point clearly, let me propose a short example.

Let’s imagine the following scenario: we are building a web site with information on a large set of modern books. In this system, we want to follow RESTful principles strictly: each book is addressable at http://example.com/book/<id>, and we can get a list of book URIs by accessing http://example.com/book/list?filter=<words>.

Now, we want to allow people to easily become aware of the newest edition of a given book. To do that, we again follow RESTful characteristics and add a, let’s say, new-editions field to the data which composes a book resource. This field contains a list of URIs of books which are more recent editions of the given book. So far so good. Looks like a nice design.

Now, we want to implement a feature which allows people to access the list of all recent editions of books in their home library, given that they know the URIs for the books because a client program stored the URIs locally in their machines. How would we go about implementing this? We certainly wouldn’t want to do 200 queries to learn about updates for 200 books which a given person has, since that’s unnecessarily heavy on the client computer, on the network, and on the server. It’s also hard to encode the resource scope (as defined in the book) in the URI, since the amount of data to define the scope (the 200 books in our case) can be arbitrarily large. This actually feels like perfectly fit for an RPC: “Hey, server, here are 200 URIs in my envelope.. let me know what are the updated books and their URIs.” I can imagine some workarounds for this, like saving a temporary list of books with PUT, and then doing the query on that temporary list’s URI, but this feels like a considerably more complex design just for the sake of purity.

When I read examples of RESTful interfaces, I usually see examples about how a Google Search API can be RESTful, for instance. Of course, Google Search is actually meant to be operated by humans, with a simple search string. But computers, unlike humans, can thankfully handle a large volume of data for us, and let us know about the interesting details only. It feels a bit like once the volume of data and the complexity of operations on that data goes up, the ability for someone to do a proper RESTful design goes down, and an RPC-style interface becomes an interesting option again.

I would be happy to learn about a nice RESTful approach to solve this kind of problem, though.