Tuesday, September 27, 2011

Even 'random junk' may need encryption

There appear to be some interesting legal cases springing up around the fact that software developers have been taking "raw" Unique Device IDs and using them to store data against a given user 'in the cloud'. UDIDs essentially resemble a long 'random number' rather than directly encoding any user-identifiable information, so you may be forgiven for wondering what the issue is. The problem comes when applications have then stored user-identifiable information (such as social network data) alongside the UDID and make this information available to other applications, thus leading to a 'leaking' of user-identifiable information on the basis of this shared UDID.

The solution is in principle simple: append the UDID to an application-specific code or 'salt' and then encode this combination using a secure hash scheme such as SHA-256. The application-specific salt doesn't need to be a secret. But doing this means that I cannot compare a hash for user X from one application with a hash for user X from another application and deduce that they are for the same user. Again, in principle.

However, this scheme does rely on UDIDs containing sufficient entropy. Since the number of devices of a particular model sold is maybe in the tens or at most hundreds of millions (iOS devices appear to have sold something in the order of 200 million so far), if it is possible to make a good prediction of which range of the theoretically possible UDIDs has actually been allocated, then I can simply pre-compute all the potential combinations of (allocated hashes, application ID) for two given applications and find all the matches.

I haven't yet looked into all the details, but it appears that iOS 5 will remove the ability to read device-wide UDIDs in favour of application-specific UDIDs, presumably generated using a scheme such as the above. Although this is arguably the scheme that should have been adopted from the outset, if true, it will interesting to see what incompatibilities this throws up.

So why didn't people think of this in the first place? I wonder if one of the issues is that to a human, UDIDs do just look like 'random junk': it's just a string of random numbers, right, so why would you bother encrypting it? It's a good example of how when deciding when and how to employ encryption, we have to think not only about the data itself but also about protocols and practices governing how that data is used and stored.

No comments: