Tuesday, July 7, 2009

Social Security Numbers and "security": a case of misguided assumption

The research on predicting social security numbers published today by Alessandro Acquisti and Ralph Gross from Carnegie Mellon University unfortunately highlights a fairly classic case of something we do all too often: basing security on something that was never secure-- and never really intended to be secure-- in the first place.

Much of the information that we readily pretend is a valid authentication key (such as Mother's Maiden Name, Date of Birth and Post Code-- and indeed Social Security Number) has really always been publicly available information. The parameter that has changed is how financially viable it is for a criminal to access the public records necessary to deduce this "secret" information. The SSN allocation scheme is perfectly well documented, public information, and the scheme clearly has no element of security built into it whatsoever. The historical origins of the scheme are also documented: the scheme has no security now, never did and was never intended to.

So what do we need to do about this? We need to understand where security actually lies, and not pretend that it exists in places where it doesn't. In most cases, the "security" does not currently lie in whether somebody can guess your PIN number, forge your signature, find out your mother's maiden name or guess the last couple of digits of your SSN. Our measures for preventing discovery of these largely unsecret "secrets" are predictably diabolical, and they are thus extremely weak forms of authentication. A transaction that is "authenticated" by an SSN or signature is essentially unauthenticated and the security of that transaction relies on it being quickly reversed in the event of fraud. So long as users, banks and lawmakers all understand this, the situation isn't so dire. The big danger comes when we pretend that there is security and authentication where there really isn't.

Friday, July 3, 2009

Java, XML and XPath

The Javamex site now includes a basic introduction to using XML and XPath in Java. Java provides various means to read XML, but XPath is generally the most practical for moderately-sized XML documents. XPath effectively allows you to treat a document as a file system and refer to elements by their "path" within the document.

Sunday, May 31, 2009

Random numbers in Java

A few updates have been made to the site's section on generating random numbers in Java. Random numbers can crop up in all sorts of applications, and it's worth having a good understanding of them.

Traditionally, the bread-and-butter means of random number generation has been the java.util.Ranom class. The technique it uses (see the section on the java.util.Random algorithm for more details) is still suitable for some very casual applications of random number generation, such as some simple games. But in many cases, you should probably be thinking of moving away from java.util.Random towards a higher quality generator. As discussed in the section, problems with java.util.Random include its low period, biases in the different bits of the numbers generated, and its unsuitability for generating combinations of values. Using a weak random generator can have side effects such as the following:
  • it can skew the results of a test harness that introduces subtle biases in the code path due to the random number generator;
  • when testing the performance of data structures, and in various simulations, a generator such as java.util.Random will not produce a good range of possible combinations of values, giving false results;
  • the series of numbers can be predictable, leading to disasterous results if used as the basis for security (e.g. generating a random encryption key, nonce or session ID).
One interesting technique, published in 2003 by Goerge Marsaglia, is the XORShift generator. This generates medium-quality random numbers in a few simple machine instructions. For example, given a Java long, x, we can generate the next random number as simply as follows (in fact, various combinations of shift values are possible, as discussed in Marsaglia's paper):


x ^= (x << 21);
x ^= (x >>> 35);
x ^= (x << 4);


continually executing these three lines will cycle through all (2^64)-1 possible values of x (i.e. all values of a long except zero) in pseudorandom order. Initialising x with the value of System.nanoTime(), or some other "random" seed, gives us a fast, medium-quality generator suitable for, say, generating random game data. It will make an excellent choice in many J2ME games, for example.

We also consider a Java implementation of the combined generator suggested by the Numerical Recipes authors, which generates numbers within a similar order of execution time as java.util.Random, but with a higher period and quality.

For applications where security depends on the quality and unpredictability of the random numbers generated, Java's SecureRandom class provides cryptographic strength random numbers, though is some 20-30 times slower than the other techniques.

Saturday, May 30, 2009

New article: how to choose a Java collection

A new addition to the Java Collections section of the Javamex site looks at a question that, perhaps unsurprisingly, crops up fairly frequently: which collections class should you use for a given task? The various collection classes provide a powerful means of managing objects and data in memory, but they're so powerful that choosing between them can sometimes be a bit daunting.

The approach that I take is to split the question firstly into two sub-questions. Firstly, what is the general type of structure that you need? That is, how is the data basically going to be organised? Usually, there is not too much confusion between a list and a map, especially if you consider whether or not you need to answer the question "for a given X, what is the Y"? But the difference between a set and a list is sometimes not well understood, or at least, not considered. With a little bit of thought, it is usually possible to decide in advance whether the purpose of your collection is to decide "is something there or not"? The trick is for programmers to remember to ask that question in the first place, and not simply plump for a list regardless.

Then, having decided on a list, map, set or queue, the next subquestion is obviously which particular flavour is required. In the case of a list, the cases where you wouldn't plump for an ArrayList are relatively uncommon and it should be clear in your head if you do choose something else that you have a "special case".

But in the case of maps, sets and queues, there are definitely more "horses for courses". But, especially in the first two, there are essentially two factors to consider: concurrency and ordering. As in the article, when the various choices are presented in tabular form according to these criteria, things become a little easier. As mentioned, a key rule of thumb is to choose the class that provides the minimum features that you require. If you don't need ordering, don't pay for it.

Java queues present a slightly complex set of choices, but in at least some cases-- especially a DelayQueue or SynchronousQueue-- it should be really clear that you need that class for a special case scenario. It should also be borne in mind that the executors framework means that in many common producer-consumer scenarios, you don't explicitly have to deal with the underlying job queue.

As usual, further questions and comments to any of the articles on Javamex are welcome on either this blog or the associated Java forum.

Wednesday, May 27, 2009

Patently stupid...

Like anyone, I have my fair share of stop-the-world-I-want-to-get-off moments. But I really did have to check the calendar to make sure it wasn't April 1st today when I read through IBM's patent application for their "solution for providing real-time validation of text input fields using regular expression evaluation during text entry" (20090132950). Wading through all the verbiage, it really does appear that IBM think they have "invented" evaluating a regular expression inside an onChange event handler.

In terms of this specific patent application, I don't think many programmers are too worried that they're suddenly going to be prosecuted for calling RegExp in the wrong place-- needless to say, plenty of prior art is being cited, in case it were needed.

But the case does leave some other more interesting questions. Why does IBM of all companies think it needs to stoop this low? And how can it help us ditinguish the "bleeding obvious" from the "possibly patentable"?

I'm really at a loss to answer the first question. Possibly we're simply looking at a clerical error on the part of an overenthusiastic junior employee. But for a company that is supposedly at the forefront of computing research and invention, claiming patent rights on what amounts to a chain of JavaScript API calls doesn't really help them uphold their reputation.

In answer to the second question, I'm reminded of a comment by Donald Knuth, that he regularly sees patents granted to "solutions" to problems that he sets as undergraduate homework questions. If "how do you validate input as the user types into a web form?" had been posted as a question on one of the various Internet programming forums (and probably it has...), it would almost certainly have been tagged as "smells like homework". Few programmers, I suspect, would have tagged it as "goodness me how novel-- I think we should patent this".

I really have no idea how in the field of software patents you can concretely separate "significant invention" from "doing the bleeding obvious" (though, as in this case, I can sometimes recognise the latter when I see it!). But in order for a software solution to be patentable, I would at least expect to see:

- evidence of significant empirical research necessary to find the solution
- an invention of an actual algorithm, with perhaps some guage of number of API calls per other instructions/lines of code in order to determine if a new algorithm had actually been invented.

I would also like to see severe financial penalities for cases such as this, where a company is clearly attempting to use its abundant resources to abuse the system. The small developers that the patent system was originally designed to protect really have to think twice before committing to the thousands of dollars per year that it costs to keep a patent going. A company such as IBM really has nothing to loose by continually paying for nonesensical patent applications out of petty cash on the off-chance that at some point, one of them will sucessfully fly over the cuckoo's nest.

In the case of this particular patent application, I really really hope for the sake of human sanity that common sense fianlly prevails...

Saturday, May 23, 2009

Java on your Kindle

It's excellent to see an array of Java programming books now available on the Kindle. Various of these books, which you may not have considered buying due to their cost, are available at a sometimes significantly reduced price on the Kindle.

Of notable interest to Javamex readers will be Brian Goetz's venerable Java Concurrency in Practice, which is really something of a bible for information on the Java Memory Model and the Java 5 concurrency library. As I say in the review, I can pretty much guarantee that if you're writing concurrent code (as many of us either are or will soon have to, not just on the server, but increasingly client-side), then Java Concurrency in Practice will allow you to fix various bugs in your code that you were possibly unaware of (it certainly fixed some in mine!), as well as help you make decisions about how to architecture your program around the Java 5 concurrency utilities.

For a more light-hearted read, but still an extremely enlightening one, Java Puzzlers remains excellent as ever. Josh Bloch's Effective Java got a whole new lease of life when its second edition was published, and its addition to the Kindle repertoire is most welcome.

Also worthy of note are the Core Java books. These, and in particular the second volume of Advanced Features, is not the most succinct of works, but covers various Java topics that you don't readily see explained in detail in other books.

Java for beginners turorial: what would you like to see?

Readers of the Javamex web site may or may not have seen the first pages of the site's Java for beginners tutorial, which covers a number of very basic Java topics such as variables, control structures (for loops, if/else), plus information on Java arrays.

Various topics are going to be added to the tutorial over the coming weeks. But how do you think it should be expanded? Are you just starting out in Java and having trouble with a particular topic? Or are you an experienced Java programmer, but remember having particular trouble with some aspect of learning Java with the tutorials that you used? Any feedback is welcome on this blog entry.