Javamex News and Development

Wednesday, October 8, 2014

The HyperLogLog structure: estimating the number of distinct values on "Big Data" with minimal memory

Imagine you have the following scenario: you want to count the number of "distinct" values that you have seen from a stream of data without actually storing all of the values. (For example, you want to answer a question such as "how many distinct IP addresses have made requests to my server" on the fly.) Conceptually, you want to place all of the values (in this case, IP addresses) into a set and then count the size of the set. But you want to do this efficiently on a potentially huge range of possible values (so that keeping the entire set in memory is not viable and placing the values in a database is not practical), even at the expense of the count being approximate.

For a moderate number of possible values, one potential implementation that I have written about before involves using a bloom filter (which is essentially an "approximate set").

But a more memory efficient solution comes in the form of the HyperLogLog algorithm. Effectively, this algorithm allows you to estimate the number of distinct values seen by gathering broad statistics on the hash codes seen so far. To use an easy-to-understand analogy presented in the article, if I told you that I had tossed a coin n times and got a maximum run of 10 heads in a row, you would estimate that I had tossed the coin fewer times than if I told you that I had got a maximum run of 20 heads in a row, and you could use some statistics to estimate n. The same essentially goes if I were to say, "out of all the hash codes of values seen so far, the maximum run of 0s in the lower bits was x", and I can take the average of various statistics of this type to hone the estimate.

It may not be used often for all applications, but if you are in the business of "counting events on big data", then the HyperLogLog algorithm appears to be a very worthwhile weapon to have in your arsenal. I'm sure I'll be adding it to mine.

An example of simple design at its best

I tried out the iOS game Hexagon! today. If you haven't come across it yet, I think it's an interesting example of how a simple but carefully implemented design can be extremely effective.

Thursday, September 25, 2014

Getting hit by the iOS 8 bug

If you haven't already, there will probably be more pressure to update to iOS 8 this year than there has been with similar updates in previous years. Out of the box, you may not immediately notice much of a change after updating from iOS 7 (which, by contrast, marked a major visual redesign compared to its predecessor). But iOS 8 is packed with new features for developers, which will slowly start to trickle into apps over the coming months if they haven't already. There are probably a large number of cautious laggards who would prefer to wait for things to "bed down" before upgrading. But sooner or later, you may well find that your favourite app can no longer be updated without upgrading to iOS 8. In the meantime, most users have surely noticed the flurry of updates arriving in the app store as developers scramble to fix issues that have come to light as a result of the iOS update.

One of my own apps illustrates the kind of subtle issue that developers are having to grapple with. The Utter French pronunciation guide, as the name suggests, makes fairly extensive use of audio playback. Like many, I tested the app using the iOS 8 beta, but one slightly subtle bug did pass me by.

As mentioned, the iOS 8 issue I discovered relates to audio playback. The issue is caused, not so much with the audio per se, but rather what happens after a sound has been played back. At the end of audio playback, the system notifies the app via what in Apple speak is called a delegate method (in effect, a callback to what is akin to an interface method implementation in Java). The system makes a call to the programmer's implementation of the audioPlayerDidFinishPlaying playback. method to indicate the end of playback. Crucially, it passes in a boolean parameter to indicate whether playback was "successful". Subtly, between iOS 7 and iOS 8, for reasons not yet clear, the value of this flag changed from TRUE to FALSE in the cases I observe, despite the fact that audio apparently plays back correctly. Other developers have also reported this issue. The reasons and circumstances remain unclear: perhaps iOS 8 has tightened up on some subtle features of audio files that it now deems to be "incorrect". Or perhaps it's just a bug that will be fixed in a subsequent maintenance release of iOS 8. But the knock-on effect for my app was that it took the FALSE flag is a signal not to play back the remaining items in a list of audio files queued for playback.

The workaround-- currently in review and hopefully to be released shortly- is simple in this case. But it illustrates how a very tiny change in the behaviour of an API call can have a knock-on effect. Multiply this kind of subtle change by who knows how many system functions and we start to see why developers are scrabbling to release updates and fixes for iOS 8...

Despite this, my advice to most users would still be to update iOS 8 fairly soon. The stream of daily updates in the App Store indicates that developers are getting on top of the issues, and it won't be long before a new app or update comes out with an iOS 8 only feature that you want!

Saturday, August 16, 2014

Horses for courses: Stroustrup's InfoWorld interview

In a recent interview for InfoWorld, C++ creator Bjarne Stroustrup talks about why he believes C++ is still going strong in 2014. To me, two statements of his argument stand out:

he attributes the popularity partly to the fact that "nothing that can handle complexity runs as fast as C++";
he acknowledges that "C++ is designed for fairly hardcore applications" and that it can be part of a mix of different languages (he mentions the fact that he himself uses C++ along with a scripting language such as Unix shell script).

Taken together, these points are broadly fair. What I do wonder, however, is to what extent he is characterising as intrinsic language features what arguably are more compiler features than language features. And inasmuch as C++ forces you to as a programmer to get a little "nearer the metal" than a language such as Java, to some extent it does so because it is based on C rather than because of the object orientation and other features added in C++.

It's also fair to say that not all uses of C++ historically have been for "fairly hardcore" applications and that, as at least one commentator on the site has pointed out, some of the popularity of C++ is surely attributable to the fact that, once a large-scale application is written in one system, it's difficult to find the momentum to shift to a whole new language or development system.

All in all, though, the interview does go to highlight that language wars are largely futile. Languages like Java have a strong footing in their specific domain. And C++ does in its domain. What is more important is to focus on the right tool for the job.

Sunday, August 10, 2014

What are the top 10 programming languages? (And is HTML a programming language anyway?)

This IEEE survey, based on code available in various repositories, concludes that Java is (by a short margin) the most popular language occurring in these sources. If you had expected choices such as Ruby and Objective C to be front runners, then you may be surprised to find that overall, the list of top languages has a distinctly "old school" feel to it: C is a close second to Java, followed by C++ and C#. Objective-C comes a distinctly underwhelming 16th.

We will avoid a debate here (but the article comments needless to say have not) around whether choices such as "HTML", "MATLAB" and "R" count as programming languages.

Of course, the choice of languages reflects the specifically the choice of languages occurring in code repositories rather than the interest in these languages across the industry as a whole. It's frankly improbable that there are more MATLAB programmers than iOS and Mac OS programmers and the vast majority of programmers in the universe have probably never even heard of R. What I suspect this graph is showing us to a large extent is the relative proportion of programmers in various languages who share their code in repositories versus those who keep it under their hat.

Still, for those of us who started learning to program in the 80s, it does seem to indicate that our trusty C skills are not going to be obsolete any time soon... :)

Saturday, August 9, 2014

Will new iOS device sizes open up new markets for developers?

A pertinent point is raised in this Wired article on the new iOS device sizes that are apparently coming later this year. In countries like the US and the UK, where the use of multiple devices is typical, the availability of a larger iPhone or smaller iPad is arguably not a game-changer but simply a "small part of the mix".

But in other countries where the mobile phone may be more of a primary device for users, the availability of the phablet form factor may more strongly influence users' choice of platform. For Apple to have such devices as part of its mix may then open up new markets for developers. As I highlighted in a previous post, the option in Xcode to test our apps on arbitrary screen sizes should probably not be overlooked in the coming weeks...!

Thursday, August 7, 2014

Alleged dates for iPhone 6 (and presumably iOS 8) release

The iPhone 6 release date is now being widely touted as 9 September 2014. It seems fairly likely that some differently sized devices will be released on or around that date. Whether or not the rumours about such devices being specifically sized 4.7" and 5.5" remains to be seen-- as will whether such devices actually turn out to be telephones or something more like versions of the iPad. Or put another way, is the ability to test your apps on an "iPad" of arbitrary size in Xcode a red herring...?

What this does means is that as developers, we have a schedule in place for testing and finishing development of new apps to take advantage of the more "iconic" of the new iOS and Mac OS features that Apple will probably be pushing. If you want to be among the first to take advantage of the new app extensions or continuity features, you have a month or so left! Now is also a good time to start testing your existing apps to see how they will cope with arbitrary screen sizes to make sure that-- as and when the newly sized devices actually appear-- it won't be too much effort to complete the process and get new versions of your apps out quickly if necessary...