Thursday, 26 January 2012

When Is Random Really Random?

A surprising amount of what we do in computer security relies upon the use of random numbers and yet not many of us actually take the time to think about how these numbers are being generated.  We blithely assume that our computer, when required, can generate a truly random number.  But wait! How can a machine that works in a deterministic manner generate something that is truly random.  The simple answer is that it can't.  The best we can do is to generate a number that is pseudorandom.

The fact that we rely upon pseudorandom numbers is a potential problem for IT security. After all, if a machine is using a known algorithm to generate a number that your system then treats as random, what is to stop an attacker from calculating that same number if he knows your algorithm. It is a fundamental truth of any strong computer security that you must assume that an attacker knows the algorithms that you have used. So, perhaps one should spend a little more time understanding how pseudorandom numbers are generated in order that they are not relied upon inappropriately.

The measure of random in a number is known as entropy.  Not the entropy as physicists use it, but entropy as cryptographers use it.  Entropy is, in essence, how uncertain you are about a number.  This means that the entropy is not necessarily related to the number of bits in a number but instead to the number of possible values that number could have taken.  Imagine being able to discard the number of bits in a number about which you were certain.  The number of bits that remain are the number of bits of entropy. Suppose, for example, a 100 bit number could take on two possible values, then you only need one bit to differentiate those two values.  Hence, such a number would be described as having 1bit of entropy.  Obviously this is an extreme case, and I have not complicated the issue by factoring in the different types of probability distribution. 
Most mathematicians tend to define the entropy of a number X as:

H(X) ≔ − ∑P(X=x) log₂P(X=x)

where the summation is from 0 to a number x, and P(X=x) is the probability of X taking the value x. 

So, how do you generate a number that has to greatest possible entropy. There has been much research on using activities such as mouse movements or keystrokes to generate a number that can be considered random.  But so far it appears that the activities picked to generate the numbers have proved to be not quite as random as first thought: some typists are remarkably accurate and consistent!  Other work has focused on incorporating some form of physical device into a computer that can generate truly random numbers based upon provably chaotic behaviour.  This has two drawbacks: generating large random numbers can take an unacceptably long time, and the device might fail rendering the numbers predictable, without you knowing.

Which brings us back to the generation of pseudorandom numbers.  Most accept that this is the best method of generating numbers with sufficient entropy, whilst making the mechanism efficient and accessible enough for widespread use in computing.  There are many algorithms in the literature but I would recommend an approach such as described in Fortuna which was an improvement by Ferguson, Schneier and Koho on a previous method called Yarrow.  Fortuna, like most approaches, relies upon a seed  which is a relatively small but truly random number.  But, it allows for the use of modern encryption techniques such as AES which mean that although the algorithm is known it is highly unlikely that the resulting pseudorandom number sequence can be reproduced.  What Fortuna shows is that rather than rely upon standard libraries on their own, you construct a routine that, yes, uses standard libraries, but recognises the potential attacks against those and seeks to minimise the threat by incorporating elements such as AES and SHA thereby reserving some secret information to the local machine, decreasing the chances that a third party could reproduce your number(s).

So, in short, the best way to use random numbers is by generating pseudrandom numbers but in such a way that "unreliable" sources of randomness are replaced by effectively creating your own store of entropy upon which you can draw and keeping the means by which that store is generated secret by virtue of a secret key.

For a superb, free book that goes into great depth about random number generation, see Luc Devroye's book here (originally published with Springer-Verlag, New York, 1986).

Friday, 20 January 2012

An Emerging Threat To Public-Key Encryption

One of the perennial problems with symmetric-key encryption has been establishing a secure channel over which to pass the secret key. It has always been much easier to compromise the transmission of the keys than to try to find some weakness in the encryption algorithm. Military and government organisations have put in place elaborate methods of passing secret keys: they pass secrets more generally so using similar channels to pass an encryption key is not a great leap.

However, as everyone has become more connected, and especially with the commercialisation of the Internet, encryption has become a requirement for the vast majority of networked users. Thus, the traditional methods of passing secret keys is impractical, if only because you might not actually know who you want to communicate with in an encrypted fashion.

Diffie-Helman Key Exchange

It was realised in the 1970’s that encryption needed to be more accessible, so a great deal of work was done on algorithms that could ensure that a key had been passed over relatively insecure channels was not compromised. Leaders in the field were Whifield Diffie and Martin Helman. In doing this work, it was realised that there was another way of tackling the problem. In the late 1970’s Diffie and Helman published their seminal paper entitled “New Directions in Cryptography” and public-key encryption was born.
When the implementation of the early public-key encryption methods was compared to symmetric-key encryption it was found that the public-key encryption was significantly slower and hence communication using public-key encryption alone was, at the very least, going to require far more processing power than would otherwise have been the case. But wait! The original problem being studied was secure key transmission so why not use public-key encryption to securely transmit the symmetric-key, and then use the faster, more efficient symmetric-key encryption algorithms. Iin essence, that is how most so called public-key encryption works today.

The majority of public-key encryption algorithms rely upon a mathematical subtlety. One of the earliest was based upon prime numbers. If one has a number (N) that is derived from multiplying two prime numbers, and N is very large, it is practically impossible to calculate what the two constituent prime numbers were (known as factorisation). It’s not that you can’t calculate the two prime numbers. There are many algorithms for doing so. It is that it takes such a long time to successfully compute the algorithm (sometimes thousands of years) that by the time you finish, the information you can recover is worthless. Even with the huge budgets available to governments, it is possible to reduce these timescales only marginally.

Public-key algorithms therefore are able to use one of the prime number constituents of N as part of the public-key element of the encryption process without fearing that the other might be discovered. It’s a fact of life that even with the massive increases in computing power over recent years, traditional algorithms for factorising these large numbers means public-key encryption is still quite secure.

Of course, “secure” is a relative word and there are many ways of recovering the private element that was used to derive N. These do not involve some fiendishly clever mathematical method (although a small army of mathematicians are seeking to do this) but rather simple methods such as accessing the PC that holds the private element! As ever, the weakest link determines the strength of the chain. Having said that, there is now an emerging threat the public-key encryption which does not rely upon such trivial methods: Quantum Computing, a field first introduced by Richard Feynman in 1982. To understand why Quantum Computing poses a threat one first needs to understand a little about how it works.

In the computers that we are familiar with, processing is done using bits. A bit has two possible values: 0 and 1. A quantum computer uses qubits. A qubit can also have the values 0 and 1, but also any combination of the two simultaneously. In quantum physics this is called superposition. So, if you have 2 qubits you can have 4 possible states, 3 qubits gives 8 possible states; and all simultaneously. In a bit-based computer you have the same number of possible states but only one exists at any one time. It is the fact that these states can exist simultaneously in a quantum computer which is both counter-intuitive and extraordinarily powerful.

These qubits are manipulated using quantum logic gates in the same way that conventional computation is done by manipulating bits using logic gates. The detailed maths is not important here. Suffice to say that you can operate on multiple simultaneous states, thereby increasing the amount of computation you can undertake over what you could otherwise do in a conventional computer. So, if you had a 2 qubit quantum computer you could theoretically compute at 4 times the speed of a 2 bit conventional computer. There are a few problems though, in translating theory into practice.

First, the algorithms developed over many years for conventional computers have been optimised for their architecture, and different algorithms are needed to run a quantum computer. Hence, trying to compare speeds of conventional and quantum computers is like comparing apples with bananas. However, one of the earliest algorithms developed for quantum computing was by Peter Shor. Shor’s 1994 algorithm was specifically designed to factorise numbers into their prime number components. Shor’s algorithm runs on a quantum computer in polynomial time, which means that for a number N it takes only logN to successfully complete the algorithm. Even with the largest numbers in use today in public-key encryption, that means that it is perfectly feasible to factorise the number in meaningful timescale.

Since Peter Shor developed his quantum computing algorithm many others have been developed, and it is worth noting that a significant number of these algorithms are aimed at breaking the underlying mathematics that supports more recent public-key encryption. The fact that methods other than the use of prime numbers have been developed is being very quickly followed by their quantum computing algorithm counterpart.

A Young Heisenberg
Second, there is the much quoted, but less well understood, Heisenberg’s Uncertainty Principle. The bottom line is that this principle tells us that if you observe something then you affect it. Hence, measuring a qubit causes it to adopt one state but that state might not have been the state resulting from the computation but could have been altered by you observing it. So, the moment you try to measure the answer calculated by your superfast quantum computer it loses its ability to give you the correct answer. That would seem to render quantum computing rather pointless. But, there is another quantum effect that can be employed: quantum entanglement. This is where, if two objects have previously interacted, they stay “connected” such that by studying the state of one object you can determine the state of the other. Hence, you can determine the state of the qubit with your answer by studying another object with which it is entangled. Again, this is counter-intuitive, but has been proven, so all one really needs to know is that there is a way of getting your answer out of your quantum computer.

Quantum Computer On An Optical Bench
Lastly, there is a small matter of implementation. Until recently this was a show stopper. Typically quantum computers are based on light, as photons are relatively easy to work with and measure. However, anyone who has seen an optical bench will know that they are enormous. They require great weight to prevent even the slightest vibration and they are prone to all manner of environmental factors. Hardly the small “chips” we are used to! Also, typically these implementations are aimed at running one algorithm: the algorithm is built into the design.

Having said that it wasn’t that long ago that conventional computers with far less processing power than the average PC required air conditioned luxury and small army of attendants to keep them functioning. It did not take very long for the size to shrink dramatically and for the environmental requirements to be quite relaxed. Not surprising then that there is a company in Canada (D Wave Systems Inc) who already offer access to quantum based computing. It’s expensive and the size of a room, but we all know that won’t be the case for long.
D-Wave's Bold Statement On Their Website

2011 bought about some major developments which might well make 2012 the year quantum computing comes of age. Most significant of these was by a team at Bristol University, who developed a small chip which housed a quantum computer. It had only 2 qubits but it was of a size that could be used for the mass market, and, crucially, it was programmable.

We are now entering a new era where we have programmable, relatively inexpensive, relatively small, quantum computers visible on the horizon, and we know that such computers have the potential to undermine the mathematics upon which current public-key encryption depends. Given all of that, maybe it’s time to reconsider the role of public-key encryption and where it might be more sensible to rely wholly on symmetric-key encryption.

Thursday, 19 January 2012

Video Introducing Cyber Security At Surrey

Here is a brief video we produced at the University of Surrey to introduce cyber security, and the work we are doing there:

Saturday, 7 January 2012

How People Really Read Your Page

If, like me, you use Facebook, Twitter, LinkedIn (and a whole host of other social media sites) then, again like me, you might have wondered how people actually read your pages.  I've long wondered how much of the page visitors read.  Especially so after I read the statistics about how little time visitors spend on each page on the web. In a matter of a few seconds, what information is the reader really scanning?

I've seen a lot of very impressive research being done on the subject and many erudite papers being published.  All of this is backed by a great deal of experimentation which involves complex equipment and analysis.  Hence, when I saw the product being offered by Eyetrack I thought that it represented a fun, accessible way that everyone can answer the question about their own sites.

Using a webcam, plus a bit of calibartion, the software will track the eyes of a reader to a specific page.  The data collected can then be turned into a report.  Not some great long winded piece of statistics but an easy to understand visual map of how the visitor uses your page. 

You might not be surprised to learn that the item most Facebook visitors look are the faces.  In addition to how much time the user is spending on each part of the page and the corollary, what they don't read, you can also track how a visitor visually moves around the page.

And, it's not the same for each type of social networking site.  Take, for example, LinkedIn (below) and compare it with Facebook (above):

It would appear that visitors do actually read the text on LinkedIn, but very much at the "headline" level.  All those extra bits of infromation added by LinkedIn to the right hand pane about who esle was viewed, etc seem to go to waste.

And then, of course, there's Twitter:
Are viewers reading the first one or two words of a Tweet only? That's all the time you have to capture their attention!  But compare Twitter with Facebook: it does appear to be text not photos being scanned in Twitter.

One very interesting area that is worth studying is the deviation in viewing patterns caused by the client that is being used to view the page.  With so many apps now available for viewing each of the social networking sites, it might be interesting to see if the viewing pattern is in any way governed by the way it is presented.  Twitter in particular has a large variety of clients, so how does that affect the viewing:

So, thanks to Eyetrack for enabling us all to do a bit of research that is both fun and produces useful results.