Monday, 9 December 2013

How Are Passwords Stored?


The past two years have seen some dramatic leaks of passwords including from well-known names such as LinkedIn and Adobe. These events shone a light on how our passwords are being stored. If someone is daft enough to store our passwords as plain text then they do not deserve to be trusted with them. Most attempt to protect passwords by using a “hash” of our passwords.
Hashing function have been around since the early 1950s and were developed to allow, for
example, fast comparison of database entries to see if there were duplicates.  Many hash functions have been developed over the years but they all do basically the same thing: they take an arbitrarily long set of characters and transform it into a fixed length, much shorter string of characters.  For the same set of input characters you would always end up with the same output. However, the likelihood of ending up with the same shortened output from differing input should be negligible: known as a “collision”.
Why does that help? Well, on the relatively slow machines of the time it was better to compare shorter strings of characters when looking for matches. Plus the development of hash functions focussed on making the hash function very fast. Hence, producing hashes and using them to find, for example, a match was significantly faster than trying to do so using the original data.



Then came the development of “cryptographic hashes”, which most refer to today simply as “hashes”.  These secure hashes are like original hash functions except that they put extra emphasis on preventing someone from determining anything about the input based solely on the hashed value: a one-way or trap door function. It was very difficult anyway as in compressing the length of the data to produce the hash you have always lost information: so called “lossy compression”. But cryptographic hashes are tested specifically for their ability to prevent reversing.
An obvious use for these cryptographic hashes was for password management.  Instead of storing our passwords in plain text, a system could now receive our password, hash it and compare it with the stored hash.  If the two matched it was almost certain that the password we had sent was correct. Hash functions appeared to do this with names like SHA1 and MD5, with some becoming standards recommended by many governments for securing passwords on their systems.  As time has passed researchers have found that some of these hash functions have weaknesses and so are not quite as “one way” as had been hoped.  Hence, you start to notice that major vendors have begun to retire certain algorithms in favour of newer ones.

Unfortunately, as time moved on computer became faster and faster….and faster still.  So much so that even your home computer is capable of undertaking millions of comparisons a second.  Plus the hashing algorithms have become well known if only because systems developers were encourage to implement them to protect passwords.   This led to the development of what is known as the “dictionary attack” which rely upon simple brute force. 
In essence it’s simple. You take a dictionary of words that might be used as passwords, you hash it yourself and you compare your resulting hashes with the hashed password you have access to. When you have a match you look back at your dictionary to see what the original plaintext word was ie the password.  As it still takes an appreciable time to hash the dictionary you are using to mount the attack then people began pre-computing the hashed forms of the dictionary.  The resulting set of hashes became known as “rainbow tables”.  Now all you have to do is compare stolen hashed passwords with your rainbow table, find a match and look back in your index to find the original word/password.
Using these techniques hackers have been able to steal huge sets of hashed passwords (sometimes hundreds of thousands) and almost before the keeper knows they are missing the hackers can have computed the original passwords. The answer is to add a touch of salt.
A “salt” is a randomly generated set of characters which you add (before or after) your password characters and then pass it through your hashing function. Now the hacker’s dictionary or rainbow tables should theoretically be useless. But, as ever, whilst the theory is sound the way system developers sprinkle their salt can give the hackers another route in.  Typical mistakes are:

1.       Choosing a random character string that is not truly random.  Computers have great difficulty in generating anything that is random so this can be difficult and some developers in the past have taken short cuts assuming that no one will guess how they have generated their “random” characters. They were wrong.

2.       Choosing a random character string that is too short.  If it is short enough there are only so many possible characters that it could be so it is possible to calculate all possible values and simply add those to your dictionary.

3.       Using the same random character set for every password. One of the greatest helps a cryptographer can be to a cryptanalyst (who is trying to break their code) is to reuse the same string of characters.  Once found, this salt will allow the attacker to compute all the passwords almost as if the salt had never been added.

Ideally systems would store the salt on a separate system to the username and hashed password. However, practical considerations often mean this is not done so the hacker might be able to obtain the salt as well as the username and password.  From this they can of course then simply compute the original passwords.  However, because of the way in which it has to be done it is a much slower process and if a hacker is attempting to crack thousands of passwords the process will take much longer than they want.  So, hackers have moved on from using computers as you might recognise them to harness one particular part of your computer: the Graphics Processor Unit (GPU).
Whilst most people have been aware that the processors in their home computers have become faster and faster, the GPU has been silently developing to achieve quite astronomical speeds.  They can achieve such speeds because they are dedicated to very specific types of computing such as decoding video or generating 3D graphics.  GPUs can be optimised to dedicate more of their processing power to these graphics functions – they don’t need to be able to do the general purpose functions that your Central Processing Unit (CPU) which is the brain of your computer must be capable of.
However, for some time now hackers (or particularly “password crackers”) have worked out how to combine many of these GPUS together to produce your own mini-supercomputer.  They sit on a desktop and can be built from parts routinely available on the Internet.  The software needed to run these GPUs in parallel and the software to make use of them to crack the passwords as explained above are freely available to download, if you know where to look.  Suddenly, although salted hashes makes it more difficult, the arms race swings back in favour of those seeking to find your password.
But, the war is not over. It might seem obvious, but it is only relatively recently that those seeking to protect passwords have started to research hashing functions that are deliberately slow.  Whereas, because of their original purpose, hash functions were always designed to be fast and efficient, some of the latest hash functions are deliberately slow. The idea I that you cause the hackers/crackers so much inconvenience, even with their home built supercomputers that they move onto easier targets.  You can’t stop them eventually calculating your password but you can make it take a long time.

There is one way that you can help enormously: choose a “strong password” which is simply a set of characters that is unlikely to appear in the hacker’s dictionary.  That’s why many system insist that you use unusual characters in your password.  For example, if you chose a phrase like “my dog has big ears”, you could write that as “Myd0ghasb!gears”.  The other thing you can do is not to reuse passwords.  Much easier said than done but sadly not all systems are developed to the same high standards so your password is only as secure as the weakest of those systems: pointless having slow salted hashes on one system I the same password is stored on a system storing your password in plaintext.