Wednesday, 30 March 2016

Reports Of The Death Of CAPTCHAs May Be Premature

Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHAs) are not quite as dead as I'd thought.  In order to be immune to bots CAPTCHAs have become so difficult that I find it difficult to prove I'm a human with some of the images they ask me to identify.  I'd assumed that they were becoming so user unfriendly that they had effectively lost the battle in their fight with the bot armies.

However, a paper I've just been reading might have the answer. With the great title "No Bot Expects the Deep CAPTCHA!" it presents a new technique called DeepCAPTCHA which abandons the ever increasing amount of general adversarial noise being added to CAPTCHA images to introduce a concept called "immutable adversarial noise" (IAN).

We've known for some time the CAPTCHAs are vulnerable to automated systems.  I wrote about some work my colleagues had done on just this problem in various banking systems back in 2012.

Adversarial noise is something that is encountered in a number of digital domains.  There has been much work on how to overcome such noise, and even on how to overcome it using ever increasing efficiency. Not entirely surprising then that the swarms of bots have been employing some of these techniques to overcome what developers had hoped were CAPTCHAs too noisy for bots to interpret.

The immutable adversarial noise proposed with DeepCAPTCHA is designed to specifically defeat Deep Learning classiļ¬cation tools (which have been used to identify everything from images to acoustics to mobile phone records - try this demo from Toronto to see it work).  Deep Learning is considered by most as the main threat to CAPTCHAs and it has caused many to turn to Google's reCAPTCHA system, even though it actually breaks some rules of how CAPTACHAs should operate.

The level of noise is kept low in DeepCAPTCHA so as to make it easier for humans, but is structured so that it is far more difficult to filter it out, which is how many automated system improve their ability to defeat the CAPTCHAs.  The technique was applied to text CAPTCAHs and then image CAPTCHAs:

The proof of concept was subjected to a number of different attacks including random guessing, machine learning and filtering. The researchers also made the very wise decision to assume in the attacks that the attacker had full knowledge of the algorithm employed.

The results presented show it stands up remarkably well whilst also being considered highly usable.  Perhaps, like Mark Twain, reports of the death of CAPTCHAs are premature.