I wrote several weeks ago on research done by Kings College, London, which appeared to show that just over 50% of hidden services were being used for illicit purposes. I wasn't surprised then to read a report from Intelliagg using the facilities from Darksum, which suggests that slightly less than 50% of such sites were involved in activities that would be considered illegal in the UK or US. Such a small variation in results could easily be a result of differences in definitions of illicit behaviour, those sites that were available during each survey, and so on.
The search engine used by Darksum does appear to be highly credible. I haven't had a chance to use it but it claims to use the same technology that DARPA used in their MEMEX programme, which was probably the best such search two years ago. I'm minded to take the results produced by this search as accurate. However, there are a couple of caveats one needs to apply to the results, exactly as with the results from Kings College.
First, the results were produced in February before the recent sudden increases in .onion addresses (discussed previously in this blog - just search for Tor). They claim to have found approximately 30,000 sites, which is quite consistent with the level being reported by Tor as unique .onion addresses prior to the recent rises:
I think we can assume that the size if they would find if they repeated the exercise today would be approximately twice that found in this survey.
Second, of the sites noted only 46% were accessible. Many hidden services are only transient sites but by basing your analysis of the level of illegal activity on less than half the published addresses I wonder if it might not be skewing the results. For example, are those sites more readily accessible those more likely to be involved in illicit activities.
Third, the classification of "illegality" is that which breaks a law in the UK or US. I think this is a good proxy (but then I would wouldn't I as I live in the UK) and is probably the one best suited to answering the questions posed by law enforcement agencies on the subject. The Darksum search was categorised specifically as:
Fourth, the number of sites involved in illegal activities does not equal the same percentage of the hidden services being accessed for illegal purposes. It is interesting to compare the work published by the Global Commission on Internet Governance in September 2015 which looked at the popularity of the various hidden services sites and found that approximately 80% of the requests were for illegal goods and services.
It is always dangerous to conflate two datasets but on balance it would appear that the (as of February) there were approximately 15,000 persistent active sites on the dark web of which only about half offered illegal goods and services, but that half (lets call it 7500) attracted the vast majority of the traffic on the dark web.
I think it is fair to say that the data currently available is not definitive but it is highly suggestive that the majority use of the hidden services that remain active on Tor is illegal in nature.
Of course, as I have noted many times before, that doesn't tell us much about the overall nature of the use of Tor. Although Cloudflare are confident enough to report that of the traffic they see 94% of the requests seen across Tor were malicious in some way (often as botnet command and control traffic). I'm not sure of their methodology so it's difficult to know whether this is a reliable figure.
Perhaps it would be better to flip the question and ask, if Tor were to disappear tomorrow how many dissidents, freedom fighters and the like would be left without a channel over which to communicate with the outside world? If only a small number of people were to be disenfranchised if Tor were disabled would it still be worth tolerating the remainder. Research is required to gather such data.
The Darksum study did show a few very interesting demographics about the hidden services the persist.
The vast majority appears to be in English. It is also interesting when you look at the traffic streams and how they flow in Tor:
Even the base data from Tor shows that the users of Tor (not necessarily hidden services) is not heavily biased towards countries associated with censorship. The top tem countries by users are:
|Mean daily users|
374616 (19.29 %)
221827 (11.42 %)
192293 (9.90 %)
117395 (6.05 %)
86590 (4.46 %)
56648 (2.92 %)
55469 (2.86 %)
50374 (2.59 %)
49624 (2.56 %)
42160 (2.17 %)
And when you look at the raw data you find that some countries you might expect Tor use from have only a handful of users (sometimes as low as 5).
From the evidence we have at present it does appear that Tor is being used for illegal activity. Although the precise figures are in flux I would hazard a guess that the data supports the view that the majority of the dark web is for illicit purposes, and a significant portion of Tor in general is likewise. I also think that the argument put forward by many that Tor is there to support those who do not have a free voice in their own country is very much open to question: the data does not support this view.
The question remains "Who is Tor really for?". Based on the data available to date, I would contend that outside of those who are using it for obviously illegal activities, Tor is probably being used by people in outwardly democratic countries where the users do not trust their governments. Maybe that is a good enough reason for it to continue to exist but I don't think we should hide behind an argument that it is really there to support some large swath of people in openly oppressive regimes.
I also suspect that many use Tor simply as a means of evading firewalls or other corporate blocks/tracking to visit sites that are not illegal, but are not necessarily appropriate at a particular time eg visiting Facebook at work.
I still think the question we need to answer is just how many people would be denied a voice if Tor were to disappear. Research is required and a different approach perhaps to the searches, such as those above, conducted to date.