As mentioned the other day, security provider Gamballa released a study stating that some 11% of global botnet command&control servers were hosted by 1&1 Internet AG. Heise, presumably Germany’s most influential IT related news portal, brought the story, mostly citing the findings of the study. 1&1 was not amused about the journalistic performance. The flaws (de) in Gamballa’s study have been quickly uncovered by Thorsten Kraft of 1&1‘s Anti-Abuse team, which is closely linked to the consumer-focussed German Anti-Botnet advisory centre. Heise released another article explaining the flaws in the Gamballa report, and Gamballa has rightly taken its analysis down. The underlying lapse, according the reports linked above, was that Gamballa had allegedly added both ordinary, non-infected infrastructure servers and sinkhole and honeypot machines to the list of C&C server.
Quantitative analysis of computer security incidents is a terrifically meticulous job. It’s so laborious that useful, reliable, scientifically secured information and knowledge hardly comes from studies done by some small security consultancies unless these studies are a spin-off of larger academic research endeavours. (Which is, of course, a not scientifically-based assumption, too. But anyhow.) If you want to have a look at a prime example of quantitative botnet analysis, download Hadi Asghari’s masterful Master thesis on “Botnet Mitigation and the Role of ISPs” finalised earlier this year. You need to go great length to secure your empirical findings, and the model of measuring botnet and spam activity of ISPs used in the thesis is currently top class. Another good example is a paper that my Delftian colleagues presented at a workshop at Harvard University in June: Van Eeten, M., Bauer, J., Asghari, H., Tabatabaie, S., & Rand, D. (2010). The role of internet service providers in botnet mitigation an empirical analysis based on spam data. (pdf)
When a researcher runs into a figure as peculiar as those eleven percent of global C&C servers allegedly being hosted by a single German ISP, you should start thinking seriously about your data sources, conceptual model and about interfering factors that might render your findings useless – or support it. Empirical findings need to be embedded in qualitative discourses – and vice versa – to be societally useful and help us understanding societal and technological complexities.
I asked Hadi whether he had some figures in his raw data set that could show how 1&1 actually performs botnet-wise compared to some other German ISPs. His data set isn’t quite designed for the task of building the numbers to answer which national ISP is best at anti-botnetting. But anyhow. I used the data to calculate a ratio of the number of unique spam sources over a year and the number of subscribers to the services of the network operator. Sounds like a reasonable approach to allow us comparing different ISPs, doesn’t it?
So, how is 1&1 doing in this playful number-crunching? Have a look at this chart, showing the ratio of unique sources of spam to subscriptions.
Doesn’t this, a day after you could read the headline that 1&1 was the top global botnet C&C server hoster, scream for another headline: “1&1 among the most botnet resilient ISPs worldwide”? That impression might, however, just as well be caused be a little error in organising or dealing with the data – or by using it for purposes it was not intended and originally used for. Hence, before you start blaming individual ISPs for allegedly being among the best or worst, consider the methodological complexities involved in building and interpreting statistical data. The literature mentioned above serves as a good showcase how this is done right.
Update: 30.10., 9:50: There were some misunderstandings on how to interpret the data listed above before. Hint: Certainly not as scientific, rigorously peer-reviewed findings suited to judge botnet-resilience on the level of ISPs. I used it to build the chart, which i used to build the main argument: be careful what you read into statistics and graphs.