1&1, Gamballa, botnets, and quantitave internet security research  28.10.10

As mentioned the other day, security provider Gamballa released a study stating that some 11% of global botnet command&control servers were hosted by 1&1 Internet AG. Heise, presumably Germany’s most influential IT related news portal, brought the story, mostly citing the findings of the study. 1&1 was not amused about the journalistic performance. The flaws (de) in Gamballa’s study have been quickly uncovered by Thorsten Kraft of 1&1‘s Anti-Abuse team, which is closely linked to the consumer-focussed German Anti-Botnet advisory centre. Heise released another article explaining the flaws in the Gamballa report, and Gamballa has rightly taken its analysis down. The underlying lapse, according the reports linked above, was that Gamballa had allegedly added both ordinary, non-infected infrastructure servers and sinkhole and honeypot machines to the list of C&C server.

Quantitative analysis of computer security incidents is a terrifically meticulous job. It’s so laborious that useful, reliable, scientifically secured information and knowledge hardly comes from studies done by some small security consultancies unless these studies are a spin-off of larger academic research endeavours. (Which is, of course, a not scientifically-based assumption, too. But anyhow.) If you want to have a look at a prime example of quantitative botnet analysis, download Hadi Asghari’s masterful Master thesis on “Botnet Mitigation and the Role of ISPs” finalised earlier this year. You need to go great length to secure your empirical findings, and the model of measuring botnet and spam activity of ISPs used in the thesis is currently top class. Another good example is a paper that my Delftian colleagues presented at a workshop at Harvard University in June: Van Eeten, M., Bauer, J., Asghari, H., Tabatabaie, S., & Rand, D. (2010). The role of internet service providers in botnet mitigation an empirical analysis based on spam data. (pdf)

When a researcher runs into a figure as peculiar as those eleven percent of global C&C servers allegedly being hosted by a single German ISP, you should start thinking seriously about your data sources, conceptual model and about interfering factors that might render your findings useless – or support it. Empirical findings need to be embedded in qualitative discourses – and vice versa – to be societally useful and help us understanding societal and technological complexities.

I asked Hadi whether he had some figures in his raw data set that could show how 1&1 actually performs botnet-wise compared to some other German ISPs. His data set isn’t quite designed for the task of building the numbers to answer which national ISP is best at anti-botnetting. But anyhow. I used the data to calculate a ratio of the number of unique spam sources over a year and the number of subscribers to the services of the network operator. Sounds like a reasonable approach to allow us comparing different ISPs, doesn’t it?

So, how is 1&1 doing in this playful number-crunching? Have a look at this chart, showing the ratio of unique sources of spam to subscriptions.

ratio-unique-spam-sources-subscriptions.png

Doesn’t this, a day after you could read the headline that 1&1 was the top global botnet C&C server hoster, scream for another headline: “1&1 among the most botnet resilient ISPs worldwide”? That impression might, however, just as well be caused be a little error in organising or dealing with the data – or by using it for purposes it was not intended and originally used for. Hence, before you start blaming individual ISPs for allegedly being among the best or worst, consider the methodological complexities involved in building and interpreting statistical data. The literature mentioned above serves as a good showcase how this is done right.

Update: 30.10., 9:50: There were some misunderstandings on how to interpret the data listed above before. Hint: Certainly not as scientific, rigorously peer-reviewed findings suited to judge botnet-resilience on the level of ISPs. I used it to build the chart, which i used to build the main argument: be careful what you read into statistics and graphs.

5 Comments on “1&1, Gamballa, botnets, and quantitave internet security research”

  1. 1 Alfred Mabuse said at 11:25 on October 29th, 2010:

    Hi,

    obviously, there must be something wrong with this statistic, too.
    If the SPAM percentage of any one of the big players actually was lower than one thirtieth of the next best we would conclude
    – this player should never be found on _any_ blacklist
    – the abuse-department is acting over-zealous and actively harassing customers with less then perfect newsletters.
    More realistically, while I have no doubt that the data itself is correct, the presentation is at least a bit biased.
    First of all 1&1 access-(dial-up,dsl, …)-customers are not listed in their own AS, but in those of the reselling partners (Deutsche Telekom, O2 and probably others). On which side are they included in the number of subscribers?
    Secondly, the calculation of “unique SPAM sources” is technically difficult and error prone. How can one be sure not to “fall” for every fake “received” header of the spammers while at the same time include all “unique” sources that are routed through the same mail-system?
    Given these difficulties, I guess the numbers would become more meaningful if all “access-spam” would be disregarded completely — some don’t even regard it as “real” SPAM as it is obvious and easily filtered.

    All in all this statistic serves more to show what can be done with seemingly objective numbers — in which I regard it as a great addition to the discussion — than it does tell us anything about how spammy one business is in comparison to another.

  2. 2 Andreas Schmidt said at 13:08 on October 29th, 2010:

    That is one of the points I intended to make here. Do statistical graphs represent social and technological realities in a useful way? Given the graph shown above, you could very well think that 1&1 can now legitimately be called the ISP that is best in dealing with bots (I’m not talking about CnC server, that’s a different story). The difference between 1&1 and other ISPs is so striking that I would like to see these graphs be backed up (or falsified) by more qualitative analysis before stating that 1&1 puts its competitors and the whole remaining industry to shame in anti-botnetting. And furthermore, the graph just divides the number of unique spam sources by the number of subscribers. The question you raised whether this is a fair and appropriate way to represent the anti-botnet performance of an ISP is a legitimate one.

    The research this data was used for didn’t only cover Germany and compared German ISPs, but a good deal more countries and their ISPs. Hence, the focus for the statistical model used in the study might not have been to ensure fair comparison of anti-botnet performance among distinct ISPs in one country. Your input may help to adopt the existing model and take all the points into consideration you’ve mentioned above. If your criticism is valid. Let’s go to the details.

    (The following arguments are based on a conversation I just had with Hadi Asghari, who is currently attending a workshop an therefore can’t comment here.)

    The question about where 1&1 access customers are listed is a tough one, indeed, and needs to be answered. It is nevertheless correct that at this point an ASN is not broken down on the national level. The simple reason for this is that such detailed IP-range information was not available. If you have such information and are willing to provide it, the stats will happily be recalculated.

    The identification of spam sources is based on logged IP addresses of the spam connection on the spam-trap. Hence, faking headers will not change the results. In addition, spam coming through the mail system is counted. But in fact, when looking at bot infections, access spam might be more relevant.

  3. 3 Damballa zieht offensichtlich fehlerhafte Statistik zurück | 1&1 Blog said at 14:54 on October 29th, 2010:

    […] der Identifizierung infizierter Systeme betreiben wir einen sehr großen Aufwand und arbeiten hier mit diversen, namhaften Botnetz-Experten zusammen. Sobald wir einen betroffenen […]

  4. 4 Alfred Mabuse said at 14:54 on October 29th, 2010:

    Hi,
    I’m more than happy to fully agree with you on here.
    The point we both wanted to raise was the subjectivity of statistics. I just felt compelled to comment on the original article as it appeared to me more as an afterthought than the main point.
    Getting to hard facts as to how botnet-resilient or not an ISP is will be very hard in a landscape where we have to deal with resale etc. The most honest thing to do when concentrating on Access-Spam would be to remove the big resellers from the list and add an annotation to list as well as the most active resale-backends.
    The problem is worsened by the fact that the resale-backend-providers will happily assign the same IP to one of their own customers and a resale customer within 24h — impossible to separate those at any stage in the process of data collection and analysis.
    When trying to distinguish ordinary spam from botnet-spam the notion of unique spam sources seems (as used in the above statistic) to be highly relevant.
    So please disregard anything I wrote about mailsystems as I was referring to the fact that each of them may hide an unknown number of unique spam sources (in a broader sense).

  5. 5 Andreas Schmidt said at 19:13 on October 29th, 2010:

    Thanks for your very insightful comments, Alfred. They’ll be quite helpful for future research and it sounds like it’s going to be an interesting challenge to build a bullet-proof model of botnet-resilience on the level of individual ISPs.