Crowdsourcing of political investigation? The problem of web-based ad-hoc collaboration  2.12.09

A couple of days ago, I mentioned Wikileaks‘ scoop of leaking the apparently horrid contracts between the Federal Republic of Germany and Toll Collect, a joint-venture of Daimler-Chrysler, Deutsche Telekom and Cofiroute.

When Germany’s leading webpolitics site brought the message (“Toll Collect wird offen”), its leading brain Markus Beckedahl asked his broad and usually helpful audience how, with which tools and techniques some 10,000 pages of contract papers could collaboratively be analyzed to quickly find the rascalities that everyone was expecting to find there. I was split on whether this could work out or not, whether such a task is suited for social ad-hoc collaboration or not.

Back in 2004, I was working with a

small team of consultants for an ICT provider that was about to louse up an e-government project and thus wanted external expertise to learn what was going wrong. After days interviewing key persons, on-site inspections and analysis of key-documents it was obvious that the ICT provider had developed a prototype that simply didn’t meet the specifications of its clients. Worse, no one actually knew exactly which features should have been implemented in the first place.

It turned out that the contractual basis for the project consisted of a dozen of substantially different contracts between the ICT provider on the one hand and distinct German bundesländer (federal states) or groups of them on the other. As no one had thoroughly read the contracts before, the ICT provider had developed for two years and implemented functions they assumed they had to develop. Just to know how deeply they were in trouble, some several thousands of pages of contract paper had to be reviewed very rapidly. On the one hand, you have to dive deep into the text to understand it, but you also have to get an overview to get into the complexities of such set of contracts—a task you simply can’t split up and delegate to several persons. On the other hand, some tasks were handed over to trainees. They were gathered in a small lab, got copies of text analysis software installed on their desktops and created series of reports and text extracts. Those more senior cared about the overall strategy and the big picture.

In a sense, crowdsourcing is similar has similar characteristics. It is a mode of production that invokes a coordinating center and supportive helpers. The poster-child of web-based collaborative production, Wikipedia, is steered by the Wikimedia Foundation, a small organisation with 34 employees and $5.6 million turnover (or expenses) per year. Analogue foundations are set up for regional Wikipedias in more countries all over the world. Tens of thousands of contributors are coordinated by this central organisation and its national siblings. Notwithstanding this centre, Wikipedia’s contents are more the result of more egalitarian modes of production than created in a crowdsourced mode. But what is crowdsourcing then actually?

Jeff Howe, who allegedly came up first came up with the term “crowdsourcing” with his 2006 Wired article and acts as its evangelist ever since, has two definitions of crowdsourcing: a “White Paper Version: Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” And then a “Soundbyte Version: The application of Open Source principles to fields outside of software.” (Jeff Howe’s Blog) I would prefer a more generic definition that doesn’t see crowdsourcing just as the activity of outsourcing to the crowds, but as a distinct mode of production that is characterised by a designing and controlling center and the production at the edge by “an undefined, generally large group of people in the form of an open call” (ibd).

A prominent example of political ad-hoc crowdsourcing was launched earlier this year by the Guardian. The British daily had within a few days set up an web-based system, which enabled interested users to participate in a distributed analysis of MPs’ filed expenses. Users could contribute to the overall effort of reviewing 457,000 pages in total, and select and review a few of them. The designers of the crowdsourced solution admitted later on that “keeping up the interest is hard” ( Too hard, obviously. Hardly 50% of all the documents online have been reviewed by the reading electorate. (Guardian) Bild 5.png

It is helpful to know the limits and potentials of crowdsourcing. The recent debates about Wikipedia (Wall Street Journal) point at generic problems of social production: accuracy, breadth and reliability. In a sense, any organisational form has to struggle with these targets, yet crowdsourced production models are especially prone to run into difficulties with these organisational targets. For products to be reliable, modern production techniques comprehend quality management, training and certified qualifications—nothing a person working for free and for fun is too keen on. At commercial organisations, breadth of service offering is guaranteed by economic interests of service providers—more services, higher revenues, higher profits. For crowdsourced endeavours however, breadth of service offering implies more unpaid, yet somewhat differently compensated work. Wikipedia has to go on road-shows to sell work to others. This is not an option for smaller projects. Guardian’s crowdsourced expenses intelligence system seems to have stalled as the respective discourse vanished from news headlines.

Another approach to crowdsourcing involves payments for the work of the amateurs, as can be seen on websites such as iStockphoto. Crowdsourcing here creates “distributed labor networks [that] are using the Internet to exploit the spare processing power of millions of human brains” {Howe 2006}. Mechanical Turks so to speak, according to Amazon. This is the name for a “Human intelligence tasks” (HIT) brokerage website owned by Amazon. (Mechanical Turk) The seekers for these Mechanical Turks have “access to a global, on-demand, 24 x 7 workforce” that only gets paid “when you’re satisfied with the results”. A capitalist’s dream come true. According to Jeff Howe a “network of passionate, geeky volunteers could write code just as well as the highly paid developers at Microsoft or Sun Microsystems”.

The the underlying principle of crowdsourcing is “to connect with brainpower outside the company”. By R&D crowdsourcing, businesses can find people who could assist them in developing products and decrease time-to-market. {Howe 2006} On Innocentive, so called solution seekers “pay solvers anywhere from $10,000 to $100,000 per solution”. Many of these solvers allegedly are hobbyists or undergraduate student. One of Howe’s interviewees stated, “We have 9,000 people on our R&D staff and up to 1.5 million researchers working through our external networks”. An R&D managers dream come true.

Now, as the mode of crowdsourced production has been around for a few years, it is used in a range of markets. Anjali Ramachandran of London-based consultancy Many By Many has set up a wiki that enlists the types of businesses that currently make use of this mode of production. She categorizes them into four groups: “1. Individual businesses or sites that channel the power of online crowds 2. Brand-sponsored initiatives or forums that depend on crowdsourcing. I’ve included those that are no longer active as well, for reference. 3. Brand initiatives that allow users to customise their products, 4. Brand-sponsored competitions/challenges focussed on crowdsourcing”.

But what about crowdsourcing in politics? The ideal of a democracy is quite the opposite of a sheep market—a felicitous word for crowdsourcing coined by artist Aaron Koblin, who used Amazon’s Mechanical Turk in one of his art projects. (Interview with Koblin in Wired) sheep_market.jpg

Mary Joyce has summed up the problems of applying crowdsourcing in politics or political activism. The definitional key of crowdsourcing is, “the task is defined at the center, produced at the edge”. (digiactive)

To come back to Wikileaks, Toll Collect and the call for collaborative contract analysis by Such a thing wouldn’t turn out to be a crowdsourced net activism. While some nodes in bottom-up political networks will be more influential as others, none of them will be so influential to become the node that controls all the process and chops up the project into small chunks for the masses, into HITs. Or, to use the analogy of Aaron Koblin, to turn net freedom activists into sheep. A differentiator between crowdsourcing and peer production is the frequency and intensity of relations among the smaller nodes. Wikipedia’s problem might be that they have morphed from a peer-to-peer production model to crowdsourcing. And, by the way, it’s peer production, not crowdsourcing that is going to have an impact on existing political institutions.

Readers of came up with only a few suggestions how this massive contractual framework could be collectively analyzed. The aforementioned Guardian solution (presentation of the developer on technical details) was mentioned. Another approach is, a web platform that provides an API, the ability to comment and tag certain text passages of plenary session protocols of the German Bundestag.

A third user recommended to just use “grep”, the Unix command line tool to search text files. But nothing was ready to go. A day or two later, journalist Detlef Borchers, a notorious critic of German eGovernment projects gone wrong, had already published an article with key statements of the contracts. At such complex tasks, nothing beats a dedicated professional with attitude. As someone who makes a living with selling his computational brain cycles, I’m relieved. But where does it leave social ad-hoc investigation? Is there still some collaborative analysis going on in this matter? Maybe peer-produced net politics just needs more time to develop more effective tools and techniques.

Update 4.12.2009

“Given enough eyeballs, corruption and waste are similarly shallow problems.” (Brito, J. 2008. Hack, mash & peer: Crowdsourcing government transparency. Colum. Sci. & Tech. L. Rev. 9:119-122.

Comments are closed.