The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Medill Links
  • Departments
    • API
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Code
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • Ruby
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

Buying Into Computational Journalism

November 9th, 2009  |  Published in Car Tools, Journalism  |  8 Comments

Update: Duke’s Sarah Cohen responds in the comments.

The intriguing title of a recent report from scholars at Duke is “Accountability Through Algorithm: Developing the Field of Computational Journalism“. Semi-related to CAR, Computational Journalism is defined as “the combination of algorithms, data, and knowledge from the social sciences to supplement the accountability function of journalism.” I take each of those – algorithms, data and knowledge from the social sciences – as separate elements, because while journalists do have plenty to learn from the social sciences, we also operate in an environment that is not quite academic (and sometimes not at all).

The report identifies four areas of potential exploration: techniques for data transformation and pattern discovery in investigative reporting; a digital “dashboard” for journalists; new social and technical structures for interactions among readers and reporters; and sense-making advances from other disciplines. All are interesting and worthy, but to me the first two are particularly so.

On the first, the best investigative journalists have been developing tools for extracting meaning from reams of information for years. The change now is that we have a greater platform for these tools in the Internet, and an effort like DocumentCloud is a clear example of that change. The challenge we face is that patterns are interesting to different people for different reasons; what an accountant finds interesting may not always be of interest to a journalist, and vice versa. The current deficit is not in the area of tools; it is the occasionally trickier area of adapting those for the task of journalism. That requires the guiding influence of people like Sarah Cohen, a newly minted Knight Chair at Duke, who is studying these issues right now. But it also requires the active participation of a wide range of news organizations and journalists. In the Internet, we have a leveling platform, but only if more journalists participate. That may be a greater challenge than the technical one.

One way to get there is the second idea – a journalist’s dashboard. This would provide reporters with a way to keep track of the deluge of information coming into newsrooms. But again, the technological side of that equation, as difficult as it is, is less of a concern to me than the implementation and adoption of the results. We know how to gather various bits of information in one place. We’re not that good at distilling the best of them, or even knowing where to start. The good news is that we have blueprints for this kind of thing: the people and companies who make great Web apps that distill masses of data into understandable results. The bad news is that we, as a business, work very differently. We don’t really share much, outside of experiences at conferences or over drinks, and particularly not at the institutional level. And we’re downright awful, in general, at adapting good ideas for our own uses.

For the idea of Computational Journalism to work, a lot is riding on a movement that is slowly growing but urgently necessary for the news industry: the increasing adoption, use and proliferation of open-source tools. The CAR community has seen an influx of use of various types of open-source software, from databases to GIS systems to web frameworks. More and more reporters and editors are embracing different styles of journalism. But the broader concept of opening up our newsrooms, both philosophically and in terms of our content and efforts, has been slow in coming. It requires not just the creation of tools, but also the development of journalists and readers who will use those tools most effectively. And that’s more than an algorithm – to say nothing of Twitter – can solve alone.

Oh, and Duke folks? Can we get a version of that report that embraces the Web as much as the concept? HTML will do fine.

Responses

Feed Trackback Address
  1. Ben says:

    November 10th, 2009 at 12:00 am (#)

    Sounds awesome. I’m down. Where’s the github repo?

  2. Derek says:

    November 10th, 2009 at 12:55 pm (#)

    That’s a good question – we don’t know much about how the Duke folks are setting up a place to see what they’re doing. I hope they get in the habit of releasing early and often.

  3. Jeremy says:

    November 11th, 2009 at 1:18 am (#)

    I, for one, am more looking forward to the sense-making tools aspect, which the report unjustly separating from pattern discovery. I see too many journalists around me talking about searching data sources for the handful of useful nuggets within, but the value is in making sense of the whole thing, which means grown-up no-black-and-white-answers statistics if you take this seriously. It’s as if everyone were trying to dig a water well in the middle of a lake.

  4. Derek gets me thinking about computational journalism says:

    November 12th, 2009 at 1:15 am (#)

    [...] Some thoughts that come to mind while (and after) reading Derek Willis’s post, Buying Into Computational Journalism. [...]

  5. Sarah says:

    November 13th, 2009 at 12:07 pm (#)

    Hey, guys. The github repo will come, I swear.

    We’re really just in start-up phase right now. We hope to address exactly the point that Derek makes: the basic technology already exists (and has for quite a while) to help accountability and investigative reporters do their jobs better and quicker. The problem is that it hasn’t been completely integrated into the newsgathering process or made accessible to normal reporters who have more responsibilities now than ever. So our job is to make things they can use without compromising their skill at uncovering important information that the powerful would rather keep hidden and at comparing how programs actually work to how they are supposed to work.

    We are triaging the best ideas that I’ve gleaned from more than 50 interviews with editors, reporters, bloggers, developers and other folks doing similar work including DocCloud, Sunlight and Brant’s work at University of Illinois. You saw a few examples of them in the report. I’d love further ideas from the folks who hang out here and pointers to great projects in the wild. The goal is to complement, not repeat or compete with, other efforts.

    But rest assured, everything we work on will be released both as free (as in beer and as in speech) tools and as open source code. The platforms will depend on the developers we partner with on various projects and will, I assume, vary — we want to work reasonably quickly and not be tied to an integrated “system”.

    We’re already working with one company (thinkrelevance.com), which has agreed to sponsor a summer project for some of our ideas in a rapid development environment.

    To Jeremy’s point: I’m not the lead on the sense-making aspect of this, but it’s worth remembering that stories are different than studies. I do think we can borrow and adapt visualizations and other techniques from others — accountants, lawyers, doctors as well as social scientists and digital humanities — to help us to understanding our material more quickly and accurately.

    Keep in touch, and thanks for commenting on the report. Send ideas and thoughts my way.

    Sarah Cohen
    Duke University
    sarah.cohen@duke.edu
    919.613.7348 (work)
    202.213.6980 (mobile)

  6. Cody Brown says:

    November 15th, 2009 at 11:24 pm (#)

    ‘Computational Journalism’

    This is an interesting way of describing what’s about to become a major trend.

    I like where Sarah’s thoughts are going but I think she’s, at least now, bogged down by a trustee models of news production. The giveaway is her repeated use of the word ‘reader’ to describe those who aren’t paid to be journalists. Instead of focusing on creating products for a failing model of news organization (trustee style news room) – start from scratch. Study and deconstruct examples of participatory information systems that have already found wide success (wikipedia).

    -cody

    http://www.kommons.com

  7. Red Links 16/11/09 : Alexia Golez says:

    November 16th, 2009 at 8:25 am (#)

    [...] On computational journalism. [...]

  8. NewzBeta – Journalism 2.0 says:

    December 14th, 2009 at 11:45 pm (#)

    [...] to Derek Willis, who blogs at The Scoop, a weblog about computer-assisted reporting, the problem is not about the availability of such [...]

Leave a Response

Recent Comments

  • Eric Mill on On Legislative Data Transparency
  • (19:19 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (15:05 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (02:33 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • Malcolm Tredinnick on On Legislative Data Transparency

Recent Posts

  • On Legislative Data Transparency
  • What We Don’t Know About Elections
  • RemoteTable Is Your Friend
  • Measuring Vocabulary Richness (or, Trying Out Django on Heroku)
  • In Defense of Building Tools

Linking Out

  • Mapping America — Census Bureau 2005-9 American Community Survey - NYTimes.com
    holy crap
  • Backbone.js and Django | joshbohde.com
  • ProPublica
  • Geoff: GeoJSON Feature Functions for JavaScript
  • Introducing Spanner: From Documents to Linked Data Apps—Clark & Parsia: Thinking Clearly
  • A performance lesson on Django QuerySets | Seek Nuance
  • http://www.post-gazette.com/pg/03001/1108747-209.stm
  • CBC News - Canada - Database: Canadian cables in WikiLeaks
  • Federal prosecutors likely to keep jobs after cases collapse - USATODAY.com
  • Strata Gems: Explore and visualize graphs with Gephi - O'Reilly Radar


©2012 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.