The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Medill Links
  • Departments
    • API
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Code
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • Ruby
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

What APIs Mean for Data Journalists

March 6th, 2011  |  Published in Data, Journalism  |  1 Comment

Anthony DeBarros of USA Today and I talked about APIs at this year’s CAR conference in Raleigh. We got a lot of “Web people”, to use a lame expression, in the audience. If you’re a reporter who works with data, why should you care?

The simple answer is that APIs are an extension of what reporters do every day: ask questions. The difference is that instead of forcing reporters to gather data from multiple sources, format it to fit your local database needs and then update that database when new releases are available, APIs allow reporters to query live data from all over the Web. If you have experience working with, say, Microsoft Access and setting up an ODBC connection to a remote database, APIs are kind of like that – except that you have near-instant access to more sources of data, more useful tools (like geocoders) and more timely information than ever before.

My path working with data went something like this: spreadsheets came first, which I routinely describe as the “gateway drug” of computer-assisted reporting. Some people become such Excel wizards that it almost doesn’t make sense for them to move beyond that expertise; there is so much you can do in a spreadsheet that alone it would be worth the time to learn. But there were things about spreadsheets that annoyed and frustrated me. Pivot tables were a clumsy fit for me – they got me close to what I wanted in many instances but never quite there. And so I moved onto databases.

Databases are still one of my favorite things. They are powerful, relatively flexible and range in utility from the ultra-portable SQLite to the transactional goodness that is PostgreSQL. But they take time and effort to build, maintain and – perhaps most importantly in the long run – connect to additional sources of information. APIs are not a complete solution to these problems, but they provide a very good one that data journalists should be familiar with and consider incorporating into their work.

A simple example is the reporter who wants to track the votes of his or her state’s delegation in Congress. There are several APIs for this data, including the one I work on and another by OpenCongress. The reporter could build a database of these votes by hand or write scripts to parse the House and Senate vote data and insert them into it. But why, when the data is freely available via HTTP?

It can’t be that simple, can it? Well, no. But it can be simpler. The data you get from APIs usually comes in XML or JSON. Data journalists have, for better or worse, been dealing with XML for awhile now. JSON may be less familiar, but it is quite nice to deal with and there are plenty of libraries with which to do so. But even better than that is the fact that other people have already solved that problem for you. Not long after we released the NYT Congress API I noticed a Ruby client library for it on Github. I had never met the author; he had never contacted me. Just the same, he made it easier for people using Ruby to query the API and get back data. There’s also an excellent Python library for it, written by NPR’s Chris Amico.

Thus can you, the data journalist, benefit from other people who need and use APIs. Check out GovKit, a Ruby wrapper to multiple government and political APIs, created by the folks at the Participatory Politics Foundation. Go play with it, and figure out what sorts of things you can do when the number of data sources you’re able to tap into multiplies overnight. The possibilities for journalists are only limited by the kinds of questions we can imagine and try to answer. APIs can make it easier to act on that greatest of questions: What if?

Responses

Feed Trackback Address
  1. Das private Blog von Claudia Sommer » Lesenswertes zu Al Jazeera, BASF, Facebook, Data Journalism, NYT says:

    March 8th, 2011 at 2:27 am (#)

    [...] APIs haben uns das wundervolle Social Web gegeben, aber was bedeuten APIs für den Journalismus? Realtime: What APIs Mean for Data Journalists [...]

Leave a Response

Recent Comments

  • Seth Lewis on Lost in the Weeds
  • Reporters' Lab // News algorithms already exist – and that’s good on The Programmer-Reporter
  • Eric Mill on On Legislative Data Transparency
  • (19:19 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (15:05 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency

Recent Posts

  • Lost in the Weeds
  • Our Mark Knoller Problem
  • The Programmer-Reporter
  • Investigating House Freshmen Voting Patterns
  • On Legislative Data Transparency

Linking Out

  • Mapping America — Census Bureau 2005-9 American Community Survey - NYTimes.com
    holy crap
  • Backbone.js and Django | joshbohde.com
  • ProPublica
  • Geoff: GeoJSON Feature Functions for JavaScript
  • Introducing Spanner: From Documents to Linked Data Apps—Clark & Parsia: Thinking Clearly
  • A performance lesson on Django QuerySets | Seek Nuance
  • http://www.post-gazette.com/pg/03001/1108747-209.stm
  • CBC News - Canada - Database: Canadian cables in WikiLeaks
  • Federal prosecutors likely to keep jobs after cases collapse - USATODAY.com
  • Strata Gems: Explore and visualize graphs with Gephi - O'Reilly Radar


©2012 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.