The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Medill Links
  • Departments
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

The Annotated Archive

Third in a series of essays humbly titled “Fixing Journalism“. First posted on May 20, 2005

I love archives. As a kid, my favorite books were reference books, the kind that had baseball stats for every team in every year since 1903. My mother, an English teacher, probably wondered what she had done wrong when I opted for lists over literature.

So I was a pretty happy fellow when I first walked into the news library at the Palm Beach Post in May 1995 as an intern. Here was a place loaded with archives, not just in the physical sense but also electronic, where I could search for the proverbial needle in a haystack – and sometimes find it.

So I love archives. But you know what I’d really love?

An annotated archive. An archive that doesn’t just display vertical depth going back years but can show relationships between archived items and the individuals and institutions named within them. An archive that can help find connections easier and can help new or unfamiliar users get up to speed quickly.

Now this isn’t a knock on current archives; they are indispensable tools for research. But imagine if we could create and grow an archive that provided supplemental information and showed networks between topics, individuals and organizations.

For example, if you have a story on a controversial new development, the annotated archive could have links within the story to information researchers have about the company or its executives. A list of articles that mention a local businessman would be a single click away, along with other important information.

This isn’t impossible; in fact, the tools exist to build a rudimentary example of the idea right now. And unlike my previous essays, for this one I’ve tried to come up with a small example of what an annotated archive might look like.

The tool I chose is the MediaWiki software that powers Wikipedia. It’s free and open-source software, easy to install and administer. The reason I picked MediaWiki is that many folks have heard of Wikipedia and might be able to better envision how a newspaper archive with Wiki principles might work.

Now, onto my example archive. It’s not much – only a few articles that I wrote or co-wrote while I was at Congressional Quarterly. The concept is pretty simple: each article in the archive can contain one or more internal references (in addition to external references) to other pages, which may contain other stories, information about a person, company or organization, or other notes.

So, for example, this story includes an internal link to Tom DeLay, the House Majority Leader. That page currently has links to external Web sites, but it could contain internal-only notes, contact information or pointers to documents or other references.

On each page, there’s a link on the lower left called “What links here,” which pulls up each of the other pages in the archive that reference that page. With the proper tending and weeding, the usefulness of such a feature only grows, illuminating connections that even beat reporters may not have realized.

Of course that’s a best case scenario. Such an annotated archive requires regular maintenance – an archive editor, basically – who must be a news generalist and yet recognize key people and organizations within the community. Those people are inside our newsrooms – we must tap their knowledge in a better way.

One of the best features of an annotated archive is its flexibility. It can grow to incorporate new topics and can be updated when a familiar subject suddenly takes on a new importance. This doesn’t have to be done with Wiki software, either; that’s just an easy way to demonstrate the principles. We have access to such a range of technologies that can enable us to think about our information in new and better ways.

Users are already thinking about these issues. Consider The Annotated Times, which essentially tracks New York Times articles by topic and author, and even provides custom RSS feeds so that users can track articles by reporter. Or Chicago Crime, which takes crime reports from the Chicago Police Department and makes them available in ways that are interesting to readers. We need to treat our archives as our most valuable data, because it is.

Archives have long been both an internal resource and an external product for newspapers. They still are, but we don’t have to treat them as the same thing. The archive that newspapers provide to their employees should be more valuable than the one they sell to vendors, and the newsroom is the place that can and must make it happen.

Recent Comments

  • Derek on 2010 CAR Conference
  • Michal Migurski on 2010 CAR Conference
  • Walt Matchim on Installing Instiki on Windows
  • Tyrone on Outsourcing Database Development, or the Caspio Issue
  • Danny Sanchez on Lightning Talks at NICAR

Recent Posts

  • 2010 CAR Conference
  • A Gentle Introduction to Google App Engine
  • Lightning Talks at NICAR
  • Fumblerooski and Raw SQL in Django
  • Using Geocoders with GeoDjango


©2010 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.