The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Medill Links
  • Departments
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

The Fundamental Training Need

June 25th, 2009  |  Published in Journalism | Comments (7)

It’s good to see recent writings on the importance of training and skill development for journalists.

One of the common responses to such entreaties is exemplified in this comment, which includes this plea: “I understand the need to bolster one’s skill set. But what happened to the days when we actually, you know, worried about reporting rather than slavishly trying to master every piece of technology?”

If only that was the real problem.

The real problem is the way that we as journalists manage information, because that determines so much else: the kinds of stories we’re able to envision and construct, the amount of context we’re able to bring to bear in a short amount of time and our ability to connect the dots. In general, and this is my scientific conclusion, we suck at managing information.

That’s nothing new, you might say, and you’d be right. But what has changed is that a lot of the people and institutions we cover are now getting smarter about this stuff, and are using better tools to help them manage information. From tracking crime to measuring customer loyalty, the sophisticated use of information is a crucial factor in many modern activities. Us? We’re still knocking rocks around hoping to generate a spark.

I’m not knocking learning skills like how to maintain a blog. I’m just saying that if all we do is teach new tools and skills, we’re making the underlying problem harder to solve, not easier, because we’re just encouraging the production of even more separate and disconnected piles of information. More photos over here, more spreadsheets over there. We’re still drowning in information and we can’t figure out how to use it to our best advantage, like finding undiscovered patterns and coming up with definitive explanations instead of the ol’ three-person anecdote story.

So yeah, teach those CAR and multimedia skills. Have everybody Twitter. But please, let’s find a way to address the fact that for many journalists, Microsoft Word is the primary tool for organizing any and all kinds of information. Let’s make sure that our silos of content (text archives, photo archives, databases, etc.) can at least be made to talk to each other, if not naturally, then through APIs or metadata or something. And let’s start talking about how a news organization’s information belongs to the organization, not just to individual reporters and editors, and how our products could be so much better if we adhered to that principle before a story/photo/slideshow is published, not just after.

The Case Against Teaching Access

June 2nd, 2009  |  Published in Journalism, Teaching | Comments (11)

I’ve been at the Medill School of Journalism at Northwestern University since last week, talking to faculty members about using data management and analysis tools (spreadsheets, databases, mapping) in their courses. When they asked me to provide some training on Excel and Access, I agreed, but asked for the chance to make a case for teaching any database but Access to students. Specifically, I suggested that universities and training organizations like IRE teach SQLite, which has the advantages of being cross-platform and accessible via a Firefox add-on. My class this semester at George Washington University and my time here at Medill have only reinforced my conviction on this.

The Case for Access

Before I offer the case against using it, let’s look at why we would use Access to teach database concepts. First, it’s widely available as part of Microsoft’s popular Office suite. A lot of news organizations have it already installed, or can do so without much trouble. Second, it has a familiar look and feel for people who have used Excel - sorting and filtering work nearly the same way, for example - and imports and exports Excel files with ease. Third, the query grid that Access has a default makes it easy to get started on actually getting answers from your data.

That’s about it, as far as I’m concerned. I guess you could throw in the ability to generate reports and construct forms, but these are less of an advantage as web-based apps have become more popular and added features. In fact, the last reason I cited, the query grid, isn’t really an advantage at all, as I’ll explain below.

The Case Against Access

Access costs money. In SQLite, MySQL and PostgreSQL, there are superior database programs that are free and open-source. If you’re asking your students, many of whom may be buying Mac laptops, to get Access, you’re putting an additional burden on them. And if that’s all they know once they graduate and manage to land a job, if that place doesn’t have Access, they may need to get it (or have nothing at all).

The Access query grid hides the fact that underneath, Access runs SQL queries. So a user is able to construct and execute a SQL query without writing any SQL whatsoever. This is, imho, a bad thing, as it makes it possible to get results without actually knowing what you are doing. When we teach the query grid, we’re teaching behavior over understanding, or at the very least we’re allowing behavior to compete with understanding. And that doesn’t even begin to address the issue that the query grid doesn’t do everything that SQL can. In terms of teaching, this is critical; we’re not properly equipping students for the opportunities and challenges they could face.

Another issue is data portability: Access databases don’t support dumping to a .sql file, which is a great way to transfer SQL data without losing data types. Access does export to many formats, including Excel, CSV and XML, but the lack of SQL dump ability is a pain for transferring data. If you want to send somebody an Access database, you can either send them the entire file (providing they have Access installed), or you can export each of the tables and have them re-import them. And if you do email that .mdb (or now, .accdb) file, be warned that they do get quite big. To demonstrate this, I loaded the same three tables into Access 2007 and SQLite and the Access file was nearly 3 times the size of the SQLite database.

Finally, there’s the Web. Know many popular Web sites that run off an Access database? Me neither. If all you know is Access and not the underlying SQL concepts, your transition to a popular server software like MySQL is going to be more difficult. Sure, you say, but it’s better than nothing. But as far as the Web goes, Access is almost nothing itself. So why would you teach a program that has very little future on the Web - the platform of today and tomorrow?

The Case for SQLite

SQLite is my choice for the candidate to replace Access in journalism education. In addition to the advantages listed above, it’s also easy to “install.” If you can download files, unzip them and move them to a location on your hard drive, you can “install” SQLite. If you can install a Firefox add-on, you can manage it in the browser. And you can take your database files home with you or email them around. The add-on supports importing CSV files, SQL dumps and XML (although all databases can have issues with importing XML). It looks and works the same on a PC or a Mac. Most importantly, it demands an understanding of SQL that you can avoid when learning Access.

When I first learned SQL at an IRE bootcamp, we were using FoxPro and we learned how to type the SQL commands. That knowledge only becomes more valuable as you learn the limits and possibilities of SQL. Journalism educators and trainers should commit to teaching SQL on the broadest platform possible and with an emphasis on the syntax and meaning of the language itself, not on which buttons to click. Otherwise we risk sending students out into this new journalism world even less-prepared to handle data intelligently, and I don’t think we can afford that.

No, Really, Show Us The Data

March 25th, 2009  |  Published in FOIA, Fed Data | Comments (8)

When it first appeared I was really excited to see Show Us The Data, which gave visitors a chance to list and vote for their “Most Requested Documents” that should be more readily available from the federal government. Sure enough, there were plenty of strong choices for the top 10 list. And then people starting voting, and the results were not quite what I had hoped to see. Yes, the items that comprise the Top 10 List (irony alert! it’s a PDF) are worthy documents, but some of them (the Supreme Court website?) reflect a lack of familiarity with the government information that’s truly buried.

What follows is my entirely subjective, data-heavy and document-light version. It’s Congress-heavy, because the executive branch has done much, much better in many ways. No, really, show us the data:

  1. Congressional committee votes. As far as I know, only commercial companies like CQ possess this information in the aggregate. Most committees publish them in committee reports (House Judiciary is one of the better ones) without a standard format and in such a way as to make their gathering prohibitively expensive. And yet these are some of the most telling public actions lawmakers make.
  2. Earmarks. If you don’t think the Appropriations Committees have a database of earmarks, you’re naive. Of course they do - it’s valuable information. Now, about sharing it in anything but an image PDF format… well, let’s just say that Keith Ashdown and the folks at Taxpayers for Common Sense probably aren’t going away soon.
  3. Foreign Travel Reports (Codels). The House publishes PDFs and text files of this data, but they are formatted for reading, not analysis. It would not be hard to change this.
  4. Legal Defense Funds. It’s utterly ridiculous that while House members now file their campaign reports electronically, legal defense fund reports are still filed on paper. This is a no-brainer.
  5. Senate Votes in XML. Go ahead, view source on this page. See where the HTML comment says “****** vote_111_1_00110.xml … “? They already generate these files; but the public can’t have them. They’re only for the use of Senators. There’s absolutely no reason the Senate cannot join the House in doing this, so why won’t they? Update: they have!
  6. Senior Executive Service. This one is particularly egregious, in that the information on senior-level political appointees in the executive branch previously was made available in database-friendly formats, but now is only available via PDF. So OPM chose to make the information less useful.
  7. High-Level Diplomatic Visits. Another “I can’t believe it’s not a database” entry. The State Department offers a list of visits by foreign leaders and lists of visits by the president and secretary.
  8. The CIA World Factbook. Oh, you can download the PDF, but (and I am not making this up): “the search software resides on our server and cannot be distributed with the World Factbook.” Thanks!

That’s eight, and I can already think of some more. What’s on your list? Actual federal data, please, as opposed to documents that are valuable for their full-text content. I’m sure I’m missing some that should be on here.

Come Join Django-District!

March 18th, 2009  |  Published in django | Comments (0)

Thanks to the tireless efforts of Rich Leland, there’s now a django-district user group and mail list. Django users in the DC-area, please sign up and participate as we start to set a schedule for meetings and other events. Rich’s employer, Discovery Communications in Silver Spring, MD, will be hosting group meetings, which is fantastic news. The first meeting should be sometime after PyCon, perhaps in mid-April.

Announcing Python-NYTCongressAPI

March 1st, 2009  |  Published in Python | Comments (3)

So we’ve got this Congress API at The Times, which already has spawned a Ruby shell by Patrick Ewing (a shell being a really neat idea, as well as useful in building apps). I decided to try and get my feet wet with writing a Python library to interact with the API, too, and the initial result is now on Github.

It’s very basic, and owes much to the example put forth by James Turk’s python wrapper for the Sunlight API. At the suggestion of Sean Gillies, I’ve made the calls more in the style of Django’s ORM by using get() and filter() functions. So far, I’ve got functions for members and roll call votes, and I’ll be adding committees soon. Please feel free to fork the code or send suggestions my way.

Previously


Jun 2, 2009
The Case Against Teaching Access

by Derek | Read | 11 Comments

I’ve been at the Medill School of Journalism at Northwestern University since last week, talking to faculty members about using data management and analysis tools (spreadsheets, databases, mapping) in their courses. When they asked me to provide some training on Excel and Access, I agreed, but asked for the chance to make a case for [...]


Mar 25, 2009
No, Really, Show Us The Data

by Derek | Read | 8 Comments

When it first appeared I was really excited to see Show Us The Data, which gave visitors a chance to list and vote for their “Most Requested Documents” that should be more readily available from the federal government. Sure enough, there were plenty of strong choices for the top 10 list. And then people starting [...]


Mar 18, 2009
Come Join Django-District!

by Derek | Read | No Comments

Thanks to the tireless efforts of Rich Leland, there’s now a django-district user group and mail list. Django users in the DC-area, please sign up and participate as we start to set a schedule for meetings and other events. Rich’s employer, Discovery Communications in Silver Spring, MD, will be hosting group meetings, which is fantastic [...]


Mar 1, 2009
Announcing Python-NYTCongressAPI

by Derek | Read | 3 Comments

So we’ve got this Congress API at The Times, which already has spawned a Ruby shell by Patrick Ewing (a shell being a really neat idea, as well as useful in building apps). I decided to try and get my feet wet with writing a Python library to interact with the API, too, and the [...]


Feb 24, 2009
Django-NYC

by Derek | Read | 8 Comments

Last Tuesday, my colleague Andrei Scheinkman and I were invited to talk about one of our projects at The Times, Represent, before a group of New York City-area Django developers. Unlike similar user groups that are more broadly centered on Python, this collection (about 25-30 people attended) were focused on using Django, so it was [...]


Dec 19, 2008
Represent and GeoDjango

by Derek | Read | 7 Comments

For more details on Represent, see our post on the NYT’s Open blog.
Today (well, technically last night) at work we launched a beta Web application called Represent that helps New York City residents keep tabs on what their elected officials are doing. It’s the product of an idea that my colleague Andrei Scheinkman suggested when [...]

About The Scoop

Derek Willis’ weblog on investigative and computer-assisted reporting.

Recent Comments

  • Links for today | Links para hoje « O Lago | The Lake on The Fundamental Training Need
  • Andy on The Fundamental Training Need
  • sandra fish on The Fundamental Training Need
  • links for 2009-06-27 – Innovation in College Media on The Fundamental Training Need
  • Mindy McAdams on The Fundamental Training Need

Recent Posts

  • The Fundamental Training Need
  • The Case Against Teaching Access
  • No, Really, Show Us The Data
  • Come Join Django-District!
  • Announcing Python-NYTCongressAPI

Contributors

  • Derek
  • Matt

Popular

  • Methadone Overdose Deaths
  • The Times
  • On Bomb-Throwing
  • Outsourcing Database Development, or the Caspio Issue
  • Joyce Meyer Ministry Compensation
  • Trial By Caspio
  • The Original (and Future?) Facebook
  • Django, iCal and vObject
  • Six Reasons To Look Past Caspio
  • The Case Against Teaching Access
  • Around the Site

    • Home
    • About
    • Projects
    • Fixing Journalism
    • Database of CAR Stories
  • Methods

    • Fanueil Media
    • Open
    • Institute for Analytic Journalism
    • CAR in Canada
    • IRE
    • MacDevCenter
    • ONLamp.com
    • Planet MySQL
    • Poynter
    • Resource Shelf
  • People

    • Mark Schaver
    • Jeremy Zawodny
    • Liz Donovan
    • Shannan Bowen
    • Matt Wynn
    • Chase Davis
    • Adrian Holovaty
    • Joe Adams
    • Matt Waite
    • Mike Hillyer
    • Mark Hamilton
    • William P. Hartnett


  • ©2009 The Scoop
    Powered by WordPress using the Gridline Lite theme by Graph Paper Press.