The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Departments
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

Object-Oriented CAR

October 31st, 2007  |  Published in Car Tools  |  4 Comments

As Adrian points out below, this might be more properly titled Data Normalization and CAR.

Like a lot of CAR folks, I got my start working with rows in tables. For a given story or project, you might need one database table or 20, but basically you “designed” the database - if you had that option - with the idea of getting the best results for the story. Queries were saved, mostly, and the data was kept separate since it made sense to have data associated with the project or story you were working on. The result was directories full of standalone tables or database servers with silos of data that had few obvious ways to connect them.

Working with the Web made me change the way I think about data, and for the better. I’ll stipulate upfront that I’ve never been the most organized person on the planet, but once I wrapped my head around the principles of object-oriented design, the implications for CAR folks were profound. It helped that I’ve been using a Web framework built around OOP principles, too, which is yet another reason why you should learn how this stuff works.

Consider a collection of political candidates, something news organizations routinely compile. What you want to know about a candidate can depend upon what office he or she is seeking and when. That usually means that news organizations end up dividing candidates into tables based on geography or office - local candidates versus statewide or national candidates, for example. But when you have separate tables created over time, their columns might change in ways that make it harder to use the entire set of candidate data, in case you want to track candidacies over time and jurisdiction.

Using OOP principles, you might have a Candidate class, which has certain characteristics (name, date of birth, hometown). Election-specific candidate information could be represented by a subclass of Candidate, say CityCandidate. CityCandidate could add additional characteristics that aren’t relevant to candidates for the House or Senate, for example. But the underlying information common to all candidates is kept in the same format, since it is inherited from the original Candidate class. This means that no matter how many types of candidates you have, all of the common data is stored in the same way.

In a similar way, OOP (and data normalization) encourages you to design a database in which objects, rather than rows, are valuable. So when you’re designing that candidates table for the first time, you might not think to make the election itself a separate table linked by a foreign key since you’re probably only concerned with a single election. OOP principles help prod you to make your database design as accommodating as it can be without sacrificing usefulness. So adding additional elections means adding new election objects, not reworking your candidates table (or worse, creating another table that’s similar).

Most of the CAR folks I know who have dabbled with object-oriented programming languages like Python or Ruby initially were fairly lost, and I definitely include myself in that statement. It didn’t make a lot of sense, mostly because it wasn’t the way that we were used to thinking about databases. But once you clear that conceptual hurdle, OOP opens up a whole new world to you and encourages database design that is useful for the current project and beyond.

Responses

Feed Trackback Address
  1. Adrian Holovaty says:

    October 31st, 2007 at 1:28 pm (#)

    I think what you’re describing here is more basic data normalization (the third normal form) than object-oriented programming, per se.

  2. Derek says:

    October 31st, 2007 at 1:58 pm (#)

    See, I knew I’d expose my relative understanding of this! And I hope to write a little more this as I get more experience.

    For me, the advantages of normalization became easier to appreciate when viewed through the prism of objects rather than rows. A lot of what I learned about building large databases made normalization seem a distant and not always desirable concept.

  3. Peter Harkins says:

    October 31st, 2007 at 4:52 pm (#)

    There’s also the related issue of object-relational impedance. Objects only mostly map to tables. I think this guy may be on the right track… well, I know he certainly is when it comes to working with the current Django ORM that doesn’t do inheritance at all. ActiveRecord is a little better, though its Single Table Inheritance scheme is not exactly a ten-gallon bucket of awesome.

    I’ve been meaning for a few months to check out sqlalchemy, I wonder how it deals with the issue.

  4. Matthew Wensing says:

    November 5th, 2007 at 9:24 am (#)

    Keep in mind: normalization is well and good, but it necessarily detriments legibility (to some degree). Hence, when confronted with the massive quantities of data that flow into a newsroom, I believe you should also consider constructing a database that is more of a data warehouse than a paragon of normalization–particularly if the database is going to be consumed internally for years to come.

    Just my thoughts. Interesting post.

Leave a Response

Recent Comments

  • First step in bringing change: find the believers |  on Six Reasons To Look Past Caspio
  • Ed on Joyce Meyer Ministry Compensation
  • Tim on Represent and GeoDjango
  • Tim on Represent and GeoDjango
  • Strange Attractor » Blog Archive » links for 2008-12-23 on Represent and GeoDjango

Recent Posts

  • Represent and GeoDjango
  • Liz Donovan, News Researcher
  • Deploying Django with Fabric
  • Even More Fumblerooski
  • White House Beat Feature Request


©2009 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.