The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Medill Links
  • Departments
    • API
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Code
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • Ruby
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

XML Parsing with Python

April 29th, 2005  |  Published in Python, XML  |  1 Comment

Let’s face it, relational database types don’t like XML files. They’re structured, sure, but not in quite the way we’re used to. So pulling them apart is a chore for which there are many tools but few that seem to fit easily into the CAR mindset. Enter Python and the Amara toolkit. Amara builds on 4Suite, which processes XML and RDF, and it works in a very Pythonic way by essentially turning XML data into Python objects. If I have to parse XML into a relational database, Amara is my tool of choice.

As an example, consider the weather data available in XML from the National Weather Service. Here’s the file with information from San Francisco, which has data indicating the temperature (temp_f), wind direction (wind_dir) and relative humidity (relative_humidity), among other things. Amara “binds” the XML file to a Python structure, which means that you can refer to or call any of the XML elements as an object. Here’s an example using three of the NWS files:


from amara import binderytools
#note that amara requires 4suite to work
import urllib, string

base_url = 'http://weather.gov/data/current_obs/'
files = ['KSFO.xml', 'KTVL.xml', 'KMIA.xml']

for file in files:
    urllib.urlretrieve(base_url+file,'f.xml')
    doc = binderytools.bind_file('f.xml')
    print doc.current_observation.location, doc.current_observation.weather, str(doc.current_observation.temp_f)+'\n'

For each of the XML files, Amara binds it to a Python object (doc) and then prints out three data elements, referring to them as attributes of the object (note that Amara can pick out some or all of the elements). CAR folks can think of it as calling field names, and instead of printing out elements you can insert them into a database. Nice and easy – the way everybody says XML should be. Here’s the output:

San Francisco Intl Airport, CA Partly Cloudy 64

South Lake Tahoe, CA Overcast 52

Miami International Airport, FL A Few Clouds 80

Responses

Feed Trackback Address
  1. Copia says:

    May 1st, 2005 at 10:24 am (#)

    Another on Amara

    One thing that I’ve especially appreciated about feedback on Amara is
    the way users cite it as an example of the essential power of Python,
    and why it is a draw from even outside of Python.

Leave a Response

Recent Comments

  • Eric Mill on On Legislative Data Transparency
  • (19:19 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (15:05 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (02:33 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • Malcolm Tredinnick on On Legislative Data Transparency

Recent Posts

  • On Legislative Data Transparency
  • What We Don’t Know About Elections
  • RemoteTable Is Your Friend
  • Measuring Vocabulary Richness (or, Trying Out Django on Heroku)
  • In Defense of Building Tools

Linking Out

  • Mapping America — Census Bureau 2005-9 American Community Survey - NYTimes.com
    holy crap
  • Backbone.js and Django | joshbohde.com
  • ProPublica
  • Geoff: GeoJSON Feature Functions for JavaScript
  • Introducing Spanner: From Documents to Linked Data Apps—Clark & Parsia: Thinking Clearly
  • A performance lesson on Django QuerySets | Seek Nuance
  • http://www.post-gazette.com/pg/03001/1108747-209.stm
  • CBC News - Canada - Database: Canadian cables in WikiLeaks
  • Federal prosecutors likely to keep jobs after cases collapse - USATODAY.com
  • Strata Gems: Explore and visualize graphs with Gephi - O'Reilly Radar


©2012 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.