The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Departments
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

XML Parsing with Python

April 29th, 2005  |  Published in Python, XML  |  1 Comment

Let’s face it, relational database types don’t like XML files. They’re structured, sure, but not in quite the way we’re used to. So pulling them apart is a chore for which there are many tools but few that seem to fit easily into the CAR mindset. Enter Python and the Amara toolkit. Amara builds on 4Suite, which processes XML and RDF, and it works in a very Pythonic way by essentially turning XML data into Python objects. If I have to parse XML into a relational database, Amara is my tool of choice.

As an example, consider the weather data available in XML from the National Weather Service. Here’s the file with information from San Francisco, which has data indicating the temperature (temp_f), wind direction (wind_dir) and relative humidity (relative_humidity), among other things. Amara “binds” the XML file to a Python structure, which means that you can refer to or call any of the XML elements as an object. Here’s an example using three of the NWS files:


from amara import binderytools
#note that amara requires 4suite to work
import urllib, string

base_url = 'http://weather.gov/data/current_obs/'
files = ['KSFO.xml', 'KTVL.xml', 'KMIA.xml']

for file in files:
    urllib.urlretrieve(base_url+file,’f.xml’)
    doc = binderytools.bind_file(’f.xml’)
    print doc.current_observation.location, doc.current_observation.weather, str(doc.current_observation.temp_f)+’\n’

For each of the XML files, Amara binds it to a Python object (doc) and then prints out three data elements, referring to them as attributes of the object (note that Amara can pick out some or all of the elements). CAR folks can think of it as calling field names, and instead of printing out elements you can insert them into a database. Nice and easy - the way everybody says XML should be. Here’s the output:

San Francisco Intl Airport, CA Partly Cloudy 64

South Lake Tahoe, CA Overcast 52

Miami International Airport, FL A Few Clouds 80

Responses

Feed Trackback Address
  1. Copia says:

    May 1st, 2005 at 10:24 am (#)

    Another on Amara

    One thing that I’ve especially appreciated about feedback on Amara is
    the way users cite it as an example of the essential power of Python,
    and why it is a draw from even outside of Python.

Leave a Response

Recent Comments

  • Scot Hacker on Six Reasons To Look Past Caspio
  • Dan D. Gutierrez on Six Reasons To Look Past Caspio
  • The AllYourtv.com Local News Blog » Six Reasons To Look Past Caspio on Six Reasons To Look Past Caspio
  • Justin Lilly on Six Reasons To Look Past Caspio
  • Derek on Six Reasons To Look Past Caspio

Recent Posts

  • The Hidden Appeal of GeoDjango
  • Six Reasons To Look Past Caspio
  • Fumblerooski
  • The Birth of Quadruplets, or Understanding the Process
  • DjangoCon


©2008 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.