XML Parsing with Python
April 29th, 2005 | Published in Python, XML | 1 Comment
Let’s face it, relational database types don’t like XML files. They’re structured, sure, but not in quite the way we’re used to. So pulling them apart is a chore for which there are many tools but few that seem to fit easily into the CAR mindset. Enter Python and the Amara toolkit. Amara builds on 4Suite, which processes XML and RDF, and it works in a very Pythonic way by essentially turning XML data into Python objects. If I have to parse XML into a relational database, Amara is my tool of choice.
As an example, consider the weather data available in XML from the National Weather Service. Here’s the file with information from San Francisco, which has data indicating the temperature (temp_f), wind direction (wind_dir) and relative humidity (relative_humidity), among other things. Amara “binds” the XML file to a Python structure, which means that you can refer to or call any of the XML elements as an object. Here’s an example using three of the NWS files:
from amara import binderytools
#note that amara requires 4suite to work
import urllib, string
base_url = 'http://weather.gov/data/current_obs/'
files = ['KSFO.xml', 'KTVL.xml', 'KMIA.xml']
for file in files:
urllib.urlretrieve(base_url+file,’f.xml’)
doc = binderytools.bind_file(’f.xml’)
print doc.current_observation.location, doc.current_observation.weather, str(doc.current_observation.temp_f)+’\n’
For each of the XML files, Amara binds it to a Python object (doc) and then prints out three data elements, referring to them as attributes of the object (note that Amara can pick out some or all of the elements). CAR folks can think of it as calling field names, and instead of printing out elements you can insert them into a database. Nice and easy - the way everybody says XML should be. Here’s the output:
San Francisco Intl Airport, CA Partly Cloudy 64
South Lake Tahoe, CA Overcast 52
Miami International Airport, FL A Few Clouds 80
May 1st, 2005 at 10:24 am (#)
Another on Amara
One thing that I’ve especially appreciated about feedback on Amara is
the way users cite it as an example of the essential power of Python,
and why it is a draw from even outside of Python.