The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Medill Links
  • Departments
    • API
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Code
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • Ruby
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

Using RSS in the Newsroom

Using RSS in the Newsroom
Derek Willis
The Washington Post
NICAR 2006
(a copy of this handout can be found at http://www.thescoop.org/projects/irenicar/)

What is RSS?

RSS is a syndication format for delivering content over the Web. Text, pictures, audio, video, whatever. It’s all enclosed in XML. That means you can turn it into data. Or you can scrape HTML or other formats into RSS to push it out to the newsroom.

Learning About RSS

Technical definition and spec can be found at Harvard. (Pay attention to required elements)

Turning RSS into Data

I recommend Feed Parser, a Python module. Feed Parser turns RSS elements into Python objects, which can be inserted into SQL databases.

Some other options:
Perl: http://www.petercooper.co.uk/archives/000995.html
Perl: http://tageloehner.de/rss2sql.pl.txt
PHP: http://kynikeren.com/tech/category/feedparser/

Turning HTML into RSS – an example

FEC press releases. This Python script scrapes HTML from the FEC web site and builds an RSS feed.

Turning SQL into RSS


RSSonate

Another example: Delaware legislation signed into law (feed). The following Python code, using Feed Parser, inserts it into a MySQL database:


"""
delaws.py - an example of turning RSS into SQL data.
Derek Willis, The Washington Post
dwillis@gmail.com
March 2006

This script turns an RSS feed into a MySQL database table, using the feedparser module to select specific attributes within the feed.

"""

# import required modules
import feedparser, MySQLdb

# set up connection to MySQL database
db = MySQLdb.connect(host='host',user='user',passwd='pass',db='dbname')
cursor=db.cursor()

# fetch feed and turn into feedparser object
d = feedparser.parse('http://www.legis.state.de.us/LIS/LIS143.NSF/GovSignedFeed.xml')

# determine number of entries in feed, which is used in processing loop
x = len(d.entries)

# processing loop - for each entry, selects certain attributes and inserts them into a MySQL table. Also converts the RSS entry date into a MySQL date

for i in range(x):
    d2 = d.entries[i].date_parsed
    d3 = str(d2[0])+'-'+str(d2[1])+'-'+str(d2[2])
    cursor.execute("""INSERT INTO delaws (bill, date, description, url) VALUES (%s,%s,%s,%s)""", (d.entries[i].title, d3, d.entries[i].description, d.entries[i].link))

# commit insertions to table (optional)
db.commit()

Recent Comments

  • Eric Mill on On Legislative Data Transparency
  • (19:19 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (15:05 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • (02:33 06-02-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre on On Legislative Data Transparency
  • Malcolm Tredinnick on On Legislative Data Transparency

Recent Posts

  • On Legislative Data Transparency
  • What We Don’t Know About Elections
  • RemoteTable Is Your Friend
  • Measuring Vocabulary Richness (or, Trying Out Django on Heroku)
  • In Defense of Building Tools

Linking Out

  • Mapping America — Census Bureau 2005-9 American Community Survey - NYTimes.com
    holy crap
  • Backbone.js and Django | joshbohde.com
  • ProPublica
  • Geoff: GeoJSON Feature Functions for JavaScript
  • Introducing Spanner: From Documents to Linked Data Apps—Clark & Parsia: Thinking Clearly
  • A performance lesson on Django QuerySets | Seek Nuance
  • http://www.post-gazette.com/pg/03001/1108747-209.stm
  • CBC News - Canada - Database: Canadian cables in WikiLeaks
  • Federal prosecutors likely to keep jobs after cases collapse - USATODAY.com
  • Strata Gems: Explore and visualize graphs with Gephi - O'Reilly Radar


©2012 The Scoop
Powered by WordPress using the Gridline Lite theme by Graph Paper Press.