<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Scoop</title>
	<atom:link href="http://blog.thescoop.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.thescoop.org</link>
	<description>Derek Willis' weblog on investigative and computer-assisted reporting.</description>
	<lastBuildDate>Mon, 08 Mar 2010 01:24:06 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>2010 CAR Conference</title>
		<link>http://blog.thescoop.org/archives/2010/03/07/2010-car-conference/</link>
		<comments>http://blog.thescoop.org/archives/2010/03/07/2010-car-conference/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 01:24:06 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[IRE]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5325</guid>
		<description><![CDATA[The 2010 CAR Conference begins on Thursday, and here are some of the sessions I&#8217;m trying not to miss:
Thursday, March 11

Big Data: Analyzing legislation with machine learning. Always good to hear what Chase Davis has been up to.
Open Source GIS. Now that mapping is more and more accessible, it pays to stay on top of [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://data.nicar.org/CAR2010">2010 CAR Conference</a> begins on Thursday, and here are some of the sessions I&#8217;m trying not to miss:</p>
<p><strong>Thursday, March 11</strong></p>
<ul>
<li><a href="http://data.nicar.org/node/3757">Big Data: Analyzing legislation with machine learning</a>. Always good to hear what Chase Davis has been up to.</li>
<li><a href="http://data.nicar.org/node/3759">Open Source GIS</a>. Now that mapping is more and more accessible, it pays to stay on top of what people are using.</li>
<li><a href="http://data.nicar.org/node/3763">Juice up your stories with advanced methods</a>. After a few months in the academy, Sarah Cohen should have some good stuff to share.</li>
<li><a href="http://data.nicar.org/node/3766">Some lightning talks!</a></li>
</ul>
<p><strong>Friday, March 12</strong></p>
<ul>
<li><a href="http://data.nicar.org/node/3893">Semantic tagging and DocumentCloud</a>. Really need to get more in-depth on this.</li>
<li><a href="http://data.nicar.org/node/3773">New Frontiers in Reporting Tools</a>. More of the New New Stuff.</li>
<li><a href="http://data.nicar.org/node/3778">Forensic Accounting for Reporters</a>. Nice to have some outside expertise.</li>
</ul>
<p><strong>Saturday, March 13</strong></p>
<ul>
<li><a href="http://data.nicar.org/node/3787">GeoDjango &#038; OpenLayers</a>. Yes, I&#8217;m on this panel. But you should come anyway to hear from <a href="http://www.palewire.com">Ben Welsh</a> of the LA Times.</li>
<li><a href="http://data.nicar.org/node/3902">Not a programmer? Not a worry</a>. The new software from ProPublica that helps publish data on the Web. Very interested to see this in action.</li>
</ul>
<p>There&#8217;s also the Django bootcamp and plenty of opportunities for demos, discussions and debates. Hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/03/07/2010-car-conference/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Gentle Introduction to Google App Engine</title>
		<link>http://blog.thescoop.org/archives/2010/02/23/a-gentle-introduction-to-google-app-engine/</link>
		<comments>http://blog.thescoop.org/archives/2010/02/23/a-gentle-introduction-to-google-app-engine/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 20:41:08 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5300</guid>
		<description><![CDATA[As part of our roll-out of version 3 of the NYT Congress API, I was tasked with coming up with a sample application that uses the API to do something mildly interesting, or at least functional. I had gotten a book on Google App Engine for my birthday and was pretty excited to see that [...]]]></description>
			<content:encoded><![CDATA[<p>As part of our roll-out of version 3 of the <a href="http://developer.nytimes.com/docs/congress_api">NYT Congress API</a>, I was tasked with coming up with a sample application that uses the API to do something mildly interesting, or at least functional. I had gotten a <a href="http://oreilly.com/catalog/9780596522735">book</a> on <a href="http://code.google.com/appengine/">Google App Engine</a> for my birthday and was pretty excited to see that some of the basic philosophies of <a href="http://www.djangoproject.com/">Django</a> were either incorporated directly into GAE or were easy to adapt to it. So when I started on the sample app, I picked GAE and dove in.</p>
<p><a href="http://code.google.com/appengine/docs/python/overview.html">App Engine&#8217;s Python runtime</a>, unsurprisingly, sticks pretty close to the language&#8217;s core tenets: it uses YAML files for configuration (hey, it&#8217;s whitespace!) and can run pretty much an entire app using just 2-3 files. A NYT colleague, Derek Gottfrid, built a <a href="http://github.com/derekg/nyt-trender">sample app for our article search API</a> comprising five files, including the README. Yes, it violates the separation of logic and design that most frameworks try to respect, but it works.</p>
<p>GAE provides the basic building blocks a lot of Web apps need, nearly all optional: a backend in Datastore, a URL Fetch service that is wrapped by Python&#8217;s familiar urllib and urllib2 libraries, mail and messaging services and memcache. <a href="http://code.google.com/appengine/docs/python/tools/webapp/">Webapp</a> is a basic framework for building apps not exactly like Django but not so unfamiliar, either.</p>
<p>The <a href="http://code.google.com/appengine/docs/python/tools/devserver.html">development server</a> will be familiar to anyone who has tinkered with Django, and GAE handles static files via separate servers, which is how it should be. And since it comes with a version of Django built-in, you can bring along some handy utilities, like simplejson, with a single import statement. And as I said earlier, you don&#8217;t have to separate display logic into template files, but you can, and the syntax is nearly identical to Django templates.</p>
<p><a href="http://nytcongress.appspot.com/">The sample app</a> takes two random members of the Senate and compares their voting and bill sponsorship records in the 111th Congress. <a href="http://github.com/dwillis/NYT-Congress-API-Demo">The app&#8217;s code</a> is like the app itself: fairly tightly-focused and without a lot of trappings. It&#8217;s just service calls to the API and a single template for display. In building it, I didn&#8217;t make use of any persistent storage, so I didn&#8217;t delve into <a href="http://code.google.com/appengine/docs/python/datastore/">Datastore</a>, but it looks pretty useful. One of its helpful features is that as you develop your app, it generates indexes used to help return data in the most efficient manner.</p>
<p>If you&#8217;re already familiar with Django, making the small step to App Engine isn&#8217;t that big of a trip. <a href="http://www.youtube.com/watch?v=v1gTI4BOPUw">Have Guido explain things to you first</a>, and then try it out. You can also <a href="http://code.google.com/appengine/articles/appengine_helper_for_django.html">run a stripped-down version of Django on GAE</a>, and I&#8217;m looking to see if there&#8217;s a project I can adapt to try it out. In the meantime, if you want to tinker with the sample app, by all means fork it and see what else you can do with the API. And let me know what you come up with!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/02/23/a-gentle-introduction-to-google-app-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lightning Talks at NICAR</title>
		<link>http://blog.thescoop.org/archives/2010/02/18/lightning-talks-at-nicar/</link>
		<comments>http://blog.thescoop.org/archives/2010/02/18/lightning-talks-at-nicar/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 22:08:04 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[IRE]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5315</guid>
		<description><![CDATA[This year&#8217;s computer-assisted reporting conference in Phoenix has a couple of new sessions on the schedule. One of them is an idea a couple of us have been pushing for a few years: lightning talks.
A staple of technical conferences, lightning talks are based on the notion that while 45-50 minutes presentations are good, sometimes you [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s <a href="http://data.nicar.org/CAR2010/">computer-assisted reporting conference in Phoenix</a> has a couple of new sessions on the schedule. <a href="http://data.nicar.org/node/3766">One of them</a> is an idea a couple of us have been pushing for a few years: <a href="http://en.wikipedia.org/wiki/Lightning_Talk">lightning talks</a>.</p>
<p>A staple of technical conferences, lightning talks are based on the notion that while 45-50 minutes presentations are good, sometimes you only need about 5 minutes to express an idea or show off an example of something interesting or useful. And I&#8217;m pretty sure, based on sessions at the hotel bar at past conferences, that there are plenty of ideas out there. Attracting and organizing them is the goal of <a href="http://ire.aronpilhofer.com/">this app</a> built by my boss, <a href="http://aronpilhofer.com/">Aron Pilhofer</a>, and launched today.</p>
<p>This is a new thing for the CAR community, but I think it could be a useful way to draw out ideas from people who aren&#8217;t doing panels or who might shrink from a 50-minute session. So let&#8217;s get started. <a href="http://ire.aronpilhofer.com/">Want to give a talk?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/02/18/lightning-talks-at-nicar/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Fumblerooski and Raw SQL in Django</title>
		<link>http://blog.thescoop.org/archives/2010/02/15/fumblerooski-and-raw-sql-in-django/</link>
		<comments>http://blog.thescoop.org/archives/2010/02/15/fumblerooski-and-raw-sql-in-django/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 13:52:30 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5304</guid>
		<description><![CDATA[Over the weekend I was able to make some upgrades to Fumblerooski (the off-season being a good time for that), and to do so I took advantage of some of the new features in Django&#8217;s development version (with the 1.2 release being just around the corner).
(Yes, I know I shouldn&#8217;t be running beta software in [...]]]></description>
			<content:encoded><![CDATA[<p>Over the weekend I was able to make some upgrades to <a href="http://fumblerooski.org/">Fumblerooski</a> (the off-season being a good time for that), and to do so I took advantage of some of the new features in <a href="http://www.djangoproject.com/">Django</a>&#8217;s development version (<a href="http://www.djangoproject.com/weblog/2010/feb/06/12-beta-1/">with the 1.2 release being just around the corner</a>).</p>
<p>(Yes, I know I shouldn&#8217;t be running beta software in production, but even I have to say that Fumblerooski isn&#8217;t &#8220;mission-critical&#8221; software. Not yet, at least.)</p>
<p>Awhile back, <a href="http://twitter.com/jeffself">Jeff Self</a> suggested that he&#8217;d like to see head-to-head records for head coaches &#8211; how one person has fared against another. Me too, I thought. But how best to do that? In Fumblerooski&#8217;s models, it&#8217;s not a simple query, since a Coach has many CoachingJobs and CollegeCoach assignments. Plus the Game model refers to the individual College, not a specific head coach. So  I <a href="http://github.com/dwillis/fumblerooski/commit/4ce2948cc0f9dbac135ca0e75c1127d4054c3471">added head coaches to the Game model and some utilities to populate those fields</a>. That work continues, especially for people who were head coaches earlier in the decade but are not now. That made it easy to grab games in which <a href="http://fumblerooski.org/coaches/detail/80-mack-brown/vs/32-bob-stoops/">two specific head coaches faced off</a>.</p>
<p>But to create a list of head coaches that Mack Brown has faced since 2000, I turned to one of Django newest features: <a href="http://docs.djangoproject.com/en/dev/topics/db/sql/#performing-raw-queries">improved raw SQL querying</a>. Much like <a href="http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002267">ActiveRecord&#8217;s find_by_sql</a> method for Rails, Model.objects.raw() makes it easy to write custom SQL when you need to, and still be able to access instantiated objects as the result.</p>
<p>You can select <a href="http://docs.djangoproject.com/en/dev/topics/db/sql/#deferring-model-fields">only those fields you need</a>, deferring others, or <a href="http://docs.djangoproject.com/en/dev/topics/db/sql/#passing-parameters-into-raw">pass parameters into the SQL</a>. And you can add additional calculated fields, too. So when I needed to pull in a QuerySet of distinct coaches that Mack Brown had faced, plus the number of games, I could do something like this:<br />
<code><br />
coach = Coach.objects.get(last_name='Brown', first_name='Mack')<br />
coach_list = Coach.objects.raw(<br />
"SELECT college_coach.id, college_coach.slug, count(college_game.*) as games<br />
from college_coach inner join college_game on college_coach.id = college_game.coach2_id<br />
where coach1_id = %s group by 1,2 order by 3 desc", [coach.id])<br />
</code><br />
And thus you can see <a href="http://fumblerooski.org/coaches/detail/80-mack-brown/vs/">head coaching opponents for Mack Brown</a> (or another coach). The idea isn&#8217;t new, but I really like the implementation &#8211; it&#8217;s clean and makes the mixing of Python code and SQL about as seamless as it could possibly be.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/02/15/fumblerooski-and-raw-sql-in-django/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Using Geocoders with GeoDjango</title>
		<link>http://blog.thescoop.org/archives/2010/01/24/using-geocoders-with-geodjango/</link>
		<comments>http://blog.thescoop.org/archives/2010/01/24/using-geocoders-with-geodjango/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 01:10:21 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5288</guid>
		<description><![CDATA[Update: Simon has updated his library to make it easy to reverse the order of coordinates. Thanks!
For a &#8220;15-minute project&#8220;, Simon Willison&#8217;s geocoders library is pretty handy if you&#8217;re doing geocoding with Python. It offers a common interface to the geocoding services provided by Google, Yahoo and other sources. When we were looking at replacing [...]]]></description>
			<content:encoded><![CDATA[<p><em>Update: Simon has <a href="http://github.com/simonw/geocoders/commit/46fc00bda4cff0fe222d38fe498aa9ba861e8dc1">updated his library</a> to make it easy to reverse the order of coordinates. Thanks!</em></p>
<p>For a &#8220;<a href="http://simonwillison.net/2009/May/27/geocoders/">15-minute project</a>&#8220;, <a href="http://github.com/simonw/geocoders">Simon Willison&#8217;s geocoders library</a> is pretty handy if you&#8217;re doing geocoding with Python. It offers a common interface to the geocoding services provided by Google, Yahoo and other sources. When we were looking at replacing the home-grown geocoding system that <a href="http://andreischeinkman.com/">Andrei Scheinkman</a> built for <a href="http://prototype.nytimes.com/represent/">Represent</a>, Simon&#8217;s project seemed a natural choice.</p>
<p>It was an easy drop-in, but there was one thing about it that was just slightly off. A successful geocoding result looks like this:</p>
<p><code>(u'New York, NY, USA', (40.756053999999999, -73.986951000000005))</code></p>
<p>Notice the coordinate pair is latitude, longitude. For folks using GeoDjango alongside Simon&#8217;s library, the way you build a <a href="http://geodjango.org/docs/geos.html#geometry-objects">Point object</a> from coordinates is to pass the longitude first, like so:<br />
<code><br />
>>> from django.contrib.gis.geos import Point<br />
>>> p = Point((5, 23)) # 2D point, passed in as a tuple<br />
</code><br />
So on Friday I <a href="http://github.com/dwillis/geocoders">forked Simon&#8217;s project</a> and reversed the ordering of the coordinates in a successful result. That way you can pass that portion of the result directly to a Point constructor:<br />
<code><br />
>>> from django.contrib.gis.geos import *<br />
>>> from geocoders.google import geocoder<br />
>>> geocode = geocoder('GOOGLE-API-KEY')<br />
>>> results = geocode('new york')<br />
(u'New York, NY, USA', (-73.986951000000005, 40.756053999999999))<br />
>>> pnt = Point(results[1])<br />
</code><br />
Not a huge deal, but in keeping with the spirit of library, I think.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/01/24/using-geocoders-with-geodjango/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Gift of Data</title>
		<link>http://blog.thescoop.org/archives/2009/12/25/the-gift-of-data/</link>
		<comments>http://blog.thescoop.org/archives/2009/12/25/the-gift-of-data/#comments</comments>
		<pubDate>Fri, 25 Dec 2009 22:15:08 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Data]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5280</guid>
		<description><![CDATA[Readers aren't dumb; they want to know what we know, and they know that the web makes it possible for us to share with them at a national and even local level. The level of commitment and effort that we put into responding to their need for relevant and meaningful information will go a long way towards building a better relationship with them.]]></description>
			<content:encoded><![CDATA[<p>One of the more challenging and interesting projects at work lately has been the work we&#8217;ve done on the &#8220;<a href="http://projects.nytimes.com/toxic-waters">Toxic Waters</a>&#8221; series by Charles Duhigg. Since the stories have explored water quality throughout the United States, the web component accompanying some of the stories have been national in scope as well. You can&#8217;t provide locally relevant information for a mass audience in a story, even one of Timesian length.</p>
<p>That constraint &#8211; which still exists on the web, albeit it less so than for print publications &#8211;  makes it easier to justify working with hundreds of thousands or millions of rows of data to build an interface that allows readers to find out about <a href="http://projects.nytimes.com/toxic-waters/polluters">polluters</a> or <a href="http://projects.nytimes.com/toxic-waters/contaminants">drinking water systems</a> close to them.</p>
<p>These kinds of apps aren&#8217;t easy; the hardest part is the interface, for which <a href="http://www.tysonevans.com/">Tyson Evans</a> and <a href="http://maps.grammata.com/">Matt Bloch</a> deserve the credit for the water series. But the difference between those apps and the &#8220;<a href="http://www.mattwaite.com/posts/2008/jan/03/data-ghettos/">data ghetto</a>&#8221; kind of app that provides a search box and not much else is more than time and talent. In some cases, it&#8217;s less a &#8220;nice-to-have&#8221; than a must-have. Earlier this month, Duhigg and Griff Palmer wrote a <a href="http://www.nytimes.com/2009/12/08/business/energy-environment/08water.html">powerful piece showing that millions of Americans are drinking contaminated water</a>. A great story, but check out the <a href="http://community.nytimes.com/comments/www.nytimes.com/2009/12/08/business/energy-environment/08water.html?sort=oldest">first dozen or so comments by readers</a>. Half of them asked for a list of drinking water systems or a map or some other way to find out if their local systems were providing unhealthy water. That app accompanied <a href="http://www.nytimes.com/2009/12/17/us/17water.html">the story published later in December</a>.</p>
<p>Readers aren&#8217;t dumb; they want to know what we know, and they know that the web makes it possible for us to share with them at a national and even local level. The level of commitment and effort that we put into responding to their need for relevant and meaningful information will go a long way towards building a better relationship with them. The kind of web application that provides a summary of millions of records requires a different approach from, say, a lookup of government employees. But that doesn&#8217;t necessarily mean <em>less</em> effort and thought, just a willingness to treat both in a manner that respects the unique characteristics of each. So as we are in the season of gift-giving, all of us &#8211; myself included &#8211; need to think more about what we&#8217;re really giving to our readers when we post data on the web.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2009/12/25/the-gift-of-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Future of IRE Training</title>
		<link>http://blog.thescoop.org/archives/2009/11/29/the-future-of-ire-training/</link>
		<comments>http://blog.thescoop.org/archives/2009/11/29/the-future-of-ire-training/#comments</comments>
		<pubDate>Mon, 30 Nov 2009 02:38:06 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[IRE]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5272</guid>
		<description><![CDATA[Anyone in journalism who knows me knows how much of a debt I owe to an organization called Investigative Reporters and Editors. Sure, I liked playing with data before I found out about IRE, but the knowledge and support that I&#8217;ve received from IRE training, conferences and members has been the single most positive influence [...]]]></description>
			<content:encoded><![CDATA[<p>Anyone in journalism who knows me knows how much of a debt I owe to an organization called <a href="http://www.ire.org/">Investigative Reporters and Editors</a>. Sure, I liked playing with data before I found out about IRE, but the knowledge and support that I&#8217;ve received from IRE training, conferences and members has been the single most positive influence on my career.</p>
<p>The trouble is that IRE is a non-profit organization tied to an industry that increasingly has cut back on spending for training, travel and other &#8220;luxuries&#8221;. So while attendance at this year&#8217;s <a href="http://data.nicar.org/2009IREconference">IRE conference in Baltimore</a> was very strong, a lot of folks there were paying their own way. Same deal for the annual <a href="http://www.ire.org/training/conference/indy09/">computer-assisted reporting conference in Indianapolis</a>. That&#8217;s simply not sustainable, given IRE&#8217;s current orientation towards providing hands-on training and data services to news organizations. My IRE membership will be pried from my cold, dead hands (and you should <a href="http://www.ire.org/join/">join</a>, too), but attracting new members and offering them the kinds of training and services they&#8217;ll need will be increasingly difficult.</p>
<p>If you ask me, IRE needs to reorient its <a href="http://www.ire.org/training/">training</a> and <a href="http://www.ire.org/resourcecenter/">service</a> offerings to take advantage of the distributed nature of the Web and the broadening of acts of journalism. Yes, hands-on training must continue, but do trainers need to travel all the time? What about video-based training? Yes, IRE should collect the best expertise of its members, but the age of the tipsheet alone is gone. We have so many other options: screencasts, podcasts, YouTube &#8211; hell, even Google Wave &#8211; to deliver the kind of knowledge that is the lifeblood of IRE.</p>
<p>Crucial to that effort is the recognition that as the potential base of members and users of IRE services expands, so too the need for individual training modules. Yes, IRE should still offer a 5-day bootcamp on computer-assisted reporting. But it also should offer half-day refreshers on SQL, or 10-minute screencasts on a useful Excel function. Look at what <a href="http://peepcode.com/">PeepCode</a> does &#8211; and I don&#8217;t think IRE would need to have such high production values to be valuable &#8211; and you&#8217;ve got an idea of what I&#8217;m talking about. IRE members are some of the leading experts in journalism on subjects such as the Census, mapping and various obscure datasets. Yet the only option for purchasing audio from conferences is all or nothing.</p>
<p>Similarly, the Resource Center needs a good update. IRE books, which are tremendously useful but can have a short shelf-life, need to be sold in print and revised online to remain attractive and lower the costs of doing new editions. Future tipsheets should be digital-only, and categorized not just by keyword but also by speaker. If I want to overdose on <a href="http://content.usatoday.com/community/tags/reporter.aspx?id=505">Paul Overberg</a>&#8217;s Census material, I should be able to do that without searching (they should also be sold to journalism schools for classes). Nearly all training exercises should be available online &#8211; this would require greater standardization, but that&#8217;s not a terrible thing &#8211; and members or paid users should be able to schedule video/chat time with an IRE trainer or volunteer as a follow-up.</p>
<p>While we&#8217;re at it: <a href="http://data.nicar.org/uplink/">Uplink</a> is, well, I don&#8217;t think even IRE knows for sure, but it definitely isn&#8217;t working. Simplify it. Give it a narrow mission. Make it easier to find the expertise that currently is spread across email, blog posts, tipsheets and tutorials. The value of the organization lies in the ability of its staff and volunteers to intelligently organize and disseminate the unique and valuable information it has within it.</p>
<p>All of these things require changes to the way IRE currently works, and that&#8217;s the tough part; it&#8217;s hard to argue that IRE hasn&#8217;t been doing good work, and I am definitely not making that claim. What I&#8217;m saying is that as both the market for IRE&#8217;s services and the methods for delivering them undergo some significant changes, it&#8217;s time for IRE to meet those changes head-on.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2009/11/29/the-future-of-ire-training/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Keeping It Simple(r)</title>
		<link>http://blog.thescoop.org/archives/2009/11/25/keeping-it-simpler/</link>
		<comments>http://blog.thescoop.org/archives/2009/11/25/keeping-it-simpler/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 19:46:43 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5266</guid>
		<description><![CDATA[I haven&#8217;t mentioned Fumblerooski in awhile, but rest assured that work continues, especially during college football season. I&#8217;ve added more coaching information (still a long ways to go on that, though) and will be unveiling player rankings soon. But the biggest thing I&#8217;ve done lately has nothing to do with new features. Instead, as I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>I haven&#8217;t mentioned <a href="http://fumblerooski.org/">Fumblerooski</a> in awhile, but rest assured that work continues, especially during college football season. I&#8217;ve added <a href="http://fumblerooski.org/college/teams/pittsburgh/alums-in-coaching/">more coaching information</a> (still a long ways to go on that, though) and will be unveiling player rankings soon. But the biggest thing I&#8217;ve done lately has nothing to do with new features. Instead, as I&#8217;ve become a better coder in general, I&#8217;ve seen how bloat can really hinder a project. So I spent time last week reorganizing <a href="http://github.com/dwillis/fumblerooski">the Fumblerooski code</a> to take advantage of some of <a href="http://www.djangoproject.com/">Django</a>&#8217;s strengths.</p>
<p>This all started back at the <a href="http://www.ire.org/training/conference/indy09/">NICAR conference in Indianapolis</a> where several of us led a mini-bootcamp on using Django. At one point, as we talked about how projects are organized, I showed off the models for Fumblerooski. They went on forever. Looking back, it wasn&#8217;t the message that I wanted to get across &#8211; I think several people gasped.</p>
<p>Fumblerooski still is far more tightly coupled together than I&#8217;d like &#8211; the scrapers can&#8217;t really be separated out as an independent app, which would be <a href="http://www.youtube.com/watch?v=A-S0tqpPga4">the right thing to do</a>. But it&#8217;s getting closer. Same for the <a href="http://github.com/dwillis/fumblerooski/tree/master/rankings/">rankings</a> app. Coaches could be the next one, or maybe players. The <a href="http://github.com/dwillis/fumblerooski/tree/master/scrapers/">scrapers</a>, even though they don&#8217;t constitute an actual app, are better organized. The point is that now the code is probably easier for someone else to follow, but it&#8217;s also easier for me to locate specific functions. I spend less time hunting and more time actually doing things.</p>
<p>How does this actually work? <a href="http://docs.python.org/reference/simple_stmts.html#the-import-statement">Python&#8217;s ability to import</a> libraries into others means that Django apps can share common code (and, if you&#8217;re working in the same database, data) inside a single project just by typing an import statement:</p>
<p><code>from fumblerooski.rankings.models import RushingSummary</code></p>
<p>And I get access to the rushing rankings wherever I need to use them. Because this is so trivial, it sometimes led me to think that where I put things didn&#8217;t matter. But it does, it really does, for your sake and the sake of anyone who attempts to look at your code.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2009/11/25/keeping-it-simpler/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A Question of Emphasis</title>
		<link>http://blog.thescoop.org/archives/2009/11/21/a-question-of-emphasis/</link>
		<comments>http://blog.thescoop.org/archives/2009/11/21/a-question-of-emphasis/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 23:18:14 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Journalism]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5258</guid>
		<description><![CDATA[The job cuts at the Washington Post on Friday have produced a round of comments, broadly summed up by Steve Yelvington earlier today. They certainly begged the question that occurred to me as a former employee of both the Post and WPNI, its soon-to-be merged online operation: &#8220;What explains this kind of decision?&#8221;
First, let me [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.politico.com/blogs/michaelcalderone/1109/Layoffs_at_WaPo_.html">job cuts at the Washington Post on Friday</a> have produced a round of comments, <a href="http://twitter.com/yelvington/status/5927713381">broadly summed up by Steve Yelvington</a> earlier today. They certainly begged the question that occurred to me as a former employee of both the Post and WPNI, its soon-to-be merged online operation: &#8220;What explains this kind of decision?&#8221;</p>
<p>First, let me say that my observations about the general history of WPNI and its relationship with the paper are colored by my own experiences, but I agree with folks like <a href="http://twitter.com/jayrosen_nyu/status/5921241591">Jay Rosen</a> who say that at one point, washingtonpost.com was clearly a national leader &#8211; not just in technical capability, but in the kind of mindset necessary for a news organization prepared to take advantage of the Internet&#8217;s possibilities. I supported the creation of WPNI as a separate operation, to allow it more creative freedom, but both the people of WPNI and their colleagues at the Post should have done more to foster a better environment for working together. It&#8217;s something that I failed at when I was there.</p>
<p>But back to the kind of environment that leads to the departures, voluntarily or otherwise, of so many talented and dedicated employees. I don&#8217;t know the people who currently run the Washington Post, but I do think I understand a bit about how the organization works and thinks, having spent about three years there (more than two at the paper and about nine months at WPNI). When I wrote about <a href="http://blog.thescoop.org/archives/2007/02/04/why-the-web/">moving from the paper to the website back in 2007</a>, I left out some details about how that process happened. And I think, in hindsight, that they shed some light on how the organization operates.</p>
<p>It&#8217;s true that there were a number of people at the Post who were supportive and encouraging of my ambitions to work at WPNI. Among them were my supervisor at the time, Lucy Shackelford, and the paper&#8217;s editor, Len Downie. But once I had seriously pursued the idea of working on the website, it took months for the move to happen, and not just for reasons of simple corporate bureaucracy. In a very real way, my transition was held up &#8211; I (jokingly at first, and then angrily) referred to it as a filibuster or a senatorial hold &#8211; by a few people at the paper. These people, most of whom no longer occupy the positions they held then, are not stupid. They are among the smartest folks I&#8217;ve ever worked with, and I have a high regard for their journalistic abilities. But the thinking that <em>caused the editor of the paper to become involved in whether a mid-level staffer moved to the website</em> was, in essence, this: this is a bad idea, because it will hurt the paper. My ego might like to think that this was really true, but I think the reality is that these people could not compare the value of my work for the website to the paper because they did not understand what it is I wanted to do. So they went with what they knew, and that seemed to be a net deficit for them. And thus it was that I mooted the option of simply resigning from the paper in order to join its website.</p>
<p>I don&#8217;t envy the people who run the Washington Post (or any news organization) today. They have a ton of thankless choices to make, and critics on every side. From a certain standpoint, I can appreciate the idea that the paper edition, which generates the overwhelming share of the revenue, should be protected and bolstered as much as possible. But I cannot agree with the idea that this means that you take employees who have proven expertise doing valuable and informative things that don&#8217;t always translate into print and cannibalize (or toss away) their talents for the sake of the paper.</p>
<p>My fear as a Washington Post subscriber and reader of washingtonpost.com is that, when the folks running the organization turn things around (and I believe that it is not an impossibility or even a long-shot), what emerges will be not only a news organization that is a shadow of its former self &#8211; most orgs will have to face that reality &#8211; but that it will have put so much emphasis on the paper that it cannot take advantage of the possibilities online. That the folks running things are literally rolling back the progress and smart work that has been done, and will not be able to get it back as fast as they might think. And the people who remain &#8211; those who will be charged with the task of rebuilding a news operation that embraces all of the ways that its readers and users can gain value &#8211; will have neither the support nor the depth to make it happen.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2009/11/21/a-question-of-emphasis/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Buying Into Computational Journalism</title>
		<link>http://blog.thescoop.org/archives/2009/11/09/buying-intocomputational-journalism/</link>
		<comments>http://blog.thescoop.org/archives/2009/11/09/buying-intocomputational-journalism/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 03:02:15 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Car Tools]]></category>
		<category><![CDATA[Journalism]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5253</guid>
		<description><![CDATA[Update: Duke&#8217;s Sarah Cohen responds in the comments.
The intriguing title of a recent report from scholars at Duke is &#8220;Accountability Through Algorithm: Developing the Field of Computational Journalism&#8220;. Semi-related to CAR, Computational Journalism is defined as &#8220;the combination of algorithms, data, and knowledge from the social sciences to supplement the accountability function of journalism.&#8221; I [...]]]></description>
			<content:encoded><![CDATA[<p><em>Update: Duke&#8217;s Sarah Cohen <a href="http://blog.thescoop.org/archives/2009/11/09/buying-intocomputational-journalism/comment-page-1/#comment-142486">responds in the comments</a>.</em></p>
<p>The intriguing title of a recent report from scholars at Duke is &#8220;<a href="http://dewitt.sanford.duke.edu/images/uploads/About_3_Research_B_cj_1_finalreport.pdf">Accountability Through Algorithm: Developing the Field of Computational Journalism</a>&#8220;. Semi-related to <a href="http://en.wikipedia.org/wiki/Database_journalism">CAR</a>, Computational Journalism is defined as &#8220;the combination of algorithms, data, and knowledge from the social sciences to supplement the accountability function of journalism.&#8221; I take each of those &#8211; algorithms, data and knowledge from the social sciences &#8211; as separate elements, because while journalists do have plenty to learn from the social sciences, we also operate in an environment that is not quite academic (and sometimes not at all).</p>
<p>The report identifies four areas of potential exploration: techniques for data transformation and pattern discovery in investigative reporting; a digital “dashboard” for journalists; new social and technical structures for interactions among readers and reporters; and sense-making advances from other disciplines. All are interesting and worthy, but to me the first two are particularly so. </p>
<p>On the first, the best investigative journalists have been developing tools for extracting meaning from reams of information for years. The change now is that we have a greater platform for these tools in the Internet, and an effort like <a href="http://documentcloud.org/">DocumentCloud</a> is a clear example of that change. The challenge we face is that patterns are interesting to different people for different reasons; what an accountant finds interesting may not always be of interest to a journalist, and vice versa. The current deficit is not in the area of tools; it is the occasionally trickier area of adapting those for the task of journalism. That requires the guiding influence of people like <a href="http://www.sanford.duke.edu/graduate/mpp/faculty_new.php#cohen">Sarah Cohen</a>, a newly minted Knight Chair at Duke, who is studying these issues right now. But it also requires the active participation of a wide range of news organizations and journalists. In the Internet, we have a leveling platform, but only if more journalists participate. That may be a greater challenge than the technical one.</p>
<p>One way to get there is the second idea &#8211; a journalist&#8217;s dashboard. This would provide reporters with a way to keep track of the deluge of information coming into newsrooms. But again, the technological side of that equation, as difficult as it is, is less of a concern to me than the implementation and adoption of the results. We know how to gather various bits of information in one place. We&#8217;re not that good at distilling the best of them, or even knowing where to start. The good news is that we have blueprints for this kind of thing: the people and companies who make great Web apps that distill masses of data into understandable results. The bad news is that we, as a business, work very differently. We don&#8217;t really share much, outside of experiences at conferences or over drinks, and particularly not at the institutional level. And we&#8217;re downright awful, in general, at adapting good ideas for our own uses.</p>
<p>For the idea of Computational Journalism to work, a lot is riding on a movement that is slowly growing but urgently necessary for the news industry: the increasing adoption, use and proliferation of open-source tools. The CAR community has seen an influx of use of various types of open-source software, from databases to GIS systems to web frameworks. More and more reporters and editors are embracing different styles of journalism. But the broader concept of opening up our newsrooms, both philosophically and in terms of our content and efforts, has been slow in coming. It requires not just the creation of tools, but also the development of journalists and readers who will use those tools most effectively. And that&#8217;s more than an algorithm &#8211; to say nothing of Twitter &#8211; can solve alone.</p>
<p>Oh, and Duke folks? Can we get a version of that report that embraces the Web as much as the concept? HTML will do fine.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2009/11/09/buying-intocomputational-journalism/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>
