<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Scoop</title>
	<atom:link href="http://blog.thescoop.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.thescoop.org</link>
	<description>Derek Willis' weblog on investigative and computer-assisted reporting.</description>
	<lastBuildDate>Sat, 31 Jul 2010 21:27:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>A GitHub for Data?</title>
		<link>http://blog.thescoop.org/archives/2010/07/31/a-github-for-data/</link>
		<comments>http://blog.thescoop.org/archives/2010/07/31/a-github-for-data/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 21:06:40 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Data]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5474</guid>
		<description><![CDATA[Clay Johnson, late of Sunlight Labs and now writing at the splendidly-named InfoVegan, says that what the &#8220;Open Data&#8221; movement needs is a better way to store data on the Web. Something like a GitHub for data: Why can I not type into a console gitdata install census-2010 or gitdata install census-2010 —format=mongodb and have [...]]]></description>
			<content:encoded><![CDATA[<p>Clay Johnson, late of Sunlight Labs and now writing at the splendidly-named InfoVegan, says that what the &#8220;Open Data&#8221; movement needs is a better way to store data on the Web. Something <a href="http://infovegan.com/2010/07/30/github-for-data">like a GitHub for data</a>:</p>
<blockquote><p>Why can I not type into a console gitdata install census-2010 or gitdata install census-2010 —format=mongodb and have everything I need to interface with the coming census data?</p></blockquote>
<p>Technically, there&#8217;s not much reason why this couldn&#8217;t happen. Sure, some government datasets are very large, and some are in arcane and oddball formats, but these are technical problems that can be overcome. But the biggest issue, for data-driven apps contests and pretty much any other use of government data, is not that data isn&#8217;t easy to store on the Web. It&#8217;s that data is hard to understand, no matter where you get it.</p>
<p>In a sense, a GitHub for data could help solve this problem, too, because you can write documentation and many GitHub projects have <a href="http://github.com/jashkenas/coffee-script">excellent documentation</a>. But there also are projects with very limited documentation &#8211; heck, some of them are mine. This is the biggest gap to better apps, that so few people really understand the data and its pitfalls. I&#8217;d like to see what Clay wants to see, too, but right now I&#8217;m more interested in:</p>
<p><code>gitdata install census-2010</code></p>
<p>If the person executing that command is, say, <a href="http://census.pewsocialtrends.org/2010/covering-census-2010-a-workshop-for-journalists">Paul Overberg</a>.</p>
<p>That&#8217;s not to say that I&#8217;m in favor of a situation where only those with expertise have access to data. What I&#8217;m saying is that the very act of what Clay describes as a hassle:</p>
<blockquote><p>A developer has to download some strange dataset off of a website like data.gov or the National Data Catalog, prune it, massage it, usually fix it, and then convert it to their database system of choice, and then they can start building their app.</p></blockquote>
<p>Is in fact what helps a user learn more about the dataset he or she is using. Even a well-documented dataset can have its quirks that show up only in the data itself, and the act of importing often reveals more about the data than the documentation does. We need to import, prune, massage, convert. It&#8217;s how we learn.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/07/31/a-github-for-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How APIs Help the Newsroom</title>
		<link>http://blog.thescoop.org/archives/2010/07/12/how-apis-help-the-newsroom/</link>
		<comments>http://blog.thescoop.org/archives/2010/07/12/how-apis-help-the-newsroom/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 13:09:05 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[API]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5457</guid>
		<description><![CDATA[As nice as it is to get praised for the civic-mindedness of your work, the not-so-secret secret about APIs at The Times is that we&#8217;re the biggest consumer of them. The flexibility and convenience that the APIs provide make it easier to cut down on repetitive manual work and bring new ideas to fruition. Other [...]]]></description>
			<content:encoded><![CDATA[<p>As nice as it is to <a href="http://blog.programmableweb.com/2010/07/08/nytimes-shows-how-civic-apis-should-be-built/">get praised for the civic-mindedness of your work</a>, the not-so-secret secret about <a href="http://developer.nytimes.com/">APIs at The Times</a> is that we&#8217;re the biggest consumer of them. The flexibility and convenience that the <a href="http://en.wikipedia.org/wiki/Application_programming_interface">APIs</a> provide make it easier to cut down on repetitive manual work and bring new ideas to fruition. Other news organizations can do the same.</p>
<p>This week, for example, we launched a page to <a href="http://www.nytimes.com/interactive/2010/07/07/us/politics/20100707-kagan-vote-tracker.html">track Republican senators&#8217; positions on the nomination of Elena Kagan to the Supreme Court</a>. The fabulous graphics department has done things like this in the past, such as with the <a href="http://www.nytimes.com/interactive/2010/03/16/us/politics/20100316-health-care-dems.html">House vote on health care</a>. Both of those graphics were assembled from lots of different pieces of information &#8211; electoral results and previous votes among them &#8211; and the Kagan data includes stuff like whether the senator in question is running for re-election this year.</p>
<p>You could, of course, ask people to gather up all that information, but if you&#8217;re going to do something like this more than once, it makes sense to have a way to automate as much as possible. That&#8217;s where the APIs come in. For the Kagan graphic, we used the <a href="http://developer.nytimes.com/docs/congress_api">NYT Congress API</a> to pull in information on senators and their votes, which leaves the gathering of information about their statements on Kagan as the lone manual task. In other words, only the stuff that is specific to this app requires manual effort.</p>
<p>Similarly, the new <a href="http://open.blogs.nytimes.com/2010/07/01/introducing-the-districts-api/">Districts API we released</a> plays well with our other APIs, so that I was able to build a <a href="http://districts-demo.heroku.com/">simple demo app</a> that takes advantage of the fact that our Congress API, among others, can return <a href="http://developer.nytimes.com/docs/congress_api#h3-current-member">the current member for a particular district</a>.</p>
<p>For newsrooms, the utility of APIs goes beyond creating Web apps. Making data available via APIs is a little like giving the newsroom the ability to ask and answer questions without having to tie down a CAR person for long periods of time. APIs can provide data in whatever format you choose, which means that a wider range of people can take advantage, from graphic artists used to working with XML to reporters comfortable with CSV files. When your data is more accessible and flexible, <a href="http://blog.thescoop.org/thefix/the-information-gap/">the possibilities for doing things with it expands</a>.</p>
<p>So if you have a big local election coming up, having an API for candidate summary data makes it easier to do a quick-and-dirty internal site for reporters and editors to browse, but also gives graphics folks a way to pull in the latest data without having to ask for a spreadsheet. Chances are that if serious data analysis is what you need, that&#8217;ll be done in some desktop application or database server anyway. The API is just a messenger, albeit one that is always on and able to spawn lots of ideas and experiments.</p>
<p>If you&#8217;re looking to build an API, remember that it&#8217;s just a Web application delivering data in a structured format (XML and JSON being two popular formats these days). There are lots of options in terms of what you use to build and serve an API, <a href="http://help.hackshackers.com/questions/421/how-can-a-news-and-content-platform-build-a-great-api">so it&#8217;s important to pay attention to the design</a>: which information you&#8217;ll deliver, and how. Being a significant user of your own API is really important, too; it&#8217;ll give you the best sense of how well you&#8217;ve designed your responses, and what you might be missing.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/07/12/how-apis-help-the-newsroom/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Big Numbers, Low Impact</title>
		<link>http://blog.thescoop.org/archives/2010/06/27/big-numbers-low-impact/</link>
		<comments>http://blog.thescoop.org/archives/2010/06/27/big-numbers-low-impact/#comments</comments>
		<pubDate>Sun, 27 Jun 2010 22:22:35 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Fed Data]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5433</guid>
		<description><![CDATA[From the perspective of someone who uses government data pretty often, Data.gov and its state progeny (Massachusetts, D.C., Minnesota &#8211; the &#8220;data deli&#8221; is a great name &#8211; among them) are better than what we used to have. They make the acquisition of data by journalists, regulated communities and the general public much easier than [...]]]></description>
			<content:encoded><![CDATA[<p>From the perspective of someone who uses government data pretty often, Data.gov and its state progeny (<a href="http://www.mass.gov/data">Massachusetts</a>, <a href="http://data.octo.dc.gov/">D.C.</a>, <a href="http://deli.dnr.state.mn.us/data_catalog.html">Minnesota</a> &#8211; the &#8220;data deli&#8221; is a great name &#8211; among them) are better than what we used to have. They make the acquisition of data by journalists, regulated communities and the general public much easier than it has been. But there are two related issues with such efforts that I can see: in general, both producers and consumers of government data tend to operate in vacuums.</p>
<p>Many governments want to put data online. It&#8217;s a bit of a PR win for them, and it does provide a service that otherwise might occupy the time of a government employee who has other tasks to accomplish. But, as <a href="http://goodspeedupdate.com/2010/2976">Rob Goodspeed</a> notes, many of them are trying to figure out what exactly to post online. And in dealing with that question, the transparency movement isn&#8217;t exactly providing a lot of clarity, or even the right kind of input.</p>
<p>Rob accurately reports that the common answer to the question of what to post is &#8220;everything,&#8221; but he quickly points out that this isn&#8217;t possible in many cases. And sometimes &#8211; I&#8217;m looking at you, <a href="http://www.data.gov/">Data.gov</a> &#8211; posting everything actually obscures the absence of good stuff. Sure, it seems impressive that nearly 275,000 datasets are available on Data.gov, but that has the secondary effect of making the most popular dataset on the site (as I write this) the &#8220;<a href="http://www.data.gov/geodata/g647269/">US Topo 7.5-minute map for Imperial, TX</a>.&#8221; Seriously, more than 200 people have downloaded it.</p>
<p>The state of Data.gov &#8211; which will greatly influence state and local government data efforts &#8211; is skewed in a way that gives rise to frustration among users of government data. Even folks in the transparency community &#8211; <a href="http://sunlightlabs.com/blog/2010/coming-data-flood/">who previously celebrated the sheer number of datasets released</a> &#8211; are starting to recognize this situation for what it is. Ellen Miller, executive director of the Sunlight Foundation, wrote that &#8220;<a href="http://blog.sunlightfoundation.com/2010/06/25/whither-transparency/">[t]he torrent of data we expected to see at Data.gov isn’t materializing</a>.&#8221; She cites a colleague&#8217;s calculation that 99 percent of the files available there are GIS data, which while useful, are not quite as accessible as, say, a CSV or Excel file. Other datasets are flawed in significant ways, rendering them nearly useless. More Miller: &#8220;Call it a hot Friday afternoon, and maybe I’m cranky, but I think it’s time to begin to ask some tough questions of the White House. Whither or wither transparency?&#8221; I&#8217;m sure many folks who work with government data have thought or uttered similar sentiments.</p>
<p>But before we prepare the tar and feathers, let&#8217;s pause for a bit and consider a few things. First, what is Data.gov supposed to be? Here&#8217;s what <a href="http://www.data.gov/about">the site</a> says: &#8220;The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.&#8221;</p>
<p>The only bit you can argue with there is the term &#8220;high value&#8221;. And yes, Sunlight and its allies have a point. There&#8217;s very little in the way of data from the Justice or Interior departments (although the DOJ&#8217;s regular <a href="http://www.data.gov/catalog/raw/category/0/agency/7,29/filter/jails/type//sort//page/1/count/25">survey of jails</a> is quite well-represented, dating back to 1985). Since Data.gov is an executive branch joint, there&#8217;s nothing from Congress or the judicial branch, either, but that&#8217;s pretty clearly stated.</p>
<p>So the valid objection is that the tens of thousands of datasets on Data.gov tend to clutter up the joint, and some of the best data isn&#8217;t there. To which I say, &#8220;Hello, my name is crucial information. I&#8217;m really valuable, which is why it&#8217;s probably hard to get your hands on me.&#8221; In Washington, as in state capitals, this is no surprise at all. Data.gov is the fulfillment of the phrase &#8220;low-hanging fruit&#8221;: you can fill up on it, but there always will be stuff out of reach.</p>
<p><a href="http://goodspeedupdate.com/2010/2976">Rob Goodspeed suggests five categories of data</a> that should come first in government transparency efforts. They&#8217;re all worthy candidates, and it&#8217;s important to note that many are related to current policy questions. But leaving it up to the government to decide which data fulfills those criteria (or others) &#8211; which is what we&#8217;re really doing when we emphasis the sheer number of datasets available &#8211; seems to be inviting more of the same.</p>
<p>So what can the good folks at Sunlight and others interested in a more transparent and accountable government do? One option would be to stop highlighting the number of datasets and start focusing on quality. The <a href="http://nationaldatacatalog.com/">National Data Catalog</a> is an interesting and useful project with a great goal, but you know what would be better to do in parallel? The National Really Useful Data Catalog. A NRUDC (admittedly, a lousy acronym) would include a targeted selection of existing datasets *and* a list of really useful data that isn&#8217;t currently available. It would focus less on the number of datasets available and more on what&#8217;s good, bad and ugly about ones of particular note. What&#8217;s more, the NRUDC would be more than a catalog &#8211; it would include a group of people tasked with investigating, testing and analyzing the uses and flaws of those datasets. People like, say, <a href="http://reporting.sunlightfoundation.com/">Sunlight&#8217;s Reporting Group</a>.</p>
<p>Not saying the Reporting Group doesn&#8217;t do good work. But the various data catalogs are crying out for some semblance of an editor function &#8211; someone who can say, &#8220;Hey, don&#8217;t bother with this, it&#8217;s missing key elements,&#8221; or &#8220;This agency&#8217;s forms contain much more information than are available in the data, and here&#8217;s why.&#8221; What we get now are lists of datasets, sometimes with links to documentation.</p>
<p>Some of this information can already be found in, say, the <a href="http://www.ire.org/membership/subscribe/nicar-l.html">NICAR-L archives</a> or <a href="http://www.ire.org/resourcecenter/tipsheets.php">IRE&#8217;s tipsheets</a>, where reporters have helped each other to understand the caveats of using this dataset or that. But in general, we haven&#8217;t done enough to make this kind of helpful information for users available to the broader public. Sunlight and other groups could really help with that effort, and in the process make a valuable contribution to meaningful transparency, the kind that isn&#8217;t so impressed by big numbers.</p>
<p>And what about government&#8217;s role? It&#8217;s the rare person who works with data who doesn&#8217;t want users of that information to understand how it works and why. By digging deeper into individual datasets instead of growing horizontally toward an ever-larger number of them, it might be possible to extend the expertise of those individuals to better use. Yes, it&#8217;s far easier to count every dataset and say you&#8217;re accomplishing transparency. But what kind of transparency, <a href="http://www.aaronsw.com/weblog/transparencybunk">and to what end?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/06/27/big-numbers-low-impact/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Using the NYT Congress API with &#8230; Excel?</title>
		<link>http://blog.thescoop.org/archives/2010/05/11/using-the-nyt-congress-api-with-excel/</link>
		<comments>http://blog.thescoop.org/archives/2010/05/11/using-the-nyt-congress-api-with-excel/#comments</comments>
		<pubDate>Tue, 11 May 2010 13:10:54 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Car Tools]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5344</guid>
		<description><![CDATA[It&#8217;s true that Excel has been a decreasing part of my toolkit for several years now, and that I never quite had the love for it that I do for various database managers. But I&#8217;m guessing that&#8217;s the exception, not the rule, in the broader journalism community. So when it came time to propose a [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s true that Excel has been a decreasing part of my toolkit for several years now, and that I never quite had the love for it that I do for various database managers. But I&#8217;m guessing that&#8217;s the exception, not the rule, in the broader journalism community. So when it came time to propose a <a href="http://data.nicar.org/node/3766">lightning talk</a> for the 2010 CAR Conference last week, I chose to pull out the ol&#8217; spreadsheet and show how you could get started with the <a href="http://developer.nytimes.com/docs/congress_api">NYT&#8217;s Congress API</a> with a familiar tool.</p>
<p>To do this, I had to not only drag out Excel but also do it on Windows, since <a href="http://www.mrexcel.com/tip103.shtml">Excel&#8217;s Web Query feature</a> isn&#8217;t available on the Mac. (You could also do this, albeit in a slightly different manner, using OpenOffice and Google Spreadsheets. In the comments, Chris Amico <a href="http://blog.thescoop.org/archives/2010/05/11/using-the-nyt-congress-api-with-excel/#comment-149933">shows you how using Google Spreadsheets</a>.) Here&#8217;s how it works using Excel.</p>
<p>First, you&#8217;ll need an API key. To get one, go to <a href="http://developer.nytimes.com/">The Times Developer Network</a> and <a href="http://developer.nytimes.com/apps/register">register</a> (note: you&#8217;ll need to be a registered user of nytimes.com first).</p>
<p>
<div class="caption center"><a href="http://blog.thescoop.org/wp-content/uploads/key_signup.png"><img src="http://blog.thescoop.org/wp-content/uploads/key_signup-150x150.png" alt="" title="key_signup" width="150" height="150" class="aligncenter size-thumbnail wp-image-5358" /></a></div>
</p>
<p>You&#8217;re registering an &#8220;application&#8221;, and then you can add specific API keys to that account. Let&#8217;s add one for the Congress API. The key itself is a longish string of letters and numbers that gets appended to every API request URL, including the ones we&#8217;ll make from Excel. Let&#8217;s copy the API key so we can easily grab it (note that this particular key has been disabled, so using it won&#8217;t work).</p>
<p>
<div class="caption center"><a href="http://blog.thescoop.org/wp-content/uploads/api_key.png"><img src="http://blog.thescoop.org/wp-content/uploads/api_key-300x130.png" alt="" title="api_key" width="300" height="150" class="aligncenter size-thumbnail wp-image-5358" /></a></div>
</p>
<p>Let&#8217;s find an API call that we can use be looking at the <a href="http://developer.nytimes.com/docs/congress_api">Congress API&#8217;s documentation</a>. Let&#8217;s pick the <a href="http://developer.nytimes.com/docs/congress_api#h3-members-leaving">&#8220;members leaving office&#8221; response</a>, otherwise known as the casualty list. All that&#8217;s required is the chamber (&#8216;house&#8217; or &#8216;senate&#8217;) and the congress (currently only the 111th is supported). If we choose the House, the URL <a href="http://api.nytimes.com/svc/politics/v3/us/legislative/congress/111/house/members/leaving?api-key=YOUR-API-KEY">will look like this</a>, except that you&#8217;ll need to specify your Congress API Key.</p>
<p>The version number should be &#8220;v3&#8243; and you don&#8217;t need to specify a format after leaving (xml is the default). You should quickly get an xml file that looks roughly like this:</p>
<p>
<div class="caption center"><a href="http://blog.thescoop.org/wp-content/uploads/request.png"><img src="http://blog.thescoop.org/wp-content/uploads/request-300x225.png" alt="" title="request" width="150" height="150" class="aligncenter size-thumbnail wp-image-5358" /></a></div>
</p>
<p>To get that xml into Excel, we&#8217;re going to use Excel&#8217;s Import Data feature. I&#8217;m not one of those cool kids who has Excel 2007 at their fingertips, so I&#8217;m going to use Excel 2002. Import Data can be found at Data -> Get External Data -> Import Data.</p>
<p>
<div class="caption center"><a href="http://blog.thescoop.org/wp-content/uploads/import_data.png"><img src="http://blog.thescoop.org/wp-content/uploads/import_data-300x225.png" alt="" title="import_data" width="150" height="150" class="aligncenter size-thumbnail wp-image-5358" /></a></div>
</p>
<p>Then change the file type to xml and paste the full API url into the box just above the file type.</p>
<p>
<div class="caption center"><a href="http://blog.thescoop.org/wp-content/uploads/import_data2.png"><img src="http://blog.thescoop.org/wp-content/uploads/import_data2-300x225.png" alt="" title="import_data2" width="150" height="150" class="aligncenter size-thumbnail wp-image-5358" /></a></div>
</p>
<p>It works for local files and Web urls. Then click on &#8220;Open&#8221; to start the process. The import process consists of Excel asking you where to put the file. Just click &#8220;OK&#8221; and you should soon see something like this: </p>
<p>
<div class="caption center"><a href="http://blog.thescoop.org/wp-content/uploads/results.png"><img src="http://blog.thescoop.org/wp-content/uploads/results-150x150.png" alt="" title="results" width="150" height="150" class="aligncenter size-thumbnail wp-image-5427" /></a></div>
</p>
<p>The header row in row 2 isn&#8217;t perfect, but it should suffice. You probably don&#8217;t need the copyright statement in column A. But now you&#8217;ve got a way to pull data into Excel from an API! If you have questions or comments, please don&#8217;t hesitate to post them below. If you&#8217;re having issues with the API, <a href="http://developer.nytimes.com/forum">the forum</a> is the best place to head.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/05/11/using-the-nyt-congress-api-with-excel/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>An Even Better CAR Conference?</title>
		<link>http://blog.thescoop.org/archives/2010/05/03/an-even-better-car-conference/</link>
		<comments>http://blog.thescoop.org/archives/2010/05/03/an-even-better-car-conference/#comments</comments>
		<pubDate>Mon, 03 May 2010 13:06:39 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[IRE]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5329</guid>
		<description><![CDATA[It&#8217;s dangerous to blog late at night, so take what follows with a grain of salt. It&#8217;s more stream of (semi-)consciousness than anything else, but I&#8217;m curious what other folks think. Note: I wrote this before Aron&#8217;s thread about conferences on Hacks/Hackers, but some of it relates to that question, too. Give IRE a lot [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s dangerous to blog late at night, so take what follows with a grain of salt. It&#8217;s more stream of (semi-)consciousness than anything else, but I&#8217;m curious what other folks think. Note: I wrote this before <a href="http://help.hackshackers.com/questions/269/journo-tech-conference-that-doesnt-suck">Aron&#8217;s thread about conferences on Hacks/Hackers</a>, but some of it relates to that question, too.</p>
<p>Give <a href="http://ire.org/">IRE</a> a lot of credit for what is a difficult task: putting on a <a href="http://data.nicar.org/conference/schedule/7">computer-assisted reporting conference</a> that appeals to both novice and expert. The CAR confab in Phoenix in March, like every year&#8217;s, tries to bring together a set of journalists who can teach each other and themselves about everything from basic spreadsheets to Web frameworks. For the most part it works, even if it means that a sizable chunk of the attendees are also instructors and speakers (often for more than one session). The energy and enthusiasm at Phoenix was great to see, and the sessions had a good variety of topics and formats.</p>
<p>But an IRE veteran would notice that quite a few of the big names in CAR work weren&#8217;t in Phoenix, or even at the last few conferences. Shrinking travel budgets are a factor, of course, but the IRE conferences have always had a segment of people who paid their own way. I suspect that for a small number of &#8220;high-end&#8221; CAR practitioners, however, the CAR conference doesn&#8217;t offer them much anymore, because of its long-standing tradition of appealing to a broad base of people.</p>
<p>That tradition is no bad thing at all: journalism needs to keep bringing in more and more people who want to learn these techniques, and expose those who already have some experience to greater challenges. This isn&#8217;t a call to drop introductory sessions. But I wonder if there aren&#8217;t some changes that could be made to make the conference irresistible for those who don&#8217;t see many chances for growth in the schedule.</p>
<p>For example, there were at least three sessions this year generically focused on &#8220;new tools&#8221; for reporting: <a href="http://data.nicar.org/node/3757">machine learning</a>, <a href="http://data.nicar.org/node/3763">advanced methods</a> and <a href="http://data.nicar.org/node/3773">new frontiers</a>. What if, instead, we blocked out some afternoon time on Thursday and actually tried out some of this software together? Bring a laptop and some legislation, and with a group of people figure out entity extraction and other classification techniques and then present it later in the conference and/or write it all up for <a href="http://data.nicar.org/uplink/">Uplink</a>. What if we voted on some new federal or state dataset and ran the traps on it together, finding out its pitfalls and uses, or brainstormed about better tools for newsrooms? What if some sessions were recast to produce something &#8211; the best documentation for a particular data source, for example &#8211; rather than a collection of tipsheets that might never be assembled into a coherent guide (or say, a <a href="https://www.ire.org/store/formengine.php?form=books&#038;status=1">beat book</a>)? What if we turned the evening bar sessions &#8211; ok, ok, too much change. But still.</p>
<p>It&#8217;ll be difficult to appeal to absolutely everyone, but if we made it easier to do more than talk for 50 minutes at a time, perhaps by providing the opportunity to get together with a range of folks and produce something that we couldn&#8217;t do alone, IRE might be able attract even more people new and old. There are an increasing number of people attending the CAR conference who are in a position to evaluate and develop tools for newsrooms, and they want to do this. Pairing them with folks who have spent years combing through data and documents while reporting can only be a good thing &#8211; we might end up with a base FEC data parser that newsrooms could customize, or the best set of documentation for IRS migration data, or even some cool dashboards to help reporters spot trends. Maybe we could designate a theme for a particular conference.</p>
<p>If you spotted the influence of open-source development in this post, you&#8217;d be a keen observer. One of IRE&#8217;s defining moments, the <a href="http://www.ire.org/history/arizonaproject.html">Arizona Project</a>, was in part about doing a public service by marshaling a wide set of talent. It was a fairly radical act of selflessness that not all IRE members agreed with, but to me it represents a key strength of the organization: collaborative learning. A lot of open source software projects out there would kill for the dedication of IRE members. What else can we do together so that we all benefit even more?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/05/03/an-even-better-car-conference/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>2010 CAR Conference</title>
		<link>http://blog.thescoop.org/archives/2010/03/07/2010-car-conference/</link>
		<comments>http://blog.thescoop.org/archives/2010/03/07/2010-car-conference/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 01:24:06 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[IRE]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5325</guid>
		<description><![CDATA[The 2010 CAR Conference begins on Thursday, and here are some of the sessions I&#8217;m trying not to miss: Thursday, March 11 Big Data: Analyzing legislation with machine learning. Always good to hear what Chase Davis has been up to. Open Source GIS. Now that mapping is more and more accessible, it pays to stay [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://data.nicar.org/CAR2010">2010 CAR Conference</a> begins on Thursday, and here are some of the sessions I&#8217;m trying not to miss:</p>
<p><strong>Thursday, March 11</strong></p>
<ul>
<li><a href="http://data.nicar.org/node/3757">Big Data: Analyzing legislation with machine learning</a>. Always good to hear what Chase Davis has been up to.</li>
<li><a href="http://data.nicar.org/node/3759">Open Source GIS</a>. Now that mapping is more and more accessible, it pays to stay on top of what people are using.</li>
<li><a href="http://data.nicar.org/node/3763">Juice up your stories with advanced methods</a>. After a few months in the academy, Sarah Cohen should have some good stuff to share.</li>
<li><a href="http://data.nicar.org/node/3766">Some lightning talks!</a></li>
</ul>
<p><strong>Friday, March 12</strong></p>
<ul>
<li><a href="http://data.nicar.org/node/3893">Semantic tagging and DocumentCloud</a>. Really need to get more in-depth on this.</li>
<li><a href="http://data.nicar.org/node/3773">New Frontiers in Reporting Tools</a>. More of the New New Stuff.</li>
<li><a href="http://data.nicar.org/node/3778">Forensic Accounting for Reporters</a>. Nice to have some outside expertise.</li>
</ul>
<p><strong>Saturday, March 13</strong></p>
<ul>
<li><a href="http://data.nicar.org/node/3787">GeoDjango &#038; OpenLayers</a>. Yes, I&#8217;m on this panel. But you should come anyway to hear from <a href="http://www.palewire.com">Ben Welsh</a> of the LA Times.</li>
<li><a href="http://data.nicar.org/node/3902">Not a programmer? Not a worry</a>. The new software from ProPublica that helps publish data on the Web. Very interested to see this in action.</li>
</ul>
<p>There&#8217;s also the Django bootcamp and plenty of opportunities for demos, discussions and debates. Hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/03/07/2010-car-conference/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Gentle Introduction to Google App Engine</title>
		<link>http://blog.thescoop.org/archives/2010/02/23/a-gentle-introduction-to-google-app-engine/</link>
		<comments>http://blog.thescoop.org/archives/2010/02/23/a-gentle-introduction-to-google-app-engine/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 20:41:08 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5300</guid>
		<description><![CDATA[As part of our roll-out of version 3 of the NYT Congress API, I was tasked with coming up with a sample application that uses the API to do something mildly interesting, or at least functional. I had gotten a book on Google App Engine for my birthday and was pretty excited to see that [...]]]></description>
			<content:encoded><![CDATA[<p>As part of our roll-out of version 3 of the <a href="http://developer.nytimes.com/docs/congress_api">NYT Congress API</a>, I was tasked with coming up with a sample application that uses the API to do something mildly interesting, or at least functional. I had gotten a <a href="http://oreilly.com/catalog/9780596522735">book</a> on <a href="http://code.google.com/appengine/">Google App Engine</a> for my birthday and was pretty excited to see that some of the basic philosophies of <a href="http://www.djangoproject.com/">Django</a> were either incorporated directly into GAE or were easy to adapt to it. So when I started on the sample app, I picked GAE and dove in.</p>
<p><a href="http://code.google.com/appengine/docs/python/overview.html">App Engine&#8217;s Python runtime</a>, unsurprisingly, sticks pretty close to the language&#8217;s core tenets: it uses YAML files for configuration (hey, it&#8217;s whitespace!) and can run pretty much an entire app using just 2-3 files. A NYT colleague, Derek Gottfrid, built a <a href="http://github.com/derekg/nyt-trender">sample app for our article search API</a> comprising five files, including the README. Yes, it violates the separation of logic and design that most frameworks try to respect, but it works.</p>
<p>GAE provides the basic building blocks a lot of Web apps need, nearly all optional: a backend in Datastore, a URL Fetch service that is wrapped by Python&#8217;s familiar urllib and urllib2 libraries, mail and messaging services and memcache. <a href="http://code.google.com/appengine/docs/python/tools/webapp/">Webapp</a> is a basic framework for building apps not exactly like Django but not so unfamiliar, either.</p>
<p>The <a href="http://code.google.com/appengine/docs/python/tools/devserver.html">development server</a> will be familiar to anyone who has tinkered with Django, and GAE handles static files via separate servers, which is how it should be. And since it comes with a version of Django built-in, you can bring along some handy utilities, like simplejson, with a single import statement. And as I said earlier, you don&#8217;t have to separate display logic into template files, but you can, and the syntax is nearly identical to Django templates.</p>
<p><a href="http://nytcongress.appspot.com/">The sample app</a> takes two random members of the Senate and compares their voting and bill sponsorship records in the 111th Congress. <a href="http://github.com/dwillis/NYT-Congress-API-Demo">The app&#8217;s code</a> is like the app itself: fairly tightly-focused and without a lot of trappings. It&#8217;s just service calls to the API and a single template for display. In building it, I didn&#8217;t make use of any persistent storage, so I didn&#8217;t delve into <a href="http://code.google.com/appengine/docs/python/datastore/">Datastore</a>, but it looks pretty useful. One of its helpful features is that as you develop your app, it generates indexes used to help return data in the most efficient manner.</p>
<p>If you&#8217;re already familiar with Django, making the small step to App Engine isn&#8217;t that big of a trip. <a href="http://www.youtube.com/watch?v=v1gTI4BOPUw">Have Guido explain things to you first</a>, and then try it out. You can also <a href="http://code.google.com/appengine/articles/appengine_helper_for_django.html">run a stripped-down version of Django on GAE</a>, and I&#8217;m looking to see if there&#8217;s a project I can adapt to try it out. In the meantime, if you want to tinker with the sample app, by all means fork it and see what else you can do with the API. And let me know what you come up with!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/02/23/a-gentle-introduction-to-google-app-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lightning Talks at NICAR</title>
		<link>http://blog.thescoop.org/archives/2010/02/18/lightning-talks-at-nicar/</link>
		<comments>http://blog.thescoop.org/archives/2010/02/18/lightning-talks-at-nicar/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 22:08:04 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[IRE]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5315</guid>
		<description><![CDATA[This year&#8217;s computer-assisted reporting conference in Phoenix has a couple of new sessions on the schedule. One of them is an idea a couple of us have been pushing for a few years: lightning talks. A staple of technical conferences, lightning talks are based on the notion that while 45-50 minutes presentations are good, sometimes [...]]]></description>
			<content:encoded><![CDATA[<p>This year&#8217;s <a href="http://data.nicar.org/CAR2010/">computer-assisted reporting conference in Phoenix</a> has a couple of new sessions on the schedule. <a href="http://data.nicar.org/node/3766">One of them</a> is an idea a couple of us have been pushing for a few years: <a href="http://en.wikipedia.org/wiki/Lightning_Talk">lightning talks</a>.</p>
<p>A staple of technical conferences, lightning talks are based on the notion that while 45-50 minutes presentations are good, sometimes you only need about 5 minutes to express an idea or show off an example of something interesting or useful. And I&#8217;m pretty sure, based on sessions at the hotel bar at past conferences, that there are plenty of ideas out there. Attracting and organizing them is the goal of <a href="http://ire.aronpilhofer.com/">this app</a> built by my boss, <a href="http://aronpilhofer.com/">Aron Pilhofer</a>, and launched today.</p>
<p>This is a new thing for the CAR community, but I think it could be a useful way to draw out ideas from people who aren&#8217;t doing panels or who might shrink from a 50-minute session. So let&#8217;s get started. <a href="http://ire.aronpilhofer.com/">Want to give a talk?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/02/18/lightning-talks-at-nicar/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Fumblerooski and Raw SQL in Django</title>
		<link>http://blog.thescoop.org/archives/2010/02/15/fumblerooski-and-raw-sql-in-django/</link>
		<comments>http://blog.thescoop.org/archives/2010/02/15/fumblerooski-and-raw-sql-in-django/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 13:52:30 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5304</guid>
		<description><![CDATA[Over the weekend I was able to make some upgrades to Fumblerooski (the off-season being a good time for that), and to do so I took advantage of some of the new features in Django&#8216;s development version (with the 1.2 release being just around the corner). (Yes, I know I shouldn&#8217;t be running beta software [...]]]></description>
			<content:encoded><![CDATA[<p>Over the weekend I was able to make some upgrades to <a href="http://fumblerooski.org/">Fumblerooski</a> (the off-season being a good time for that), and to do so I took advantage of some of the new features in <a href="http://www.djangoproject.com/">Django</a>&#8216;s development version (<a href="http://www.djangoproject.com/weblog/2010/feb/06/12-beta-1/">with the 1.2 release being just around the corner</a>).</p>
<p>(Yes, I know I shouldn&#8217;t be running beta software in production, but even I have to say that Fumblerooski isn&#8217;t &#8220;mission-critical&#8221; software. Not yet, at least.)</p>
<p>Awhile back, <a href="http://twitter.com/jeffself">Jeff Self</a> suggested that he&#8217;d like to see head-to-head records for head coaches &#8211; how one person has fared against another. Me too, I thought. But how best to do that? In Fumblerooski&#8217;s models, it&#8217;s not a simple query, since a Coach has many CoachingJobs and CollegeCoach assignments. Plus the Game model refers to the individual College, not a specific head coach. So  I <a href="http://github.com/dwillis/fumblerooski/commit/4ce2948cc0f9dbac135ca0e75c1127d4054c3471">added head coaches to the Game model and some utilities to populate those fields</a>. That work continues, especially for people who were head coaches earlier in the decade but are not now. That made it easy to grab games in which <a href="http://fumblerooski.org/coaches/detail/80-mack-brown/vs/32-bob-stoops/">two specific head coaches faced off</a>.</p>
<p>But to create a list of head coaches that Mack Brown has faced since 2000, I turned to one of Django newest features: <a href="http://docs.djangoproject.com/en/dev/topics/db/sql/#performing-raw-queries">improved raw SQL querying</a>. Much like <a href="http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002267">ActiveRecord&#8217;s find_by_sql</a> method for Rails, Model.objects.raw() makes it easy to write custom SQL when you need to, and still be able to access instantiated objects as the result.</p>
<p>You can select <a href="http://docs.djangoproject.com/en/dev/topics/db/sql/#deferring-model-fields">only those fields you need</a>, deferring others, or <a href="http://docs.djangoproject.com/en/dev/topics/db/sql/#passing-parameters-into-raw">pass parameters into the SQL</a>. And you can add additional calculated fields, too. So when I needed to pull in a QuerySet of distinct coaches that Mack Brown had faced, plus the number of games, I could do something like this:<br />
<code><br />
coach = Coach.objects.get(last_name='Brown', first_name='Mack')<br />
coach_list = Coach.objects.raw(<br />
"SELECT college_coach.id, college_coach.slug, count(college_game.*) as games<br />
from college_coach inner join college_game on college_coach.id = college_game.coach2_id<br />
where coach1_id = %s group by 1,2 order by 3 desc", [coach.id])<br />
</code><br />
And thus you can see <a href="http://fumblerooski.org/coaches/detail/80-mack-brown/vs/">head coaching opponents for Mack Brown</a> (or another coach). The idea isn&#8217;t new, but I really like the implementation &#8211; it&#8217;s clean and makes the mixing of Python code and SQL about as seamless as it could possibly be.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/02/15/fumblerooski-and-raw-sql-in-django/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Using Geocoders with GeoDjango</title>
		<link>http://blog.thescoop.org/archives/2010/01/24/using-geocoders-with-geodjango/</link>
		<comments>http://blog.thescoop.org/archives/2010/01/24/using-geocoders-with-geodjango/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 01:10:21 +0000</pubDate>
		<dc:creator>Derek</dc:creator>
				<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://blog.thescoop.org/?p=5288</guid>
		<description><![CDATA[Update: Simon has updated his library to make it easy to reverse the order of coordinates. Thanks! For a &#8220;15-minute project&#8220;, Simon Willison&#8217;s geocoders library is pretty handy if you&#8217;re doing geocoding with Python. It offers a common interface to the geocoding services provided by Google, Yahoo and other sources. When we were looking at [...]]]></description>
			<content:encoded><![CDATA[<p><em>Update: Simon has <a href="http://github.com/simonw/geocoders/commit/46fc00bda4cff0fe222d38fe498aa9ba861e8dc1">updated his library</a> to make it easy to reverse the order of coordinates. Thanks!</em></p>
<p>For a &#8220;<a href="http://simonwillison.net/2009/May/27/geocoders/">15-minute project</a>&#8220;, <a href="http://github.com/simonw/geocoders">Simon Willison&#8217;s geocoders library</a> is pretty handy if you&#8217;re doing geocoding with Python. It offers a common interface to the geocoding services provided by Google, Yahoo and other sources. When we were looking at replacing the home-grown geocoding system that <a href="http://andreischeinkman.com/">Andrei Scheinkman</a> built for <a href="http://prototype.nytimes.com/represent/">Represent</a>, Simon&#8217;s project seemed a natural choice.</p>
<p>It was an easy drop-in, but there was one thing about it that was just slightly off. A successful geocoding result looks like this:</p>
<p><code>(u'New York, NY, USA', (40.756053999999999, -73.986951000000005))</code></p>
<p>Notice the coordinate pair is latitude, longitude. For folks using GeoDjango alongside Simon&#8217;s library, the way you build a <a href="http://geodjango.org/docs/geos.html#geometry-objects">Point object</a> from coordinates is to pass the longitude first, like so:<br />
<code><br />
>>> from django.contrib.gis.geos import Point<br />
>>> p = Point((5, 23)) # 2D point, passed in as a tuple<br />
</code><br />
So on Friday I <a href="http://github.com/dwillis/geocoders">forked Simon&#8217;s project</a> and reversed the ordering of the coordinates in a successful result. That way you can pass that portion of the result directly to a Point constructor:<br />
<code><br />
>>> from django.contrib.gis.geos import *<br />
>>> from geocoders.google import geocoder<br />
>>> geocode = geocoder('GOOGLE-API-KEY')<br />
>>> results = geocode('new york')<br />
(u'New York, NY, USA', (-73.986951000000005, 40.756053999999999))<br />
>>> pnt = Point(results[1])<br />
</code><br />
Not a huge deal, but in keeping with the spirit of library, I think.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.thescoop.org/archives/2010/01/24/using-geocoders-with-geodjango/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
