Chris Essig

Walkthroughs, tips and tricks from a data journalist in eastern Iowa

Turning Blox assets into timelines: Part 2

with 2 comments

Note: This is cross-posted from Lee’s data journalism blog. Reporters at Lee newspapers can read my blog over there by clicking here.

Also note: You will need to run on the Blox CMS for this to work. That said you could probably learn a thing or two about webscraping even if you don’t use Blox.

For part one of this tutorial, click here. For part three, click here

 

On my last blog, I discussed how you can turn Blox assets into a  timeline using a tool made available by ProPublica called TimelineSetter.

If you recall, most of the magic happens with a little Python script called Timeline.py. It scrapes information from a page and puts it into a CSV file, which can then be used with TimelineSetter.

So what’s behind this Timeline.py file? I’ll go through the code by breaking it down into chunks. The full code is here and is heavily commented to help you follow along.

(NOTE: This python script is based off this tutorial from BuzzData. You should definitely check it out!)

– The first part of the script is basically the preliminary work. We’re not actually scraping the web page yet. This code first imports the necessary libraries for the script to run. We are using a Python library called BeautifulSoup that was designed for web scraping.

We then create a CSV to put the data in with the open attribute and create an initial header row in the CSV file with the write attribute.  Also be sure to enter the URL of the page you want to scrape.

Note: For now, ignore the line “now = datetime.datetime.now().” We will discuss it later.

import urllib2
from BeautifulSoup import BeautifulSoup
import datetime
import re

now = datetime.datetime.now()

# Create a CSV where we'll save our data. See further docs:
# http://propublica.github.com/timeline-setter/#csv
f = open('timeline.csv', 'w')

# Make the header rows. These are based on headers recognized by TimelineSetter.
f.write("date" + "," + "description" + "," + "link" + "," + "html" + "\n")

# URL we will scrape
url = 'http://wcfcourier.com/test/scrape/dunkerton/'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

– Before we go any further, we need to look at the page we are scraping, which in this example is this page. It’s basically a running list of articles about a particular subject. All of these stories will go on the timeline.

Now we’ll ask: what do we actually want to pull from this page? For each article we want to pull: the headline, the date, the photo, the first paragraph of the story and the link to the article.

Now we need to become familiar with the HTML of the page so we can tell BeautifulSoup what HTML attributes we want to pull from it. Go ahead and open the page up and view its source (Right click > View page source for Firefox and Chrome users).

One of the easiest things we can do is just search for the headline of the first article. So type in “Mayor’s arrest rattles Dunkerton.” This will take us to the chunk of code for that article. You’ll notice how the headline and all the other attributes for the story are contained in a DIV with the class “story-block.’

All stories on this page are formatted the same so every story is put into a DIV with the class ‘story-block.’ Thus, the number of DIVs with the class ‘story-block’ is also equal to the number of articles on the page we want to scrape.

– For the next line of code, we will put that number (whatever it may be) into a variable called ‘events.’ The line after that is what is known as a ‘for loop.’ These two lines tell BeautifulSoup how many times we want to run the ‘for loop.’

So if we have five articles we want to scrape, the ‘for loop’ will run five times. If we have 25 articles, it will run 25 times.

events = soup.findAll('div', attrs={'class': 'story-block'})
for x in events:

– Inside the ‘for loop,’ we need to tell it what information from each article we want to pull. Now go back to the source of the page we are scraping and find the headline, the date, the photo, the first paragraph of the story and the link to the article. You should see that:

  • The date is in a paragraph tag with the class ‘story more’
  • The link appears several times, including within a tag called ‘fb:like,’ which is the Facebook like button people can click to share the article on Facebook.
  • The headline is in a h3 tag, which is a header tag.
  • The first few paragraphs of the story are contained within a DIV with the id ‘blox-story-text.’ Note: In the Python script, we will tell BeautifulSoup to pull only the first paragraph.
  • The photo is contained within an img tag, which shouldn’t be a surprise.

So let’s put all of that in the ‘for loop’ so it knows what we want from each article. The code below uses BeautifulSoup syntax, which you can find out about by reading their documentation.

    # Information on the page that we will scrape
    date = x.find('p', attrs={'class': 'story-more'})('em')
    link = x.find('fb:like')['href']
    headline = x.find('h3').text
    description = x.find('div', attrs={'id': 'blox-story-text'})('p', limit=1)
    image = x.find('img')

One note about the above code: The ‘x’ is equal to the number that the ‘for loop’ is on. For example, say we want to scrape 20 articles. The first time we run the ‘for loop,’ the ‘x’ will be equal to one. The second time through, the ‘x’ will be equal to two. The last time through, it will be equal to 20.

We use the ‘x’ so we pull information from a different article each time we go through the ‘for loop’. The first time through the ‘for loop,’ we will pull information from the first article because the ‘x’ will be equal to one. And the second time through, we pull information from the second article because the ‘x’ will be equal to two.

If we didn’t use ‘x,’ we’d run through the ‘for loop’ 20 times but we’d pull the same information from the same article each time. The ‘x’ in combination with the ‘for loop’ basically tells BeautifulSoup to start with one article, then move onto the next and then the next and so on until we’ve scraped all the articles we want to scrape.

– Now you should be well on your way to creating timelines with Blox assets. For the third and final part of this tutorial, we will just clean up the data a little bit so it looks like nice on the page. Look for the final post of this series soon!

Written by csessig

March 7, 2012 at 2:21 pm

Turning Blox assets into timelines: Part 1

with 3 comments

Note: This is cross-posted from Lee’s data journalism blog. Reporters at Lee newspapers can read my blog over there by clicking here.

Also note: You will need to run on the Blox CMS for this to work. For part two of this tutorial, click here. For part three, click here.

 

A couple weeks ago I blogged about the command line and a few reasons journalists should learn it. Among the reasons was a timeline tool made available by ProPublica called TimelineSetter, which is shown to the left. Here are two live examples to give you an idea of what the tool looks like.

To create the timeline, you will first need to make a specially-structured CSV file (CSV is a type of spreadsheet file. Excel can export CSV files). Rows in the CSV file will represent particular events on the timeline. Columns will include information on those events, like date, description, photo, etc.

ProPublica has a complete list of available columns here. To give you an idea of what the final product will look like BEFORE you make the timeline, you can download one CSV file we used to make a timeline by clicking here.

After you have your CSV file, you run a simple command and walla! A beautiful timeline will be created. For more information on the command you have to run, check out the TimelineSetter page. (Hint: The command you run is: timeline-setter -c timeline.csv)

By far the most tedious part of all this is tracking down events and articles you want to include in the timeline and making your CSV file. That is why I wrote a simple Python script that will help turn Blox assets into a CSV file you can use with TimelineSetter.

Here’s a walkthrough of how to use it:

1. The first thing you need to do is go to this GitHub page and follow the instructions in the ReadMe file. After that you will have a page set up will all of the events you want to include in the timeline. Here’s an example of what that page should look like.

2. Download the Python script (Timeline.py). What this script will actually be doing is scraping the web page we just created. And by that I mean it will be going to that page and pulling out all the information we need to create our timeline. So it will be grabbing photos, headlines, dates, etc. It will then create a CSV file with all that information. We can then use that CSV file with TimelineSetter.

3. The script uses a Python library called Beautiful Soup. If you don’t have that downloaded already, click here. It takes only a few seconds to install.

4. In the Timeline.py file on line 16 is a spot for the URL of the page we want to scrape. Make sure you change that to whatever URL you created.

5. Run the command python timeline.py from your command line in the same directory as the Python script you downloaded. This will output a CSV file.

6. You will need to download TimelineSetter, which is also really easy to do. Just run this command: gem install timeline_setter. For more information, click here.

7. Now navigate to the folder with the CSV file and run this command: timeline-setter -c timeline.csv. (Or whatever your CSV file is called).

8. You should end up with a directory of javascript files, a CSS file and a Timeline.html file. This is your timeline. Now put in on your server and embed it using an HTML asset in Blox (or whatever you want to do with it).

9. Do the happy dance (Mandatory step)

This will get you pushing out timelines in no time. On my next blog, I will be going through that Python script (Timeline.py) and what it actually does to create the CSV file.

Written by csessig

March 2, 2012 at 9:59 am

Multiple layers and rollover effects for Fusion Table maps

with 3 comments

Note: Because of a change in the Fusion Tables API, the method for using rollover effects no longer works.

Note: This is cross-posted from Lee’s data journalism blog. Reporters at Lee newspapers can read my blog over there by clicking here.

If you haven’t noticed by now, a lot of journalists are in love with Google Fusion Tables. I’m one of them. It’s helped me put together a ton of handy maps on deadline with little programming needed.

For those getting started, I suggest these two walkthroughs on Poynter. For those who have some experience with FT, here are a couple of options that may help you spruce up your next map.

Multiple Fusion tables on one map

Hypothetically speaking, let’s say you have two tables with information: one has county data and the other city data. And you want to display that data on just one map.

Fusion Tables makes it very easy to display both at the same time. We’ll start from the top and create a simple Javascript function to display our map (via the Google Maps API):

function initialize() {
	map = new google.maps.Map(document.getElementById('map_canvas'), {
	    center: new google.maps.LatLng(42.5, -92.2),
		zoom: 10,
		minZoom: 8,
		maxZoom: 15,
	    mapTypeId: google.maps.MapTypeId.TERRAIN
	});
	loadmap();
}

At the end of the initialize function we call a function “loadmap(); With this function, we will actually pull in our Fusion Tables layers. For this example we’ll bring in two layers instead of one. Notice how strikingly similar the two are:

function loadmap() {
	layer2 = new google.maps.FusionTablesLayer({
		query: {
			select: 'geometry',
			from: 2814002
		}
	});
	layer2.setMap(map);

	layer = new google.maps.FusionTablesLayer({
		query: {
			select: 'Mappable_location',
			from: 2813443
		}
	});
	layer.setMap(map);

That’s it! You now have one map pulling in two sets of data. To see the full code for this map, click here.

Rollover effects for Fusion Tables

One feature often requested in the Fusion Tables forums is to enable mouse rollover events for Fusion Table layers. Typically, readers who look at a map have to click on a point, polygon, etc. to open up new data about that point, polygon, etc. A mouseover event would allow new data to pop up if a reader hovers over a point, polygon, etc. with their mouse.

A few months ago, some very smart person rolled out a “workable solution” for the rollover request. Here’s the documentation and here’s an example of it in the wild.

Another example is this map on poverty rates in Iowa. The code below is from this map and is very similar to the code on the documentation page:

layer.enableMapTips({
		select: "'Number', 'Tract', 'County', 'Population for whom poverty status is determined - Total', 'Population for whom poverty status is determined - Below poverty level', 'Population for whom poverty status is determined - Percent below poverty level', 'One race - White', 'One race - Black', 'Other', 'Two or more races'", // list of columns to query, typially need only one column.
		from: 2415095, // fusion table name
		geometryColumn: 'geometry', // geometry column name
		suppressMapTips: true, // optional, whether to show map tips. default false
		delay: 1, // milliseconds mouse pause before send a server query. default 300.
		tolerance: 6 // tolerance in pixel around mouse. default is 6.
		});

	//here's the pseudo-hover
	google.maps.event.addListener(layer, 'mouseover', function(fEvent) {
var NumVal = fEvent.row['Number'].value;
	layer.setOptions({
		styles: [{
			where: "'Number' = " + NumVal,
			polygonOptions: {
				fillColor: "#4D4D4D",
				fillOpacity: 0.6
			}
		}]
	});

Note: It’s easiest to think of your Fusion Table as a list of polygons with certain values attached to it. For this poverty map, each row represents a Census Tract with values like Tract name, number of people within that tract that live in poverty, etc. And for this map, I made it so each polygon in the Fusion Table has its own, unique number in the “Number” column.

Here’s a run through of what the above code does:

1.  We’ve already declared “layer” as a particular Google Fusion Table layer (see the second box of code above). Now the “layer.enableMapTips” will allow us to add rollover effects to that Fusion Table layer. The “select” option represents all the columns in that Fusion Table layer that you want to use with the rollover effect.

For instance, here’s the Fusion Table I’m calling in the above “enableMapTips” function. Notice how I’ve called all the columns with data (‘Tract’, ‘County’, etc.). I then told it which Fusion Table to look for with “from: 2415095.” Each Fusion Table has its own unique number. The number for my poverty Fusion Table is 2415095, which is called. To find out what number your Fusion Table is, click File > About.

Finally, I’ve told it what column contains the geometry information for this Fusion Table (Again, go through this Poynter walkthrough to find out all you need to know about the geometry field). Mine is simply called “geometry.” Each row in the “geometry” column represents one polygon.

2. The second step is the “google.maps.event.addListener(layer, ‘mouseover’, function(fEvent).” Basically this says “anytime the reader rollovers a polygon, the following will happen.”

In this function, “fEvent”represents the polygon that the reader is currently hovering over. Let’s say I’m rolling over the polygon that is represented by the first row in the Fusion Table. It’s Census Tract 9601 and has the value of “1” in the “Number” column.

Every time a reader rolls over Census Tract 9601, the code “fEvent.row[‘Number’].value” goes into the Google Fusion Table, finds the Census Tract 9601 row and returns the value of the Number column, which is “1.” So var “NumVal” would become “1” when a reader rolls over Census Tract 9601.

The next part changes that polygon’s color. This happens with the “where” statement. This is saying, “when I rollover a polygon, find the polygon in the Fusion Table that represents ‘NumVal’ and change its color.” Since the variable “NumVal” represents the polygon currently being hovered over, this is the polygon that changes colors. For a reader, the output is simple: I rollover a polygon. It changes colors.

In short: I roll over Census Tract 9601, firing of the “google.maps.event.addListener” function. This function finds the value of the “Number” column. It returns “1.” The code then says change the color of the polygon that has a value of “1” in the “Number” column. This is Census Tract 9601, which I rolled over. It changes colors and life is good.

MapTips

If you go back up to “layer.enableMapTips” in the third box of code, you’ll notice there is an option for “suppressMapTips.” For the poverty map, I have it set to true. But what if you set it to false? Basically, any time a reader hovers over a point or polygon, a small box shows up next to it containing information on that point or polygon. Notice the small yellow box that pops on this example page.

This is a nifty feature and a great replacement for the traditional InfoBox (the box that opens when you click on a point in a Google map). The only problem is the default text size is almost too small to read. How do we change that? Fairly easily:

1. Download a copy of the FusionTips javascript file.

2. Copy the file to the same folder your map is in and add this at the top of your document header:

<script type="text/javascript" src="fusiontips.js"></script>

3. Open the FusionTips file and look for “var div = document.createElement(‘DIV’).” It’s near the top of the Javascript file.

4. This ‘DIV’ represents the MapTips box. By editing this, you can change how the box and its text will display when a reader hovers over a point on the map. For instance, this map of historical places in Iowa used MapTips but the text is larger, the background is white, etc. Here’s what the DIV looks like in my FusionTips javascript file:

FusionTipOverlay.prototype.onAdd = function() {
    var div = document.createElement('DIV');
    div.style.border = "1px solid #999999";
	div.style.opacity = ".85";
    div.style.position = "absolute";
    div.style.whiteSpace = "nowrap";
    div.style.backgroundColor = "#ffffff";
    div.style.fontSize = '13px';
    div.style.padding = '10px';
    div.style.fontWeight = 'bold';
    div.style.margin = '10px';
    div.style.lineHeight = '1.3em';
    if (this.style_) {
      for (var x in this.style_) {
        if (this.style_.hasOwnProperty(x)) {
          div.style[x] = this.style_[x]
        }
      }
    }

Much better! Here’s the code for this map. And here are my three Fusion Tables.

I hope some of these tips help and if you have any questions, send me an e-mail. I’d be more than happy to help.

Written by csessig

February 7, 2012 at 4:34 pm

A few reasons to learn the command line

leave a comment »

Note: This is my first entry for Lee Enterprises’ data journalism blog. Reporters at Lee newspapers can read the blog by clicking here.

As computer users, we have grown accustomed to what is known as the graphical user interface (GUI). What’s GUI, you ask? Here are a few examples: When you drag and drop a text document into the trash, that’s GUI in action. Or when you create a shortcut on your desktop, that’s GUI in action. Or how about simply navigating from one folder to the next? You guessed it: that’s GUI in action.

GUI, basically, is the process of interacting with images (most notably icons on computers) to get things done on electronic devices. It’s easy and we all do it all the time. But there is another way to accomplish many tasks on a computer: using text-based commands. Welcome to the command line.

So where do you enter these text-based commands and accomplish these tasks? There is a nifty little program called the Terminal on your computer that does the work. If you’ve never opened up your computer’s Terminal, now would be a good time. On my Mac, it’s located in the Applications > Utilities folder.

A scary black box will open up. Trust me, I know: I was scared of it just a few months ago. But I promise there are compelling reasons for journalists to learn the basics of the command line. Here are a few:

 

1. Several programs created by journalists for journalists require the command line.

Two of my favorite tools out there for journalists come from ProPublica: TimelineSetter and TableSetter.

The first makes it easy to create timelines. We’ve made a few at the Courier. The second makes easily searchable tables out of spreadsheets (more specifically, CSV files), which we’ve also used at the Courier. But to create the timelines and tables, you’ll need to run very basic commands in your Terminal.

It’s worth noting the LA Times also has its own version of TableSetter called TableStacker that offers more customizations. We used it recently to break down candidates running in our local elections. Again, these tables are created after running a simple command.

The New York Times has a host of useful tools for journalists. Some, like Fech, require the command line to run. Fech can help journalists extract data from the Federal Election Commission to show who is spending money on whom in the current presidential campaign cycle.

 

2. Other programs/tools that journalists can use:

Let’s say you want to pull a bunch of information from a website to use in a story or visualization, but copy and pasting the text is not only tedious but very time consuming.

Why not just scrape the data using a program made in a language like Python or Ruby and put it in a spreadsheet or Word document? After all, computers are great at performing tedious tasks in just a few minutes.

One of my favorite web scraping walkthroughs comes from BuzzData. It shows how to pull water usage rates for every ward in Toronto and can easily be applied to other scenarios (I used it to pull precinct information from the Iowa GOP website). The best way to run this program and scrape the data is to run it through your command line.

Another great walkthrough on data scraping is this one from ProPublica’s Dan Nguyen. Instead of using the Python programming language, like the one above, it uses Ruby. But the goal remains the same: making data manageable for both journalists and readers.

A neat mapping service that is gaining popularity at news organizations is TileMill. Here are a few examples to help get you motivated.

One of the best places to start with TileMill is this walkthrough from the application team at the Chicago Tribune. But beware: you’ll need to know the command line before you start (trust me, I learned the hard way).

 

3. You’ll impress your friends because the command line kind of looks like the Matrix

And who doesn’t want that?

 

Okay I’m sort of interested…How do I learn?

I can’t tell people enough how much these two command line tutorials from PeepCode helped me. I should note that each costs $12 but are well worth it, in my opinion.

Also, there is this basic tutorial from HypeXR that will help. And these shortcuts from LifeHacker are also great.

Otherwise, a quick Google or YouTube search will turn up thousands of results. A lot of programmers use the command line and, fortunately, many are willing to help teach others.

Written by csessig

January 31, 2012 at 9:21 am

Better map rollover option

with 2 comments

A month ago, I blogged about an attempt I made to use a new feature from Google Fusion Tables that allowed map makers to customize their maps based on mouse overs. The idea for users was you could rollover a point/state/census tract/whatever on a map and some some sort of data would pop up on that map. You also customize it so the polygon changes colors, polygon borders grow in size or whatever option you decide to use to let the reader know they have rolled over a particular object. I used it on a recent map of poverty rates in Iowa. The result worked but mouseover events seem delayed and clunky. Not so user friendly, especially if you have a slow Internet connection. So I looked for a new option.

What I found was this great library from NY Time’s Albert Sun for polygons and rollover effects. Granted, this takes much longer to put together. But the result is much smoother and user friendly IMO. I used it on a recent map of heroin rates in the U.S.

Here’s brief synopsis on how I put it together:

1. First I grabbed data from the Substance Abuse and Mental Health
Data Archive
 on reported heroin cases at substance abuse centers broken down by state. I then grabbed a shapefile from the U.S. Census bureau of each state in the U.S. This is the file that maps each state based on its boundaries.

2. The data related to drug cases from the SAMHDA was sorted by state and year. You can download individual spreadsheets of data for years 2009 and before. I downloaded spreadsheets for 2005 through 2009. I then pulled out the information I needed from each one and merged them, ending up with a final spreadsheet that had the following data for each state dating back to 2005: number people in who were admitted to a substance abuse treatment center for heroin, number of people who were admitted to a substance abuse treatment center for any drug and the percentage of people who were heroin users.

3. I first uploaded that final spreadsheet to Google Fusion Tables. I then uploaded the shapefile I downloaded into Google Fusion Tables using the AWESOME Shpescape tool. Important: Make sure you select “Create a Simplified Geometry column” under advanced options before you import the shapefile, otherwise it will be too big. Finally I merged the shapefile with the spreadsheet and ended up with this final table.

4. I exported that table into a KML (If you click Vizualize > Map, you’ll see the option to download a KML file) and converted that to a GeoJSON file using this Ogre web client. I only did this because the KML wasn’t working on Internet Explorer. Anyways, converting it into a GeoJSON file was a TOTAL PAIN IN THE ASS that required me to mess with the GeoJSON file much more than I wanted. Finally, I ended up with this GeoJSON File.

5. Now for the code. My final script is here. I’m not going to run through it all but will point out a few lines of note:

  • Line 179 brings in the JSON file. Underneath it are polygon highlighting options based on Albert Sun’s code. I have it set up so the opacity on the state polygons is light enough to see the state names underneath it when you first pull up the map. When you rollover a state, the polygons gets shaded in all the way and you can’t read the text underneath it (on the Google Map). But if you click a polygon, you can once again see the text. I also have a red border pop up when you rollover a state (strokeColor and strokeWeight).
  • Line 207 is the highlightCallback function. This creates a variable for each of line of data: number of heroin users, number of drug users, percentage, etc. for 2005 to 2009. It’s what you see under “Figures” in the DataTable on the map when you rollover a state. I first made each line a string by adding quotes to each variable.
  • Each variable is called into function selected_district (line 229). This function creates the Google DataTable via “new google.visualization.DataTable().” I’ve used this table in the past on a map for prep high school football teams. Check this past blog post for more information.
  • Line 255 is a function that puts in commas for numbers in the thousands…I didn’t make it. It’s freely available online. Please take it and use it as you see fit.
  • Line 107 to 154 is the legend.

Per usual, I used Colorbrewer to come up with the colors…

I’m happy with the resulting map and hope to use this polygon feature in the future. If you have any questions, feel free to leave a comment. I’d be more than happy to pass my limited amount of knowledge along to others.

Written by csessig

January 21, 2012 at 5:07 pm

The caucus night that almost didn’t end

leave a comment »

All eyes were on Iowa last night as the Iowa caucuses took place. It was pretty much the longest work day I’ve ever had…By alot. Anyways, we did a ton of updating on WCFCourier.com all day and night…I wish I had a screen shot of all the photos/stories we put on the front of our website.  I did take one at about 2:30 a.m., which is shown above. It’s the site after Romney was (finally) declared the winner. The template is now being used on our Iowa caucus website.

Here’s a quick summary of our online coverage: It started with a general Iowa caucus coverage, switched over to our live coverage from the UNI-Dome (which hosted the largest caucus in the state Tuesday night) and then to the statewide race between Santorum-Romney-Paul and, finally, the grudge match between Santorum and Romney. At 1:30 a.m., the Iowa GOP announced Romney had won by eight flippin’ votes. At one point, Santorum was leading by ONE vote with ONE precinct left. What are the odds?

My schedule on caucus night went something like this:

– 9 a.m. – 2:30 p.m. –  Preparing for the day / arranging plans with reporters /posting stories, photos and other content. We had caucus stories going up all day, obviously. I also posted and helped monitor a live chat, which was shared with other Iowa newspapers and was active all day, as well as posted live video from KCRG, which played from about 7 p.m. to 12 a.m. I opted to put both the live video and the live chat on the same page, making it easier for readers to follow action at home.

– 2:30 p.m. – 2:31 p.m. – Lunch

– 2:31 p.m. – 4:30 p.m. – More preparing and posting. We also posted two maps with our coverage: one of live caucus results (which started coming in after the 7 p.m. caucus start). This map was provided by the Iowa Republican Party and is pictured to the left. You may have seen it on several news sites… Many had it or a variation of it.

The second map I made myself and featured caucus locations for all (I believe, although I haven’t counted) 1,700 caucus locations in the state of Iowa. The addresses were pulled from the Iowa GOP website, which listed every site. Basically, I wrote a Python program that scraped the data from their site and put it into a spreadsheet, pulled it into Google Fusion tables and mapped the locations based on their addresses. The Python scraper is based on this FANTASTIC walk-through by BuzzData on how to scrape data from websites. Check it out!

(NOTE: Here’s the code for the Python scraper. Here’s my Google Fusion Table.)

At about 3 p.m., we rolled over the site to feature one huge photo and story (see the screenshot at the top of this post). It was caucus night, after all,  so we had to go big.

– 4:30 p.m. – 5 p.m. – Mad dash to the UNI-Dome, where Black Hawk County was caucusing. The doors opened at 5:30 p.m. and I wanted to get there and set up before either Bachmann or Gingrich stopped by to speak.

– 5 p.m. – 10 p.m. – Posted up at the UNI-Dome. At about 5:30 p.m., we switched our main story to our Dome coverage…This was basically when our first photo and update came in. Throughout the evening, we posted small updates from the Dome and new photos. We also had three videos from the Dome.

At about 6:30 p.m., Bachmann and Gingrich spoke. I took a few photos for our live chat (which, BTW, had more than 5,000 viewers at one point!) and posted a fresh candidate Dome story when it came in.

At about 8:30 p.m., the Dome action was winding down and our attention turned to the statewide race between Santorum-Romney-Paul and then Santorum-Romney. We relied on the AP our Lee Des Moines bureau for our main story on the site, adding photos from the Dome and the wire with it.

– 10 p.m. – 10:30 p.m. – Mad dash back to the newsroom. I was actually afraid they might announce the winner while I was on the road back to the newsroom but I was off by about three hours.

– 10:30 p.m. – 12:30 a.m. – We waited. And waited. And made some jokes. And waited some more. The precinct results continued to flood in and amazingly the number of votes between Romney and Santorum dwindled. Santorum was actually in the lead for much of the night. By 16 votes. Then one vote. Then four votes. Just ridiculous.

– 12:30 a.m. – 1:30 a.m. – At about this time, they announced they were down to three precincts then one precinct…At that point, I knew I would be in the newsroom until the final precinct was counted. The lone holdout was in Clinton County (eastern Iowa along the river) and apparently there was some confusion about whether or not they had submitted their results to the state yet.

– 1:30 a.m. – 2 a.m. – The Iowa GOP finally announces Romney won by eight (!!!) votes. Hurray! I slapped a quick update on top of story we had online and added a new photo. At this point, I just wanted to make sure those who got up in the morning would see the final results.

– 2 a.m. – 3 a.m. – The longest day ever came to a close. I took down the big photo, big story template we had used all night (see screenshot at the top) and returned the site back to our standard carousel template with five rotating stories on the front (see WCFCourier.com). I also added a teaser to our Iowa caucus site on the top so people see all of our caucus coverage from the night/morning. Because there was a ton of it.

– 3 a.m. – Sleep

Here’s what the Courier’s front page looked like on Wednesday. We’re a morning paper so we were able to get the final results in:

Written by csessig

January 4, 2012 at 1:42 pm

Map mouseover test

with one comment

A couple of weeks ago I saw some tweets on a new feature that would allow Google Fusion Table map makers to customize their maps based on mouse overs (Here’s some background and the  ‘workable’ solution that was released earlier this month). This was exciting news. In the past, people who were looking at a map would typically have to click on a point or a polygon or whatever to open up new data about that point, polygon, etc.  Now, the mouseover effect would allow new data to pop up if a reader hovers over a point, polygon, etc. with their mouse.

It sounds great so I tried it out over the weekend with this map on poverty rates in Iowa. The map is broken into Census tracts and when a reader hovers over a polygon, poverty data about that particular tract pops up. I also set it so the polygon changes colors on a mouse rollover.

It didn’t turn out too bad. But I feel the polygon color changes take a while to load and frankly doesn’t feel that slick. There are plenty of other options for messing with polygons (I’ve heard the Raphael Javascript library works great and Albert Sun with the NY Times has a great library for polygon effects) but nothing I’ve seen is as simple and as quick to turn around as this workable solution… Overall, there seems to be a ton of promise here, especially for us in the news business who are trying to turn around maps on deadline.

Anyways, anybody who wants to build off this map should check out the code. And I’d love to hear any suggestions on how to improve on it.

Written by csessig

December 14, 2011 at 4:33 pm

Graph: Hispanic population growing in Iowa

leave a comment »

This graph I put together with a weekend story breaks down Iowa’s Hispanic population by county. It was based almost entirely off a JavaScript chart walkthrough put together by Michele Minkoff, Interactive Producer for the Associated Press. So check it out now because it’s FANTASTICO!

I did make a couple of minor tweaks, which may be helpful for others so I will outline them here.

My initial, final product looked like this. Go ahead and roll over the bars like the blue one, for instance, which represents the amount of Hispanic people in Iowa who said they were also white. Notice how the bars in 2000 and 2010 are both the same length, even though the value of the 2000 bar (38,000) is almost half of the 2010 bar (80,000).

The data was retrieved from Census.IRE.org. I just selected “Iowa,” then “County,” then “Hispanic or Latino origin by race.” You can then download a CSV of the data and chop it up as you see fit. Also click an option after “Browse data for…” on the Census.IRE.org page for a great breakdown of what each of the headers in the CSV files means. Here’s Iowa’s page, for instance.

The Javascript file that makes these graphs calls a JSON file containing information retrieved from Census.IRE.org. My JSON file initially looked like this:

  1. headers = [“White”,”Black / African American”,”American Indian / Alaska Native”,”Asian”,”Native Hawaiian / Other Pacific Islander”,”Some other race”,”Two or more races”]
  2.  allCountyData = [
  3. [“Iowa”,38296,1109,1034,290,121,35317,6306,80438,2242,2503,497,206,54000,11658],
  4. (Enter data for every county in Iowa here)

The headers represent the different races of Hispanic people. The line for “Iowa” represents the amount of people per particular race in first 2000 and then 2010. 38296 = Number of Hispanic people in 2000 who said they were White, 1109 said they were black, etc. 80438 = Number of Hispanic people in 2010 who said they were White, 2242 said they were black, etc.

Here’s my tweak: Instead of having both 2000 and 2010 data on the same line, I broke this out onto two separate lines. So my new JSON file includes this:

  1. allCountyData = [
  2.   [“Iowa”,38296,1109,1034,290,121,35317,6306],
  3. (Enter data for every county in Iowa here)

And this:

  1. allCountyData2 = [
  2.   [“Iowa”,80438,2242,2503,497,206,54000,11658],
  3. (Enter data for every county in Iowa here)

From there, I needed to edit the Javascript file to call the data contained in allCountyData2. Here’s my Javascript file. Go ahead and click on it. I know you want to. And scroll down to “function changeGraph(stateText)” and notice how it is calling both selectedData, which includes the value for everything in allCountyData (in the JSON file), AND selectedData2, which contains the value for everything in allCountyData2 (in the JSON file). The “drawVisualization();” inside the function also contains two arguments now: “selectedData, selectedData2.”

As a result, the “function drawVisualization()” at the very top of the Javascript file must also contain two arguments: “newData, newData2.” Initially, my drawVisualization function (here’s my first Javascript file) contained just one argument because it was only calling newData, which contains data from allCountyData. At the very end of the function, it broke newData (allCountyData) into two because remember, allCountyData contained data for both 2000 and 2010:

  1. //the first half of our data is for 2000, so we fill the row in with appropriate numbers
  2. //we start at 1 to leave out the years, remember data structured this way starts at 0
  3. for (var i = 1; i <=(newData.length-1)/2; ++i) {
  4. data.setValue(0, i, newData[i]);
  5. data.setFormattedValue(0,i, numberFormat(newData[i]));
  6. }
  7. //now, the second half of the data is for 2010, so we’ll fill that in
  8. for (var i = 1; i<=(newData.length-1)/2; ++i) {
  9. data.setValue(1, i, newData[i]);
  10. data.setFormattedValue(1,i, numberFormat(newData[i+(newData.length-1)/2])); 
  11. }

I needed to change that last “for” function because we no longer wanted newData to be broken in half. Instead, we want to call that second argument (newData2) in the drawVisualization function because it calls the data from allCountyData2:

  1. //the first half of our data is for 2000, so we fill the row in with appropriate numbers
  2. //we start at 1 to leave out the years, remember data structured this way starts at 0
  3. for (var i = 1; i <=(newData.length-1); ++i) {
  4. data.setValue(0, i, newData[i]);
  5. data.setFormattedValue(0, i, numberFormat(newData[i]));
  6. }
  7. //now, the second half of the data is for 2010, so we’ll fill that in
  8. for (var i = 1; i<=(newData2.length-1); ++i) {
  9. data.setValue(1, i, newData2[i]);
  10. data.setFormattedValue(1, i, numberFormat(newData2[i])); 
  11. }

If I’m not mistaken, that’s all I did. If you are interested, my HTML file is here and my CSS file is here. Again, Michele Minkoff deserves all the credit in the world for putting together her great walkthrough. So get over there and check it out!

Written by csessig

November 28, 2011 at 11:29 pm

Map Mania

with one comment


Here are a few maps I’ve worked on in the last couple of months:

1. City breakdown of property taxes in Iowa

– This map (shown above) includes property tax rates for every city in Iowa (900+) dating back to 2010 and color-codes them based on their rate. I won’t go into too much detail on how I went about doing it; instead, I’ll refer you to this great Poynter tutorial on how to make a heat map using Google Fusion Tables (which is what the above map is based off of). When you’re done with that, check out their more recent post on mapping data by county, cities, etc. using what is known as shapefiles (Don’t worry. I didn’t know what they were either until I read the post).

For this map, I first found tax rates for every city in Iowa on the Iowa Department of Management’s website. These rates were contained in several spreadsheets (City Tax Rates FY12, City Consolidated Tax Rates FY12, FY 11, etc.).For each spreadsheet I used (six in all: three for city tax rates dating back to 2010 and three for consolidated tax rates dating back to 2010), I basically went in and chopped out all the information in the spreadsheets I didn’t need (Example spreadsheet), which would up being most of it. I then merged the six spreadsheets by city name using Google Fusion Tables.

I then found the shapefile that maps every city in Iowa on the Census site, which contained the geographical data that Google uses when it maps out the cities. I then merged that spreadsheet with the one containing all the tax rates and pulled it into Google Fusion Tables.

Google did the heavy lifting so to speak and mapped out all the cities on the map. I then told FT to map the cities based on their tax rate. The final product is cities with higher tax rates are red; cities with lower tax rates are yellow. The legend is HTML/CSS, which I copied heavily off of the Chicago Tribune News App and their wonderful map walkthrough. When you get the hang of Fusion Tables, check it out. Their maps are much prettier than mine.

The property tax map is on my (barren) Github account. Please copy anything you find helpful.

2. 2008 Flood Buyouts

Compared to the above map, the flood buyout map was a piece of cake to make. We got a spreadsheet from the city of Cedar Falls that listed every home that was offered a buyout from the federal government and their address. Besides mapping shapefiles (like above), Google can also map simple addresses (obviously). So Google can map a spreadsheet of addresses as well.

3. Road to the Dome: A look at the prep teams in the semifinals

I think high school football is popular everywhere. The Cedar Valley is certainly no exception. In Iowa, we have six football classes based on size of the schools. Each year, the semifinals and final rounds of each class are played at the UNI Dome in Cedar Falls. This package breakdowns each of the 24 teams in the semifinals and maps them based on their school address. Readers can then click on each pointer on the map and read more about the team.

From a technical standpoint, this is the first map I’ve put together where a info box DOESN”T open up when a reader clicks on a point on the map. Instead, the information on the teams opens in a table format to the right of the map. This uses Google’s API; I borrowed heavily off of this map + chart example provided by Google. The main difference being is I called a table visualization instead of a chart visualization in the example.

The table below the map (the one where you can select a class and see the team’s playoff schedule) was borrowed heavily off this Google table example.

Here’s the table I ended up with on Google Fusion Tables. If you’re looking for the map source, go to this webpage and click view source. This is where the map is housed.

My Google-powered Javascript is pasted below for anyone who is curious:

google.load('visualization', '1', {'packages':['table']});

function initialize() { 
 var map = new google.maps.Map(document.getElementById('map_canvas'), {
 center: new google.maps.LatLng(42.3, -453.4),
 zoom: 7,
 minZoom: 5,
 maxZoom: 11,
 mapTypeControl: false,
 streetViewControl: false,
 mapTypeId: google.maps.MapTypeId.TERRAIN
 });

 var layer = new google.maps.FusionTablesLayer({
 query: {
 select: 'mappable_address',
 from: 2102557
 },
 suppressInfoWindows: true
 });
 layer.setMap(map);

 // Add a listener to the layer that constructs a chart from
 // the data returned on click
 google.maps.event.addListener(layer, 'click', function(e) {

 var data = new google.visualization.DataTable();
 data.addColumn('string', 'About:');
 data.addColumn('string', e.row['School'].value);
 data.addRows([
 ['Class', e.row['Class'].value],
 ['Mascot', e.row['Mascot'].value],
 ['Address', e.row['Address'].value],
 ['League', e.row['League'].value],
 ['League Record', e.row['League Record'].value],
 ['Regular Season Record', e.row['Season Record'].value],
 ['Playoffs - Round 1', e.row['Round 1'].value],
 ['Playoffs - Round 2', e.row['Round 2'].value],
 ['Quarterfinals', e.row['Q-Finals'].value],
 ['Semifinals', e.row['Semifinals'].value]
 ]);

 var chart = new google.visualization.Table(document.getElementById('chart'));
 var options = {
 'title': e.row['School'].value + ' ',
 };
 chart.draw(data, options);
 });
}

function changeData(team) {
 var whereClause = "";
 if(team) {
 whereClause = " WHERE 'Class' = '" + team + "'";
 }
 var queryText = encodeURIComponent("SELECT 'School', 'Round 1', 'Round 2', 'Q-Finals', 'Semifinals' FROM 2102557" + whereClause);
 var query = new google.visualization.Query('http://www.google.com/fusiontables/gvizdata?tq=' + queryText);

 query.send(getData);
}

function getData(response) {
 var table = new google.visualization.Table(document.getElementById('visualization'));
 table.draw(response.getDataTable());
}

Written by csessig

November 10, 2011 at 10:20 pm

ProPublica to the rescue

leave a comment »

ProPublica, known for producing excellent, investigative journalism, also has a wonderful staff of developers that have put out several tools to help fellow journalists like myself. Here’s a quick run through two of their tools I’ve used in the last month.

TimelineSetter – I’m a big fan of timelines. They can help newspapers show passage of time, obviously, as well as keep their stories on a particular subject in one central location. This is exactly how we used ProPublica’s TimelineSetter tool when ‘Extreme Makeover: Home Edition’ announced they were going to build a new home for a local family. From the print side, we ran several stories, including about one a day during the week of the build. The photo department also put out four photo galleries on the build and a fundraiser. Finally, our videographer shot several videos. Our audience ate up our coverage, raking up more than 100,000 page views on the photo galleries alone. But unless you wanted to attach every story, gallery and video to any new story we did (which would be both cumbersome and unattractive), it would have been hard to get a full scope of our coverage. That’s were the ProPublica tool came into play. Simply, it helped compile all of our coverage of the event on one page.

I’m not going to go into detail on how I put together the timeline. Instead, I will revert you to their fantastic and easy to use documentation. Best of all, the timeline is easy to customize and upload to your site. It’s also free, unlike the popular timeline-maker Dipity. Check it out!

TableSorter – This tool is equally as impressive and fairly easy to use. The final output is basically an easy-to-navigate-and-sort spreadsheet. And, again, the documentation is comprehensive. Run through it and you’ll have up sorted table in no time! I’ve put together two already in the last week or so.

The first is a list of farmers markets in Iowa, with links to their home page (if available) and a Google map link, which was formatted using a formula in Microsoft Excel. The formula for the first row looked like this: =CONCATENATE(“http://www.google.com/maps?q=&#8221;, ” “, B2, ” “, C2, ” “, E2, ” “, “Iowa”)

The first part is the Google Map link, obviously. B2 represented the cell with the city address; C2 = City; E2 = Zip code and finally “Iowa” so Google Maps knows where to look. In between each field I put in a space so Google can read the text and try to map it our using Google Maps (I should note that not every location was able to be mapped out). Then I just copy and pasted this for every row in the table. At this point, I had a standard XLS Excel file, which I saved as a CSV file. TableSetter uses that CSV file and formats it using a YML file to produce the final output. Go and read the docs…It’s easier than having me try to explain it all. Here’s what my CSV looked like; here’s my YML file; and finally the table, which was posted on our site.

In the same vein, I put together this table on what each state department is requesting from the government in the upcoming fiscal year.

I should also note here that the Data Desk at the LA Times has a variation of ProPublica’s TableSorter that offers more features (like embedding photos into the table, for instance). It’s called Table Stacker and works in a very similar fashion as TableSorter. I recommend checking it out after you get a feel for ProPublica’s TableSorter.

Learning the Command Line: Both of these tools require the use of the command line using the Terminal program installed on your computer. If you are deathly afraid of that mysterious black box like I was, I highly recommend watching PeepCode’s video introduction to the command line called “Meet the Command Line.” And when your done, go ahead and jump into their second part called “Advanced Command Line.” Yes, they both cost money ($12 a piece), but there is so much information packed into each hour-long screencast, that they are both completely worth it. I was almost-instantly comfortable with the command line after watching both screencasts.

Written by csessig

October 21, 2011 at 3:23 pm