Chris Essig

Walkthroughs, tips and tricks from a data journalist in eastern Iowa

Archive for January 2012

A few reasons to learn the command line

leave a comment »

Note: This is my first entry for Lee Enterprises’ data journalism blog. Reporters at Lee newspapers can read the blog by clicking here.

As computer users, we have grown accustomed to what is known as the graphical user interface (GUI). What’s GUI, you ask? Here are a few examples: When you drag and drop a text document into the trash, that’s GUI in action. Or when you create a shortcut on your desktop, that’s GUI in action. Or how about simply navigating from one folder to the next? You guessed it: that’s GUI in action.

GUI, basically, is the process of interacting with images (most notably icons on computers) to get things done on electronic devices. It’s easy and we all do it all the time. But there is another way to accomplish many tasks on a computer: using text-based commands. Welcome to the command line.

So where do you enter these text-based commands and accomplish these tasks? There is a nifty little program called the Terminal on your computer that does the work. If you’ve never opened up your computer’s Terminal, now would be a good time. On my Mac, it’s located in the Applications > Utilities folder.

A scary black box will open up. Trust me, I know: I was scared of it just a few months ago. But I promise there are compelling reasons for journalists to learn the basics of the command line. Here are a few:

 

1. Several programs created by journalists for journalists require the command line.

Two of my favorite tools out there for journalists come from ProPublica: TimelineSetter and TableSetter.

The first makes it easy to create timelines. We’ve made a few at the Courier. The second makes easily searchable tables out of spreadsheets (more specifically, CSV files), which we’ve also used at the Courier. But to create the timelines and tables, you’ll need to run very basic commands in your Terminal.

It’s worth noting the LA Times also has its own version of TableSetter called TableStacker that offers more customizations. We used it recently to break down candidates running in our local elections. Again, these tables are created after running a simple command.

The New York Times has a host of useful tools for journalists. Some, like Fech, require the command line to run. Fech can help journalists extract data from the Federal Election Commission to show who is spending money on whom in the current presidential campaign cycle.

 

2. Other programs/tools that journalists can use:

Let’s say you want to pull a bunch of information from a website to use in a story or visualization, but copy and pasting the text is not only tedious but very time consuming.

Why not just scrape the data using a program made in a language like Python or Ruby and put it in a spreadsheet or Word document? After all, computers are great at performing tedious tasks in just a few minutes.

One of my favorite web scraping walkthroughs comes from BuzzData. It shows how to pull water usage rates for every ward in Toronto and can easily be applied to other scenarios (I used it to pull precinct information from the Iowa GOP website). The best way to run this program and scrape the data is to run it through your command line.

Another great walkthrough on data scraping is this one from ProPublica’s Dan Nguyen. Instead of using the Python programming language, like the one above, it uses Ruby. But the goal remains the same: making data manageable for both journalists and readers.

A neat mapping service that is gaining popularity at news organizations is TileMill. Here are a few examples to help get you motivated.

One of the best places to start with TileMill is this walkthrough from the application team at the Chicago Tribune. But beware: you’ll need to know the command line before you start (trust me, I learned the hard way).

 

3. You’ll impress your friends because the command line kind of looks like the Matrix

And who doesn’t want that?

 

Okay I’m sort of interested…How do I learn?

I can’t tell people enough how much these two command line tutorials from PeepCode helped me. I should note that each costs $12 but are well worth it, in my opinion.

Also, there is this basic tutorial from HypeXR that will help. And these shortcuts from LifeHacker are also great.

Otherwise, a quick Google or YouTube search will turn up thousands of results. A lot of programmers use the command line and, fortunately, many are willing to help teach others.

Advertisements

Written by csessig

January 31, 2012 at 9:21 am

Better map rollover option

with 2 comments

A month ago, I blogged about an attempt I made to use a new feature from Google Fusion Tables that allowed map makers to customize their maps based on mouse overs. The idea for users was you could rollover a point/state/census tract/whatever on a map and some some sort of data would pop up on that map. You also customize it so the polygon changes colors, polygon borders grow in size or whatever option you decide to use to let the reader know they have rolled over a particular object. I used it on a recent map of poverty rates in Iowa. The result worked but mouseover events seem delayed and clunky. Not so user friendly, especially if you have a slow Internet connection. So I looked for a new option.

What I found was this great library from NY Time’s Albert Sun for polygons and rollover effects. Granted, this takes much longer to put together. But the result is much smoother and user friendly IMO. I used it on a recent map of heroin rates in the U.S.

Here’s brief synopsis on how I put it together:

1. First I grabbed data from the Substance Abuse and Mental Health
Data Archive
 on reported heroin cases at substance abuse centers broken down by state. I then grabbed a shapefile from the U.S. Census bureau of each state in the U.S. This is the file that maps each state based on its boundaries.

2. The data related to drug cases from the SAMHDA was sorted by state and year. You can download individual spreadsheets of data for years 2009 and before. I downloaded spreadsheets for 2005 through 2009. I then pulled out the information I needed from each one and merged them, ending up with a final spreadsheet that had the following data for each state dating back to 2005: number people in who were admitted to a substance abuse treatment center for heroin, number of people who were admitted to a substance abuse treatment center for any drug and the percentage of people who were heroin users.

3. I first uploaded that final spreadsheet to Google Fusion Tables. I then uploaded the shapefile I downloaded into Google Fusion Tables using the AWESOME Shpescape tool. Important: Make sure you select “Create a Simplified Geometry column” under advanced options before you import the shapefile, otherwise it will be too big. Finally I merged the shapefile with the spreadsheet and ended up with this final table.

4. I exported that table into a KML (If you click Vizualize > Map, you’ll see the option to download a KML file) and converted that to a GeoJSON file using this Ogre web client. I only did this because the KML wasn’t working on Internet Explorer. Anyways, converting it into a GeoJSON file was a TOTAL PAIN IN THE ASS that required me to mess with the GeoJSON file much more than I wanted. Finally, I ended up with this GeoJSON File.

5. Now for the code. My final script is here. I’m not going to run through it all but will point out a few lines of note:

  • Line 179 brings in the JSON file. Underneath it are polygon highlighting options based on Albert Sun’s code. I have it set up so the opacity on the state polygons is light enough to see the state names underneath it when you first pull up the map. When you rollover a state, the polygons gets shaded in all the way and you can’t read the text underneath it (on the Google Map). But if you click a polygon, you can once again see the text. I also have a red border pop up when you rollover a state (strokeColor and strokeWeight).
  • Line 207 is the highlightCallback function. This creates a variable for each of line of data: number of heroin users, number of drug users, percentage, etc. for 2005 to 2009. It’s what you see under “Figures” in the DataTable on the map when you rollover a state. I first made each line a string by adding quotes to each variable.
  • Each variable is called into function selected_district (line 229). This function creates the Google DataTable via “new google.visualization.DataTable().” I’ve used this table in the past on a map for prep high school football teams. Check this past blog post for more information.
  • Line 255 is a function that puts in commas for numbers in the thousands…I didn’t make it. It’s freely available online. Please take it and use it as you see fit.
  • Line 107 to 154 is the legend.

Per usual, I used Colorbrewer to come up with the colors…

I’m happy with the resulting map and hope to use this polygon feature in the future. If you have any questions, feel free to leave a comment. I’d be more than happy to pass my limited amount of knowledge along to others.

Written by csessig

January 21, 2012 at 5:07 pm

The caucus night that almost didn’t end

leave a comment »

All eyes were on Iowa last night as the Iowa caucuses took place. It was pretty much the longest work day I’ve ever had…By alot. Anyways, we did a ton of updating on WCFCourier.com all day and night…I wish I had a screen shot of all the photos/stories we put on the front of our website.  I did take one at about 2:30 a.m., which is shown above. It’s the site after Romney was (finally) declared the winner. The template is now being used on our Iowa caucus website.

Here’s a quick summary of our online coverage: It started with a general Iowa caucus coverage, switched over to our live coverage from the UNI-Dome (which hosted the largest caucus in the state Tuesday night) and then to the statewide race between Santorum-Romney-Paul and, finally, the grudge match between Santorum and Romney. At 1:30 a.m., the Iowa GOP announced Romney had won by eight flippin’ votes. At one point, Santorum was leading by ONE vote with ONE precinct left. What are the odds?

My schedule on caucus night went something like this:

– 9 a.m. – 2:30 p.m. –  Preparing for the day / arranging plans with reporters /posting stories, photos and other content. We had caucus stories going up all day, obviously. I also posted and helped monitor a live chat, which was shared with other Iowa newspapers and was active all day, as well as posted live video from KCRG, which played from about 7 p.m. to 12 a.m. I opted to put both the live video and the live chat on the same page, making it easier for readers to follow action at home.

– 2:30 p.m. – 2:31 p.m. – Lunch

– 2:31 p.m. – 4:30 p.m. – More preparing and posting. We also posted two maps with our coverage: one of live caucus results (which started coming in after the 7 p.m. caucus start). This map was provided by the Iowa Republican Party and is pictured to the left. You may have seen it on several news sites… Many had it or a variation of it.

The second map I made myself and featured caucus locations for all (I believe, although I haven’t counted) 1,700 caucus locations in the state of Iowa. The addresses were pulled from the Iowa GOP website, which listed every site. Basically, I wrote a Python program that scraped the data from their site and put it into a spreadsheet, pulled it into Google Fusion tables and mapped the locations based on their addresses. The Python scraper is based on this FANTASTIC walk-through by BuzzData on how to scrape data from websites. Check it out!

(NOTE: Here’s the code for the Python scraper. Here’s my Google Fusion Table.)

At about 3 p.m., we rolled over the site to feature one huge photo and story (see the screenshot at the top of this post). It was caucus night, after all,  so we had to go big.

– 4:30 p.m. – 5 p.m. – Mad dash to the UNI-Dome, where Black Hawk County was caucusing. The doors opened at 5:30 p.m. and I wanted to get there and set up before either Bachmann or Gingrich stopped by to speak.

– 5 p.m. – 10 p.m. – Posted up at the UNI-Dome. At about 5:30 p.m., we switched our main story to our Dome coverage…This was basically when our first photo and update came in. Throughout the evening, we posted small updates from the Dome and new photos. We also had three videos from the Dome.

At about 6:30 p.m., Bachmann and Gingrich spoke. I took a few photos for our live chat (which, BTW, had more than 5,000 viewers at one point!) and posted a fresh candidate Dome story when it came in.

At about 8:30 p.m., the Dome action was winding down and our attention turned to the statewide race between Santorum-Romney-Paul and then Santorum-Romney. We relied on the AP our Lee Des Moines bureau for our main story on the site, adding photos from the Dome and the wire with it.

– 10 p.m. – 10:30 p.m. – Mad dash back to the newsroom. I was actually afraid they might announce the winner while I was on the road back to the newsroom but I was off by about three hours.

– 10:30 p.m. – 12:30 a.m. – We waited. And waited. And made some jokes. And waited some more. The precinct results continued to flood in and amazingly the number of votes between Romney and Santorum dwindled. Santorum was actually in the lead for much of the night. By 16 votes. Then one vote. Then four votes. Just ridiculous.

– 12:30 a.m. – 1:30 a.m. – At about this time, they announced they were down to three precincts then one precinct…At that point, I knew I would be in the newsroom until the final precinct was counted. The lone holdout was in Clinton County (eastern Iowa along the river) and apparently there was some confusion about whether or not they had submitted their results to the state yet.

– 1:30 a.m. – 2 a.m. – The Iowa GOP finally announces Romney won by eight (!!!) votes. Hurray! I slapped a quick update on top of story we had online and added a new photo. At this point, I just wanted to make sure those who got up in the morning would see the final results.

– 2 a.m. – 3 a.m. – The longest day ever came to a close. I took down the big photo, big story template we had used all night (see screenshot at the top) and returned the site back to our standard carousel template with five rotating stories on the front (see WCFCourier.com). I also added a teaser to our Iowa caucus site on the top so people see all of our caucus coverage from the night/morning. Because there was a ton of it.

– 3 a.m. – Sleep

Here’s what the Courier’s front page looked like on Wednesday. We’re a morning paper so we were able to get the final results in:

Written by csessig

January 4, 2012 at 1:42 pm