Chris Essig

Walkthroughs, tips and tricks from a data journalist in eastern Iowa

Archive for the ‘Charts’ Category

Going mobile with our news apps

leave a comment »

ImageIt’s been about two months since we launched our crime map for the city of Waterloo and so far, the response has been overwhelmingly positive.

I’ve been proud of our fans on Facebook, in particular, who are using the map to make informed opinions on crime in the area. One of our fans actually counted every dot on the map to counter another fan’s claim that crime only happens on the east side of town. He found crime was almost equally spread across the town.

In hopes that this data will be digested by an even larger audience, I launched the mobile/tablet equivalent of the map last week. The new app is fully responsive thanks to Twitter’s Bootstrap framework. If you haven’t check out Bootstrap, you really should. It makes designing websites for all platforms — desktops, tablets, mobile phones — about as easy as its going to get.

I also went in and modified  a lot of the CSS, changing things like the width of objects on the page from pixels to percentages. This ensures that the app looks great no matter how wide the screen you’re viewing it from is.

Serving up the tablet version of the map wasn’t particularly difficult. It’s basically the same map on our site without the header and footer, which seems to load slower on iPads. It’s also designed to be flexible regardless of how wide your tablet screen is.

The mobile version was more difficult. At this time, the mobile app does not have a map of the crimes. Instead, it’s just the color charts comparing different crimes and a table of all the crime data. I stripped out the map mostly because it’s difficult to click individual points on the map on small mobile screens. But screens continue to get bigger and nicer so hopefully this won’t be a problem in the future.

One pro tip: I set the padding-right CSS property on the table of crimes to 10 percent. This gives a nice cushion to the right of the table, making it easier for people to scroll past it on smartphones with touch screens.

For this project, I went about making the mobile version the completely wrong way: I opted to create just the desktop version at first and then go back and make one for tablets and phones.

Ideally I would have done both at the same time, which is easy to do with Bootstrap. And that’s exactly what I did for another project we launched this weekend on campaign finance reports. The project exams the finance reports for candidates running in four local races. The reports are made available by the Iowa Ethics and Campaign Disclosure Board, which fortunately allows users to export the data into CSV files.

We broke down the data and created bar charts using Bootstrap to compare the figures. The framework has several bar options to easily create the bar charts, which are worth checking out. The best part is they are designed to look good on all platforms.

We also have databases of all the contributions that are searchable. This allows readers to see exactly where the money is coming from.

For this project, I created the mobile and tablet equivalents of app as I was creating the desktop version. When viewing it on a desktop computer, the app is embedded on our website so it has the election header, footer and colors. The same app was created with responsive design in mind. So if you open it on a mobile phone, the app will look just as good on a tablet or smartphone as it does on a desktop computer.

Many studies show that more and more people are getting their news on smartphones. It is imperative that we keep those readers in mind when designed full-scale apps for our websites.

Hopefully we can continue this trend at the Courier and make sure our projects are reaching our full audience.

Advertisements

Written by csessig

October 21, 2012 at 1:58 pm

How We Did It: Waterloo crime map

with 3 comments

Note: This is cross-posted from Lee’s data journalism blog. Reporters at Lee newspapers can read my blog over there by clicking here.

Last week we launched a new feature on the Courier’s website: A crime map for the city of Waterloo that will be updated daily Monday through Friday.

The map uses data provided by the Waterloo police department. It’s presented in a way to allow readers to make their own stories out of the data.

(Note: The full code for this project is available here.)

Here’s a quick run-through of what we did to get the map up and running:

1. Turning a PDF into manageable data

The hardest part of this project was the first step: Turning a PDF into something usable. Every morning, the Waterloo police department updates their calls for service PDF with the latest service calls. It’s a rolling PDF that keeps track of about a week of calls.

The first step I took was turning the PDF into a HTML document using the command line tool PDFtoHTMLFor Mac users, you can download it by going to the command line and typing in “brew install pdftohtml.” Then run “pdftohtml -c (ENTER NAME OF PDF HERE)” to turn the PDF into an HTML document.

The PDF we are converting is basically a spreadsheet. Each cell of the spreadsheet is turned into a DIV with PDFtoHTML. Each page of the PDF is turned into its own HTML document. We will then scrape these HTML documents using the programming language Python, which I have blogged about before. The Python library that will allow us to scrape the information is Beautiful Soup.

The “-c” command adds a bunch of inline CSS properties to these DIVs based on where they are on the page. These inline properties are important because they help us get the information off the spreadsheet we want.

All dates and times, for instance, are located in the second column. As a result, all the dates and times have the exact same inline left CSS property of “107” because they are all the same distance from the left side of the page.

The same goes for the dispositions. They are in the fifth column and are farther from the left side of the page so they have an inline left CSS property of “677.”

We use these properties to find the columns of information we want. The first thing we want is the dates. With our Python scraper, we’ll grab all the data in the second column, which is all the DIVs that have an inline left CSS property of “107.”

We then have a second argument that uses regular expressions to make sure the data is in the correct format i.e. numbers and not letters. We do this to make sure we are pulling dates and not text accidently.

The second argument is basically an insurance policy. Everything we pull with the CSS property of “107” should be a date. But we want to be 100% so we’ll make sure it’s integers and not a string with regular expressions.

The third column is the reported crimes. But in our converted HTML document, crimes are actually located in the DIV previous to the date + time DIV. So once we have grabbed a date + time DIV with our Python scraper, we will check the previous DIV to see if it matches one of the seven crimes we are going to map. For this project, we decided not to map minor reports like business checks and traffic stops. Instead we are mapping the seven most serious reports.

If it is one of our seven crimes, we will run one final check to make sure it’s not a cancelled call, an unfounded call, etc. We do this by checking the disposition DIVs (column five in the spreadsheet), which are located before the crime DIVs. Also remember that all these have an inline left CSS property of “677”.

So we check these DIVs with our dispositions to make sure they don’t contain words like “NOT NEEDED” or “NO REPORT” or “CALL CANCELLED.”

Once we know it’s a crime that fits into one of our seven categories and it wasn’t a cancelled call, we add the crime, the date, the time, the disposition and the location to a CSV spreadsheet.

The full Python scraper is available here.

2. Using Google to get latitude, longitude and JSON

The mapping service I used was Leaflet, as opposed to Google Maps. But we will need to geocode our addresses to get latitude and longitude information for each point to use with Leaflet. We also need to convert our spreadsheet into a Javascript object file, also known as a JSON file.

Fortunately that is an easy and quick process thanks to two gadgets available to us using Google Docs.

The first thing we need to do is upload our CSV to Google Docs. Then we can use this gadget to get latitude and longitude points for each address. Then we can use this gadget to get the JSON file we will use with the map.

3. Powering the map with Leaflet, jQRangeSlider, DataTables and Bootstrap

As I mentioned, Leaflet powers the map. It uses the latitude and longitude points from the JSON file to map our crimes.

For this map, I created my own icons. I used a free image editor known as Seashore, which is a fantastic program for those who are too cheap to shell out the dough for Adobe’s Photoshop.

The date range slider below the map is a very awesome tool called jQRangeSlider. Basically every time the date range is moved, a Javascript function is called that will go through the JSON file and see if the crimes are between those two dates.

This Javascript function also checks to see if the crime has been selected by the user. Notice on the map the check boxes next to each crime logo under “Types of Crimes.”

If the crime is both between the dates on the slider and checked by the users, it is mapped.

While this is going on, an HTML table of this information is being created below the map. We use another awesome tool called DataTables to make that table of crimes interactive. With it, readers can display up to a 100 records on the page or search through the records.

Finally, we create a pretty basic bar chart using the Progress Bars made available by Bootstrap, an awesome interface released by the people who brought us Twitter.

Creating these bars are easy: We just need to create DIVs and give them a certain class so Bootstrap knows how to style them. We create a bar for each crime that is automatically updated when we tweak the map

For more information on progress bars, check out the documentation from Bootstrap. I also want to thank the app team at the Chicago Tribune for providing the inspiration behind the bar chart with their 2012 primary election app.

The full Javascript file is available here.

4. Daily upkeep

This map is not updated automatically so every day, Monday through Friday, I will be adding new crimes to our map.

Fortunately, this only takes about 5-10 minutes of work. Basically I scrape the last few pages of the police’s crime log PDF, pull out the crimes that are new, pull them into Google Docs, get the latitude and longitude information, output the JSON file and put that new file into our FTP server.

Trust me, it doesn’t take nearly as long as it sounds to do.

5. What’s next?

Besides minor tweaks and possible design improvements, I have two main goals for this project in the future:

A. Create a crime map for Cedar Falls – Cedar Falls is Waterloo’s sister city and like the Waterloo police department, the Cedar Falls police department keeps a daily log of calls for service. They also post PDFs, so I’m hoping the process of pulling out the data won’t be drastically different that what I did for the Waterloo map.

B. Create a mobile version for both crime maps – Maps don’t work tremendously well on the mobile phone. So I’d like to develop some sort of alternative for mobile users. Fortunately, we have all the data. We just need to figure out how to display it best for smartphones.

Have any questions? Feel free to e-mail me at chris.essig@wcfcourier.com.

Graph: Hispanic population growing in Iowa

leave a comment »

This graph I put together with a weekend story breaks down Iowa’s Hispanic population by county. It was based almost entirely off a JavaScript chart walkthrough put together by Michele Minkoff, Interactive Producer for the Associated Press. So check it out now because it’s FANTASTICO!

I did make a couple of minor tweaks, which may be helpful for others so I will outline them here.

My initial, final product looked like this. Go ahead and roll over the bars like the blue one, for instance, which represents the amount of Hispanic people in Iowa who said they were also white. Notice how the bars in 2000 and 2010 are both the same length, even though the value of the 2000 bar (38,000) is almost half of the 2010 bar (80,000).

The data was retrieved from Census.IRE.org. I just selected “Iowa,” then “County,” then “Hispanic or Latino origin by race.” You can then download a CSV of the data and chop it up as you see fit. Also click an option after “Browse data for…” on the Census.IRE.org page for a great breakdown of what each of the headers in the CSV files means. Here’s Iowa’s page, for instance.

The Javascript file that makes these graphs calls a JSON file containing information retrieved from Census.IRE.org. My JSON file initially looked like this:

  1. headers = [“White”,”Black / African American”,”American Indian / Alaska Native”,”Asian”,”Native Hawaiian / Other Pacific Islander”,”Some other race”,”Two or more races”]
  2.  allCountyData = [
  3. [“Iowa”,38296,1109,1034,290,121,35317,6306,80438,2242,2503,497,206,54000,11658],
  4. (Enter data for every county in Iowa here)

The headers represent the different races of Hispanic people. The line for “Iowa” represents the amount of people per particular race in first 2000 and then 2010. 38296 = Number of Hispanic people in 2000 who said they were White, 1109 said they were black, etc. 80438 = Number of Hispanic people in 2010 who said they were White, 2242 said they were black, etc.

Here’s my tweak: Instead of having both 2000 and 2010 data on the same line, I broke this out onto two separate lines. So my new JSON file includes this:

  1. allCountyData = [
  2.   [“Iowa”,38296,1109,1034,290,121,35317,6306],
  3. (Enter data for every county in Iowa here)

And this:

  1. allCountyData2 = [
  2.   [“Iowa”,80438,2242,2503,497,206,54000,11658],
  3. (Enter data for every county in Iowa here)

From there, I needed to edit the Javascript file to call the data contained in allCountyData2. Here’s my Javascript file. Go ahead and click on it. I know you want to. And scroll down to “function changeGraph(stateText)” and notice how it is calling both selectedData, which includes the value for everything in allCountyData (in the JSON file), AND selectedData2, which contains the value for everything in allCountyData2 (in the JSON file). The “drawVisualization();” inside the function also contains two arguments now: “selectedData, selectedData2.”

As a result, the “function drawVisualization()” at the very top of the Javascript file must also contain two arguments: “newData, newData2.” Initially, my drawVisualization function (here’s my first Javascript file) contained just one argument because it was only calling newData, which contains data from allCountyData. At the very end of the function, it broke newData (allCountyData) into two because remember, allCountyData contained data for both 2000 and 2010:

  1. //the first half of our data is for 2000, so we fill the row in with appropriate numbers
  2. //we start at 1 to leave out the years, remember data structured this way starts at 0
  3. for (var i = 1; i <=(newData.length-1)/2; ++i) {
  4. data.setValue(0, i, newData[i]);
  5. data.setFormattedValue(0,i, numberFormat(newData[i]));
  6. }
  7. //now, the second half of the data is for 2010, so we’ll fill that in
  8. for (var i = 1; i<=(newData.length-1)/2; ++i) {
  9. data.setValue(1, i, newData[i]);
  10. data.setFormattedValue(1,i, numberFormat(newData[i+(newData.length-1)/2])); 
  11. }

I needed to change that last “for” function because we no longer wanted newData to be broken in half. Instead, we want to call that second argument (newData2) in the drawVisualization function because it calls the data from allCountyData2:

  1. //the first half of our data is for 2000, so we fill the row in with appropriate numbers
  2. //we start at 1 to leave out the years, remember data structured this way starts at 0
  3. for (var i = 1; i <=(newData.length-1); ++i) {
  4. data.setValue(0, i, newData[i]);
  5. data.setFormattedValue(0, i, numberFormat(newData[i]));
  6. }
  7. //now, the second half of the data is for 2010, so we’ll fill that in
  8. for (var i = 1; i<=(newData2.length-1); ++i) {
  9. data.setValue(1, i, newData2[i]);
  10. data.setFormattedValue(1, i, numberFormat(newData2[i])); 
  11. }

If I’m not mistaken, that’s all I did. If you are interested, my HTML file is here and my CSS file is here. Again, Michele Minkoff deserves all the credit in the world for putting together her great walkthrough. So get over there and check it out!

Written by csessig

November 28, 2011 at 11:29 pm