Chris Essig

Walkthroughs, tips and tricks from a data journalist in eastern Iowa

Archive for the ‘Blox’ Category

Our experiment with an ultra-light CMS

leave a comment »

So I was featured in Source recently. No not that Source. This Source.  It’s a blog for journalists who code. It’s a Knight-Mozilla OpenNews project, and it’s maintained by some of the brightest journo-techies in the industry.

The article was about newspapers using “ultra-light CMSes” instead of their own full-blown CMSes to put publish web projects. I interviewed with Katie Zhu, a very talented digital journalist who currently works for the New York Times. She talked to me about how we use Tarbell to maintain our news apps landing page. We used Tarbell instead of BLOX, which is our day-to-day CMS, to deploy the app outside of the CMS.

The article covers Tarbell — and other ultra-light CMSes — in great detail so I won’t go into detail about it here. Instead, you should read the entire blog post. I’m quoted near the bottom!

Advertisements

Written by csessig

August 12, 2013 at 9:40 am

Turning Excel spreadsheets into searchable databases in five minutes

with 21 comments

Note: This is cross-posted from Lee’s data journalism blog and includes references to the Blox content management system, which is what Lee newspapers use. Reporters at Lee newspapers can read my blog over there by clicking here.

data_tables_screenshotHere’s a scenario I run into all the time: A government agency sends along a spreadsheet of data to go along with a story one of our reporters is working on. And you want to post the spreadsheet(s) online with the story in a reader-friendly way.

One way is to create a sortable database with the information. There are a couple of awesome options on the table put out by other news organizations. One is TableSetter, published by ProPublica, which we’ve used in the past. The other one is TableStacker, which is a spin off of TableSetter. We have also used this before.

The only problem is making these tables typically take a little bit of leg work. And of course, you’re on deadline.

Here’s one quick and dirty option:

1. First open the spreadsheet in Excel or Google Docs and highlight the fields you want to make into the sortable database.

2. Then go to this wonderful website called Mr. Data Converter and paste the spreadsheet information into the top box. Then select the output as “HTML.”

3. We will use a service called DataTables to create the sortable databases. It’s a great and easy jQuery plugin to create the sortable tables.

4. Now create an HTML asset in Blox and paste in this DataTables template below:

Note: I’ve added some CSS styling to make the tables look better.

<html>
<head>
<link rel="stylesheet" type="text/css" href="http://wcfcourier.com/app/special/data_tables/media/css/demo_page.css">
<link rel="stylesheet" type="text/css" href="http://wcfcourier.com/app/special/data_tables/media/css/demo_table.css">

<style>
table {
	font-size: 12px;
	font-family: Arial, Helvetica, sans-serif;
float: left
}
table th, table td {
    text-align: center;
}

th, td {
	padding-top: 10px;
	padding-bottom: 10px;
	font-size: 14px;
}

label {
	width: 100%;
	text-align: left;
}

table th {
	font-weight: bold;
}

table thead th {
    vertical-align: middle;
}

label, input, button, select, textarea {
    line-height: 30px;
}
input, textarea, select, .uneditable-input {
    height: 25px;
    line-height: 25px;
}

select {
    width: 100px;
}

.dataTables_length {
    padding-left: 10px;
}
.dataTables_filter {
	padding-right: 10px;
}

</style>

<script type="text/javascript" language="javascript" src="http://wcfcourier.com/app/special/data_tables/media/js/jquery.js"></script>
<script type="text/javascript" language="javascript" src="http://wcfcourier.com/app/special/data_tables/media/js/jquery.dataTables.min.js"></script>

<script type="text/javascript" charset="utf-8">
$(document).ready(function() {
$('#five_year').dataTable({
"iDisplayLength": 25
});
});
</script>
</head>

<body>

<--- Enter HTML table here --->

</body>

</html>

– This will link the page to the necessary CSS spreadsheets and Javascript files to get the DataTable working. The other option is go to the DataTable’s website and download the files yourself and post them on your own server, then link to those files instead of the ones hosted by WCFCourier.com.

5. Where you see the text “<— Enter HTML table here —>,” paste in your HTML table from Mr. Data Converter.

6. The last thing you will need to do is create an “id” for the table and link that “id” to the DataTable’s plugin. In the example above, the “id” is “five_year.” It is noted in this line of code in the DataTable template:

<script type="text/javascript" charset="utf-8">
$(document).ready(function() {
$('#five_year').dataTable({
"iDisplayLength": 25
});
});
</script>

– The header on your HTML table that you post into the template will look like so:

<table id="five_year" style="width: 620px;">
  <thead>
    <tr>
      <th class="NAME-cell">NAME</th>
      <th class="2008 Enrollment-cell">2008 Enrollment</th>
      <th class="2012 Enrollment-cell">2012 Enrollment</th>
      <th class="Increase/Decrease-cell">Increase/Decrease</th>
      <th class="Percent Increase/Decrease-cell">Percent Increase/Decrease</th>
    </tr>
  </thead>

– Here’s an live example of two sortable tables. The first table has an “id” of “five_year.” The second has an “id” of “one_year.” The full code for the two tables is available here.

– As an alternative, you can use a jQuery plugin called TableSorter (not to be confused with the TableSorter project mentioned above). The process of creating the table is very similar.

7. That’s it! Of course, DataTables provides several customization options that are worth looking into if you want to make your table look fancier.

Written by csessig

January 3, 2013 at 4:34 pm

Tip: Embedding Vmix videos into a Google Fusion table

leave a comment »

Note: This is cross-posted from Lee’s data journalism blog. Reporters at Lee newspapers can read my blog over there by clicking here.

For any map makers out there, here’s a walk-through on how to take a Vmix video and post it into a Google Fusion table. It’s a perfect follow-up to the tutorials Chris Keller and I held a month ago with Lee journalists on how to build maps with Google Fusion Tables.

1. First we need a Vmix video so post one onto your website like you normally would by uploading it to Vmix and pulling it into the Blox CMS. I’m going to use this video in our example.

2. View the source of the page by right clicking on the page and selecting “View page source.” Then search for a DIV with the class of “vmix-player”. You can do this by searching for “vmix-player”.

3. Under that should be a Javascript file with a source that starts with “http://media.vmixcore.com/&#8221;. Click on that link to open up the source in a new window. You should now see a screen with a huge “Not Found” warning. But don’t be discouraged.

4. Now view the source of that page by doing the same thing you did before (Right click > “View page source”).

5. You should now see a page with three variables: t, u and h. The variable we want is “h”, which is the object tag we will embed into the map.

The page should look something like this.

6. Clean up the variable tag by removing these tags:

var h = ”

(This marks the beginning of the variable.)

h += ”

(There should be several of these. Basically this adds whatever follows  it to the “h” variable, hints the plus sign.)

“;

(These are at the end of every line of code.)

7. Now we need to replace all references to the “t” and “u” variables with their actual value. You’ll notice that “t” and “u” appear in the code for the “h” variable and are surrounded by plus signs. Basically that is just telling Javascript to put whatever “t” equals into that spot. We’ll do that manually:

So replace:

” + t + ”

With:

location.href

And replace:

” + u + ”

With:

http://cdn-akm.vmixcore.com/player/2.0/player.swf?player_id=48df95747124fbbad1aea98cee6e46e4

(Your link will be different than mine)

– It’s important to note that you need to delete the equal signs and the plus signs before and after “t” and “u”. Your final code should not have a double quote next to a single quote. We should have just single quotes around “location.href” and our “http://cdn-adk.vmixcore.com&#8221; link.

– It’s also important to note that when we grab the “t” and “u” variables, we don’t grab the semi-colon at the end of the variable or the quotes around the “u” variable.

For instance, let’s say we have this for our “u” variable:

var u = "http://cdn-akm.vmixcore.com/player/2.0/player.swf?player_id=48df95747124fbbad1aea98cee6e46e4";

So on our movie parameter, we’re turned this line of code:

<param name='movie' value='" + u + "'/>

Into this line of code:

<param name='movie' value='http://cdn-akm.vmixcore.com/player/2.0/player.swf?player_id=48df95747124fbbad1aea98cee6e46e4'/>

– Repeat this for every reference of “t” and “u” in the code.

Our final piece of code should look like this garbled mess.

8. The final step is to post that object tag above into a Google Fusion Table. The easiest way to do this is create a new column called “video” and simply put the above code into that column’s row.

9. Then configure the info window (Visualize > Map > Configure info window) and make sure the “video” column name appears in the HTML of the info window.

If you want to pull in just the “video” column and nothing else, you’re HTML would look like this:

<div class="googft-info-window">{video}</div>

The result looks something like this map. Click on the blue marker labeled “3” to see the video.

I am using the Fusion Table API to make my map instead of using the embed code provided by Google. It seems to work better with videos. If you are interested in see my full code for this map, click here.

That’s it. If you have any questions or something doesn’t make sense, please leave a comment or e-mail at chris.essig@wcfcourier.com.

Written by csessig

July 28, 2012 at 5:49 pm

Turning Blox assets into timelines: Last Part

with 2 comments

Note: This is cross-posted from Lee’s data journalism blog. Reporters at Lee newspapers can read my blog over there by clicking here.

Also note: You will need to run on the Blox CMS for this to work. That said you could probably learn a thing or two about web-scraping even if you don’t use Blox.

For part one of this tutorial, click here. For part two, click here

 

If you’ve followed along thus far, you’ll see we’re almost done turning a collection of Blox assets — which can include articles, photos and videos — into  timelines using a tool made available by ProPublica called TimelineSetter.

Right now, we are in the process of creating a CSV file, which we will use with the program TimelineSetter. This programs takes the CSV file and creates a nice looking timeline out of it.

The CSV file is created by using a Python script that scrapes a web page we created with all of our content on it. The full code for this script is available here.

In the second part of the series, we went ahead and created a Python scraper that will pull all the information on the page we need.

1. If you’ve gone through the second part, you can go ahead and run python timeline.py now on your command line (More information on how to run the scraper is available in the first part of the blog).

You’ll notice the script will output a CSV that has all the information we need. But some of it is ugly. We need to delete certain tags, for instance, that appear before and after some of the text.

But before we can run Python commands to delete these unnecessary tags, we need to convert all of the text into strings. This is basically just text. Integers, by contrast, are numbers.

So let’s do that first:

# Extract that information in strings
    date2 = str(date)
    link2 = str(link)
    headline2 = str(headline)
    image2 = str(image)

    # These are pulled from outside the loop
    description2 = str(description)

Note: This code and the following chunks should go inside the loop statement we created in the second part of the series. This is because we want these changes to take effect on every item we scrape. If you are wondering, I tried to explain loop statements in my second blog. Your best bet though is to ask Google about loop statements.

2. Now that we’ve converted everything into strings, we can get rid of the ugly tags. You’ll notice that the descriptions of our stories start and end with with a “p” tag. Similarly, all dates start and end with an “em” tag.

Because of this, I added a few lines to the Python script. These lines will replace the ugly tags with nothing…effectively deleting them before they are put into the CSV file:

# Extra formatting needed for dates to get rid of em tags and unnecessary formatting
    date4 = date3.replace('[<em>', "")
    date5 = date4.replace('</em>]', "")
    date6 = date5.replace('- ', "")
    date7 = date6.replace("at ", "")

    # Extra formatting is also need for the description to get rid of p tags and new line returns
    description4 = description3.replace('[<p>', "")
    description5 = description4.replace('</p>]', "")
    description6 = description5.replace('\n', " ")
    description7 = description6.replace('[]', "")

3. For images, the script spits out “\d\d\d” for the width property. We will change this so it’s 300px, making all images on the page 300px wide. Also, we will delete the text “None,” which shows up if an image doesn’t exist for a particular asset.

# We will adjust the width of all images to 300 pixels. Also, Python spits out the word 'None' if it doesn't find an image. Delete that.
    image4 = re.sub(r'width="\d\d\d"', 'width="300"', image3)
    image5 = image4.replace('None', "")

4. If you are at all familiar with the way articles work in Blox, you know that when you update them, red text shows up next to the headline telling you when the story was last updated. The code that formats that text and makes it red is useless to us now. So we will delete this and replace it with the current time and date using the Python datetime module.

To use the datetime module, we have to first import it at the top of our Python script. We then need to call the object within the module that we want. A good introduction to Python modules is available here.

The module we want is called “datetime.now()”. As the name suggests, it returns the current date and time. I then put it in a variable called “now”, which makes it more reader friendly to use later in the script.

So the top of our page should look like this:

import urllib2
from BeautifulSoup import BeautifulSoup
import datetime
import re

now = datetime.datetime.now()

Inside the loop we call the “datetime.now()” object and replace the text “Updated” with the current date and time:

# If the story has been updated recently, an em class tag will appear on the page showing the time but not the date. We will delete the class and         replace it with today's date. We can change the date in the CSV if we need to.
    date8 = date7.replace('[<em class="item-updated badge">Updated:', str(now.strftime("%Y-%m-%d %H:%M")))

5. Now there is just one last bit of cleaning up we will need to do. For those who don’t know, CSV stands for comma-separated values. This means that basically the columns we are creating in our spreadsheet are separated by commas. This is the preferred type of spreadsheet for most programs because it’s simple.

We can run into problems, however, if some of our data includes commas. So these next few lines in our script will replace all of the commas we scraped with dashes. You can change it to whatever character or characters you want:

# We'll replace commas with dashes so we don't screw up the CSV. You can change the dash to whatever character you want
    date3 = date2.replace(",", " -")
    link3 = link2.replace(",", " -")
    headline3 = headline2.replace(",", " -")
    description3 = description2.replace(",", " -")
    image3 = image2.replace(",", " -")

If you want to put the commas back into the timeline, you can do so after the final HTML file is created (after you run python timeline.py). Typically I’ll replace all the commas with “////” and then do a simple find and replace on the final HTML file with a text editor and change every instance of “////” back into a comma.

Now we have clean, concise data! We will now put this into the CSV file using the “write” command. Again, all these commands are put inside the loop statement we created in the second part of the series, so every image, description, date, etc. we scrape will be cleaned up and ready to go:

# Write the information to the file. The HTML code is based on coding recognized by TimelineSetter
    f.write(date8 + "," + description7 + "," + link3 + "," + '<h2 class="timeline-img-hed">' + headline3 + '</h2>' + image5 + "\n")

The headlines, you’ll notice, are put into an HTML tag “h2” tag. This will bold and increase the size of the headlines so they stick out when a reader clicks on a particular event in the timeline.

That’s it for the loop statement. Now we will add a simple line outside of the loop statement that closes the CSV file when all the loops we want run are done:

#You're done! Close file.
f.close()

And there you have it folks. We are now done with our Python script. We can now run it (see part 1 of the series) and have a nice, clean looking CSV file we can turn into a timeline.

If you have any questions, PLEASE don’t hesitate to e-mail or Tweet me. I’d be more than happy to help.

Written by csessig

March 16, 2012 at 8:38 am

Turning Blox assets into timelines: Part 1

with 3 comments

Note: This is cross-posted from Lee’s data journalism blog. Reporters at Lee newspapers can read my blog over there by clicking here.

Also note: You will need to run on the Blox CMS for this to work. For part two of this tutorial, click here. For part three, click here.

 

A couple weeks ago I blogged about the command line and a few reasons journalists should learn it. Among the reasons was a timeline tool made available by ProPublica called TimelineSetter, which is shown to the left. Here are two live examples to give you an idea of what the tool looks like.

To create the timeline, you will first need to make a specially-structured CSV file (CSV is a type of spreadsheet file. Excel can export CSV files). Rows in the CSV file will represent particular events on the timeline. Columns will include information on those events, like date, description, photo, etc.

ProPublica has a complete list of available columns here. To give you an idea of what the final product will look like BEFORE you make the timeline, you can download one CSV file we used to make a timeline by clicking here.

After you have your CSV file, you run a simple command and walla! A beautiful timeline will be created. For more information on the command you have to run, check out the TimelineSetter page. (Hint: The command you run is: timeline-setter -c timeline.csv)

By far the most tedious part of all this is tracking down events and articles you want to include in the timeline and making your CSV file. That is why I wrote a simple Python script that will help turn Blox assets into a CSV file you can use with TimelineSetter.

Here’s a walkthrough of how to use it:

1. The first thing you need to do is go to this GitHub page and follow the instructions in the ReadMe file. After that you will have a page set up will all of the events you want to include in the timeline. Here’s an example of what that page should look like.

2. Download the Python script (Timeline.py). What this script will actually be doing is scraping the web page we just created. And by that I mean it will be going to that page and pulling out all the information we need to create our timeline. So it will be grabbing photos, headlines, dates, etc. It will then create a CSV file with all that information. We can then use that CSV file with TimelineSetter.

3. The script uses a Python library called Beautiful Soup. If you don’t have that downloaded already, click here. It takes only a few seconds to install.

4. In the Timeline.py file on line 16 is a spot for the URL of the page we want to scrape. Make sure you change that to whatever URL you created.

5. Run the command python timeline.py from your command line in the same directory as the Python script you downloaded. This will output a CSV file.

6. You will need to download TimelineSetter, which is also really easy to do. Just run this command: gem install timeline_setter. For more information, click here.

7. Now navigate to the folder with the CSV file and run this command: timeline-setter -c timeline.csv. (Or whatever your CSV file is called).

8. You should end up with a directory of javascript files, a CSS file and a Timeline.html file. This is your timeline. Now put in on your server and embed it using an HTML asset in Blox (or whatever you want to do with it).

9. Do the happy dance (Mandatory step)

This will get you pushing out timelines in no time. On my next blog, I will be going through that Python script (Timeline.py) and what it actually does to create the CSV file.

Written by csessig

March 2, 2012 at 9:59 am