WengerOut was my first proper foray into using the Twitter API to gather real time data. During the start of the 2013/2014 Premier League season Arsene Wenger came under a lot of pressure from Arsenal fans after a heavy 4-1 loss to Aston Villa. Many disgruntled fans took to Twitter to express their lack of support through the use of the WengerOut hashtag.

The reason I choose such an odd hashtag to track as I knew that there would a constant flow of new tweets coming in at regular intervals. The hashtag would also be less active than say anything related to One Direction, which is good as since this was my first time using the API I didn’t want to handle too much data.

To collect the data a Cron job is run every 15 minutes which unleashes some PHP onto the Twitter API, the API then returns all the tweets which have happened since the last time the cron job ran. I store each of these tweets in an SQL database. From here the data is picked up on the WengerOut webpage and displayed in some nice graphical charts. I decided against developing my own JavaScript charts as this project was purely about using the Twitter API. Therefore the graphical data is displayed using ChartJS.

I display three different types of data on the WengerOut homepage;

  • Total Number of #WengerOut tweets per day
  • Top Countries using the #WengerOut hashtag
  • Top Cities using the #WengerOut hashtag

Total Number Of #WengerOut Tweets Per Day

I felt the best way to display this data was to use a simple line chart. This allows viewers to quickly understand what the data is showing.

WengerOut Chart

Top Countries using the #WengerOut hashtag

Displaying this data was a little more tricky then calculating the number of tweets per day. This is due to Twitter not enforcing each tweet to have a location attached to it (and for good reason). Meaning that I can only use the WengerOut tweets that have a geo location attached. I could have used another line chart to display this data but I felt that with a possible 196 countries tweeting, the line chart could get very confusing. Therefore I decided to opt for a pie chart (everyone loves pie charts). Again I would have the issue of a possible 196 segments of the pie, therefore I have limited the pie chart to the top 5 countries. I could have done this with the line chart but a pie chart makes comparing data a lot easier in this scenario. You can see in the below imagine one limitation with ChartJS, the smallest segment is not big enough to contain the text overlay.

WengerOut Countries

Top Cities using the #WengerOut hashtag

Once again I could only use the data with a geo location attached to it. Unlike countries, Twitter does not seem the verify the data stored in the city/town field of the JSON response. Which is why you sometimes end up with “Nigeria” in the city segment of the data. I could have created a separate database table with a list of countries in it and perform a search and compare against each result, that would have added overhead to the computation of the queries and slowed the loading time of the webpage. Further to this I wanted to point out that this is an issue with the current version of the Twitter API.

WengerOut Cities

 

Map Data

As well as displaying the data in the above ways, I also played around with the Open Street Maps API. I wanted to see if it was possible to plot the tweets on a map using the data that I obtained from Twitter. I found the Open Street Maps API to be a little confusing to use at first and slightly clunky, my next foray into mapping will be using Google Maps as I’ve heard lots of good things about that. One issue I found is that the API does not like it when you use custom icons for plotting data. In the below image you can see two different icons being used, the red icon is nowhere to be found on my server. It is being taken from the Open Street Maps API and I am not entirely sure why…

Another issue I find is that you can’t just plot a point on the map, you have to go through Open Street maps API to convert a location into something the map can understand, this means that for each tweet I want to plot I have to send it’s geolocation to an external server and then wait for the response before I can plot it. I currently only plot the top 100 tweets in the database but even that is incredibly slow. If I wanted to plot all tweets I’ve collected then I wouldn’t like to think how long that would take.

Open Street Maps WengerOut

 

Link to WengerOut: http://wengerout.joshjordan.co.uk