Monday, 28 September 2015

When BBC stands for Bad, Bad Charting!

The British Broadcasting Company is an iconic news channel the world over known for cutting edge reporting and breaking news stories in an innovative way. The BBC has slowly become the website that I consume the majority of my news through whether it is politics, business or sport (other news channels are available).

With the rise of data being collected about sports games to help the spectators understand the game further, I was excited to see where the BBC would take this data in playing back to their audience. The results have not been the quality that I would normally expect from an organisation the British nation largely has pride in. All the below come from this year’s Premier League season – enjoy and be ready to cringe.

Taking a bite out of the donut
Let’s talk about Donut Pie charts. This is actually a tough subject as I know it evokes a lot of emotion from a lot of people. Some people love the aesthetic they create, some people love their infographic feel and a whole lot of those who understand data visualisation really, really, really don’t like them.
I found Andy Kirk’s tweet below summarised a beautiful example.



So what’s wrong with this chart? The chart actually doesn’t stand out to me. Andy is right, the dots and labels are more dominant than the visual that should be helping the viewer see the story of the data. The story of the chart is challenging the commonly held belief that more possession equals more wins. The data shows that if you are the team with the most possession then you have won 41% of your games. A majority of results have not been a win. So why doesn’t the chart use colour to pick this story out? Why use two shades of blue to show win and loss (opposite results) with grey being used to show a draw. Would using a non-win colour, and a different colour being used for the win, be more effective? Or a colour scale showing the league table points that are returned from the game results?

Add to this the distracting dots that are not centred in the middle of the section they relate to, really do not help. The labels are huge and the dots, once your eyes spot them, can’t distract themselves back to the data visualisation. I would simply use the ‘donuts hole’ to host the 59% message rather than Total Games (Total Games of what by the way?).

I’m not going to say you have to have a bar chart for everything but there are a lot of ways to improve this chart… a lot.

Love any other chart type (except pies)
Like Andy Kirk, Chris Love knows a lot about data visualisation.



Pie charts are all about showing a part of an overall whole. The English football team have struggled for the last few years but in it’s entire history, the team have definitely scored more than 279 goals and had more than 6 different goal scorers (who did ‘Own Goals’ play for?).

Chris actually found a couple of examples of this chart ‘style’ used (same colours) but he is absolutely correct that this chart is ‘Useless’. The differentiation in scale is completely lost in this chart. I really can’t spot who had the most goals. Wouldn’t Yellow make more sense as own goals if you had to use this colour scheme (data visualisation and the brand police have a lot of conversations still to have to reconcile their differences). Again a bar chart ordered by goal scorer would be really useful to really see the message that is screaming out from the data. I’d prefer a scatterplot showing how many games it took to score those goals instead but I love scatterplots!

Making the difference seem overly large and therefore… newsworthy
The debate about non-zero axis’d charts has raged long and hard. This article by Andy Cotgreave (http://gravyanecdote.com/visual-analytics/breaking-the-real-chart-rules-to-follow/) swayed my opinion more and most. The next example from the BBC shows when a Zero axis definitely is needed.



I could run 0 sprints, I could ‘run’ at 0 km/h (and on a basketball court a coach has accused me of this) and this chart really needs to show the relative difference in speed between Billy Jones and Theo Walcott. Rob completely (and sarcastically) nails the chart by highlighting how slow Theo Walcott seems in comparison when in fact he runs a miserable 1.1% slower than Billy. Add the zero back on the axis and this chart would show… very little and that I fear is why the chart has had the 0 axis removed.

The average without context (or perspective) can be confusing
The goalkeeper spent the majority of their time during the 2nd half in their penalty area. Boom! Good insight that.

When Match of the Day first flashed this chart up, I let out an audible “ooh” as my eyes tore themselves off Tableau to stare at the screen to look at this chart in wonder. John (again another good authority to listen to on data viz) laid out a series of arguments that were similar to my thoughts about this chart. The main point that struck a chord with me was the chart misses the context of where were the opposition? What were there formation? This is exactly why I created this (http://datajedininja.blogspot.co.uk/2015/09/tableau-and-nba-moving-past-static-shot.html) to understand positioning on the pitch / court then you need to understand whether your team are responding correctly to how the other team are set-up. When attacking are you exploiting the gaps the defence is leaving?

In retrospect, what I find hardest to understand about this chart is why the chart tapers away from the camera when the point being made by the ‘analysts’ was that the team wasn’t playing closer to the opposition goal (or using the width of the pitch extensively enough). The taper of the pitch will always make the players seem more closely together than if the image of the pitch was reversed with the defensive team’s goal closest to the camera. The arrows on the pitch tell me what the direction of play was, I don’t need to focus on the Chelsea goalkeeper as this doesn’t help me understand the distribution of the players.



John makes many other valid points about average position and how that is a potentially misleading metric but I will let you read his tweets as he articulates those points a lot better than I could in the same number of characters.

Overall
I am excited that data visualisation is becoming more at the heart of the communication of journalistic points. We live in an age where data is becoming increasingly available for everything we do and therefore, we can quantify and prove elements of the world (even if they are as flippant as sport) that we previously just made guesses about. I have spent the majority of my career trying to take dry data and turn it in to something that is more easily consumed and more attractive to the casual observer but always trying to avoid creating a chart ‘for the sake of it’.

If the public continue to see a greater variety of charting then their ability to consume more data-led messages and make more data-led decisions in everyday life will increase but they won’t if all they ask for is a donut pie chart like that seen on the BBC. Banks and other service providers need to get across really complex messages with data but there is no way they can do that if the data viz guys and girls are restricted to bar charts as that is all the public knows how to instantly read until they see other types of visualisation that capture the attention about subjects they are passionate about.
So BBC, please keep making the British nation and world proud by being the leader in everything you do and just spend a bit more time challenging the clarity of your visualisations before unleashing them on us.


Thursday, 17 September 2015

Tableau and the NBA - moving past a static shot chart

For those who know me, I have always wanted to achieve one thing with data - understand the NBA better. If you were (un)lucky enough to hear me talk in Seattle at the Tableau Public session in 2014 you know that the great basketball analysis I had come up with was this:


The Interactive Shot Chart: click on the image to go to Tableau Public


Don’t get me wrong, I was massively proud of creating this. It was the first time that I had used path, background images and a number of other techniques in Tableau. The visualisation allowed me to understand not just whether a shot was made, but added context to the shot by showing additional information about the pass that led to the shot. However, this was not sustainable. It took me four hours to map one quarter of one game. I’m a dedicated fan but there are limits (even when the Spurs are beating the Lakers).


I made an extra step when I stumbled upon the ‘Shot Logs’ on the NBA.com website. Have a look at all of the richness that is available from this part of the site: 


Stats.nba.com - All thanks to SportsVu data

I have already written about how I got the statistics out of this part of the NBA in the following blog: http://www.theinformationlab.co.uk/2015/08/11/the-nba-letting-you-get-closer-to-the-game/ Despite all the goodies in these pages (and the accompanying Rebound logs) this gave me a lot to analyse but still didn’t get me any closer to what I wanted to really create.


This is why I got really excited when a colleague sent me a link to this article in which Savvas shows how to get the X/Y movement data that I have been striving to get my hands on for the last few years. All of this is available from the NBA API. Savvas uses Python and as I’m not the greatest coder in the world, I took the path of ‘least coding’ and set-up Alteryx to do all the hard work for me.


With Alteryx you have the ability to connect to the API and grab the JSON output. By generating numbers you can, create a whole set of URLs from a base to scroll through the various events within a game. As a game returns circa 2 million rows of data through this macro, I haven’t made the macro to return multiple games. With Alteryx’s text input, it allows me to enter a Game ID once and then can reuse in multiple places throughout the macro. I have used this functionality as a second API call is made to bring back the play-by-play information.





You can download the module from here and see what I did at each stage.   


My favourite visualisation that I have created so far with this data allows you to pick any play throughout the game and watch the play unfold. As the ‘Pages’ shelf in server doesn’t allow you to ‘play’ the visualisation, I have uploaded a YouTube video of how this looks on Tableau Desktop.


But you can also get the workbook and ‘play’ it on your Tableau Desktop by downloading from Tableau Public