A couple of days ago, I was able to catch snippets of the India Bangladesh Test match.
The venue where the match was being played had a pitch that tended to become slower and more docile over time, and there was also a rather interesting statistic about how West Indies had chased down a rather imposing target on the fifth day, which inlcuded a double century scored on the fifth day.
They then showed a statistic about how only five wickets, on average, had fallen on the fifth day in all test matches played at the venue thus far. That only reinforces the idea that the pitch becomes slower over time.
But then one of the commentators said something that I found fascinating. It should be noted, he said, that maybe there were only five wickets remaining in the fourth innings by the time the fifth day started! That is, the reason Day 5 records the fall of only five wickets may either be due to
a) The pitch becoming slower over time
b) The match having played out in a way that only five wickets were remaining by the time the fifth day started.
And the lesson for those of us who are students of statistics is that before jumping to conclusions given the data we’re looking at, ask about the context first. And it is surprising how often we forget to do this!
For years, I used to ask students in my statistics classes why this chart looks the way it does:
Why, I would ask them, do Indians tend to search for cats towards the end of the year? And I would get lots of interesting responses. Maybe Indians gift cats towards the end of the year, some would say. Maybe cats fall ill in India towards the end of the year? Maybe there’s a festival involving cats in some parts of the country?
The clue lies in the fact that this spike in the search for “cat” towards the end of the year did not happen post 2010 or so. And the answer lies in the fact that the entrance exam for IIM’s changed around that time. But that’s the point, of course, that’s common to this exercise and the test match in Bangladesh. Data matters, sure, but the context of the data matters even more.
And finally, this example is perhaps the most famous of them all – but do read the rest of the article too. And always remember – the story behind the data is at least as important as the story that you will be able to tell by analyzing the data!