No Such Thing As Too Much Stats in One Week

I wrote this earlier this week:

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.
Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.

But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

And just today morning, I chanced upon this:

And so let’s try and take a walk down this rabbit hole!

Here are three statements:

  1. If there is correlation, there must be causation.

    I think we can all agree that this is not true.
  2. If there is causation, there must be correlation.

    That is what the highlighted excerpt is saying in the tweet above. I said much the same thing in my own blogpost the other day. The bad news (for me) is that I was wrong – and I’ll expand upon why I was wrong below.
  3. If there is no correlation, there can be no causation

    That is what Rachael Meager is saying the book is saying. I spent a fair bit of time trying to understand if this is the same as 2. above. I’ve never studied logic formally (or informally, for that matter), but I suppose I am asking the following:
    If B exists, A must exist. (B is causation, A is correlation – this is just 2. above)
    If we can show that A doesn’t exist, are we guaranteed the non-existence of B?
    And having thought about it, I think it to be true. 3. is the same as 2.1

Rachael Meager then provides this example as support for her argument:

This is not me trying to get all “gotcha” – and I need to say this because this is the internet, after all – but could somebody please tell me where I’m wrong when I reason through the following:

Ceteris paribus, there is a causal link between pressing on the gas and the speed of the car. (Ceteris paribus is just fancy pants speak – it means holding all other things constant.)

But when you bring in the going up a hill argument, ceteris isn’t paribus anymore, no? The correlation is very much still there. But it is between pressing on the gas and the speed of the car up the slope.

Forget the phsyics and accelaration and slope and velocity and all that. Think of it this way: the steeper the incline, the more you’ll have to press the accelerator to keep the speed constant. The causal link is between the degree to which you press on the gas and the steepness of the slope. That is causally linked, and therefore there is (must be!) correlation.2

Put another way:

If y is caused by x, then y and x must be correlated. But this is only true keeping all other things constant. And going from flat territory into hilly terrain is not keeping all other things constant.


But even if my argument above turns out to be correct, I still was wrong when I said that causation implies correlation. I should have been more careful about distinguishing between association and correlation.

Ben Golub made the same argument (I think) that I did:

… and Enrique Otero pointed out the error in his tweet, and therefore the error in my own statement:

Phew, ok. So: what have we learnt, and what do we know?

Here is where I stand right now:

  1. Correlation doesn’t imply causation
  2. I still think that if there is causation, there must be correlation association. But that being said, I should be pushing The Mixtape to the top of the list.
  3. Words matter, and I should be more careful!

All in all, not a bad way to spend a Saturday morning.

  1. Anybody who has studied logic, please let me know if I am correct![]
  2. Association, really. See below[]

Leave a Reply