The Tea Statistic

I’m sorry, but the pun was just too hard to resist.

Cyril Morong, author of the blog The Dangerous Economist links to a lovely write-up by Francisca Antman about drinking tea. Now, you might wonder what is so special about drinking tea. As it turns out, it can save your life.

Well, technically speaking, it probably would have saved your life if you started to drink it in the 18th century in England.

my research adds to both the historical and development literatures by exploiting a natural experiment into the effects of water quality on mortality that occurred prior to the understanding that water contamination could compromise health. This occurred through the widespread adoption of tea drinking in England which began in the 18th century. Since brewing tea required boiling water, and boiling water is a method of water purification, the rise of tea consumption in 18th century England would have resulted in an accidental improvement in the relatively poor quality of water available during the Industrial Revolution. To what extent can the rise of tea drinking account for a drop in mortality rates at this crucial juncture in economic history?

https://broadstreet.blog/2022/06/13/did-tea-drinking-cut-mortality-rates-in-england/

You should of course read the whole thing, but the author of the piece took data from 18th century England and showed that the period in which imports of tea into England went up from about 1 pound per person to 3 pounds per person also happened to be the period in which the death rate in England fell from 28 per 1000 to 23 per 1000.

Now, you might argue that this could be because of a variety (pun unintended) of reasons – how do we know that this is because of an increase in tea consumption?

Fair question!

What the author does is that she uses data about cause of mortality in the same period, and notes that there was a marked decrease in deaths that were caused by water borne diseases, while deaths due to air-borne diseases did not go down in the same period. In fact, there was a marginal increase in deaths due to airborne diseases.

So the simple act of having to boil water in order to make tea went a very long way towards saving lives. Or at least, that’s the hypothesis being advanced.

Now, fans of statistics among my readers will want to carefully look over the model that has been developed, and ask some probing questions – which, of course, should happen. But if you are a student starting out in the fields of development economics, health economics, statistics or econometrics, this is a great example of what is known as a natural experiment.

Natural experiments are hard to come by in economics, so any insight that can be gleaned by figuring out how to “set” one up is worth it’s weight in gold.

Making tea can save lives, who’d have thought?

And that, given the weather in Pune, is as good a reason as any to brew myself a mug. I make mine with ginger, just to make the lives of future statisticians that much more difficult.

No Such Thing As Too Much Stats in One Week

I wrote this earlier this week:

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.
Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.


But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

https://econforeverybody.com/2021/05/19/correlation-causation-and-thinking-things-through/

And just today morning, I chanced upon this:

And so let’s try and take a walk down this rabbit hole!

Here are three statements:

  1. If there is correlation, there must be causation.

    I think we can all agree that this is not true.
  2. If there is causation, there must be correlation.

    That is what the highlighted excerpt is saying in the tweet above. I said much the same thing in my own blogpost the other day. The bad news (for me) is that I was wrong – and I’ll expand upon why I was wrong below.
  3. If there is no correlation, there can be no causation

    That is what Rachael Meager is saying the book is saying. I spent a fair bit of time trying to understand if this is the same as 2. above. I’ve never studied logic formally (or informally, for that matter), but I suppose I am asking the following:
    ..
    ..
    If B exists, A must exist. (B is causation, A is correlation – this is just 2. above)
    ..
    ..
    If we can show that A doesn’t exist, are we guaranteed the non-existence of B?
    ..
    ..
    And having thought about it, I think it to be true. 3. is the same as 2.1

Rachael Meager then provides this example as support for her argument:

This is not me trying to get all “gotcha” – and I need to say this because this is the internet, after all – but could somebody please tell me where I’m wrong when I reason through the following:

Ceteris paribus, there is a causal link between pressing on the gas and the speed of the car. (Ceteris paribus is just fancy pants speak – it means holding all other things constant.)

But when you bring in the going up a hill argument, ceteris isn’t paribus anymore, no? The correlation is very much still there. But it is between pressing on the gas and the speed of the car up the slope.

Forget the phsyics and accelaration and slope and velocity and all that. Think of it this way: the steeper the incline, the more you’ll have to press the accelerator to keep the speed constant. The causal link is between the degree to which you press on the gas and the steepness of the slope. That is causally linked, and therefore there is (must be!) correlation.2

Put another way:

If y is caused by x, then y and x must be correlated. But this is only true keeping all other things constant. And going from flat territory into hilly terrain is not keeping all other things constant.

No?


But even if my argument above turns out to be correct, I still was wrong when I said that causation implies correlation. I should have been more careful about distinguishing between association and correlation.

Ben Golub made the same argument (I think) that I did:

… and Enrique Otero pointed out the error in his tweet, and therefore the error in my own statement:


Phew, ok. So: what have we learnt, and what do we know?

Here is where I stand right now:

  1. Correlation doesn’t imply causation
  2. I still think that if there is causation, there must be correlation association. But that being said, I should be pushing The Mixtape to the top of the list.
  3. Words matter, and I should be more careful!

All in all, not a bad way to spend a Saturday morning.

  1. Anybody who has studied logic, please let me know if I am correct![]
  2. Association, really. See below[]

Correlation, Causation and Thinking Things Through

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.

Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.

For example, this one chart from this magnificent website (and please, do take a look at all the charts):

https://www.tylervigen.com/spurious-correlations

But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

OK, you might be saying right now. So what?

Well, how about using this to figure out what ingredients were being used to make nuclear bombs? Say the government would like to keep the recipe (and the ingredients) for the nuclear bomb a secret. But what if you decide to take a look at the stock market data? What if you try to see if there is an increase in the stock price of firms that make the ingredients likely to be used in a nuclear bomb?

If the stuff that your firm produces (call this x) is in high demand, your firm’s stock price will go up (call this y). If y has gone up, it (almost certainly) will be because of x going up. So if I can check if y has gone up, I can assume that x will be up, and hey, I can figure out the ingredients for a nuclear bomb.

Sounds outlandish? Try this on for size:

Realizing that positive developments in the testing and mass production of the two-stage thermonuclear (hydrogen) bomb would boost future cash flows and thus market capitalizations of the relevant companies, Alchian used stock prices of publicly traded industrial corporations to infer the secret fuel component in the device in a paper titled “The Stock Market Speaks.” Alchian (2000) relates the story in an interview:
We knew they were developing this H-bomb, but we wanted to know, what’s in it? What’s the fissile material? Well there’s thorium, thallium, beryllium, and something else, and we asked Herman Kahn and he said, ‘Can’t tell you’… I said, ‘I’ll find out’, so I went down to the RAND library and had them get for me the US Government’s Dept. of Commerce Yearbook which has items on every industry by product, so I went through and looked up thorium, who makes it, looked up beryllium, who makes it, looked them all up, took me about 10 minutes to do it, and got them. There were about five companies, five of these things, and then I called Dean Witter… they had the names of the companies also making these things, ‘Look up for me the price of these companies…’ and here were these four or five stocks going like this, and then about, I think it was September, this was now around October, one of them started to go like that, from $2 to around $10, the rest were going like this, so I thought ‘Well, that’s interesting’… I wrote it up and distributed it around the social science group the next day. I got a phone call from the head of RAND calling me in, nice guy, knew him well, he said ‘Armen, we’ve got to suppress this’… I said ‘Yes, sir’, and I took it and put it away, and that was the first event study. Anyway, it made my reputation among a lot of the engineers at RAND.

https://www.sciencedirect.com/science/article/abs/pii/S0929119914000546

I learnt about this while reading Navin Kabra’s Twitter round-up from yesterday. Navin also mentions the discovery of Neptune using the same underlying principle, and then asks this question:

Do you know other, more recent examples of people deducing important information by guessing from correlated data?

https://futureiq.substack.com/p/best-of-twitter-antifragility-via

… and I was reminded of this tweet:


Whether it is Neptune, the nuclear bomb or the under-reporting of Covid deaths, the lesson for you as a student of economics is this: when you marry the ability to connect the dots with the ability to understand and apply statistics, truly remarkable things can happen.

Of course, the reverse is equally true, and perhaps even more important. When you marry the ability to connect the dots with a misplaced ability to understand and apply statistics, truly horrific things can happen.

Tread carefully when it comes to statistics!