correlation – EconForEverybody

Correlation, Causation and Cricket

(1) Cricket's recently seen an explosion of TERRIFIC data-based analysis, but some questions @ABdeVilliers17 raises here are 👌
Analysts often mistake correlation for causation, completeness for universality, absence of evidence for evidence of absence + ignore spillovers 👇 https://t.co/znhbvGF52D
— M.R. Sharan (@sharanidli) July 12, 2023

What is common to Butyrylcholinesterase and Vitamin D, or why English is an underrated skill in statistics

Today’s blog post title is in the running for the longest title that I have come up with, but let’s ignore this particular bit of potential trivia and get on with it.

Today’s story really begins with the tragic tale of Sally Clark. It is a very lengthy extract, from a piece I wrote along with a friend some months ago. Lengthy, but fascinating:

In November of the year 1999, an English Solicitor named Sally Clark was convicted on two charges of murder, and sentenced to life imprisonment. This tragic case is notable for many reasons — one of those reasons was the fact that her alleged victims were her own sons. Another was the fact that both were toddlers when they died.
The cause of death in both cases was initially attributed to sudden infant death syndrome (SIDS), also known as cot death in the United Kingdom. We did not know then, and do not know until this day, about the specific causes of SIDS. But suspicion grew on account of the fact that two children from the same family had died due to unspecified causes, and shortly after the death of her second child, Sally Clark was arrested, tried and convicted.
One of the clinching pieces of evidence was expert testimony provided by the pediatrician Professor Sir Roy Meadow. He put the odds of two children from the same family dying of SIDS at 1 in 73 million — in other words, an all but impossible eventuality. On the back of this testimony, and others, Sally Clark was convicted of the crime of murdering her own sons, and sent to prison for life.
One cannot help but ask the question: how did Sir Roy Meadow arrive at this number of 1 in 73 million? Succinctly put, here is the theory: for the level of affluence that Sally Clark’s family possessed, the chance of one infant dying of SIDS was 1 in 8543. This was simply an empirical observation. What then, were the chances that two children from the same family would die of SIDS?
The answer to this question, statisticians tell us, depends on whether the two deaths are independent of each other. If one assumes that they are, then the probability of two deaths in the same family is simply the multiplicative product of the two probabilities. That is, 1 in 8543 multiplied by itself, which is 1 in 73 million and that would be enough to convince any “reasonable man” that the deaths were deliberate and could not have been just coincidence.
But on the other hand, if the two events are not independent of each other — say, for example, that there are underlying genetic or environmental reasons that we simply are not aware of just yet — then it is entirely possible that multiple children from the same family may die of SIDS. In fact, given a SIDS death in a family, research shows that the likelihood of a second SIDS death goes up.
Sally Clark’s convictions were overturned on her second appeal, and she was released from prison. She died four years later due to alcohol poisoning.
https://www.scconline.com/blog/post/2021/12/14/data-analysis-an-essential-skill-for-the-legal-community/

We’ll get back to this truly tragic tale, but let’s go off on a tangent for a second.

Today’s a day for extracts from my own earlier work, it would seem, for I have another one for your consideration:

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.
Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.
For example, this one chart from this magnificent website (and please, do take a look at all the charts):
https://atomic-temporary-112243906.wpcomstaging.com/2021/05/19/correlation-causation-and-thinking-things-through/

https://www.tylervigen.com/spurious-correlations

Hold on to this line of thinking, and let’s get back to the tragic Sally Clark story, but with a twist towards the rather more optimistic side of things.

They found the cause of SIDS
THEY FOUND THE CAUSE OF SIDS

Excuse me while I cry for all the parents, including lead researcher Dr Carmel Harrington, who lived with guilt. And cry happy tears for parents in the future who will have access to screening and prevention. 😭 pic.twitter.com/LCp4A63HXd
— Debbie Mia (@TheDebbieMia) May 12, 2022

Great news, right? We’ve found what causes SIDS!

Well, that’s where it gets tricky, and we go off on yet another tangent.

Do Vitamin D supplements help? We know that sunlight gives us Vitamin D, and that’s A Good Thing. So if we don’t get enough sunlight, hey, let’s get Vitamin D injections or supplements:

In interpreting vitamin D-related study results, correlation should not be understood as causation. Diets composed of vitamin D–rich foods such as dairy products and salmon also contain high levels of other healthy nutrients. Those who have a high vitamin D level are likely to participate in active outdoor activities and exercises, to be interested in health issues, and to have a healthy lifestyle. Without considering these confounders, misleading results can be obtained. In the study by Kim et al.,4) a univariate analysis revealed a correlation between a low vitamin D level and a low quality of life score; however, its significance was lost when age, sex, income, education level, and disease state were considered.
Sometimes, correlations shown in cross-sectional studies are used as evidence for requiring vitamin D supplements. A recent increasing trend of taking vitamin D supplements may be due to these effects.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4961851/

What if Vitamin D is just a marker? That is, what if sunlight causes a lot of good things in our bodies, and it also causes Vitamin D levels in our body to go up? So it’s not sunlight that causes vitamin D to go up, and vitamin D that causes an increase in our wellbeing. Maybe it’s sunlight causing an uptick in our wellbeing and causing an increase in Vitamin D levels in our body? They (health and vitamin D levels) may just be correlated, without there being any causation.

What I’m about to say is important: I’m not a doctor. All I’m saying is, I’ve been confused often enough about correlation and causation to wonder about whether vitamin D causes good health. It is correlated, there’s no arguing with that. But causation? Ah, that’s another (very tricky) thing altogether.

And now that we have the mis-en-place of this blogpost done, let’s get the dish together.

Butyrylcholinesterase doesn’t necessarily cause SIDS in infants. Infants who die of SIDS stop breathing (for reasons that are still not understood clearly), and these infants have low levels of Butyrylcholinesterase. Butyrylcholinesterase may not even cause breathing to stop in infants. It is just a marker – there is correlation there, but we don’t know if there is causation.

In fact, the paper’s title itself says as much:

“Butyrylcholinesterase is a potential biomarker for Sudden Infant Death Syndrome”

But the tweet above speaks about how we’ve found the cause, and that’s not quite right.

Again, please don’t misunderstand me – the fact that this has been discovered is awesome, it is fantastic, and the joy, the relief and the euphoria should absolutely be there.

But:

Sally Clark lost her life at least in part to a fundamental misunderstanding of statistical theory, and we still don’t know what causes SIDS. We understand it better, but there is a ways to go.

The most underrated skill in statistics is the English language.

Words matter, and we all (myself included!) need to be more careful about what exactly we mean when we speak about statistics.

And thank god we’re closer to figuring out how to deal with the horrible, horrible thing that SIDS is.

But if you’re teaching or learning statistics, tread very, very carefully.

No Such Thing As Too Much Stats in One Week

I wrote this earlier this week:

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.
Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.
…
…
But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.
https://atomic-temporary-112243906.wpcomstaging.com/2021/05/19/correlation-causation-and-thinking-things-through/

And just today morning, I chanced upon this:

Ok so chapter 12 of “Noise” by Kahneman, Sibony and Sunstein contains a repeated, incorrect claim about stats / causality: the claim that no correlation implies no causation. Easy to disprove this claim, no math needed, see my next tweet. pic.twitter.com/AM2uaSXwmx
— Grand Theft Autocorrelation (@economeager) May 21, 2021

And so let’s try and take a walk down this rabbit hole!

Here are three statements:

If there is correlation, there must be causation.

I think we can all agree that this is not true.
If there is causation, there must be correlation.

That is what the highlighted excerpt is saying in the tweet above. I said much the same thing in my own blogpost the other day. The bad news (for me) is that I was wrong – and I’ll expand upon why I was wrong below.
If there is no correlation, there can be no causation

That is what Rachael Meager is saying the book is saying. I spent a fair bit of time trying to understand if this is the same as 2. above. I’ve never studied logic formally (or informally, for that matter), but I suppose I am asking the following:
..
..
If B exists, A must exist. (B is causation, A is correlation – this is just 2. above)
..
..
If we can show that A doesn’t exist, are we guaranteed the non-existence of B?
..
..
And having thought about it, I think it to be true. 3. is the same as 2.((Anybody who has studied logic, please let me know if I am correct!))

Rachael Meager then provides this example as support for her argument:

Imagine driving a car, reaching a hill and pumping the gas as you begin to go up so that your speed is constant. The correlation between pressing on the gas and the speed of the car is zero but they’re obviously causally related, it’s that the agent is optimizing speed!
— Grand Theft Autocorrelation (@economeager) May 21, 2021

This is not me trying to get all “gotcha” – and I need to say this because this is the internet, after all – but could somebody please tell me where I’m wrong when I reason through the following:

Ceteris paribus, there is a causal link between pressing on the gas and the speed of the car. (Ceteris paribus is just fancy pants speak – it means holding all other things constant.)

But when you bring in the going up a hill argument, ceteris isn’t paribus anymore, no? The correlation is very much still there. But it is between pressing on the gas and the speed of the car up the slope.

Forget the phsyics and accelaration and slope and velocity and all that. Think of it this way: the steeper the incline, the more you’ll have to press the accelerator to keep the speed constant. The causal link is between the degree to which you press on the gas and the steepness of the slope. That is causally linked, and therefore there is (must be!) correlation.((Association, really. See below))

Put another way:

If y is caused by x, then y and x must be correlated. But this is only true keeping all other things constant. And going from flat territory into hilly terrain is not keeping all other things constant.

No?

But even if my argument above turns out to be correct, I still was wrong when I said that causation implies correlation. I should have been more careful about distinguishing between association and correlation.

Ben Golub made the same argument (I think) that I did:

https://twitter.com/ben_golub/status/1395823496404557831

… and Enrique Otero pointed out the error in his tweet, and therefore the error in my own statement:

Causation does imply "some association", not necessarily correlation, and not necessarily between the two things you are looking at?https://t.co/ILlWMDYMT0
— Enrique Otero (@eoteromuras) May 21, 2021

Phew, ok. So: what have we learnt, and what do we know?

Here is where I stand right now:

Correlation doesn’t imply causation
I still think that if there is causation, there must be ~~correlation~~ association. But that being said, I should be pushing The Mixtape to the top of the list.
Words matter, and I should be more careful!

All in all, not a bad way to spend a Saturday morning.

Correlation, Causation and Thinking Things Through

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.

Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.

For example, this one chart from this magnificent website (and please, do take a look at all the charts):

But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

OK, you might be saying right now. So what?

Well, how about using this to figure out what ingredients were being used to make nuclear bombs? Say the government would like to keep the recipe (and the ingredients) for the nuclear bomb a secret. But what if you decide to take a look at the stock market data? What if you try to see if there is an increase in the stock price of firms that make the ingredients likely to be used in a nuclear bomb?

If the stuff that your firm produces (call this x) is in high demand, your firm’s stock price will go up (call this y). If y has gone up, it (almost certainly) will be because of x going up. So if I can check if y has gone up, I can assume that x will be up, and hey, I can figure out the ingredients for a nuclear bomb.

Sounds outlandish? Try this on for size:

Realizing that positive developments in the testing and mass production of the two-stage thermonuclear (hydrogen) bomb would boost future cash flows and thus market capitalizations of the relevant companies, Alchian used stock prices of publicly traded industrial corporations to infer the secret fuel component in the device in a paper titled “The Stock Market Speaks.” Alchian (2000) relates the story in an interview:
We knew they were developing this H-bomb, but we wanted to know, what’s in it? What’s the fissile material? Well there’s thorium, thallium, beryllium, and something else, and we asked Herman Kahn and he said, ‘Can’t tell you’… I said, ‘I’ll find out’, so I went down to the RAND library and had them get for me the US Government’s Dept. of Commerce Yearbook which has items on every industry by product, so I went through and looked up thorium, who makes it, looked up beryllium, who makes it, looked them all up, took me about 10 minutes to do it, and got them. There were about five companies, five of these things, and then I called Dean Witter… they had the names of the companies also making these things, ‘Look up for me the price of these companies…’ and here were these four or five stocks going like this, and then about, I think it was September, this was now around October, one of them started to go like that, from $2 to around $10, the rest were going like this, so I thought ‘Well, that’s interesting’… I wrote it up and distributed it around the social science group the next day. I got a phone call from the head of RAND calling me in, nice guy, knew him well, he said ‘Armen, we’ve got to suppress this’… I said ‘Yes, sir’, and I took it and put it away, and that was the first event study. Anyway, it made my reputation among a lot of the engineers at RAND.
https://www.sciencedirect.com/science/article/abs/pii/S0929119914000546

I learnt about this while reading Navin Kabra’s Twitter round-up from yesterday. Navin also mentions the discovery of Neptune using the same underlying principle, and then asks this question:

Do you know other, more recent examples of people deducing important information by guessing from correlated data?
https://futureiq.substack.com/p/best-of-twitter-antifragility-via

… and I was reminded of this tweet:

Gujarat issued 1.23 lakh death certificates in March 1-May 10 period this year in comparison to 58 thousand issued in the same period last year: Divya Bhaskar

This means Gujarat issued 65,085 more death certificates in March 1-May 10 period this year.

65,085.

(1/n) pic.twitter.com/9a6QZPIKHF
— Deepak Patel (@deepakpatel_91) May 14, 2021

Whether it is Neptune, the nuclear bomb or the under-reporting of Covid deaths, the lesson for you as a student of economics is this: when you marry the ability to connect the dots with the ability to understand and apply statistics, truly remarkable things can happen.

Of course, the reverse is equally true, and perhaps even more important. When you marry the ability to connect the dots with a misplaced ability to understand and apply statistics, truly horrific things can happen.

Tread carefully when it comes to statistics!

EC101: Links for 3rd October, 2019

Everything is correlated.
..
..
For students at Gokhale Institute for sure, but elsewhere too: the Stiglitz essay prize.
..
..
Capitalim vs Socialism.
..
..
On reforming the PhD.
..
..
On complements, substitutes, YouTube and reading.

Tech: Links for 13th July, 2019

Five articles by Michael Nielsen. If you aren’t familiar with Michael Nielsen, this is a great place to start!

His version of how to write better.
..
..
A scientist’s explanation of Arrow’s Impossibility Theorem.
..
..
May this come true, and right soon.
..
..
“In the US House of Representatives, 61 percent of Democrats voted for the Civil Rights Act, while a much higher percentage, 80 percent, of Republicans voted for the Act. You might think that we could conclude from this that being Republican, rather than Democrat, was an important factor in causing someone to vote for the Civil Rights Act. However, the picture changes if we include an additional factor in the analysis, namely, whether a legislator came from a Northern or Southern state. If we include that extra factor, the situation completely reverses, in both the North and the South. Here’s how it breaks down:North: Democrat (94 percent), Republican (85 percent)
South: Democrat (7 percent), Republican (0 percent)

Yes, you read that right: in both the North and the South, a larger fraction of Democrats than Republicans voted for the Act, despite the fact that overall a larger fraction of Republicans than Democrats voted for the Act.”
..
..
One of my favorite problems from statistics: Simpson’s Paradox. And an old frenemy: correlation is not causation.
..
..
Memory, and how to get better at it.

EC101: Links for 20th June, 2019

“One needs to be cautious in these type of businesses trading at higher multiples as slip in any one of the parameters – decline in sales and profit growth, build up of debt, deterioration in working capital, capital misallocation – wrong acquisitions and expansions will lead to derating of the stock quickly. The company has shown no signs of these as of now and investors need to keep a close look at these.”
..
..
A vastly under-rated skill among economics students. The theory of (and in this case also the application of) reading a balance sheet. Read this article to get a sense of how to read one – and in an ideal world, try to write a similar article about a firm of your choice.
..
..
“In other words, to quote Simon, “so long as the rate of interest remains constant, an advance in technology can only produce a rising level of real wages. The only route through which technological advance could lower real wages would be by increasing the capital coefficient (the added cost being compensated by a larger decline in the labor coefficient), thereby creating a scarcity of capital and pushing interest rates sharply upward.” In other words, the price of capital would have to rise by more than the price of consumption.”
..
..
Under what circumstances will advances in technology cause the real wage rate to go down? The vastly under-rated Herbert Simon provided an answer to this question way back when – read this article to find out its rediscovery.
..
..
“Now that the crisis is in the rearview mirror and the current expansion is nearing the longest on record, is it possible to go back to having a balance sheet as small as in 2007? The answer is no. The amount of currency in circulation has grown so much that it is not possible to shrink the balance sheet to its earlier size. This is good news because it reflects a growing economy. The larger balance sheet also reflects banks wanting to hold more reserves at the Fed. Banks partly hold these highly liquid and essentially risk-free assets to meet new liquidity regulations designed to improve the resilience of the overall financial system.”
..
..
A short, but useful essay about the huge expansion to the Federal Reserve’s balance sheet, and why it is unlikely to shrink anytime soon. A useful read for students of monetary economics.
..
..
“The correlation phrase has become so common and so irritating that a minor backlash has now ensued against the rhetoric if not the concept. No, correlation does not imply causation, but it sure as hell provides a hint. Does email make a man depressed? Does sadness make a man send email? Or is something else again to blame for both? A correlation can’t tell one from the other; in that sense it’s inadequate. Still, if it can frame the question, then our observation sets us down the path toward thinking through the workings of reality, so we might learn new ways to tweak them. It helps us go from seeing things to changing them.”
..
..
The phrase is burned onto my brain, as it is for everybody else who ever attended a statistics class. “Correlation is not causation” Sure, it isn’t – but this article warns us against the over-use of this phrase, and how it might have ended up making us not think deeper.
..
..
“The Baumol effect reminds us that all prices are relative prices. An implication is that over time prices have very little connection to affordability. If the price of the same can of soup is higher at Wegmans than at Walmart we understand that soup is more affordable at Walmart. But if the price of the same can of soup is higher today than in the past it doesn’t imply that soup was more affordable in the past, even if we have done all the right corrections for inflation.”
..
..
A short, but very readable interpretation of the Baumol effect – and as this excerpt makes clear, also a great reminder of the fact that all prices, everywhere and always, are relative.

Links for 17th May, 2019

“Despite the 73rd and 74th Constitutional amendments, except in a few states, there has been little progress at decentralization—to both rural and urban local bodies. Most state governments have been reluctant to devolve the functions, funds and functionaries for delivering public services at the local level. The functions assigned are unclear, funds uncertain and inadequate, and decision-making functionaries are mostly drawn from the state bureaucracy. Local bodies do not even have powers to determine the base and rate structure of the taxes assigned to them. The states have not cared to create institutions and systems mandated in the Constitution, including the appointment of the State Finance Commissions, and even when they are appointed, states have not found it obligatory to place their reports in the legislature. In fact, the local bodies are not clear about delivering local public goods, with the prominent agenda of implementing central schemes obscuring their functions.”
..
..
M. Govinda Rao pulls no punches in pointing out how and why decentralization hasn’t (and likely will not) taken place in India. This is a conversation more people need to be having in India – and in particular, to aid meaningful urbanization.
..
..
“I love this paper because it is ruthless. The authors know exactly what they are doing, and they are clearly enjoying every second of it. They explain that given what we now know about polygenicity, the highest-effect-size depression genes require samples of about 34,000 people to detect, and so any study with fewer than 34,000 people that says anything about specific genes is almost definitely a false positive; they go on to show that the median sample size for previous studies in this area was 345.”
..
..
Slate Star Codex helps us understand the importance of learning (and applying!) statistics. The website is more than worth following, by the way.
..
..
“Sucking the life out of a mango is one of those primal pleasures that makes life feel worthwhile. The process is both elaborate and rewarding. The foreplay that loosens up the pulp inside, the careful incision at the top that allows access without a juice overrun, and then the sustained act of sucking every bit juice from the helpless peel. Senses detach themselves from the body and attach themselves to the mango, and even mobile phones stop ringing. The world momentarily rests in our mouths as we slurp, suck and slaver at the rapidly disappearing pulp. The mango is manhandled vigorously till only the gutli remains which is scraped off till it has nothing left to confess. As is evident, there is no elegant way to eat this kind of mango, no delicate and dignified method that approximates any form of refinement, which is just as well, for the only way to enjoy a mango is messily.”
..
..
An excellent column about an excellent fruit – there isn’t that much more to say! I completely agree with the bit about serving aamras front and center, rather than as an afterthought, by the way.
..
..
“Welcome to the 4th Annual Top Economics Blogs list. For the 2019 edition, we’ve added many newcomers, as well as favorites which continue to provide quality insight year after year. Like lists in previous years (2018, 2017, 2016), the new 2019 list features a broad range of quality blogs in practically every economic discipline. Whether you are interested in general economics or prefer more specific topics such as finance, healthcare economics, or environmental economics; there is something here for you. You will also find blogs which focus on microeconomics, macroeconomics, and the economics of specific geographical regions.Whether you are a student, economics professional, or just someone with a general interest in how economic issues affect the world around you, you’re certain to find the perfect blog for your specific needs.”
..
..
The most comprehensive answer to that most perennial of questions: what should I read?
Bonus! If you’re wondering how to keep up with all of this, this might help.
..
..
“India should do the same with our state capitals. The Union government can create fiscal and other incentives to encourage state governments to shift their capitals to brown- or green-field locations. Mumbai, Bengaluru, Hyderabad, Chennai, Jaipur or Lucknow, for instance, will continue to thrive even if the state government offices move out. Their respective states will benefit from a new urban engine powered by government.”
..
..
I have been sceptical about the feasibility of doing something like this – my reading of urbanization has always been that it more of an organic process – cities grow (or not) of their own accord, and rarely as a planned endeavor. But maybe I’m wrong?

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: