No Such Thing As Too Much Stats in One Week

I wrote this earlier this week:

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.
Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.


But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

https://econforeverybody.com/2021/05/19/correlation-causation-and-thinking-things-through/

And just today morning, I chanced upon this:

And so let’s try and take a walk down this rabbit hole!

Here are three statements:

  1. If there is correlation, there must be causation.

    I think we can all agree that this is not true.
  2. If there is causation, there must be correlation.

    That is what the highlighted excerpt is saying in the tweet above. I said much the same thing in my own blogpost the other day. The bad news (for me) is that I was wrong – and I’ll expand upon why I was wrong below.
  3. If there is no correlation, there can be no causation

    That is what Rachael Meager is saying the book is saying. I spent a fair bit of time trying to understand if this is the same as 2. above. I’ve never studied logic formally (or informally, for that matter), but I suppose I am asking the following:
    ..
    ..
    If B exists, A must exist. (B is causation, A is correlation – this is just 2. above)
    ..
    ..
    If we can show that A doesn’t exist, are we guaranteed the non-existence of B?
    ..
    ..
    And having thought about it, I think it to be true. 3. is the same as 2.1

Rachael Meager then provides this example as support for her argument:

This is not me trying to get all “gotcha” – and I need to say this because this is the internet, after all – but could somebody please tell me where I’m wrong when I reason through the following:

Ceteris paribus, there is a causal link between pressing on the gas and the speed of the car. (Ceteris paribus is just fancy pants speak – it means holding all other things constant.)

But when you bring in the going up a hill argument, ceteris isn’t paribus anymore, no? The correlation is very much still there. But it is between pressing on the gas and the speed of the car up the slope.

Forget the phsyics and accelaration and slope and velocity and all that. Think of it this way: the steeper the incline, the more you’ll have to press the accelerator to keep the speed constant. The causal link is between the degree to which you press on the gas and the steepness of the slope. That is causally linked, and therefore there is (must be!) correlation.2

Put another way:

If y is caused by x, then y and x must be correlated. But this is only true keeping all other things constant. And going from flat territory into hilly terrain is not keeping all other things constant.

No?


But even if my argument above turns out to be correct, I still was wrong when I said that causation implies correlation. I should have been more careful about distinguishing between association and correlation.

Ben Golub made the same argument (I think) that I did:

… and Enrique Otero pointed out the error in his tweet, and therefore the error in my own statement:


Phew, ok. So: what have we learnt, and what do we know?

Here is where I stand right now:

  1. Correlation doesn’t imply causation
  2. I still think that if there is causation, there must be correlation association. But that being said, I should be pushing The Mixtape to the top of the list.
  3. Words matter, and I should be more careful!

All in all, not a bad way to spend a Saturday morning.

  1. Anybody who has studied logic, please let me know if I am correct![]
  2. Association, really. See below[]

Correlation, Causation and Thinking Things Through

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.

Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.

For example, this one chart from this magnificent website (and please, do take a look at all the charts):

https://www.tylervigen.com/spurious-correlations

But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

OK, you might be saying right now. So what?

Well, how about using this to figure out what ingredients were being used to make nuclear bombs? Say the government would like to keep the recipe (and the ingredients) for the nuclear bomb a secret. But what if you decide to take a look at the stock market data? What if you try to see if there is an increase in the stock price of firms that make the ingredients likely to be used in a nuclear bomb?

If the stuff that your firm produces (call this x) is in high demand, your firm’s stock price will go up (call this y). If y has gone up, it (almost certainly) will be because of x going up. So if I can check if y has gone up, I can assume that x will be up, and hey, I can figure out the ingredients for a nuclear bomb.

Sounds outlandish? Try this on for size:

Realizing that positive developments in the testing and mass production of the two-stage thermonuclear (hydrogen) bomb would boost future cash flows and thus market capitalizations of the relevant companies, Alchian used stock prices of publicly traded industrial corporations to infer the secret fuel component in the device in a paper titled “The Stock Market Speaks.” Alchian (2000) relates the story in an interview:
We knew they were developing this H-bomb, but we wanted to know, what’s in it? What’s the fissile material? Well there’s thorium, thallium, beryllium, and something else, and we asked Herman Kahn and he said, ‘Can’t tell you’… I said, ‘I’ll find out’, so I went down to the RAND library and had them get for me the US Government’s Dept. of Commerce Yearbook which has items on every industry by product, so I went through and looked up thorium, who makes it, looked up beryllium, who makes it, looked them all up, took me about 10 minutes to do it, and got them. There were about five companies, five of these things, and then I called Dean Witter… they had the names of the companies also making these things, ‘Look up for me the price of these companies…’ and here were these four or five stocks going like this, and then about, I think it was September, this was now around October, one of them started to go like that, from $2 to around $10, the rest were going like this, so I thought ‘Well, that’s interesting’… I wrote it up and distributed it around the social science group the next day. I got a phone call from the head of RAND calling me in, nice guy, knew him well, he said ‘Armen, we’ve got to suppress this’… I said ‘Yes, sir’, and I took it and put it away, and that was the first event study. Anyway, it made my reputation among a lot of the engineers at RAND.

https://www.sciencedirect.com/science/article/abs/pii/S0929119914000546

I learnt about this while reading Navin Kabra’s Twitter round-up from yesterday. Navin also mentions the discovery of Neptune using the same underlying principle, and then asks this question:

Do you know other, more recent examples of people deducing important information by guessing from correlated data?

https://futureiq.substack.com/p/best-of-twitter-antifragility-via

… and I was reminded of this tweet:


Whether it is Neptune, the nuclear bomb or the under-reporting of Covid deaths, the lesson for you as a student of economics is this: when you marry the ability to connect the dots with the ability to understand and apply statistics, truly remarkable things can happen.

Of course, the reverse is equally true, and perhaps even more important. When you marry the ability to connect the dots with a misplaced ability to understand and apply statistics, truly horrific things can happen.

Tread carefully when it comes to statistics!

EC101: Links for 3rd October, 2019

  1. Everything is correlated.
    ..
    ..
  2. For students at Gokhale Institute for sure, but elsewhere too: the Stiglitz essay prize.
    ..
    ..
  3. Capitalim vs Socialism.
    ..
    ..
  4. On reforming the PhD.
    ..
    ..
  5. On complements, substitutes, YouTube and reading.

Tech: Links for 13th July, 2019

Five articles by Michael Nielsen. If you aren’t familiar with Michael Nielsen, this is a great place to start!

  1. His version of how to write better.
    ..
    ..
  2. A scientist’s explanation of Arrow’s Impossibility Theorem.
    ..
    ..
  3. May this come true, and right soon.
    ..
    ..
  4. “In the US House of Representatives, 61 percent of Democrats voted for the Civil Rights Act, while a much higher percentage, 80 percent, of Republicans voted for the Act. You might think that we could conclude from this that being Republican, rather than Democrat, was an important factor in causing someone to vote for the Civil Rights Act. However, the picture changes if we include an additional factor in the analysis, namely, whether a legislator came from a Northern or Southern state. If we include that extra factor, the situation completely reverses, in both the North and the South. Here’s how it breaks down:North: Democrat (94 percent), Republican (85 percent)

    South: Democrat (7 percent), Republican (0 percent)

    Yes, you read that right: in both the North and the South, a larger fraction of Democrats than Republicans voted for the Act, despite the fact that overall a larger fraction of Republicans than Democrats voted for the Act.”
    ..
    ..
    One of my favorite problems from statistics: Simpson’s Paradox. And an old frenemy: correlation is not causation.
    ..
    ..

  5. Memory, and how to get better at it.

EC101: Links for 20th June, 2019

  1. “One needs to be cautious in these type of businesses trading at higher multiples as slip in any one of the parameters – decline in sales and profit growth, build up of debt, deterioration in working capital, capital misallocation – wrong acquisitions and expansions will lead to derating of the stock quickly. The company has shown no signs of these as of now and investors need to keep a close look at these.”
    ..
    ..
    A vastly under-rated skill among economics students. The theory of (and in this case also the application of) reading a balance sheet. Read this article to get a sense of how to read one – and in an ideal world, try to write a similar article about a firm of your choice.
    ..
    ..
  2. “In other words, to quote Simon, “so long as the rate of interest remains constant, an advance in technology can only produce a rising level of real wages. The only route through which technological advance could lower real wages would be by increasing the capital coefficient (the added cost being compensated by a larger decline in the labor coefficient), thereby creating a scarcity of capital and pushing interest rates sharply upward.” In other words, the price of capital would have to rise by more than the price of consumption.”
    ..
    ..
    Under what circumstances will advances in technology cause the real wage rate to go down? The vastly under-rated Herbert Simon provided an answer to this question way back when – read this article to find out its rediscovery.
    ..
    ..
  3. “Now that the crisis is in the rearview mirror and the current expansion is nearing the longest on record, is it possible to go back to having a balance sheet as small as in 2007? The answer is no. The amount of currency in circulation has grown so much that it is not possible to shrink the balance sheet to its earlier size. This is good news because it reflects a growing economy. The larger balance sheet also reflects banks wanting to hold more reserves at the Fed. Banks partly hold these highly liquid and essentially risk-free assets to meet new liquidity regulations designed to improve the resilience of the overall financial system.”
    ..
    ..
    A short, but useful essay about the huge expansion to the Federal Reserve’s balance sheet, and why it is unlikely to shrink anytime soon. A useful read for students of monetary economics.
    ..
    ..
  4. “The correlation phrase has become so common and so irritating that a minor backlash has now ensued against the rhetoric if not the concept. No, correlation does not imply causation, but it sure as hell provides a hint. Does email make a man depressed? Does sadness make a man send email? Or is something else again to blame for both? A correlation can’t tell one from the other; in that sense it’s inadequate. Still, if it can frame the question, then our observation sets us down the path toward thinking through the workings of reality, so we might learn new ways to tweak them. It helps us go from seeing things to changing them.”
    ..
    ..
    The phrase is burned onto my brain, as it is for everybody else who ever attended a statistics class. “Correlation is not causation” Sure, it isn’t – but this article warns us against the over-use of this phrase, and how it might have ended up making us not think deeper.
    ..
    ..
  5. “The Baumol effect reminds us that all prices are relative prices. An implication is that over time prices have very little connection to affordability. If the price of the same can of soup is higher at Wegmans than at Walmart we understand that soup is more affordable at Walmart. But if the price of the same can of soup is higher today than in the past it doesn’t imply that soup was more affordable in the past, even if we have done all the right corrections for inflation.”
    ..
    ..
    A short, but very readable interpretation of the Baumol effect – and as this excerpt makes clear, also a great reminder of the fact that all prices, everywhere and always, are relative.

Links for 17th May, 2019

  1. “Despite the 73rd and 74th Constitutional amendments, except in a few states, there has been little progress at decentralization—to both rural and urban local bodies. Most state governments have been reluctant to devolve the functions, funds and functionaries for delivering public services at the local level. The functions assigned are unclear, funds uncertain and inadequate, and decision-making functionaries are mostly drawn from the state bureaucracy. Local bodies do not even have powers to determine the base and rate structure of the taxes assigned to them. The states have not cared to create institutions and systems mandated in the Constitution, including the appointment of the State Finance Commissions, and even when they are appointed, states have not found it obligatory to place their reports in the legislature. In fact, the local bodies are not clear about delivering local public goods, with the prominent agenda of implementing central schemes obscuring their functions.”
    ..
    ..
    M. Govinda Rao pulls no punches in pointing out how and why decentralization hasn’t (and likely will not) taken place in India. This is a conversation more people need to be having in India – and in particular, to aid meaningful urbanization.
    ..
    ..
  2. “I love this paper because it is ruthless. The authors know exactly what they are doing, and they are clearly enjoying every second of it. They explain that given what we now know about polygenicity, the highest-effect-size depression genes require samples of about 34,000 people to detect, and so any study with fewer than 34,000 people that says anything about specific genes is almost definitely a false positive; they go on to show that the median sample size for previous studies in this area was 345.”
    ..
    ..
    Slate Star Codex helps us understand the importance of learning (and applying!) statistics. The website is more than worth following, by the way.
    ..
    ..
  3. “Sucking the life out of a mango is one of those primal pleasures that makes life feel worthwhile. The process is both elaborate and rewarding. The foreplay that loosens up the pulp inside, the careful incision at the top that allows access without a juice overrun, and then the sustained act of sucking every bit juice from the helpless peel. Senses detach themselves from the body and attach themselves to the mango, and even mobile phones stop ringing. The world momentarily rests in our mouths as we slurp, suck and slaver at the rapidly disappearing pulp. The mango is manhandled vigorously till only the gutli remains which is scraped off till it has nothing left to confess. As is evident, there is no elegant way to eat this kind of mango, no delicate and dignified method that approximates any form of refinement, which is just as well, for the only way to enjoy a mango is messily.”
    ..
    ..
    An excellent column about an excellent fruit – there isn’t that much more to say! I completely agree with the bit about serving aamras front and center, rather than as an afterthought, by the way.
    ..
    ..
  4. “Welcome to the 4th Annual Top Economics Blogs list. For the 2019 edition, we’ve added many newcomers, as well as favorites which continue to provide quality insight year after year. Like lists in previous years (2018, 2017, 2016), the new 2019 list features a broad range of quality blogs in practically every economic discipline. Whether you are interested in general economics or prefer more specific topics such as finance, healthcare economics, or environmental economics; there is something here for you. You will also find blogs which focus on microeconomics, macroeconomics, and the economics of specific geographical regions.Whether you are a student, economics professional, or just someone with a general interest in how economic issues affect the world around you, you’re certain to find the perfect blog for your specific needs.”
    ..
    ..
    The most comprehensive answer to that most perennial of questions: what should I read?
    Bonus! If you’re wondering how to keep up with all of this, this might help.
    ..
    ..
  5. “India should do the same with our state capitals. The Union government can create fiscal and other incentives to encourage state governments to shift their capitals to brown- or green-field locations. Mumbai, Bengaluru, Hyderabad, Chennai, Jaipur or Lucknow, for instance, will continue to thrive even if the state government offices move out. Their respective states will benefit from a new urban engine powered by government.”
    ..
    ..
    I have been sceptical about the feasibility of doing something like this – my reading of urbanization has always been that it more of an organic process – cities grow (or not) of their own accord, and rarely as a planned endeavor. But maybe I’m wrong?