Should students of law be taught statistics?

I teach statistics (and economics) for a living, so I suppose asking me this question is akin to asking a barber if you need a haircut.

But my personal incentives in this matter aside, I would argue that everybody alive today needs to learn statistics. Data about us is collected, stored, retrieved, combined with other data sources and then analyzed to reach conclusions about us, and at a pace that is now incomprehensible to most of us.

This is done by governments, and private businesses, and it is unlikely that we’re going to revert to a world where this is no longer the case. You and I may have different opinions about whether this intrusive or not, desirable or not, good or not – but I would argue that this ship has sailed for the foreseeable future. We (and that’s all of us) are going to be analyzed, like it or not.

And conclusions are going to be made about us on the basis of that analysis, like it or not. This could be, for example, a computer in a company analyzing us as a high value customer and according us better service treatment when we call their call center. Or it could be a computer owned by a government that decides that we were at a particular place at a particular time on the basis of the footage from a security camera.

In both of these cases (and there are millions of other examples besides), there is no human being who makes these decisions about us. Machines do. This much is obvious, because it is now beyond the capacity of our species to deal manually with the amount of data that we generate on a daily basis. And so the machines have taken over. Again, you and I may differ on whether this is a good thing or a bad thing, but the fact is that it is a trend that is unlikely to be reversed in the foreseeable future.

Are the conclusions that these machines reach infallible in nature? Much like the humans that these machines have replaced, no. They are not infallible. They process information much faster than we humans can, so they are definitively better in handling much more data, but machines can make errors in classification, just like we can. Here, have fun understanding what this means in practice.

Say this website asks you to draw a sea turtle. And so you start to draw one. The machine “looks” at what you’ve drawn, and starts to “compare” it with its rather massive data bank of objects. It identifies, very quickly, those objects that seem somewhat similar in shape to those that you are drawing, and builds a probabilistic model in the process. And when it is “confident” enough that it is giving the right answer, it throws up a result. And as you will have discovered for yourself, it really is rather good at this game.

But is it infallible? That is, is it perfect every single time? Much like you (the artist) are not, so also with the machine. It is also not perfect. Errors will be made, but so long as they are not made very often, and so long as they aren’t major bloopers, we can live with the trade-off. That is, we give up control over decision making, and we gain the ability to analyze and reach conclusions about volumes of data that we cannot handle.

But what, exactly, does “very often” mean in the previous paragraph? One error in ten? One in a million? One in an impossibly-long-word-that-ends-in-illion? Who gets to decide, and on what basis?

What does the phrase “major blooper” mean in that same paragraph? What if a machine places you on the scene of a crime on the basis of security camera footage when you were in fact not there? What if that fact is used to convict you of a crime? If this major blooper occurs once in every impossibly-long-word-that-ends-in-illion times, is that ok? Is that an acceptable trade-off? Who gets to decide, and on what basis?

If you are a lawyer with a client who finds themselves in such a situation, how do you argue this case? If you are a judge listening to the arguments being made by this lawyer, how do you judge the merits of this case? If you are a legislator framing the laws that will help the judge arrive at a decision, how do decide on the acceptable level of probabilities?

It needn’t be something as dramatic as a crime, of course. It could be a company deciding to downgrade your credit score, or a company that decides to shut off access to your own email, or a bank that decides that you are not qualified to get a loan, or any other situation that you could come up with yourself. Each of these decisions, and so many more besides, are being made by machines today, on the basis of probabilities.

Should members of the legal fraternity know the nuts and bolts of these models, and should we expect them to be experts in neural networks and the like? No, obviously not.

But should members of the legal fraternity know the principles of statistics, and have an understanding of the processes by which a probabilistic assessment is being made? I would argue that this should very much be the case.

But at the moment, to the best of my knowledge, this is not happening. Lawyers are not trained in statistics. I do not mean to pick on any one college or university in particular, and I am not reaching a conclusion on the basis of just one data point. A look at other universities websites, conversations with friends and family who are practicing lawyers or are currently studying law yields the same result. (If you know of a law school that does teach statistics, please do let me know. I would be very grateful.)

But because of whatever little I know about the field of statistics, and for the reasons I have outlined above, I argue that statistics should be taught to the students of law. It should be a part of the syllabus of law schools in this country, and the sooner this happens, the better it will be for us as a society.

The Economist on How To Compile an Index

I had blogged recently about a Tim Harford column. In that column, he had spoken about the controversy surrounding the Ease of Doing Business rankings, and ruminated about why the controversy was, in a sense, inevitable.

Alex Selby-Boothroyd, the head of data journalism at The Economist magazine, has a section in one of their newsletters titled “How to compile an index”:

In any ranking of our Daily charts, it is no small irony that some of the most viewed articles will be those that use indices to rank countries or cities. The cost-of-living index from the EIU that we published last week is a case in point. It was the most popular article on our website for much of the week. Readers came to find out not just which city was the world’s most expensive, but also where their own cities were placed. The popularity of such lists is unsurprising: most people take pride in where they live and want to see how it compares with other places, and there’s also a desire to “locate yourself within the data”. But how are these rankings created?

Source: Off The Charts Newsletter From The Economist

Alex makes the same point that Tim did in his column – rankings just tend to be more viral. What that says about us as a society is a genuinely interesting question, but we won’t go down that path today. We will learn instead the concepts behind the creation of an index.

There are, as the newsletter mentions, two different kinds of indices you want to think about. One is relatively speaking simpler to work with, because it is quantitative. Now, if you are just about beginning your journey into the dark arts of stats and math, you might struggle to wrap your head around the fact that making something quantitative makes it simpler. And trust me, I know the feeling. And I’ll get to why qualitative data is actually harder in just a couple of paragraphs.

But for the moment, let’s focus on the cost-of-living index that the excerpt was referring to:

EIU correspondents visit shops in 173 cities around the world and collect multiple prices for each item in a globally comparable basket of goods and services. These prices are averaged and weighted and then converted from local currency into US dollars at the prevailing exchange rate. The overall value is then indexed relative to New York City’s basket, the cost of which is set at 100.

Source: Off The Charts Newsletter From The Economist

Here are some questions you should be thinking about for having read the paragraph:

  • Why these 173 cities and no other? Has the list changed over time? Whether yes or no, why?
  • How does one decide upon a “globally comparable basket of goods and services”. No such list can ever be perfect, so how does one decide when it is “good enough”?
  • How are these prices averaged and weighted? Weighted by what?
  • Why does The Economist magazine not use the purchasing power parity adjust exchanged rate?
  • Why New York City’s basket? Why no other city?

I do not for a minute mean to suggest that these should be your only questions – see if you can come up with more, and try and bug your friends and stats professor with these questions. Even better, see if you can do this as an in-class exercise!

Not all indices are so straightforward. Sometimes they are used to measure something more subjective. The EIU has another index that ranks cities by the quality of life they provide. For this, in-country experts assess more than 30 indicators such as the prevalence of petty crime, the quality of public transport or the discomfort of the climate for travellers. Each indicator is assigned a qualitative score: acceptable, tolerable, uncomfortable, undesirable or intolerable. These words are assigned a numerical value and a ranking begins to emerge. The scoring system is fine-tuned by giving different weightings to each category (the EIU weights the “stability” indicators slightly higher than the “infrastructure” questions, for example). Further tweaking of the weights might be required, such as when the availability of health care becomes more important during a pandemic.

Source: Off The Charts Newsletter From The Economist

You see why qualitative data is more problematic? Just who, exactly, are in-country experts? Experts on what basis? As decided by whom?

I should be clear – this is in no way a criticism of the methodology used by The Economist. In fact, in the very next paragraph, the newsletter explains the problems with a qualitative index. And in much the same vein, I am simply trying to explain to you why a qualitative index is so problematic, regardless of who tries to build one.

But the problem is a real one! Expertise in matters such as these is all but impossible to assess accurately, and the inherent biases of these experts are also going to get baked into these assessments. And not just biases, their moods and state of mind are also going to be baked into these assessments. Again, this is not a criticism, it is inevitable.

And the biggest problem of them all: the subjectivity of not the experts, but rather the scale itself!

Qualitative rankings are built on subjective measures. Perhaps “tolerable” means almost the same to someone as “uncomfortable”—whereas “intolerable” might feel twice as bad as “undesirable”? On ordinal scales the distance between these measures is subjective—and yet they have to be assigned a numerical score for the ranking to work.

Source: Off The Charts Newsletter From The Economist

Statistical analysis of qualitative data is problematic, and I cannot begin to tell you how often statistical tools are misapplied in this regard. If you are learning statistics for the first time, take it from me: spend hours understanding the nature of the data you are working with. It will save you hours of rework later.

And finally, have fun exploring some of The Economist’s own indices (if these happen to behind a paywall, my apologies!):

On the Etymology of Risk

I often like to begin classes on statistics by talking about the etymology of the word average, and it is such a lovely story:

Everybody associated with transporting goods by sea had to deal with the chance that only a part of the consignment would actually reach the intended destination. There was always the chance that a part of the consignment would go bad, or needed to be jettisoned, or some such. Who bears the loss of this part of the total consignment? Should it be the sending merchant, the receiving merchant or should it be the captain of the ship?

Thus, when for the safety of a ship in distress any destruction of property is incurred, either by cutting away the masts, throwing goods overboard, or in other ways, all persons who have goods on board or property in the ship (or the insurers) contribute to the loss according to their average, that is, according to the proportionate value of the goods of each on board. [Century Dictionary]

The latter half of that excerpt above is nothing but “sigma x by n” – the total losses divided by the number of people involved. This, of course, is nothing but the formula for average. But the word itself comes from the word loss, but in Arabic – awargi, or awariya. Or as I like to tell my students, you’re really speaking Arabic when you’re saying “average”. has a lovely essay on both the etymology of, and the emergence of the concept of, risk. Authored by Karla Mallette, it is a lovely little rumination on both the meaning of the word, and how it has evolved over time.

The first known usage of the Latin word resicum – cognate and distant ancestor of the English risk – occurs in a notary contract recorded in Genoa on 26 April 1156. The captain of a ship contracts with an investor to travel to Valencia with the sum invested. The contract allocates the ‘resicum’ to the investor.

This is entirely speculative on my part, because I know next to nothing about Latin, but a simple Google search for the meaning/etymology of resicum tells me that it means “that which cuts, rock, crag”. If one agrees with the notion that ship voyages at the time must have been fraught with risk, then the etymology of risk begins to make eminent sense – the entirety of the prospective profit from such a voyage can end up being cut down to zero. One could earn all of it, or one could get none of it – that, of course, is the risk involved in such a structure.

The essay remains of interest beyond just this point:

Before the innovation of the resicum, captain and crew took on the risks of the journey alone: only they would shoulder the burdens (and pocket the profits). But resicum shared out potential profit and loss among a broader community. It put a number on contingency, and in so doing it rationalised risk.

In this context, one needs to realize that the author is talking not about the original meaning of the word resicum. Rather, she is implying that resicum has a modern, institutional meaning now – the idea that resicum (or risk) is being diversified. The captain doesn’t bear the risk alone, although he does bear part of it (typically 25% in those times). Somewhat analogous to what we could call sweat equity these days, I suppose. The rest of the risk, or resicum, is parcelled out to investors who are willing to stump up the cost of the voyage. If the captain comes back empty handed, they lose their investment. If the captain comes back from the voyage, his ship laden with precious cargo, then the investors reap the benefits of having funded the voyage.

This arrangement was called resicum, and it seems to have meant an arrangement which had the ability (but not the guarantee) to provide sustenance.

Historians believe that resicum derived from an Arabic word, al-rizq. The Arabic rizq is Quranic. It refers to God’s provision for creation. This verse, for instance, uses the noun and a verb derived from the same lexical root, and refers to the sustenance that God provides for all of creation: ‘And how many a creature does not carry its own provision [rizq]! God provides for them and for you: he is the All-Hearing, the All-Knowing.’ During the Middle Ages, the word was used to name the daily subsistence pay given to soldiers. In the dialect of al-Andalus (Arab Spain), it referred to chance or good fortune. Rizq, it seems, bounced from port to port around the Mediterranean, until it landed on the worktable of a scribe in Genoa recording a strategy used to share out the risk of trans-Mediterranean trading ventures by betting against catastrophe.

So from providing for, to meaning good fortune, to our modern understanding of the word risk, the word has been on quite a journey, and is in fact a good way to understand all of what risk means.

A little postscript: I came across this article via The Browser. And second, if you haven’t read it, Against the Gods: The Remarkable Story of Risk by Peter Bernstein is a good introductory book to read about the topic.

Bibek Debroy on loopholes in the CPC

That’s the Civil Procedure Code.

The average person will not have heard of Dipali Biswas or Nirmalendu Mukherjee and may not be aware of the case decided by the Supreme Court on October 5, 2021. The case was decided by a division bench, consisting of Hemant Gupta and V Ramasubramanian and the judgment was authored by Justice V Ramasubramanian. Justice Ramasubramanian observed (not part of the judgment), “Not to be put off by repeated failures, the appellants herein, like the tireless Vikramaditya, who made repeated attempts to capture Betal, started the present round and hopefully the final round.” Other than smiling about a case that took 50 years to be resolved and making wisecracks about “tareekh pe tareekh”, shouldn’t we be concerned about rules and procedures (all in the name of natural justice) that permit a travesty of justice?

I know (alas) next to nothing about the law, but there were two excerpts in this article that I wanted to highlight as a student of statistics and economics. We’ll go with statistics first.

Whenever I start to teach a new course, I always tell my students that there are two kinds of errors I can make. I can either make sure that I complete the syllabus, irrespective of whether everybody has understood it or not. Or I can make sure that everybody has understood whatever I have taught, irrespective of whether the syllabus is completed or not. Speed versus thoroughness, if you will – and both cannot be optimized for at the same time. If you’re wondering, I prefer to err on the side of making sure everybody has understood, even if it comes at the cost of an incomplete syllabus.

This is, of course, closely related to formulating the null hypothesis and asking which type of error one would rather avoid. And the reason I bring it up, is because of this exceprt:

Innumerable judgments have quoted the maxim, “justice hurried is justice buried”. By the same token, justice tarried is also justice buried and inordinate delays mean the legal system doesn’t provide adequate deterrence to mala fide action. In my view, for most civil cases, the moment issues are framed, one can predict the outcome within a range, with a reasonable degree of certainty. (Obviously, I don’t mean constitutional cases before the Supreme Court.) With no disrespect to the legal system, I think AI (artificial intelligence) is capable of delivering judgments in such cases, freeing court time for non-trivial cases.

“Justice hurried is justice buried” and “Justice tarried is justice buried” are both problems, and optimizing for one means not optimizing for the other. What Bibek Debroy is saying here is that what we have ended up choosing to optimize for the former. We make sure that every case has the opportunity to be heard at great length, and with sufficient maneuvering room for both parties.

And that’s great, but the opportunity cost is the fact that sometimes judgments can take over fifty years (and counting!).

And what is Bibek Debroy’s solution? When he suggests that AI is capable of delivering judgments in such cases, he is not saying that the AI will give a perfect judgment every time. He is not even saying that one should use AI (I think the point is rhetorical, although of course I could be wrong). He is saying that the gains in efficiency are worth the occasional case being incorrectly judged. In other words, he is optimizing for justice tarried is also justice buried – he would rather avoid the error of taking up too much time for each case, and would (presumably) be fine paying the price of having the occasional case being misjudged.

It is up to you to agree or disagree with him, or with me when it comes to how I conduct classes. But all of us should be cognizant of the opportunity costs when we decide which error we’d rather avoid!

And economics second:

Litigants and lawyers (at least on one side of a civil case) have no incentive to finish a case fast (Does the judiciary have it?).

This is more of a question (or rumination) on my part – what are the incentives of the judiciary? I can imagine scenarios in which those “on one side of a civil case” can use both official rules and underhanded stratagems to delay the eventual judgment. And since there is no incentivization in terms of speedier resolutions, are we just left with a system that is geared towards moving along ponderously forever more?

And if so, how might this be changed for the better? This is, and I’m not joking, (more than) a trillion dollar question.

And finally, as a bonus, culture:

My friend Murali Neelakantan makes the point here that isn’t really about incentive design at all, that the problem is more rooted in how we, the people of India, use and abuse the provisions of the CPC.

That takes me into even deeper and ever more unfamiliar waters, so I shall think more about this before trying to write about it!

The Data and The Narrative

This week is Back to College at the Gokhale Institute. A podcast that I started a couple of years ago has become a tradition of sorts at the start of each semester at the BSc programme.

For about a week, we have people come and speak to us. All of them answer a simple question in a variety of ways. And that question is this: what would you do differently if you got the chance to go back to college? It’s a simple question, and can be answered in myriad ways. Here are some of the past talks, if you’re interested.

There’s one theme that has come up in all of the talks so far, and often enough for me to want to emphasize on it further. All of the speakers have spoken about the importance of doing the analysis, but also having the ability to build a story around it. Most folks are perhaps good at one, but not the other, and rarely both.

As an economist, almost all of the speakers have said, we have nowadays the ability to build models and run regressions. Building out a more sophisticated model, tweaking it, refining it, is either already possible, or can be learnt relatively easily. But where we lose out on, as young economists entering the workforce, is in our ability to explain what we’ve done.

I often say in my classes on statistics that the most underrated skill that a statistician possesses is the English language. I usually get confused laughter by way of response, but I am, of course, getting at much the same point. Unless you have the ability to explain what your model implies for the business problem at hand, you haven’t really done your work. And when I say explain, I mean using the English language.

Each of our speakers for the week so far have made the same point in their own way. Technical ability is table stakes. The differentiator is the ability to expand on what you’ve done, in a way that resonates with the listener. And resonance means the ability to tell a story about how what you’ve done is A Good Thing For The Business.

There are many other lessons to have come out of this week’s talks, and more, I’m sure, to come. But this is worth internalizing and working upon for all of us (myself included): it’s about the analysis and the narrative.

JEP, p-values and tests of statistical significance

The Summer 2021 issue of the Journal of Economic Perspectives came out recently:

I have been the Managing Editor of the Journal of Economic Perspectives since the first issue in Summer 1987. The JEP is published by the American Economic Association, which decided about a decade ago–to my delight–that the journal would be freely available on-line, from the current issue all the way back to the first issue. You can download individual articles or the entire issue, and it is available in various e-reader formats, too. Here, I’ll start with the Table of Contents for the just-released Summer 2021 issue, which in the Taylor household is known as issue #137.

(JEP is a great journal to read as a student. If you’re looking for a good place to start, may I recommend the Anomalies column?)

Of particular interest this time around is the section on statistical significance. This paper, in particular, was an enjoyable read.

And reading that paper reminded of a really old blogpost written by an ex-colleague of mine:

The author starts off by emphasizing the importance of developing a statistical toolbox. Indeed statistics is a rich subject that can be enjoyed by thinking through a given problem and applying the right kind of tools to get a deeper understanding of the problem. One should approach statistics with a bike mechanic mindset. A bike mechanic is not addicted to one tool. He constantly keeps shuffling his tool box by adding new tools or cleaning up old tools or throwing away useless tools etc. Far from this mindset, the statistics education system imparts a formula oriented thinking amongst many students. Instead of developing a statistical or probabilistic thinking in a student, most of the courses focus on a few formulae and teach them null hypothesis testing.

If you are a student of statistics, and think that you “get” statistics, please read the post in its entirety. Don’t worry if you get confused – that is, in a way, the point of that post. It challenges you by asking a very simple question: do you really “get” statistics? And the answer is almost always in the negative (and that goes for me too!)

And my final recommendations du jour is this (extremely passionately) written tirade:

We want to persuade you of one claim: that William Sealy Gosset (1876-1937)—aka “Student” of “Student’s” t-test—was right, and that his difficult friend, Ronald A. Fisher (1890-1962), though a genius, was wrong. Fit is not the same thing as importance. Statistical significance is not the same thing as scientific importance or economic sense. But the mistaken equation is made, we find, in 8 or 9 of every 10 articles appearing in the leading journals of science, economics to medicine. The history of this “standard error” of science involves varied characters and plot twists, but especially R. A. Fisher’s canonical translation of “Student’s” t. William S. Gosset aka “Student,” who was for most of his life Head Experimental Brewer at Guinness, took an economic approach to the logic of uncertainty. Against Gosset’s wishes his friend Fisher erased the consciously economic element, Gosset’s “real error.” We want to bring it back.

Although it might help by reading this review first:

However, thanks to an arbitrary threshold set by statistics pioneer R.A. Fisher, the term ‘significance’ is typically reserved for P values smaller than 0.05. Ziliak and McCloskey, both economists, promote a cost-benefit approach instead, arguing that decision thresholds should be set by considering the consequences of wrong decisions. A finding with a large P value might be worth acting upon if the effect would be genuinely clinically important and if the consequences of failing to act could be serious.

Statistics is a surprisingly, delightfully conceptual subject, and I’m still peeling away at the layers. Every year I think I understand it a little bit more, and every year I discover that there is much more to learn. The symposium on statistical significance in this summer’s issue of the JEP, RK’s blogpost and Deirdre McCloskey’s paper are good places to get started on unlearning what you’ve been taught in stats.

On Confidence Intervals

As with practically every other Indian household, so with mine. Trudging back home after having written the math exam was never much fun.

It wasn’t fun because most of your answers wouldn’t tally with those of your friends. But it wasn’t fun most of all because you knew the conversation that waited for you at home. Damocles had it easy in comparison.

“How was the exam?”, would be the opening gambit from the other side.

And because Indian kids had very little choice but to become experts at this version of chess very early on in life, we all know what the safest response was.

“Not bad”.

Safe, you see. Non-committal, and just the right balance of being responsive without encouraging further questioning.

It never worked, of course, because there always were follow-up questions.

“So how much do you think you’ll get?”

There are, as any kid will tell you, two possible responses to this. One brings with it temporary relief, but payback can be hellish come the day of the results. This is the Blithely ConfidentTM method.

“Oh, it was awesome! I’ll easily get over 90!”

The other response involves a more difficult conversation at the present juncture, but as any experienced negotiator will tell you, expectations setting is key in the long run.

“Not sure, really.”

Inwardly, you’re praying for a phone call, a doorbell ring, the appearance of a lizard in the kitchen – anything, really, that will serve as a distraction. Alas, miracles occur all too rarely in real life.

“Well, ok”, the pater would say, “Give me a range, at least.”

We’ve all heard the joke where the kid goes “I’ll definitely get somewhere between 0 and 100!”.

Young readers, a word of advice: this never works in real life. Don’t try it, trust me.

But joke apart, there was a grain of truth in that statement. That was the range that I (and every other student) was most comfortable with.

Or, in the language of the statistician, the wider the confidence interval, the more confident you ought to be that the parameter will lie within it.1

What range should one go with? 0-100 is out unless you happen to like a stinging sensation on your cheek.

You’re reasonably confident that you’ll pass – it wasn’t that bad a paper. And if you’re lucky, and if your teacher is feeling benevolent, you might even inch up to 80. So, maybe 40-80?

“I’ll definitely pass, and if I’m lucky, could get around 60 or so”, you venture.

“Hmmm,” the pater goes, ever the contemplative thinker. “So around 60, you’re saying?”

“Well yeah, around that”, you say, hoping against hope that this conversation is approaching the home stretch now.

“Around could mean anything!”, is the response. “Between 50 and 70, or between 40 and 80?! Which is it?!”

And that, my friends, is the intuition behind confidence intervals. Your parents are optimizing for accurate estimates (a narrower range), and you want to tell them that sure, you can have a narrower range – but the price they must pay is lesser confidence on your part.

And if they say, well, no, we want you to be more confident about your answer, you want to tell them that sure, I can be more confident – but the price they must pay is lower accuracy (a broader range).

And sorry, you can’t have both.

(Weird how parents get to say that all the time, but children, never!)

But be careful! This little story helps you get the intuition only. The truth is a little more subtle, alas:

The confidence interval can be expressed in terms of samples (or repeated samples): “Were this procedure to be repeated on numerous samples, the fraction of calculated confidence intervals (which would differ for each sample) that encompass the true population parameter would tend toward 90%

Or, in the case of our little story, this is what an Indian kid could tell their parents:

Were I to give the math exam a hundred times over, I would score somewhere between 50 and 70 about ninety times. And I would score between 40 and 80 about 95 times.

Now, if you ask where we get those specific sets of numbers from ( [50-70, {90}] , [40-80, {95}] ) , that takes us into the world of computation and calculation. Time to whip out the textbook and the calculator.

But if you are clear about why broader intervals imply higher confidence, and narrow intervals imply lower confidence, then you are now comfortable about the intuition.

And I hope you are clear, because that was my attempt in this blogpost.

Kids, trust me. Never try this at home.

But please, do read the Wikipedia article.

  1. Statisticians reading this, I know, I know. Let it slide for the moment. Please.[]

Probability, Expected Value…

… in No Country For Old Men

No Such Thing As Too Much Stats in One Week

I wrote this earlier this week:

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.
Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.

But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

And just today morning, I chanced upon this:

And so let’s try and take a walk down this rabbit hole!

Here are three statements:

  1. If there is correlation, there must be causation.

    I think we can all agree that this is not true.
  2. If there is causation, there must be correlation.

    That is what the highlighted excerpt is saying in the tweet above. I said much the same thing in my own blogpost the other day. The bad news (for me) is that I was wrong – and I’ll expand upon why I was wrong below.
  3. If there is no correlation, there can be no causation

    That is what Rachael Meager is saying the book is saying. I spent a fair bit of time trying to understand if this is the same as 2. above. I’ve never studied logic formally (or informally, for that matter), but I suppose I am asking the following:
    If B exists, A must exist. (B is causation, A is correlation – this is just 2. above)
    If we can show that A doesn’t exist, are we guaranteed the non-existence of B?
    And having thought about it, I think it to be true. 3. is the same as 2.1

Rachael Meager then provides this example as support for her argument:

This is not me trying to get all “gotcha” – and I need to say this because this is the internet, after all – but could somebody please tell me where I’m wrong when I reason through the following:

Ceteris paribus, there is a causal link between pressing on the gas and the speed of the car. (Ceteris paribus is just fancy pants speak – it means holding all other things constant.)

But when you bring in the going up a hill argument, ceteris isn’t paribus anymore, no? The correlation is very much still there. But it is between pressing on the gas and the speed of the car up the slope.

Forget the phsyics and accelaration and slope and velocity and all that. Think of it this way: the steeper the incline, the more you’ll have to press the accelerator to keep the speed constant. The causal link is between the degree to which you press on the gas and the steepness of the slope. That is causally linked, and therefore there is (must be!) correlation.2

Put another way:

If y is caused by x, then y and x must be correlated. But this is only true keeping all other things constant. And going from flat territory into hilly terrain is not keeping all other things constant.


But even if my argument above turns out to be correct, I still was wrong when I said that causation implies correlation. I should have been more careful about distinguishing between association and correlation.

Ben Golub made the same argument (I think) that I did:

… and Enrique Otero pointed out the error in his tweet, and therefore the error in my own statement:

Phew, ok. So: what have we learnt, and what do we know?

Here is where I stand right now:

  1. Correlation doesn’t imply causation
  2. I still think that if there is causation, there must be correlation association. But that being said, I should be pushing The Mixtape to the top of the list.
  3. Words matter, and I should be more careful!

All in all, not a bad way to spend a Saturday morning.

  1. Anybody who has studied logic, please let me know if I am correct![]
  2. Association, really. See below[]

Correlation, Causation and Thinking Things Through

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.

Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.

For example, this one chart from this magnificent website (and please, do take a look at all the charts):

But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

OK, you might be saying right now. So what?

Well, how about using this to figure out what ingredients were being used to make nuclear bombs? Say the government would like to keep the recipe (and the ingredients) for the nuclear bomb a secret. But what if you decide to take a look at the stock market data? What if you try to see if there is an increase in the stock price of firms that make the ingredients likely to be used in a nuclear bomb?

If the stuff that your firm produces (call this x) is in high demand, your firm’s stock price will go up (call this y). If y has gone up, it (almost certainly) will be because of x going up. So if I can check if y has gone up, I can assume that x will be up, and hey, I can figure out the ingredients for a nuclear bomb.

Sounds outlandish? Try this on for size:

Realizing that positive developments in the testing and mass production of the two-stage thermonuclear (hydrogen) bomb would boost future cash flows and thus market capitalizations of the relevant companies, Alchian used stock prices of publicly traded industrial corporations to infer the secret fuel component in the device in a paper titled “The Stock Market Speaks.” Alchian (2000) relates the story in an interview:
We knew they were developing this H-bomb, but we wanted to know, what’s in it? What’s the fissile material? Well there’s thorium, thallium, beryllium, and something else, and we asked Herman Kahn and he said, ‘Can’t tell you’… I said, ‘I’ll find out’, so I went down to the RAND library and had them get for me the US Government’s Dept. of Commerce Yearbook which has items on every industry by product, so I went through and looked up thorium, who makes it, looked up beryllium, who makes it, looked them all up, took me about 10 minutes to do it, and got them. There were about five companies, five of these things, and then I called Dean Witter… they had the names of the companies also making these things, ‘Look up for me the price of these companies…’ and here were these four or five stocks going like this, and then about, I think it was September, this was now around October, one of them started to go like that, from $2 to around $10, the rest were going like this, so I thought ‘Well, that’s interesting’… I wrote it up and distributed it around the social science group the next day. I got a phone call from the head of RAND calling me in, nice guy, knew him well, he said ‘Armen, we’ve got to suppress this’… I said ‘Yes, sir’, and I took it and put it away, and that was the first event study. Anyway, it made my reputation among a lot of the engineers at RAND.

I learnt about this while reading Navin Kabra’s Twitter round-up from yesterday. Navin also mentions the discovery of Neptune using the same underlying principle, and then asks this question:

Do you know other, more recent examples of people deducing important information by guessing from correlated data?

… and I was reminded of this tweet:

Whether it is Neptune, the nuclear bomb or the under-reporting of Covid deaths, the lesson for you as a student of economics is this: when you marry the ability to connect the dots with the ability to understand and apply statistics, truly remarkable things can happen.

Of course, the reverse is equally true, and perhaps even more important. When you marry the ability to connect the dots with a misplaced ability to understand and apply statistics, truly horrific things can happen.

Tread carefully when it comes to statistics!