Team “Kam Nahi Padna Chahiye”

Every time we host a party at our home, we engage in a brief and spirited… let’s go with the word “discussion”.

Said discussion is not about what is going to be on the menu – we usually find ourselves in agreement about this aspect. It is, instead, about the quantity.

In every household around the world, I suppose, this discussion plays out every time there’s a party. One side of the debate will worry about how to fit in the leftovers in the refrigerator the next day, while the other will fret about – the horror! – there not being enough food on the table midway through a meal.

There is, I should mention, no “right” answer over here. Each side makes valid arguments, and each side has logic going for it. Now, me, personally, I quite like the idea of leftovers, because what can possibly be better than waking up at 3 in the morning for no good reason, waddling over to the fridge, and getting a big fat meaty slice of whatever one may find in there? But having been a part of running a household for a decade and change, I know the challenges that leftovers can pose in terms of storage.

You might by now be wondering about where I am going with this, but asking yourself which side of the debate you fall upon when it comes to this specific issue is also a good way to understand why formulating the null hypothesis can be so very challenging.

Let’s assume that there’s going to be four adults and two kids at a party.

How many chapatis should be made?

Should the null hypothesis be: We will eat exactly 16 chapatis tonight

With the alternate then being: 16 chapatis will either be too much or too little

Or should the null hypothesis be: We will eat 20 chapatis or more

With the alternate being: We will definitely eat less than 20 chapatis tonight.

The reason we end up having a “discussion” is because we can’t agree on which outcome we would rather avoid: that of potentially being embarrassed as hosts, or the one of standing, arms exasperatedly akimbo, in front of the refrigerator post-party.

It is the outcome we would rather avoid that guides us in our formation of the null hypothesis, in other words. We give it every chance to be true, and if we reject it, it is because we are almost entirely confident that we are right in rejecting it.

What is “almost entirely“?

That is the point of the “significant at 1%” or “5%” or “10%” sentence in academic papers.

Which, of course, is another way to think about it. This set of the null and the alternate…

H0: We will eat 20 chapatis or more

Ha: We will eat less than 20 chapatis

… I am not ok rejecting the null at even 1%. Or in the language of statistics, I am not ok with committing a Type I error, even at a probability (p-value) of 1%.

A Type I error is rejecting the null when it is true. So even a 1% chance that we and our guests would have wanted to eat more than 20 chapatis* to me means that we should get more than 20 chapatis made.

At this point in our discussions (we’re both economists, so these discussions really do take place at our home), my wife exasperatedly points out that not once has the food actually fallen short.

Ah, I say, triumphantly. Can you guarantee that it won’t this time around? 100% guarantee?

No? So you’re saying there’s a teeny-tiny 1% chance that we’ll have too few chapatis?

Well, then.


Kam nahi padna chahiye!

*Don’t judge us, ok. Sometimes the curry just is that good.

What do Income Tax Returns, Demonetization, and Fast Tag have in common?

It may help to read last Thursday’s post before you start reading this one.

Why are there such long lines at all the toll plazas across India at the moment? You may give  a lot of answers, and if you have recently passed through a toll plaza yourself, your answer may well be unprintable.

Here’s mine though: you are, currently, assumed guilty until proven innocent.

All cars must wait in line, pay cash/have the RFID tag scanned, and for each car, once the payment is done, the barrier is raised, and you may pass through. The barrier stays put until the verification is done: that’s another way of saying guilty until proven innocent.

But the cool thing, to me, about implementing Fast Tag, is that once a certain percentage of vehicles in India is equipped with Fast Tags, the barriers can stay up. We will transition to a regime in which all vehicles are assumed to be innocent.

Now, as we learnt the previous week, with a large sample, there will  be problems. In the new systems, in which vehicles just pass through because we assume all of them have Fast Tag implemented, there will be exceptions. There will be vehicles that don’t, in fact, have Fast Tag implemented, and so they may end up not paying the toll.

But the vast majority will have Fast Tag, and don’t have to pay with money and waiting time. The government will miss out on catching a few bad apples, but a lot of Indians will save a lot of time. On balance, everybody wins.

And of course, given technology, it should be possible to have notifications sent to those vehicles that pass through without paying. Yes, I know it seems a long way off right now, but the point is that as a statistician, we move to a world where we assume all vehicles are innocent until proven guilty, rather than the other way around.

Fast Tag implementation, when fully functional, will get the null hypothesis right.

And pre-filled income tax returns, sent to us by the government, with minimum of audits and notices, is exactly the same story. The government assumes innocence until proven otherwise, leading to a system in which every tax-paying Indian is assumed to be an honest tax-payer until proven otherwise. We already have a system that is closer to this ideal than was the case earlier, and hopefully, it will become better still with time.

And now that we’re on a roll, that’s the problem with demonetization, if you were to ask a statistician! All notes were presumed guilty, until proven innocent.

Here’s the point: if you are a student of statistics, struggling with the formation of the null, and wondering what the point is anyways*, the example from last Thursday and the three noted above should help make the topic more relatable.

And to the extent that it does, statistics becomes more relatable, more understandable and – dare I say it – fun!


*Trust me, we’ve all been there


Statistics and the NRC

Imagine the following: you are the judge who has been hearing a case in which somebody has been accused of murder, and you go to sleep one night at the end of the trial, knowing that you must give your judgment tomorrow.
God appears in your dream, and tells you that he is displeased with you. As punishment, he decrees that whatever judgment you make tomorrow will be wrong. If you say that the accused didn’t commit the murder, then he did in fact commit the murder. If you say that the accused did commit the murder, then he didn’t in fact commit the murder. He’s god, so he gets to have all of this be true.
You wake up from your sleep convinced that this wasn’t a dream, and what god said will actually happen. When you announce your judgment, whatever you say is going to be wrong.
Would you rather send an innocent man to jail, or would you rather send a guilty man free? Remember, it must be one of the two. Don’t take a peek at what follows, and try and answer this question before you read further.
My bet is that you likely chose to send a guilty man free. I have been using a variant of this exercise for years in my classes on statistics, and there appears to be something within us that rebels at the idea of sending an innocent man to jail. One reason, maybe, that explains the enduring popularity of the Shawshank Redemption.
It is at least partially for this reason that we say innocent until proven guilty. That should be, for reasons outlined above, the null hypothesis. We give it every chance to be true, and assure ourselves that the chance we’re wrong is small enough to feel safe in declaring the defendant guilty (note to statistics students: that’s one way of understanding the p-value right there)
So what does this have to do with the NRC?
Well, if you were one of the officials charged with designing this scheme, what would you say the hypothesis should be about, say, me? Indian until proven otherwise, or not Indian until proven Indian?
Like the judge, you can end up making two errors. Declaring me as not an Indian when I am, in fact, an Indian. Or declaring me as an Indian when I am, in fact, not an Indian.
To me, personally, declaring an Indian to not be an Indian is morally more problematic than declaring a non-Indian to be an Indian. And therefore my answer to my own question would be that the hypothesis ought to be Indian until proven otherwise.
But the NRC is, of course, designed exactly the other way around. Everybody is assumed to not be an Indian until proven otherwise. The burden of proof rests on the defense, not the prosecution. We are assumed guilty until proven innocent.
Not only is this problematic for reasons stated above, it will also mean that we minimize the chance of mistakenly declaring someone to be Indian. Now, one may view that as a good thing, but the price we pay is the following:
We can’t control for the other kind of error. We lose control over the chance of mistakenly declaring somebody as a non-Indian.
And given that there’s 1,300,000,000 of us (and counting), there will be a lot of Indians who will mistakenly be identified as non-Indian.
Viewed this way, the CAA is potentially a useful tool to undo the inevitable errors that will occur.
And what has a lot of people upset (myself included) is the fact that the CAA has, to the extent that I understand it, the power to undo the errors the NRC in its current form will commit, but contingent on religious faith.
I being upset about this is me expressing my opinion, and your opinion might be the same, or it might be different – and that’s fine! Debate is an awesome way to learn.
But everything that preceded the last paragraph is not opinion. If the NRC is formulated the way it is described above, there will be far too many Indians who end up being classified as non-Indians.
If the NRC is not formulated the way it is described above, then the government needs to, 1.3 billion times, try us – in the legal sense – assuming we’re Indians, and they need to prove otherwise. To say that this will be expensive, and beyond our existing state capacity, is obvious.
There are many reasons to be against the NRC (if and when it will be implemented). But this post isn’t about my opinion about the NRC as an Indian. It is about my view of the NRC as a statistical exercise.
And as a statistician, there can be only one view of the NRC: it fails the most basic criteria.
It gets the null hypothesis wrong.
The link to Shruti Rajagopalan’s article, which served as inspiration for this post.

My thanks to Prof. Alex Tabbarok for making this essay much, much more readable than it was at the outset (imagine!), and to Prof. Pradeep Apte for reminding me of the concept of falsifiability, which we’ll get to next Thursday.

Links for 27th March, 2019

  1. “As a program adapts and serves more people and more functions, it naturally requires tighter regulation. Software systems govern how we interact as groups, and that makes them unavoidably bureaucratic in nature. There will always be those who want to maintain the system and those who want to push the system’s boundaries. Conservatives and liberals emerge.”
    Here’s a useful thumb-rule. Read anything written by Atul Gawande. In this article, he speaks, nominally, about the difficulty of adapting to a new computer system that is being foisted upon the medical community. But there’s much more to unpack here! Adapting to systems, mutations within systems, the difficulty of scaling, substitutes and complements, opportunity costs – and much, much more.
  2. “Pig facial recognition works the same way as human facial recognition, the companies say. Scanners and software take in the bristles, the snout, the eyes and ears. The features are mapped. Pigs don’t all look alike when you know what to look for, they said.”
    The intersection of technology, pork and the culture that is China today. Some might call this dystopian, others might fret at how slow progress is – but the article is fascinating.
  3. “The level of u* is not fixed. It changes over time, driven by changes in labor laws, the minimum wage, government benefit programs, demographics and technology. For instance, u* might decline if workers, on average, are older; older workers are less likely to be unemployed. The level of u* might rise if unemployment benefits become more generous and this leads unemployed workers to be more picky about taking jobs.”
    NAIRU – or the Non-Accelerating Inflation Rate of Unemployment, was one of the more nerdy acronyms I learnt when I was a student. This article does a good job of explaining exactly what this is, and why it matters. And most importantly, it does so in a way that isn’t confusing for the layperson.
  4. “These calculations make clear why economists so often argue against light rail and subway construction projects. They are so expensive that ridership can only begin to cover construction and maintenance costs if the systems operate at close to their physical capacity most of the time; that is, if there are enough riders to fill up the cars when they run on two- to three-minute headways for many hours per day. Since most proposed projects do not meet this standard, economists generally argue against them. Buses can usually move the projected numbers of riders at a fraction of the cost.”
    I am, and probably always will be, a huge fan of buses over other forms of public transport. And I will always be a big fan of public transport over private transport. This article explains why not just I, but other economists will also tend to favor buses over other forms of public transport.
  5. “The principle that you are presumed to be innocent unless and until you are convicted, after a fair trial, turns out, in practice, to be a different principle altogether: for the purposes of compensation, once you are convicted your conviction is deemed to be correct. You are presumed guilty for the rest of your life, irrespective of whether your trial was fair or unfair. It makes no difference that your conviction has been quashed. It makes no difference that new evidence – which ought to have been obtained by the police before your trial – shows that you are probably innocent. Those acting on behalf of the state may have bungled the investigation, and possibly even bent the rules to get you convicted. None of that is of any consequence. All that matters is whether you can prove that you suffered a “miscarriage of justice:” ”
    I teach statistics, and would happily spend an entire semester explaining how to frame the null, and more importantly, hot to not frame the null. This article does an excellent job of providing an all too important example of the latter.