Statistics and the NRC

Imagine the following: you are the judge who has been hearing a case in which somebody has been accused of murder, and you go to sleep one night at the end of the trial, knowing that you must give your judgment tomorrow.
God appears in your dream, and tells you that he is displeased with you. As punishment, he decrees that whatever judgment you make tomorrow will be wrong. If you say that the accused didn’t commit the murder, then he did in fact commit the murder. If you say that the accused did commit the murder, then he didn’t in fact commit the murder. He’s god, so he gets to have all of this be true.
You wake up from your sleep convinced that this wasn’t a dream, and what god said will actually happen. When you announce your judgment, whatever you say is going to be wrong.
Would you rather send an innocent man to jail, or would you rather send a guilty man free? Remember, it must be one of the two. Don’t take a peek at what follows, and try and answer this question before you read further.
My bet is that you likely chose to send a guilty man free. I have been using a variant of this exercise for years in my classes on statistics, and there appears to be something within us that rebels at the idea of sending an innocent man to jail. One reason, maybe, that explains the enduring popularity of the Shawshank Redemption.
It is at least partially for this reason that we say innocent until proven guilty. That should be, for reasons outlined above, the null hypothesis. We give it every chance to be true, and assure ourselves that the chance we’re wrong is small enough to feel safe in declaring the defendant guilty (note to statistics students: that’s one way of understanding the p-value right there)
So what does this have to do with the NRC?
Well, if you were one of the officials charged with designing this scheme, what would you say the hypothesis should be about, say, me? Indian until proven otherwise, or not Indian until proven Indian?
Like the judge, you can end up making two errors. Declaring me as not an Indian when I am, in fact, an Indian. Or declaring me as an Indian when I am, in fact, not an Indian.
To me, personally, declaring an Indian to not be an Indian is morally more problematic than declaring a non-Indian to be an Indian. And therefore my answer to my own question would be that the hypothesis ought to be Indian until proven otherwise.
But the NRC is, of course, designed exactly the other way around. Everybody is assumed to not be an Indian until proven otherwise. The burden of proof rests on the defense, not the prosecution. We are assumed guilty until proven innocent.
Not only is this problematic for reasons stated above, it will also mean that we minimize the chance of mistakenly declaring someone to be Indian. Now, one may view that as a good thing, but the price we pay is the following:
We can’t control for the other kind of error. We lose control over the chance of mistakenly declaring somebody as a non-Indian.
And given that there’s 1,300,000,000 of us (and counting), there will be a lot of Indians who will mistakenly be identified as non-Indian.
Viewed this way, the CAA is potentially a useful tool to undo the inevitable errors that will occur.
And what has a lot of people upset (myself included) is the fact that the CAA has, to the extent that I understand it, the power to undo the errors the NRC in its current form will commit, but contingent on religious faith.
I being upset about this is me expressing my opinion, and your opinion might be the same, or it might be different – and that’s fine! Debate is an awesome way to learn.
But everything that preceded the last paragraph is not opinion. If the NRC is formulated the way it is described above, there will be far too many Indians who end up being classified as non-Indians.
If the NRC is not formulated the way it is described above, then the government needs to, 1.3 billion times, try us – in the legal sense – assuming we’re Indians, and they need to prove otherwise. To say that this will be expensive, and beyond our existing state capacity, is obvious.
There are many reasons to be against the NRC (if and when it will be implemented). But this post isn’t about my opinion about the NRC as an Indian. It is about my view of the NRC as a statistical exercise.
And as a statistician, there can be only one view of the NRC: it fails the most basic criteria.
It gets the null hypothesis wrong.
..
..
..
The link to Shruti Rajagopalan’s article, which served as inspiration for this post.

My thanks to Prof. Alex Tabbarok for making this essay much, much more readable than it was at the outset (imagine!), and to Prof. Pradeep Apte for reminding me of the concept of falsifiability, which we’ll get to next Thursday.

8 thoughts on “Statistics and the NRC

  1. Really liked the way you explained type 1 error and type error. Coming to the main point of CAA potentially helping to reduce the error, I think devil lies in the detail.
    CAA is applicable for citizens of 3 countries (Pak, Afghanistan and Bangladesh) which belong to 7 (non-Muslim) religion who came to India before 2014. Now, to take the benefit of the law you need to show proof that you meet the three criterion. So it will only help individual who meet this criterion and not anyone else and definitely not Indians.

    • Thank you for reading, and taking the time to write, Smit! Glad you liked it 🙂
      I’m sorry you thought the main point was about the CAA; I clearly didn’t do a good enough job. The main point was the formation of the null hypothesis where the NRC is concerned, and how the idea is incorrect from a statistical theory viewpoint.
      I’ll say this much though: as a policymaker worried about the inevitable fallout of the implementation of the NRC (if and when it happens), it sure would be nice to have a tool like the CAA in one’s toolbox. But I assure you, not the main point of this blog post at all 🙂
      Again, thanks for writing in!

Leave a Reply