You Haven’t Really Read it, Have You?

I’ve been giving introductory talks on behavioral economics for a while now, and this is usually my last slide:

Dan Ariely and the problems that have surfaced in the recent past is a whole other story. But that apart, I think this is a fairly good list of books to begin with if you want to learn more about behavioral economics. Shleifer’s book is slightly more advanced, and Stumbling on Happiness isn’t formally about behavioral economics – but that being said, both are worth your time. Especially Stumbling on Happiness – please do read it if you haven’t already.

But I always say, whenever I show this slide, that Thinking Fast and Slow is a book that everybody claims to have read. But hardly anybody, in fact, has actually read it. It is not because this is a difficult book to read in the sense that it has a lot of mathematics, or theorems, or diagrams. Nor is the prose especially difficult. And even the ideas in the book aren’t impossibly difficult to digest.

It is just that this happens to be a book that is incredibly thoughtful. It is thoughtfully written, and every single line matters. The book is heavy reading, in other words, precisely because some heavy thinking has gone into it. I’m not exaggerating!

When I first met Kahneman he was making himself more miserable about his unfinished book than any writer I’d ever seen. It turned out merely to be a warm-up for the misery to come, the beginning of an extraordinary act of literary masochism. In effect, the psychologist kept trying to trick himself into doing things he didn’t want to do and failing to fall for the ruse. “I had this idea at first that I could do it easily,” he said. “I thought, you know, that I could talk it” to a ghostwriter, but then he seized on another approach: a series of lectures, delivered to Princeton undergraduates who knew nothing about the subject, that he could transcribe and publish more or less as spoken. “I paid someone to transcribe them,” he says. “But when I read them I could see that they were very bad.” Next, he set out to write the book by himself, as he suspected he should have done all along. He quit and re-started so many times he lost count, and each time he quit he seemed able to convince himself that he should never have taken on the project in the first place. Last October he quit for what he swore was the last time. One morning I went up the hill to have coffee with him and found that he was no longer writing his book. “This time I’m really finished with it,” he said.
Then, after I left him, he sat down and reviewed his own work. The mere fact that he had abandoned it probably raised the likelihood that he would now embrace it: after all, finding merit in the thing would now prove him wrong, and he seemed to take pleasure in doing that. Sure enough, when he looked at his manuscript his feelings about it changed again. That’s when he did the thing that I find not just peculiar and unusual but possibly unique in the history of human literary suffering. He called a young psychologist he knew well and asked him to find four experts in the field of judgment and decision-making, and offer them $2,000 each to read his book and tell him if he should quit writing it. “I wanted to know, basically, whether it would destroy my reputation,” he says. He wanted his reviewers to remain anonymous, so they might trash his book without fear of retribution. The endlessly self-questioning author was now paying people to write nasty reviews of his work. The reviews came in, but they were glowing. “By this time it got so ridiculous to quit again,” he says, “I just finished it.” Which of course doesn’t mean that he likes it. “I know it is an old man’s book,” he says. “And I’ve had all my life a concept of what an old man’s book is. And now I know why old men write old man’s books. My line about old men is that they can see the forest, but that’s because they have lost the ability to see the trees.”

https://www.vanityfair.com/news/2011/12/michael-lewis-201112

In fact, I tell my audience during these talks that if they are really serious about reading the book, they should plan to read no more than three to four pages per day. Take as long as you like to read those four to five pages, then talk about what you’ve read with somebody (talk to yourself if nobody else has the patience or inclination). Then tomorrow, read another four to five pages. Keep at it until you’re done.

Again, none of this is meant as criticism of the book. Far from it, it is a magnificent book. So magnificent, in fact, that you’re better off sipping it rather than chugging it. But please do sip it!


But as I was saying, a lot of people claim to have read it, but they haven’t, not actually. Turns out this phenomenon has a name:

So which are the three most unread books of all time? Hard Choices by Hillary Clinton, Capital in the 21st Century, by Piketty, and Infinite Jest, by David Foster Wallace. And check out number 5!

Shashikant was surprised to see it at number 5, and so was I, but for different reasons. I think Shashikant was surprised to see it was that high on the list.

Me, I’m surprised it is that low.

But hey, if you haven’t read it already, please do start today. I’ll check in on your progress a year from now.

No Such Thing As Too Much Stats in One Week

I wrote this earlier this week:

Us teaching type folks love to say that correlation isn’t causation. As with most things in life, the trouble starts when you try to decipher what this means, exactly. Wikipedia has an entire article devoted to the phrase, and it has occupied space in some of the most brilliant minds that have ever been around.
Simply put, here’s a way to think about it: not everything that is correlated is necessarily going to imply causation.


But if there is causation involved, there will definitely be correlation. In academic speak, if x and y are correlated, we cannot necessarily say that x causes y. But if x does indeed cause y, x and y will definitely be correlated.

https://atomic-temporary-112243906.wpcomstaging.com/2021/05/19/correlation-causation-and-thinking-things-through/

And just today morning, I chanced upon this:

And so let’s try and take a walk down this rabbit hole!

Here are three statements:

  1. If there is correlation, there must be causation.

    I think we can all agree that this is not true.
  2. If there is causation, there must be correlation.

    That is what the highlighted excerpt is saying in the tweet above. I said much the same thing in my own blogpost the other day. The bad news (for me) is that I was wrong – and I’ll expand upon why I was wrong below.
  3. If there is no correlation, there can be no causation

    That is what Rachael Meager is saying the book is saying. I spent a fair bit of time trying to understand if this is the same as 2. above. I’ve never studied logic formally (or informally, for that matter), but I suppose I am asking the following:
    ..
    ..
    If B exists, A must exist. (B is causation, A is correlation – this is just 2. above)
    ..
    ..
    If we can show that A doesn’t exist, are we guaranteed the non-existence of B?
    ..
    ..
    And having thought about it, I think it to be true. 3. is the same as 2.((Anybody who has studied logic, please let me know if I am correct!))

Rachael Meager then provides this example as support for her argument:

This is not me trying to get all “gotcha” – and I need to say this because this is the internet, after all – but could somebody please tell me where I’m wrong when I reason through the following:

Ceteris paribus, there is a causal link between pressing on the gas and the speed of the car. (Ceteris paribus is just fancy pants speak – it means holding all other things constant.)

But when you bring in the going up a hill argument, ceteris isn’t paribus anymore, no? The correlation is very much still there. But it is between pressing on the gas and the speed of the car up the slope.

Forget the phsyics and accelaration and slope and velocity and all that. Think of it this way: the steeper the incline, the more you’ll have to press the accelerator to keep the speed constant. The causal link is between the degree to which you press on the gas and the steepness of the slope. That is causally linked, and therefore there is (must be!) correlation.((Association, really. See below))

Put another way:

If y is caused by x, then y and x must be correlated. But this is only true keeping all other things constant. And going from flat territory into hilly terrain is not keeping all other things constant.

No?


But even if my argument above turns out to be correct, I still was wrong when I said that causation implies correlation. I should have been more careful about distinguishing between association and correlation.

Ben Golub made the same argument (I think) that I did:

… and Enrique Otero pointed out the error in his tweet, and therefore the error in my own statement:


Phew, ok. So: what have we learnt, and what do we know?

Here is where I stand right now:

  1. Correlation doesn’t imply causation
  2. I still think that if there is causation, there must be correlation association. But that being said, I should be pushing The Mixtape to the top of the list.
  3. Words matter, and I should be more careful!

All in all, not a bad way to spend a Saturday morning.

On Signals and Noise

Have you ever walked out of a classroom as a student wondering what the hell went on there for the past hour? Or, if you are a working professional, have you ever walked out of a meeting wondering exactly the same thing?

No matter who you are, one of the two has happened to you at some point in your life. We’ve all had our share of monumentally useless meetings/classes. Somebody has droned on endlessly about something, and after an eternity of that droning, we’re still not sure what that person was on about. To the extent that we still don’t know what the precise point of the meeting/class was.

One of the great joys in my life as a person who tries to teach statistics to students comes when I say that if you have experienced this emotion, you know what statistics is about. Well, that’s a stretch, but allow me to explain where I’m coming from.


Image taken form here: https://en.wikipedia.org/wiki/Z-test

Don’t be scared by looking at that formula. We’ll get to it in a bit.


Take your mind back to the meeting/class. When you walked out of it, did you find yourself plaintively asking a fellow victim, “But what was the point?”

And if you are especially aggrieved, you might add that the fellow went on for an hour, but you’re still not sure what that was all about. What you’re really saying is that there was a lot of noise in that meeting/class, but not nearly enough signal.

You’re left unsure about the point of the whole thing, but you and your ringing ears can attest to the fact that a lot was said.


Or think about a phone call, or a Whatsapp call. If there is a lot of disturbance on the call, it is likely that the call won’t last for very long, and you may well be unclear about what the other person on the call was trying to say.

What you’re really saying is that there was a lot of noise on the call, but not nearly enough signal.


That is what the signal-to-noise ratio is all about. The clearer the signal, the better it is. The lower the noise, the better it is. And the ratio is simply both things put together.

A class that ends with you being very clear about what the professor said is a good class. A good class is “high” on the signal that the professor wanted to leave you with. And if it is a class in which the professor didn’t deviate from the topic, didn’t wander down side-alleys and didn’t spend too much time cracking unnecessary jokes, it is an even better class, because it was “low” on disturbance (or to use another word that means the same thing as disturbance: noise).


That, you see, is all that the formula up there is saying. How high is the signal (x less mu), relative to the noise (sigma, or s). The higher the signal, and the lower the noise, the clearer the message from the data you are working with.

And it has to be both! A clear signal with insane amounts of noise ain’t a good thing, and an unclear signal with next to no noise is also not a good thing.

And all of statistics can be thought of this way: what is the signal from the data that I am examining, relative to the noise that is there in this dataset. That is one way to understand the fact that the formula can look plenty scary, but this is all it is really saying.

Even this monster, for example:

https://www.statsdirect.co.uk/help/parametric_methods/utt.htm

Looks scary, but in English, it is asking the same question: how high is the signal, relative to the noise. It’s just that the formula for calculating the noise is exuberantly, ebulliently expansive. Leave all that to us, the folks who think this is fun. All you need to understand is the fact that this is what we’re asking:


What is the signal, relative to the noise?


And finally speaking of noise, that happens to be the title of Daniel Kahneman’s latest book. I have just downloaded it, and will get to it soon (hopefully). But before recommending to you that you should read it, I wanted to explain to you what the title meant.

And if you’re wondering why I would recommend something that I haven’t read yet, well, let me put it this way: it’s Daniel Kahneman.

High signal, no noise.