JEP, p-values and tests of statistical significance

The Summer 2021 issue of the Journal of Economic Perspectives came out recently:

I have been the Managing Editor of the Journal of Economic Perspectives since the first issue in Summer 1987. The JEP is published by the American Economic Association, which decided about a decade ago–to my delight–that the journal would be freely available on-line, from the current issue all the way back to the first issue. You can download individual articles or the entire issue, and it is available in various e-reader formats, too. Here, I’ll start with the Table of Contents for the just-released Summer 2021 issue, which in the Taylor household is known as issue #137.

https://conversableeconomist.wpcomstaging.com/2021/07/29/summer-2021-journal-of-economic-perspectives-available-online/

(JEP is a great journal to read as a student. If you’re looking for a good place to start, may I recommend the Anomalies column?)

Of particular interest this time around is the section on statistical significance. This paper, in particular, was an enjoyable read.


And reading that paper reminded of a really old blogpost written by an ex-colleague of mine:

The author starts off by emphasizing the importance of developing a statistical toolbox. Indeed statistics is a rich subject that can be enjoyed by thinking through a given problem and applying the right kind of tools to get a deeper understanding of the problem. One should approach statistics with a bike mechanic mindset. A bike mechanic is not addicted to one tool. He constantly keeps shuffling his tool box by adding new tools or cleaning up old tools or throwing away useless tools etc. Far from this mindset, the statistics education system imparts a formula oriented thinking amongst many students. Instead of developing a statistical or probabilistic thinking in a student, most of the courses focus on a few formulae and teach them null hypothesis testing.

https://radhakrishna.typepad.com/rks_musings/2015/09/mindless-statistics.html

If you are a student of statistics, and think that you “get” statistics, please read the post in its entirety. Don’t worry if you get confused – that is, in a way, the point of that post. It challenges you by asking a very simple question: do you really “get” statistics? And the answer is almost always in the negative (and that goes for me too!)


And my final recommendations du jour is this (extremely passionately) written tirade:

We want to persuade you of one claim: that William Sealy Gosset (1876-1937)—aka “Student” of “Student’s” t-test—was right, and that his difficult friend, Ronald A. Fisher (1890-1962), though a genius, was wrong. Fit is not the same thing as importance. Statistical significance is not the same thing as scientific importance or economic sense. But the mistaken equation is made, we find, in 8 or 9 of every 10 articles appearing in the leading journals of science, economics to medicine. The history of this “standard error” of science involves varied characters and plot twists, but especially R. A. Fisher’s canonical translation of “Student’s” t. William S. Gosset aka “Student,” who was for most of his life Head Experimental Brewer at Guinness, took an economic approach to the logic of uncertainty. Against Gosset’s wishes his friend Fisher erased the consciously economic element, Gosset’s “real error.” We want to bring it back.

https://www.deirdremccloskey.com/docs/jsm.pdf

Although it might help by reading this review first:

However, thanks to an arbitrary threshold set by statistics pioneer R.A. Fisher, the term ‘significance’ is typically reserved for P values smaller than 0.05. Ziliak and McCloskey, both economists, promote a cost-benefit approach instead, arguing that decision thresholds should be set by considering the consequences of wrong decisions. A finding with a large P value might be worth acting upon if the effect would be genuinely clinically important and if the consequences of failing to act could be serious.

https://www.nature.com/articles/nm0209-135

Statistics is a surprisingly, delightfully conceptual subject, and I’m still peeling away at the layers. Every year I think I understand it a little bit more, and every year I discover that there is much more to learn. The symposium on statistical significance in this summer’s issue of the JEP, RK’s blogpost and Deirdre McCloskey’s paper are good places to get started on unlearning what you’ve been taught in stats.

Links for Friday, 30th October, 2020

On the Foxconn, um, factory in Wisconsin:

Such announcements are far from unusual for Gou, and often, nothing comes of them. In Vietnam in 2007, in Brazil in 2011, in Pennsylvania in 2013, and in Indonesia in 2014, Foxconn announced enormous factories that either fell far short of promises or never appeared. Just this year, the industries minister of Maharashtra, India, which aggressively pursued one of Gou’s multibillion-dollar projects in 2015, finally confirmed the factory isn’t coming, saying the state had learned a lesson about believing businesses promising big investments.
In China, where Foxconn employs the vast majority of its million workers, these sorts of announcements are called “state visit projects,” according to Willy Shih, a Harvard business school professor and former display industry consultant. Officials get a ribbon-cutting photo op, the company gets political goodwill, and everyone understands that the details of the contract are just an opening bid by a company that will ultimately do whatever makes economic sense.

https://www.theverge.com/21507966/foxconn-empty-factories-wisconsin-jobs-loophole-trump

I wish I could explain statistics as clearly as this:

Let’s say we have 100 people who have volunteered for the trials. We’ve divided them into two groups of 50 each. One will be administered the experimental drug, the other a placebo — i.e. something that looks identical, but has no medicinal value at all. There are rules for administering a placebo correctly, and I’ll come to those. For now, let’s assume they have been followed.
The trial runs its course. The placebo group reports that one person has recovered, whereas the group that got the actual drug reports that five have recovered. What, if anything, can we conclude? Is this just chance? Is there a real difference between the groups? Is this enough to conclude anything about the efficacy of the drug?

https://www.livemint.com/opinion/columns/opinion-significance-of-double-blind-drug-trials-11602211204718.html

Old men, friendships, and chimpanzees.

As they got older, the chimps developed more mutual friendships and fewer one-sided friendships. They also exhibited a more positive approach to their whole community, continuing grooming of other chimps, including those that weren’t close friends, at the same rate, but with a drop in aggression. Other primates don’t necessarily follow this pattern as they grow older, according to the authors. Some monkeys tend to withdraw from social relationships and their aggression levels stay high.

https://www.nytimes.com/2020/10/22/science/aging-chimps-friendship.html

Krish Ashok writes a passionate (and erudite, but that’s a given, no?) defence of… maida.

Maida is technically more all-purpose than all-purpose flour because, with a little bit of food science, you can turn this unfairly maligned flour into flaky Malabar parottas, crisp luchis, fluffy naans and kulchas, airy bhaturas, pillowy soft loaves of bread, crunchy-yet-chewy pizzas and delectable cakes without having to buy multiple kinds of flours to do it all.

https://lifestyle.livemint.com/food/discover/masala-lab-why-maida-is-not-the-flour-world-villain-111603383026971.html

And while on food, this excellent, entertaining article on custard:

Corn flour comes from pounding the kernel into a white powder that forms a non-Newtonian fluid––a liquid that doesn’t change viscosity under stress––when mixed with water. Its greatest virtue is that it contributes to thickness and volume without tasting like anything.
Its use as a food product was patented in Britain in 1854 by a man named John Polson Jr., who began manufacturing it in a factory in Paisley, Scotland owned by his father, John Polson, and his partners William Polson and John Brown. Some of their first advertisements declared that the product “was preferred on account of its plainness.”

https://fiftytwo.in/story/powder/

Links for 5th April, 2019

  1. “And almost invariably, I see the same colleague in our communal kitchen, who asks with delight, “Joe, what are you having for lunch today?” The types of bean and cheese rotate, as does the fruit—which depends on the season—but I do not inform my co-worker of these variations when I laugh off her very clever and funny question.”
    In an article about the comforts of routine and habit when it comes to food, I found this excerpt to be pleasingly meta. You know who should especially read this article? Statisticians – especially aspiring statisticians.
    ..
    ..
  2. “Take his celebrated work with David Card on the minimum wage. They looked at how relative hiring patterns changed when one state raised its minimum wage and one right on its border did not. Not much except the minimum wage differed between the two situations, so it was about as close to a controlled experiment as economists will ever get. Alan was a pioneer in the exploitation of such natural experiments. After Alan showed what kind of evidence can be marshaled to study a labor-market intervention, economists have raised their standard of what constitutes convincing evidence. What followed has been called a “credibility revolution” in empirical economics.”
    Unless you are a student of economics, it is unlikely that you will have heard of Alan Krueger. More’s the pity – for as the title of this article will tell you, his work likely has already affected you, no matter where in the world you are reading this.
    ..
    ..
  3. “The issue, Statistical Inference in the 21st Century: A World Beyond P<0.05, calls for an end to the practice of using a probability value (p-value) of less than 0.05 as strong evidence against a null hypothesis or a value greater than 0.05 as strong evidence favoring a null hypothesis. Instead, p-values should be reported as continuous quantities and described in language stating what the value means in the scientific context.”
    Statistics is harder, and more confusing than you think. Yet another example is this article – each of the quotes in the article make for thoughtful reading.
    ..
    ..
  4. “…Section 230 has proved an “awesome benefit” for the tech platforms. It has encouraged astonishing innovation and accelerated the growth of some of the richest companies on the planet. But it has also allowed billions of people to post anything they like online with almost no constraint. Some of that content is inspirational, much of it trivial, and a small sliver grotesque and harmful. Social networks do not discriminate.”
    The FT on whether Facebook and its ilk are publishers or postmen. The import of section 230 is quite staggering, and I’d like to read the book mentioned in the article for that reason.
    ..
    ..
  5. “We ran similar regressions controlling for industry and found that — even after controlling for industry — elite MBAs did not produce positive statistically significant alpha. Elite MBAs did perform relatively well as CEOs in healthcare and consumer staples, but relatively poorly in energy and materials businesses, though those results were not statistically significant. Our study is not the only one to come to this conclusion. A study by economists at the University of Hawaii asked similar questions and found that firm performance is not predicted by the educational background of the CEOs.”
    A regression based exercise (which of course comes with its own set of problems) on whether education (type and quality) and experience matters for CEO performance. Short answer: it doesn’t. I was tempted to excerpt the concluding paragraph, but I’ll leave it to the reader to discover.

Links for 14th March, 2019

  1. “It is an example of the paradox of India’s state capacity that it can execute well-defined tasks like elections, census, disaster relief etc with unparalleled proficiency and do the simplest things like running mid-day meal kitchens in the most appalling manner.”
    The always excellent Gulzar Natarajan on the paradox that is Indian bureaucracy. Given the remarkable efficiency with which we run our elections – and read the article to find out just how efficient it really is – why not other stuff in India? I do not have a clue. Will headline converge to core, or will core converge to headline?
  2. “While a Chinese depreciation would be a negative to shock to the world, China’s apparent willingness to use fiscal tools to restart its economy should be helpful to the world, at least directionally.*”
    The asterisk is at least as important as the excerpt, because the nature of the fiscal stimulus will matter more than the extent of it in the long run – but an update on what is fast becoming a mini-series – the state of China’s economy.
  3. “A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so – and yet these misinterpretations dominate much of the scientific literature.”
    I don’t know if I’ve fully understood p-values, and I don’t know if I do a good job of teaching them – to the extent that I understand them myself. And occasionally reading, and re-reading this blog post is therefore a useful thing to do. Assuming it is correct in the first place!
  4. “…boosting an intermediate range of labor-intensive, low-skilled economic activities. Tourism and non-traditional agriculture are the prime examples of such labor-absorbing sectors. Public employment (in construction and service delivery), long scorned by development experts, is another area that may require attention. But government efforts can go much further.”
    Buried in this article are a whole range of papers waiting to be written – but that’s for academicians to salivate over. The article is a wonderful summary of what good jobs are, why they are difficult to come by today, and what can be done to make sure that they do come by.
  5. “I suppose it’s worth trying to measure economic growth, but don’t take the findings too seriously.”
    Words to live by, and I mean that. Trying to measure economic growth is, ultimately, an un-solvable problem. Conversely, any estimate that we have is always going to be off the mark. Think through the implications (and the implications of the implications!)