A Walk Through a Simple T-Test Problem

If you’ve been reading the posts of the past few days, I invite you to watch this video carefully, and ask yourselves which parts of it you agree with, which parts you disagree with, and why.

Let’s Brew Some Beer

Back when I used to work at the Gokhale Institute, I would get a recurring request every year without fail. What request, you ask? To get AB InBev to come on campus. To the guys at AB InBev – if you’re reading this, please do consider going to GIPE for placements. The students are thirsty to, er, learn.

But what might their work at AB InBev look like?

I don’t know for sure, but it probably will not involve working with barley and hops. It should, though, if you ask me, because today, building statistical models about other aspects of selling beer today might rake in the moolah. But there will be a pleasing historical symmetry about using stats to actually brew beer.

You see, you can’t – just can’t – make beer without barley and hops. And to make beer, these two things should have a number of desirable characteristics. Barley should have optimum moisture content, for example. It should have high germination quality. It needs to have an optimum level of proteins. And so on. Hops, on the other hand, should have beefed up on their alpha acids. They should be brimming with aroma and flavor compounds. There’s a world waiting to be discovered if you want to be a home-brewer, and feel free to call me over for extended testing once you have a batch ready. I’ll work for free!

But in a beer brewing company, it’s a different story. There, given the scale of production, one has to check for these characteristics. And many, many years ago – a little more than a century ago, in fact – there was a guy who was working at a beer manufacturing enterprise. And this particular gentleman wanted to test these characteristics of barley and hops.

So what would this gentleman do? He would walk along the shop-floor of the firm he worked in, and take some samples from the barley and hops that was going to be used in the production of beer. Beer aficionados who happen to be reading this blog might be interested to know the name of the firm bhaisaab worked at. Guinness – maybe you’ve heard of it?

So, anyway, off he’d go and test his samples. And if the results of the testing were encouraging, bhaisaab would give the go-ahead, and many a pint of Guinness would be produced. Truly noble and critical work, as you may well agree.

But Gosset – for that was his name, this hero of our tale – had a problem. You see, he could never be sure if the tests he was running were giving trustworthy results. And why not? Well, because he had to make accurate statements about the entire batch of barley (and hops). But in order to make an accurate statement about the entire batch, he would have liked to take larger samples.

Imagine you’re at Lasalgaon, for example, and you’ve been tasked with making sure that an entire consignment of onions is good enough to be sold. How many sacks should you open? The more sacks you open, the surer you are. But on the other hand, the more sacks you open, the lesser the amount left to be sold (assume that once a sack is open, it can’t be sold. No, I know that’s not how it works. Play along for the moment.)

A student of statistics faced with a real world problem (Source)

So how many sacks should you open? Well, unless your boss happens to be a lover of statistics and statistical theory for its own sake, the answer is as few as possible.

The problem is that you’re trying to then reach a conclusion about a large population by studying a small sample. And you don’t need high falutin’ statistical theory to realize that this will not work out well.

Your sample might be the Best Thing Ever, but does that mean that you should conclude that the entire population of barley and hops is also The Best Thing Ever? Or, on the other hand, imagine that you have a strictly so-so sample. Does that mean that the entire batch should be thought of as so-so? How to tell for sure?


Worse, the statistical tools available to Gosset back then weren’t good enough to help him with this problem. The tools back then would give you a fairly precise estimate for the population, sure – but only if you took a large enough sample in the first place. And every time Gosset went to obtain a large enough sample, he met an increasingly irate superior who told him to make do with what he already had.

Or that is how I like to imagine it, at any rate.

So what to do?

Well, what our friend Gosset did is that he came up with a whole new way to solve the problem. I need, he reasoned, reasonably accurate estimates for the population. Plus, khadoos manager says no large samples.

Ergo, he said, new method to solve this problem. Let’s come up with a whole new distribution to solve the problem of talking usefully about population estimates by studying small samples. Let’s have this distribution be a little flatter around the centre, and a little fatter around the tails. That way, I can account for the greater uncertainty given the smaller sample.

And if my manager wants to be a little less khadoos, and he’s ok with me taking a larger sample, well, I’ll make my distribution a little taller around the center, and a little thinner around the tails. A large enough sample, and hell, I don’t even need my new method.

https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Continous-Random-Variables/Students-t-Distribution/index.html

And that, my friends, is how the t-distribution came to be.


You need to know who Gosset was, and why he did what he did, for us to work towards understanding how to resolve Precision and Oomph. But it’s going to be a grand ol’ detour, and we must meet a gentleman, a lady, and many cups of tea before we proceed.

The Gift That Keeps on Giving: The p-value

Naman Mishra, a friend and a junior from the Gokhale Institute, was kind enough to read and comment on my post about Abhinav Bindra and the p-value. Even better, he had a little “gift” for me – a post written by somebody else about the p-value:

P values are the probability of observing a sample statistic that is at least as different from the null hypothesis as your sample statistic when you assume that the null hypothesis is true. That’s a pretty convoluted but technically correct definition—and I’ll come back it later on!

https://statisticsbyjim.com/hypothesis-testing/p-values-misinterpreted/

It is convoluted, of course, but that’s not a criticism of the author. It is, instead, an acknowledgement of how difficult this concept is.

So difficult, in fact, that even statisticians have trouble explaining the concept. (Not, I should be clear, understanding it. Explaining it, and there’s a world of a difference).

Well, you have my explanation up there in the Abhinav Bindra post, and hopefully it works for you, but here is the problem with the p-value in terms of not how difficult the concept i, but rather in terms of its limitations:

We want to know if results are right, but a p-value doesn’t measure that. It can’t tell you the magnitude of an effect, the strength of the evidence or the probability that the finding was the result of chance.

https://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/

In other words, the p-value is not the probability of rejecting the null when it is true. And here’s where it gets really complicated. I myself have in classes told people that the lower the p-value, the safer you should fail in rejecting the null hypothesis! And that’s not incorrect, and it’s not wrong… but well, it ain’t right either.

Consider these two paragraphs, each from the same blogpost:

https://statisticsbyjim.com/hypothesis-testing/interpreting-p-values/

But also, there’s this, from earlier on in the same blogpost:

https://statisticsbyjim.com/hypothesis-testing/interpreting-p-values/

This.”, you can practically hear generation after generation of statistics students say with righteous anger. “This is why statistics makes no sense.”

“Boss, which is it? Can p-values help you reject the null hypothesis, or not?”


Fair question.

Here’s the answer: no.

P-values cannot help you reject the null hypothesis.

You knew there was a “but”, didn’t you? You knew it was coming, didn’t you? Well, congratulations, you’re right. Here goes.

But they’re used to reject the null anyway.


Why, you ask?

Well, because of four people. And because of beer and tea. And other odds and ends, and what a story it is.

And so we’ll talk about beer, and tea and other odds and ends over the days to come.

But as with all good things, let’s begin with the beer. And with the t*!

*I’ve wanted to crack a stats based dad joke forever. Yay.