Elementary, My Dear Excel

This broke my heart:

But some researchers are calling Ariely’s large body of work into question after a 17 August blog post revealed that fabricated data underlie part of a high-profile 2012 paper about dishonesty that he co-wrote. None of the five study authors disputes that fabrication occurred, but Ariely’s colleagues have washed their hands of responsibility for it. Ariely acknowledges that only he had handled the earliest known version of the data file, which contained the fabrications.
Ariely emphatically denies making up the data, however, and says he quickly brought the matter to the attention of Duke’s Office of Scientific Integrity. (The university declined to say whether it is investigating Ariely.) The data were collected by an insurance company, Ariely says, but he no longer has records of interactions with it that could reveal where things went awry. “I wish I had a good story,” Ariely told Science. “And I just don’t.”


I’ve been recommending Dan Ariely’s books and talks to students for years now, and with good reason. But whether he himself was responsible for this, or not, it is certainly the case that a thorough investigation is warranted, both of this specific paper, but also of his entire body of work.

But the point of this post isn’t to just point out this rather depressing fact. The blogpost that broke the story is worth reading in full for the following reasons:

  1. The admirable clarity in how it is written. Anybody who knows the very basics of math and statistics (and I do mean the very basics) will be able to understand what is going on.
  2. You don’t need to know any coding to figure out how they uncovered the fraud. Simple Excel is enough.
  3. The researchers have provided the data for you to play along with as you read the blogpost.

So if you are a student of statistics (and that is all of us, like it or not), I’d strongly encourage you to set aside a couple of hours, and work your way through the post and the Excel file(s).

And finally, a word of advice if you are a student who is just about beginning to play around with data:

  1. Don’t commit fraud. It sounds stupid, almost, to dispense this advice, but please, resist the temptation.
  2. Double check data that has been sent to you by somebody else. Triple check it! And checking means running sanity checks. There is still a chance that you will not be able to detect fraud, if it has been committed, but minimize the chances. Get better at asking questions of the data you are working with!
  3. Stuff like this is, trust me on this, the best way to learn statistics. No amount of end-of-chapter problem solving will help you get your basics clear like a statistical whodunnit. Or a what-was-done, as in this case.

A lengthy excerpt, but a necessary one. What follows are the last three paragraphs of the blogpost that broke this story:

We have worked on enough fraud cases in the last decade to know that scientific fraud is more common than is convenient to believe, and that it does not happen only on the periphery of science. Addressing the problem of scientific fraud should not be left to a few anonymous (and fed up and frightened) whistleblowers and some (fed up and frightened) bloggers to root out. The consequences of fraud are experienced collectively, so eliminating it should be a collective endeavor. What can everyone do?
There will never be a perfect solution, but there is an obvious step to take: Data should be posted. The fabrication in this paper was discovered because the data were posted. If more data were posted, fraud would be easier to catch. And if fraud is easier to catch, some potential fraudsters may be more reluctant to do it. Other disciplines are already doing this. For example, many top economics journals require authors to post their raw data [16]. There is really no excuse. All of our journals should require data posting.
Until that day comes, all of us have a role to play. As authors (and co-authors), we should always make all of our data publicly available. And as editors and reviewers, we can ask for data during the review process, or turn down requests to review papers that do not make their data available. A field that ignores the problem of fraud, or pretends that it does not exist, risks losing its credibility. And deservedly so.


If you’re writing a paper, put your data up for public scrutiny. Always, and without fail. It matters.