An account of the genomics scandal at Duke University has appeared in Significance (a journal sponsored by British and American statistical societies). The scandal caused the end of a clinical trial — it had been based on fraudulent data — and the resignation of assistant professor Anil Potti, who had among other things falsified his resume.
It reminded me of the Ranjit Chandra case. Similarities: 1. The published results could not be reconstructed from data. In Chandra’s case, some of the results were statistically impossible. In the Potti case, two statisticians were unable to go from raw data they were given to the published results. 2. Outsiders important. Saul Sternberg and I, who are psychology professors, not nutrition professors, wrote an article that drew attention to what Chandra had done and caused retraction of one of his papers. As far as I could tell, at least a few nutrition professors had believed for many years that Chandra made up data. In Potti’s case, the deception was revealed by two statisticians. Perhaps Chandra and Potti both believed (a) hardly anyone will notice and (b) if anyone notices, they won’t do anything. 3. Incidental fabrication. In one paper, Chandra said that everyone asked to be in the study agreed to participate. The study involved having blood drawn many times. Potti claimed to be something similar to a Rhodes Scholar. 4. Found innocent. Years before Sternberg and I got involved, Chandra had been accused by his research assistant, a nurse. A Memorial University committee found him innocent of her accusations — at least, her accusations were not upheld. Chandra then sued the nurse. In the Potti case, a Duke University committee looked into the case and found no serious wrongdoing. A clinical trial based on the Potti results, which had been stopped, was resumed.
Factor 2 (outsiders important) is no surprise to readers of this blog, although the new account doesn’t mention it. But Factors 1 (reconstruction impossible) and 3 (incidental fabrication) mean that the fabrication should have been relatively easy to confirm. Yet Factor 4 seems to suggest it was hard to confirm. Factor 4 — in spite of Factors 1 and 3 — implies there is something mysterious and important going on here, more mysterious and interesting than someone lying. But I cannot say what.
The Significance article, which is by Darrel Ince, a professor of computing at the Open University, includes several suggestions for improving the system. I fail to see why they will help and they have significant costs. One of them is to put the original data and software in an independent repository. I think this would make things worse. People would continue to fake research; now. they would now also fake raw data, in addition to the graphs and tables needed for publication. In the past, thinking they wouldn’t be caught, fakers would either (a) not make up the raw data (Chandra) or (b) do so carelessly (Potti). Their overconfidence was key to catching them.
My suggestion along these lines is a requirement that researchers make available upon request the raw data and any original software. They store it themselves, in other words. If they fail to fulfill outside requests for these materials within one month, this will be grounds for immediate retraction of the paper. Without something like this, a store-it-yourself requirement means little. I once requested the raw data for a paper that had appeared in a journal that had a make-data-available policy. The authors refused my request. The editor did nothing. As A. W. Montford makes clear in The Hockey Stick Illusion, we would all be better off if Michael Mann and other authors had simply handed over the raw data behind their “hockey stick” temperature graphs when requested rather than fight a long string of FOIA battles (and mull over what emails to delete).
As I understand it, the Duke case was not about fraudulent data but sloppy data analysis. Some may feel that “sloppy” is being too generous and that there was fraud in the analysis. But I don’t believe anyone is questioning the data itself, only the analysis.
Anil Potti did avoid share his raw data from his bioinformation and statistician colleagues. They’ve been suspecting fishy data handling way before the scandal outbreak. He made Joe Nevins believe he did better job analyzing their data than statisticians can.
I like this idea. Assumedly competing “data banks” would arise, that would offer to store the data and fulfill requests for a fee.
John, you’re right, I have not heard the data “questioned” — just the whole analysis, which could be said to include more than the data. On the other hand, you are the first person I’ve heard say that perhaps Potti was merely sloppy — that is, made honest mistakes. The whole picture doesn’t support that conclusion. There were too many problems, for one thing. And the effect of all the supposed sloppiness was far too favorable for the sloppy person.
Factor 4 does prove that the oversight committees are incompetent at oversight. They’re either not using the obvious metrics or invalidly explaining away negative results.
As always, I distrust official pronouncements unless I have specific evidence to confirm them.
Historically, such committees have been rubber-stamp providers, used as political gatekeepers. For example, if these scientists and committees were Chinese, everyone would just assume it was all political.
In this case, we have specific evidence that they are similar to the historical norm.
Fact is, unravelling the support for the corrupt institution is highly complex. (That’s why they’re the historical norm.) For example, having the entire voting public well-versed in history would do it – it would be well known they’re usually 100% political – but that’s impossible. In any case, the perverse incentives have to be disrupted, and these particular perverse incentives are due to system-wide perverse incentives.
Though, eventually it will go away on its own. Rubber-stamp enough fraudulent studies, and even the thickest observer will realize the scientists aren’t reliable.
Both researchers are Indians. Indians cheat a lot.
Your are missing the most important part of the story, what happened to the nurse ? Was she punished ? If so the most important lesson that we can draw is that if you see scientific misconduct on the part of a superior, keep your mouth shut.
“was the nurse punished?” As I said she was sued. That is punishment, for sure.
I followed the Potti story more closely at the beginning than now. My colleagues were the ones who tried and failed to reproduce the analysis. At first it looked like incompetence. Now more people are suggesting fraud.
It’s sobering to realize how difficult it was to get anyone to acknowledge there was a problem here. Baggerly and Coombes had to be very persistent in their criticism before people started to pay attention. Eventually it became a scandal, but it would not have been if they had given up after, say, a year.
“At first it looked like incompetence.” At first it was interpreted as incompetence, I suppose, only because that was the more charitable and safer interpretation. It was not a likely explanation. Incompetence turns great raw data into bad surface data. Never the opposite. Potti had great surface data.
I believe you’re arguing that the probability of a positive result given an error is small. And I agree. Mistakes usually don’t help your case. But consider the opposite probability, the probability of an error given a positive result.
Suppose you’re working in an area where positive results are few and far between. What is the probability that a promising result resulted from a bungled analysis? Possibly high. If nearly every result is truly negative, the positive ones that catch our attention are likely to be in error. There could be an effect analogous to stochastic resonance.
Here’s a new article about the Potti case from The Economist that was just posted. https://www.economist.com/node/21528593
Here’s my favorite line from the article: “I find it ironic that we have been yelling for three years about the science, which has the potential to be very damaging to patients, but that was not what has started things rolling.”
Bad science didn’t bring Potti down, padding his resume did.
Thanks for the link, John. Excellent article. My favorite line is: Dr. Nevins “could not explain why he had not detected the problems even when alerted to anomalies.” In other words, Nevins did not understand the research in a paper that he co-authored.
Your comment about Nevins reminds me of someone — I forget who, I may have seen it on Andrew Gelman’s blog — who blamed problems with his article on a graduate student who had not been listed as a co-author. In other words, “I get the credit and my peons take the blame.” What a deal!