Stiffening the Standards of Scientific Research

Oct 24, 2013

The Economist has published a scathing indictment of scientific research titled “Unreliable research: Trouble at the lab.” The headline proclaims: “Scientists like to think of science as self-correcting. To an alarming degree it is not.”

An accompanying article, “Problems with scientific research: How science goes wrong,” tells us, “Scientific research has changed the world. Now it needs to change itself.”

While I agree that changes are called for in certain standards and practices, it is wrong to conclude that there are any fundamental flaws in the basic methods of science. When science is done properly, it still remains the most powerful force for human advancement the world has ever seen.

The Economist reports on attempts to replicate widely cited biomedical experiments. The results are admittedly dismal. In once study, published in Nature, only six of 53 so-called “landmarks” in the study of cancer were replicated. An official at the National Institutes of Health estimates that three-quarters of all published biomedical findings would be hard to reproduce.

I am not the slightest bit surprised. For years I have been crusading against the low publication standards used in biomedical and psychological journals, among others. The Economist cites one of these common standards: “When testing a specific hypothesis, scientists run statistical checks to work out how likely it would be for data which seem to support the idea to have come about simply by chance. If the likelihood of such a false-positive conclusion is less than 5 percent, they deem the evidence that the hypothesis is true ‘statistically significant.’ They are thus accepting that one result in 20 will be falsely positive–but one in 20 seems a satisfactorily low rate.”

And that’s the problem. One in 20 is far from a satisfactory low rate. Let’s think of what that implies. If 100 experiments are conducted in a variety of fields, five of them will have effects that are taken to be significant but are really just statistical fluctuations. But, it’s worse than that. A good number of the 95 other experiments that saw “no effect” are likely not to be published, since only positive results tend to be published — at least in some fields. In fact, it would be a reasonable conclusion that most, if not all, published claims of new phenomena at the 5 per cent level in these fields are simply wrong.

Indeed, that was exactly the conclusion of Stanford epidemiologist John Ionnidis in a 2005 paper titled “Why Most Published Research Findings Are False.”

More recently, Ionnidis reported that the average publishing statistical standard in neuroscience is 21 percent. Another report shows that in psychology it is a ridiculous 35 percent.

Back in the 1960s, when I was still a graduate student at UCLA, high energy particle accelerators were producing new elementary particles left and right. Not all those reported were being replicated, and eventually the field of particle physics adopted the “5-sigma” rule for the official announcement of any new phenomenon. This corresponds to a statistical significance of one part in 3.5 million, to be compared with one in 20 for the biomedical standard (2-sigma). The discovery of the Higgs boson reported in July 2012 was at the 5-sigma level for two independent experiments, implying an overall probability that the effect was a statistical fluctuation of about one in 10 trillion.

The Economist mentions the particle physics standard but properly points out that maximizing a single figure of merit is “never enough.” It gives the example of the report of a “pentaquark” by several research groups in the mid-2000s that met the 5-sigma test. This was a particle that appeared to be composed of five quarks.

I remember this incident well because if the pentaquark existed I should have seen it in my PhD thesis experiment, published in 1963 (fifty years ago this month!). I didn’t. And that’s because it isn’t there. As The Economist reports, the experiments were not properly blinded. When this was corrected, the pentaquark disappeared.

However, this example should not be taken as an indictment of science. Rather, it shows that good science is still self-correcting. As in the case of the false report last year of neutrinos moving faster than the speed of light, when the scientific method is carried out competently and honestly, the natural human mistakes that are inevitably going to be made in the process are eventually uncovered.

Now, I am not suggesting that the biomedical sciences adopt the 5-sigma criteria and require independent replications before any new potential cure is clinically applied. Since these standards would be almost impossible to achieve, that would result in very few cures indeed. Medicine has pressures on it to produce cures quickly, while no one would have died had the Higgs boson not been discovered for another ten years or so.

However, the current low standards in medicine and psychology can’t be doing anyone any good if most published findings are in fact false. At the very least the statistical standard for all publications should be raised from 2-sigma to 3-sigma, which is a confidence level of one in 370. Also, more replication should be encouraged (and funded).

Furthermore, the value of publishing negative results should be recognized. Again referring to my own field, one of the most important negative results has been the continual failure to detect proton decay. Each new published experiment sets a more stringent limit on the decay lifetime that helps theorists rule out various models and guides experimentalists on where to look next.

And mark my word: someday protons will be seen to decay and help tell us why the universe is still here after starting out with an equal amount of matter and antimatter that should have totally annihilated 13.8 billion years ago. If it had, medical standards would be moot.

But one part in a billion of the protons and antiprotons did not annihilate and we are here as the result. We may not be here for much longer, though, unless the highest possible standards are applied in the conduct of science of every kind.