Earlier in this series, I talked about the “quality assuring” (QA) aspects of formal peer review; and then we took a look at preprints, postprints and Versions of Record. In this post, I’m focusing on the Why and the How of replication studies. At the end of this post I’ll wrap up with a brief look at retractions, the ultimate “undo” button for papers. Overall, they make up the parts of the standard research and reporting process aimed at improving the quality and reliability of research outputs. The effort put into them —which can be considerable —serves to improve the scientific record, which in turn helps all of us.
A long time ago (1934 or so; from 1959 in English), the Viennese philosopher of science Karl Popper articulated what has since become known as the “criterion of falsifiability,” the notion that within the scientific disciplines, for a proposition to be considered scientific, it must be — at least in principle — capable of being refuted by an experiment. It may not be a perfect criterion but applying it does serve to separate science from pseudo-science and that, in the words of Martha Stewart, is “a good thing.”
In his book, The Logic of Scientific Discovery, Popper wrote:
“We do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them. Only by such repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence,’ but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable.”
Admittedly, that is all a bit jargon-y; but what I see Popper as getting at here has led to the noble practice, in scientific research and publishing, of demanding that experiments be re-run to see if their results hold up. Ideally, the initial results will check out but, if they don’t, maybe the methods were invalid or the data wasn’t of sufficient quality in the first place. In other words, if something is amiss, it is much better to find it out as soon as possible. In biomedicine, this testing rigor is enforced through procedures of testing a new substance or device in vitro (essentially, in a dish), then in vivo (i.e., on live subjects, starting with animal testing), and only then on human subjects (control groups and all that); then —with luck—final approvals will be obtained from the FDA (or other authorities) for specific uses. In essence, what we see in this instance is the scientific process at work, striving to provide treatments that are at once both safe and effective for doctors to prescribe and for people to use.
At the most abstract level, scientific studies – specifically, experiments which are designed to ferret out new knowledge – produce outcomes that indicate . . . something. Maybe the something is “the null hypothesis” — effectively, that no meaningful relation between this and that shows up in the results. Alternatively, some new and interesting result may be uncovered, as when penicillin was shown to have a strong, general antibiotic effect. But such discoveries are not complete – they are not considered reliable — until they have been replicated by others. In basic terms, replication studies are those in which the conditions, data and procedures of original experiments are re-run to see if they come out the same. Of course, there is more to it than that.
Although I am sure there are many others, the greatest failure-to-replicate example that I know of is the Fleishmann-Pons “cold fusion” debacle of 1989. (I recall following the controversy in Usenet’s sci.physics newsgroups in near-real-time. It was fascinating, even to an outsider like me.) Bear in mind that “cold fusion” (sometimes referred to as “desktop fusion,”) would have certainly changed the world of energy and fuel production — if the effect were real.
The summary paragraph provided in the Wikipedia entry on “Cold Fusion” is precise and to the point:
“In 1989, two electrochemists, Martin Fleischmann and Stanley Pons, reported that their apparatus had produced anomalous heat (“excess heat”) of a magnitude they asserted would defy explanation except in terms of nuclear processes. They further reported measuring small amounts of nuclear reaction byproducts, including neutrons and tritium. The small tabletop experiment involved electrolysis of heavy water on the surface of a palladium (Pd) electrode. The reported results received wide media attention and raised hopes of a cheap and abundant source of energy.”
Boiled down to essentials, the initial claims of the Fleishmann-Pons team concerning detectable energy production — and particularly their inference as to its source — failed to replicate, and failed to hold up under closer scrutiny. The more specialists refined the experimental procedure, the less the net energy production effect, even when present, appeared. Although unfortunate for the reputations of those two scientists, the replication step did its job; relatively quickly, it showed that there were significant problems in the experiment and that the claimed results were not to be relied on.
Note: At this distance of time, it appears that Fleishmann-Pons and follow-on experiments found something interesting, but not energy at the levels they thought, and certainly not any form of anything that should be referred to as “cold fusion.”
And that brings us to retractions.
Even where the underlying research and the resulting article have cleared all the usual hurdles and have been published in a reputable (not fly-by-night) journal, sometimes – rarely —major problems with the overall work are identified after-the-fact. The procedure for addressing such problems is known as “making a retraction” —aka, “pulling the paper”— and only an unusual sort of person would enjoy doing it. Even though many (most?) retractions may be the result of honest and very subtle mistakes on the part of the research team, it is understandable that those involved may typically feel frustration, sadness and perhaps even a touch of anger. The author or research team might feel defeated or even rejected, the editor and journal likely perceive they have suffered a loss of reputation, and readers may feel let down by the whole business. Retractions can occur fast, or they can be slow and take years to complete. For those who need to follow such things, Retraction Watch is a good aggregator of retraction events. Even so, retractions are the “after-the-fact” part of the quality process in publishing; they are in that was a good thing, although perhaps a temporarily painful one —their existence is proof that the system , however imperfectly, is working.
According to the Oxford Dictionaries, “quality assurance” may be functionally defined as “the maintenance of a desired level of quality in a service or product, especially by means of attention to every stage of the process of delivery or production.” Throughout this short series, we’ve focused on the many steps, and all stages, used in scientific and scholarly publishing to ensure and improve on the quality and reliability of published articles. As is often said about peer review, but I think is equally true about the others, these are the inglorious (or, maybe, simply non-glorious) but necessary procedures to establish and implement if a top-quality product and reputation are to be earned and maintained.