I have been asked to write six columns about common scientific mistakes for the journal Nutrition. This is a draft of the first. I am very interested in feedback, especially about what you don’t like.
Lesson 1. Doing something is better than doing nothing.
“You should go to the studio everyday,” a University of Michigan art professor named Richard Sears told his students. “There’s no guarantee that you’ll make something good — but if you don’t go, you’re guaranteed to make nothing.” The same is true of science. Every research plan has flaws, often big ones — but if you don’t do anything, you won’t learn anything.
I have been asked to write six columns about common scientific mistakes. The mistakes I see are mostly mistakes of omission.
A few years ago I visited a pediatrician in Stockholm. She was interested in the connection between sunlight and illness (children are much healthier in the summer) and had been considering doing a simple correlational study. When she told her colleagues about it, they said: Your study doesn’t control for X. You should a more difficult study. It was awful advice. In the end, she did nothing.
Science is all about learning from experience. It is a kind of fancy trial and error. But this modest description is not enough for some scientists, who create rules about proper behavior. Rule 1. You must do X (e.g., double-blind placebo-controlled experiments). Rule 2. You must not do Y (e.g., “uncontrolled” experiments). Such ritualistic thinking is common in scientific discussions, hurting not only the discussants — it makes them dismissive — but also those they might help. Sure, some experimental designs are better than others. It’s the overstatement, the notion that experiments in a certain group are not worth doing, that is the problem. It is likely that the forbidden experiments, whatever their flaws, are better than nothing. A group that has suffered from this way of thinking is people with bipolar disorder. Over the last thirty years, few new treatments for this problem have been developed. According to Post and Luckenbaugh (2003, p. 71), “many of us in the academic community have inadvertently participated in the limitation of a generation of research on bipolar illness . . . by demands for methodological purity or study comprehensiveness that can rarely be achieved.”
Rituals have right and wrong. Science is more practical. The statistician John Tukey wrote about ritualistic thinking among psychologists in an article called “Analyzing data: Sanctification or detective work?” (Tukey, 1969). One of his examples involved measurement typology. The philosopher of science N. R. Campbell had come up with the notion, popularized by Stevens (1946), that scales of measurement could be divided into four types: ratio, interval, ordinal, and nominal. Weight and age are ratio scales, for example; rating how hungry you are is an ordinal measure. The problem, said Tukey, were the accompanying prohibitions. Campbell said you can add two measurements (e.g., two heights) only if the scale is ratio or interval; if you are dealing with ordinal or nominal measures, you cannot. The effect of such prohibitions, said Tukey, is to make it less likely that you will learn something you could have learned. (See Velleman and Wilkinson, 1993, for more about what’s wrong with this typology.)
I fell victim to right-and-wrong thinking as a graduate student. I had started to use a new way to study timing and had collected data from ten rats. I plotted the data from each rat separately and looked at the ten graphs. I did not plot the average of the rats because I had read an article about how, with data like mine, averages can be misleading — they can show something not in any of the data being averaged. For example, if you average bimodal distributions you may get a unimodal distribution and vice-versa. After several months, however, I averaged my data anyway; I can’t remember why. Looking at the average, I immediately noticed a feature of the data (symmetry) that I hadn’t noticed when looking at each rat separately. The symmetry was important (Roberts, 1981).
A corollary is this: If someone (else) did something, they probably learned something. And you can probably learn something from what they did. For a few years, I attended a meeting called Animal Behavior Lunch where we discussed new animal behavior articles. All of the meetings consisted of graduate students talking at great length about the flaws of that week’s paper. The professors in attendance knew better but somehow we did not manage to teach this. The students seemed to have a very strong bias to criticize. Perhaps they had been told that “critical thinking” is good. They may have never been told that appreciation should come first. I suspect failure to teach graduate students to see clearly the virtues of flawed research is the beginning of the problem I discuss here: Mature researchers who don’t do this or that because they have been told not to do it (it is “flawed”) and as a result do nothing.
References
Post RM, Luckenbaugh DA.. Unique design issues in clinical trials of patients with bipolar affective disorder. J Psychiatr Res. 2003 Jan-Feb;37(1):61-73.
Roberts, S. (1981). Isolation of an internal clock. Journal of Experimental Psychology: Animal Behavior Processes, 7, 242-268.
Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 677-680.
Tukey, J. W. (1969). Analyzing data: Sanctification or detective work. American Psychologist, 24, 83-91.
Velleman PF, Wilkinson L. Nominal, Ordinal, Interval, and Ratio Typologies Are Misleading. The American Statistician, Vol. 47, No. 1. (1993), pp. 65-72.
“only if the scale is ratio or interval; if you are dealing with interval measures, you cannot.”
Did you mean “only if the scale is ratio or ordinal” here?
I’m sympathetic to your argument; I’d like it more if it addressed the most popular and/or best-in-your-opinion counter arguments. Given that you are encouraging scientists to generate results more aggressively, you might start with “how can false results do harm?”
That’s a good point, thanks.
It isn’t easy to argue how more evidence can be worse. It’s not “results” that are false (unless fabricated); it’s the conclusions drawn from them.
Generally, I’d like to see clarification that distinguishes processes that lead to knowledge or hypotheses, but cannot be used to completely confirm hypotheses or establish knowledge, from processes or methods that can.
Or perhaps activities that lead to greater observation or awareness of phenomena, or to further hypotheses. Part of what you’re identifying is disjunctive thinking and an unwillingness in scientists to acknowledge a huge grey area.There is a huge continuum between partial confirmation of a hypotheses and totally unquestionable science. What determines what falls where on that continuum? Of course a controlled 5 year study with thousands of subjects might be ideal in confirming the Shangri-la Diet, but as more people have success with it over longer periods of time, it becomes more and more trivial.
A glaring error related to people who read about your diet: there seem to be doctors and experts who dismiss it as “placebo” when they could confirm that extra light olive oil, and associated methods, reduce hunger in the time between breakfast and lunch. Can you articulate the mind-set that leads to such intellectual incompetence? An enthusiastic teenager might be more likely to confirm what’s going on with the olive oil than many educated or mis-educated people. Much of the problem seems to be letting the perfect be the enemy of the good. There is a belief that because I know the hypothesis that I am trying to confirm, I cannot experiment on myself. But it would be pretty bizarre if my beliefs could alone take away hunger.
Part of the problem is that you are threatening some peoples’ sense of their own authority and power. To try out your method and see that it reduces hunger would be ego threatening.
It’s true that one person cannot completely confirm something through self experimentation, because there may be something unusual about them resulting in the effect. But even 5 or 10 people all getting the same effect in self experimentation gives me, at least, a lot of confidence that something interesting is going on, though we always have to be careful about self fulfilling prophecies.
I agree that there’s too much emphasis on the negative, but I would say that grad students are encouraged to be negative: I actually failed one of my qualification exams and had to take the entire thing the next year for including the following in the evaluation of a study:
“The lack of a real control group, subject self-selection, and other threats to
internal validity mentioned above make it impossible to distinguish regression to the mean from a treatment effect… However, this study takes place in the real world in which the prospects for good multi-year studies are severely limited by funding, compliance with protocols, and cooperative participants. Policy decisions need to be made even on programs whose effectiveness have not been evaluated at all. Assessing the study from a pragmatic policy-making perspective, interventions such as these have great promise… so some generosity should be shown in interpreting the results…[The program] is relatively low cost compared with alternatives … and unlike these programs, it has been studied. We can’t prove that the intervention had any effect … , but it may very well be the best option among the set of alternatives.”
Janet R., you failed your qualifying exam because you included praise — pointing out virtues — in one of your answers? Astonishing. Can you explain why? I’ve never heard of such a thing.
On the internet we call them trolls.