Suppose you ask several experts how to choose a good car. Their answers reveal they don’t know how to drive. What should you conclude? Suppose these experts build cars. Should we trust the cars they’ve built?
Gina Kolata writes that “experts agree that there are three basic principles that underlie the search for medical truth and the use of clinical trials to obtain it.” Kolata’s “three basic principles” reveal that her experts don’t understand experimentation.
Principle 1. “It is important to compare like with like. The groups you are comparing must be the same except for one factor — the one you are studying. For example, you should compare beta carotene users with people who are exactly like the beta carotene users except that they don’t take the supplement.” An expert told her this. This — careful equation of two groups — is not how experiments are done. What is done is random assignment, which roughly (but not perfectly) equates the groups on pre-experimental characteristics. A more subtle point is that the X versus No X design is worse than a design that compares different dosages of X. The latter design makes it less likely that control subjects will get upset because they didn’t get X and makes the two groups more equal.
Principle 2. “The bigger the group studied, the more reliable the conclusions.” Again, this is not what happens. No one with statistical understanding judges the reliability of an effect by the size of the experiment; they judge it by the p value (which takes account of sample size). The more subtle point is that the smaller the sample size, the stronger the effect must be to get reliable results. Researchers try to conserve resources so they try to keep experiments as small as possible. Small experiments with reliable results are more impressive than large experiments with equally reliable results — because the effect must be stronger. This is basically the opposite of what Kolata says.
Principle 3. In the words of Kolata’s expert, it’s “Bayes theorem”. He means consider other evidence — evidence from other studies. This is not only banal, it is meaningless. It is unclear — at least from what Kolata writes — how to weigh the various sources of evidences (what if the other evidence and the clinical trials disagree?).
Kolata also quotes David Freedman, a Berkeley professor of statistics who knew the cost of everything and the value of nothing. Perhaps it starts in medical school. As I blogged, working scientists, who have a clue, don’t want to teach medical students how to do research.
If this is the level of understanding of the people who do clinical trials, how much should we trust them? Presumably Kolata’s experts were better than average — a scary thought.
Well I don’t know about clinical trials, but I know we shouldn’t ever trust Kolata. She had an illuminating back-and-forth with Gary Taubes at one point (you documented it) which demonstrated her inability to read and understand simple English sentences, much less basic science.
Andrew, she wanted to understand clinical trials, so she talked to some experts who do them. What she found revealed incompetence, which is interesting. Talking to quantitative political scientists or psychometricians wouldn’t be a good way to learn about clinical trials.
Principle 2: An experimental psychologist who read an experiment (in psychology) with a large sample size (e.g., n = 20) would be suspicious: Why such a large sample size? It must mean the effect is weak or maybe they did the experiment with n = 10 (a typical size) and didn’t find anything. Anyway, it would mean something was off. In this sense, the larger the sample size, the less trustworthy.
Andrew, by “effect” I meant experimental effect. The whole discussion is about experiments. The sex ratio stuff in your Amer Scientist article isn’t experimental (= does not come from experiments). I’m happy to learn about an example that contradicts what I said but it would need to be an experiment.
Your sister’s research isn’t experimental psychology, it’s developmental psychology. I agree, the term experimental psychology (= perceptual and cognitive psychology and animal learning) isn’t terribly clear to outsiders. Developmental psychology experiments tend to have larger n’s than experimental psychology experiments.
You’re being unusually uncharitable in your reading here, Seth. I don’t see anything inaccurate in her article. It’s a bit imprecise or unclear in places (for instance, she shouldn’t have said “exactly”), and it all seems pretty basic, but I don’t see this deep ignorance of research design that you’re reading into her article.
Her first principle is that you need to eliminate confounding variables so that you can be confident that differences are due to the factor that you’re trying to study. She describes random assignment as the standard way to do this (I’m not sure why you think she doesn’t understand random assignment when she discusses it right there, explaining why randomization is better than observational studies that try to statistically control for differences). The second principle is saying (correctly) that larger studies give you a more precise estimate of the effect size. Studies with a smaller sample size have wider confidence intervals. A point estimate of a 20% reduction in risk may be misleading if the confidence interval is a 5%-35% reduction. The third principle is that other evidence can continue to be relevant after you’ve done a full study with random assignment. She seems to reach the correct conclusions about the two examples that she describes, one (prayer) where she thinks you should doubt the results of the study because of other evidence and one (beta carotene) where she thinks that you should trust the results of the study despite the other evidence, although you’re right that she doesn’t give much of an explanation of how to reach these conclusions.
True, she does mention randomization. Maybe her mistake was to ask an epidemiologist about clinical trials. Not realizing that epidemiologists do surveys, not experiments. Equation of the groups being compared is a much bigger deal for epidemiologists than experimenters.
I don’t think it’s obvious that the beta-carotene clinical trials are more trustworthy than the other beta-carotene studies. I’d have to know a lot more about the details before I’d reach that conclusion. For example, large clinical trials allow vast possibilities for data entry errors, which will reduce differences between groups. I know an example where a transcription error wasn’t noticed for 40 years. Did the MRFIT clinical trial reach the right conclusion (of no effect)? It’s still hard to know.
What neither Kolata nor her experts understand is that until something more accurate than “randomized clinical trials” comes along, we have no way of generally assessing their accuracy — just as the problem with eyewitness testimony only became apparent when DNA testing came along.
Another NY Times expert: The fat gent heading up Yale’s Obesity Center:
https://www.nytimes.com/2009/07/04/health/04patient.html
“What matters most is your level of motivation and your willingness to change,” says Kelly D. Brownell, a psychologist and director of the Rudd Center for Food Policy and Obesity at Yale.
Really, Dr. Brownell? How would you know?
thanks for your posting. I loved the comments and that back and forth.
It reminds me, in some slight way, about how clinical trials come about, and how they are used to find both negative and positive results.
We all need to go back and forth and finalize and make our opinions clearer as we move forward.
Thank you for an enlightening post!