Adding to earlier discussion of that question, here is an excellent article in the LA Times.
Addendum: A long article in the NY Times on exactly this question appeared minutes after I posted the above.
Adding to earlier discussion of that question, here is an excellent article in the LA Times.
Addendum: A long article in the NY Times on exactly this question appeared minutes after I posted the above.
An interesting Economist article about sex differences in a visual task calls an evolutionary explanation a “just-so story.” I don’t know if the late Stephen Jay Gould, evolutionary theorist, Harvard professor, and “one of the most influential and widely-read writers of popular science of his generation” (Wikipedia), invented this form of dismissal, but certainly he was fond of it. Here, for example:
Evolutionary biology has been severely hampered by a speculative style of argument that . . . tries to construct historical or adaptive explanations for why this bone looked like that or why this creature lived here. These speculations have been charitably called “scenarios”; they are often more contemptuously, and rightly, labeled “stories” (or “just-so stories” if they rely on the fallacious assumption that everything exists for a purpose). Scientists know that these tales are stories; unfortunately, they are presented in the professional literature where they are taken too seriously and literally.
Well, this is seriously wrong. My work contains several just-so stories — evolutionary explanations of the morning-faces effect and of the mechanism behind the Shangri-La Diet, for example. My theory of human evolution might be called a just-so saga.
These explanations made me (at least) believe more strongly in the result or theory they explained — which turned out to be a good thing. My morning-faces result was at first exceedingly implausible. The evolutionary explanation encouraged me to study it more. After repeating it hundreds of times I no longer need the evolutionary explanation to believe it but the explanation may help convince others to take it seriously. The evolutionary explanation connected with the Shangri-La Diet had the same effect. My evolutionary explanation of the effect of breakfast on sleep led me to do the experiment that discovered the morning-faces effect. My theory of human evolution led me to try new ways of teaching, with good results.
Why did Gould make this mistake? Thorstein Veblen wrote about our fondness for “invidious comparisons.” We like to say our X is better than someone else’s X. Sure, evolutionary explanations may be hard to test. That doesn’t mean they’re worthless. Like many scientists, Gould failed to grasp that something is better than nothing.
Addendum: Perhaps the Economist writer had read a recent Bad Science column that began:
I want you to know that I love evolutionary psychologists, because the ideas, like “girls prefer pink because they need to be better at hunting berries” are so much fun. Sure there are problems, like, we don’t know a lot about life in the pleistocene period through which humans evolved; their claims sound a bit like “just so” stories, relying on their own internal, circular logic; the existing evidence for genetic influence on behaviour, emotion, and cognition, is coarse; they only pick the behaviours which they think they can explain while leaving the rest; and they get themselves in massive trouble as soon as they go beyond examining broad categories of human behaviors across societies and cultures, becoming crassly ethnocentric.
“They only pick the behaviours which they think they can explain” — how dare they!
A friend of mine has started to wonder how to find scientists he will feel comfortable working with. For the past year, he has been working in a lab in a very prestigious institution. He wrote me about it:
The director of my lab is a very successful scientist. She is also director of the research facility. Our personalities blended well initially, but then we grew apart. She is very nice, very busy, and impressively ambitious. Despite her genuine desire to be nice, honest, and good teacher, her ambition is supreme — above honesty and integrity from my point of view.
My biggest issue has been her caring more about her own advancement than about the discovery of truth. She does not blatantly lie about her research results, but she profoundly modulates her research efforts based upon what she believes will give her the success she seeks. I realize that on the face of it there is not anything unethical about ambition directing the evolution of research. However, I am not comfortable with the degree to which the research in this group is shaped by its leader’s ambition.
What has had the biggest effect on me is realizing that it isn’t just her. The rest of my group has allowed her to pursue her strategies. I have realized that I don’t want to pursue research in a culture where ambition is above all, particularly the pursuit of truth.
What’s an example? I asked. He replied:
We had an interesting result in a study we did. Accompanying this result was an unusual artifact. It is my impression that my director did not want to publish our good result because she was hesitant to admit that we observed this unusual artifact. I believe that the unusual artifact could negatively impact the use of fMRI to investigate pharmacological drugs that affect the brain — a big research market. It is not a lie to not publish a result. However, I don’t like not being able to speak frankly about the implications of a result.
This is an advantage of self-experimentation I hadn’t thought of.
Addendum: “It is not a lie to not publish a result.” In The Shangri-La Diet I use Vladimir Nabokov’s term doughnut truth — the whole truth, nothing but the truth, with a hole in the truth.
I read The Theory of the Leisure Class by Thorstein Veblen during college and was very impressed. One of the book’s main points is that wealthy people advertise their avoidance of “dirty” work. Long fingernails on women. Obscure and elaborate phrases in academic articles. “The advantage of the accredited locutions lies in their reputability; they are reputable because they are cumbrous and out of date, and therefore argue waste of time and exemption from the use and the need of direct and forcible speech,” wrote Veblen.
A friend of mine does research for an oil company. Several years ago, the oil company he worked for (Company X) was bought by another oil company (Company Y), which merged their research departments. Company X’s research group moved to the research campus of Company Y. Following the move, each Company X researcher was asked to give a talk about his recent work. My friend wrote an abstract for his talk. The seminar coordinator — from Company Y — came into my friend’s office with his abstract and said to him, “Could you deemphasize the parts involving real data? We don’t deal with real data here.”
This was true. The Company Y researchers included many theorists, heavily into abstruse mathematical models. Others were coding new algorithms and relied on model “data” for testing, but not actual data. In contrast, many of Company X’s researchers, including my friend, “got their hands dirty.” After my friend’s talk, several people told him how nice it was to hear about real data.
You can see this tendency everywhere at UC Berkeley, from English to Statistics to Engineering to Psychology. Disciplines that began closely connected with reality and everyday concerns moved farther and farther away. A few days ago someone complained to me about a class where students graded each other’s papers. That’s academia, I said.
In 2000, Hal Pashler and I published a paper called “How persuasive is a good fit? A comment on theory testing.” For more than 50 years, psychologists had supported mathematical theories by showing that the equations of the theory could fit data. We pointed out that this was a mistake because no account was taken of the flexibility of the theory. A too-flexible theory can fit anything. However obvious this may sound to outsiders, the practice we criticized was common (and continues).
Recently I asked Hal: Is the problem we pointed out an example of something more general? Neither Hal nor I had a good answer to this. Both of us thought the practice we had criticized was what Feynman called cargo-cult science — looks like science but isn’t — but that was more of a derogatory description than anything else.
Now I think I have a helpful answer: What we pointed out was an example of the general point Thorstein Veblen made in The Theory of the Leisure Class: The growth of worse-than-useless practices among the well-off. Foot-binding. Hood ornaments. Long words and bad writing in scholarly articles. Conspicuous waste. The last chapter of Veblen’s book is about academia.
The Numbers Guy Wall Street Journal columnist wrote recently about reporting the average cost of weddings. He said the averages are means, not medians, they don’t include certain groups, and so on. It was one of the better numerical discussions I’ve seen in a newspaper.
However, it was about 25% of an ideal discussion. When I was a freshman in college, I went to a talk about life on other planets. The speaker wrote a bunch of numbers on the board, multiplied them together, and came up with something that was supposed to estimate the number of other planets with life. After the talk, I asked, “What’s the error in that number?” The speaker had no idea.
If the Numbers Guy gave his column as a talk, during the question period I would say: “You’ve told us what’s wrong with those numbers. Thanks. I’d also like to know what’s good about them.” His column and blog contain nothing about this.
Here’s my answer:
1. Sure, the median is more interesting than the mean. Because the distribution is obviously skewed positive (like the distribution of incomes), the mean provides an upper bound on the median. If the mean is $30,000, for example, the median must be less. That’s helpful to know.
2. Assuming the distribution of wedding costs resembles the distribution of incomes, I’d guess that the median is somewhere between half and two-thirds of the mean. So the mean is providing even more useful information.
3. The false precision of some estimates (e.g., “$27,852″) indicates the numerical savvy of their source. That too is helpful to know. I will take the rest of what they say less seriously. In a talk I attended, Richard Herrnstein, the Harvard psychologist, said a certain t value was so large that he had to use a special table to find the associated p value. This was a accurate foreshadowing of the quality of The Bell Curve, which Herrnstein co-authored.
That brings us to about half of a good discussion. The other half would come from eliminating the long discussion of sampling bias. Yes, the wedding industry loves sampling methods that overestimate the average cost. I knew that before I read the column. What I don’t know is a method that will tend to underestimate the average cost and thus provide a lower bound. That’s what I’d like to read about.
Something is better than nothing. Micronutrient requirements.
In a recent post I guessed that it would be better to begin to study the effects of omega-3 and other fats on the brain with healthy subjects than with “unhealthy” ones — that is, persons with obvious brain dysfunction. So far, almost all behavioral studies of omega-3 have used unhealthy subjects — adults with bipolar disorder or depression, children with coordination problems, autism, or ADHD. My guess was based on three things: 1. A thought experiment. Imagine trying to learn how cars work. You’d rather experiment with working cars than broken cars. 2. Healthy subjects are far more available and easier to study. 3. The work of Saul Sternberg, who pioneered the study of memory using tests on which subjects are very accurate (e.g., 95% correct). The main measure of performance on these tasks was speed (called reaction time) rather than accuracy. After his work, reaction-time experiments became far more popular. In my study of the effects of flaxseed oil, I had directly compared high- and low-accuracy tasks. I had measured the effects of flaxseed oil using two high-accuracy tasks (arithmetic and memory-scanning) and a low-accuracy task (digit span). The effects were much clearer (smaller p values) with the high-accuracy tasks.
I asked Sternberg what he thought of my guess. He wrote, “I certainly agree that it is worth studying the effects of X on “normal” brains, where X can be many things” and later added:
I suspect my decision to measure [reaction] time under conditions of high accuracy was multiply determined, and that the determinants included some speculative notions. E. g. I may have thought that the variety of strategies is greater when the system is overloaded and errors are occurring than when it is functioning smoothly, so one was more likely to get clear answers about an underlying mechanism. Also, there was something of a tradition of measuring RT in experiments on “information processing” that weren’t normally described as memory experiments, but could be. Another reason was probably that I felt that RT – a continuous measure – probably contained more “information” than errors, with a few discrete possibilities, did.
It is possible that the emphasis in memory experiments on studying accuracy when the relevant brain system is failing was influenced by the study of sensory processes, where the experimental and analytic techniques (e.g., for measuring discriminability and detectability) were well worked out, and where it is believed that the enterprise has been highly successful. Also, sensory detectability and discriminability may be more intrinsically interesting and more closely related to actual situations of practical concern than accurate performance.
In Forbes, Nassim Taleb, author of The Black Swan, made some comments I like:
Things, it turns out, are all too often discovered by accident. . . . Academics are starting to realize that a considerable component of medical discovery comes from the fringes, where people find what they are not exactly looking for. It is not just that hypertension drugs led to Viagra or that angiogenesis drugs led to the treatment of macular degeneration, but that even discoveries we claim come from research are themselves highly accidental. They are the result of undirected tinkering narrated after the fact, when it is dressed up as controlled research. The high rate of failure in scientific research should be sufficient to convince us of the lack of effectiveness in its design. If the success rate of directed research is very low, though, it is true that the more we search, the more likely we are to find things “by accident,” outside the original plan.
If the success rate per test is low, a good research strategy is to start with low-cost tests. Ants do this: They search with low-cost tests (single ants), exploit with high-cost tests (many ants). I don’t think the need to use different tools at different stages in the scientific process is well understood. John Tukey used the terms exploratory data analysis and confirmatory data analysis to make this point about data analysis but distinguishing exploratory and confirmatory experimental design is much less common.
I think my self-experimentation has been productive partly because it is a low-cost way of testing. All my interesting discoveries were accidents. My latest omega-3 research started with an accidental observation.
Yesterday I attended the annual convention of the American Psychological Association (APA) in San Francisco. I was also measuring (again) the time course of omega-3 effects. The exhibits hall was full of books. I picked up three introductory psychology texts to see what they said about nutrition. None of their indexes listed nutrition; apparently they said nothing about it. None of the hundreds of books I saw was about nutrition — that is, about how to nourish the brain. Yet the APA is mainly about mental health.
It’s not just APA. At Berkeley, I’ve attended dozens of talks in the Nutrition Department. I have never seen another psych professor or grad student at any of them. Nor have I seen a nutrition professor at any Psychology Department talk. Both disciplines have Annual Review series. In last seven years, there hasn’t been a single article in the Nutrition series about behavior or cognition (aside from eating) nor a single article in the Psychology series about nutrition (aside from an article about weight control).
Sometimes interdisciplinary is hard. Cognitive science has tried to unite computer scientists with linguists and philosophers and psychologists. That’s hard because computer scientists are engineers, not scientists, and philosophers are neither. But nutrition and psychology are both experimental sciences. Nutrition is an independent variable (food), psychology a dependent variable (behavior). They naturally go together, especially if you are concerned with mental health.
Now and then someone will study how Disorder X responds to Nutritional Treatment Y — how depression responds to omega-3, for example. Better than nothing, absolutely, but not the best approach. By the time something is broken it is likely to be (a) a mess and therefore hard to measure and (b) hard to put back together. If you want to learn how a car works, should you study a car that works or a car that doesn’t work? The answer isn’t obvious, at least to cognitive psychologists, because for half a century they mainly studied how memory, perception, etc., failed. In the 1960s, Saul Sternberg taught the rest of the profession a better approach — namely, study a car that works. Sternberg made popular the kinds of experiments usually done today: reaction time experiments with easy problems that subjects almost always get right. My omega-3 research has illustrated the truth of Sternberg’s general point. I found much clearer effects of flaxseed oil on easy tasks (easy arithmetic, an easy memory task) than on a difficult task (digit span). A better way to learn how food affects our brains will be to study the effect of food on healthy brains. Such experiments will be much much easier than studying people who are depressed, children with ADD, schizophrenics, autistic children, drug addicts, and so on. I’m sure that the conclusions from healthy brains will generalize to malfunctioning brains, just as all cars — working or broken — work the same way.
In a recent post I said that scientists are often much too dismissive. They are “evidence snobs,” Alex Tabarrok might say. A letter in the current issue of the American Journal of Clinical Nutrition criticizes a important example of just such dismissiveness:
In conclusion, whereas we agree that policy decisions should be evidence-based and not hasty, we do not agree that the evidence base [used to make those decisions] should be constrained to one type of study [long-term randomized controlled trials]—in particular, not to a study design that is inherently limited. Do we really want to wait perhaps decades for results of long-term RCTs, which almost certainly will not provide definitive evidence, while ignoring other relevant evidence involving shorter-term endpoints? An example is provided in the panel’s own summary statement (2). In lauding RCTs as the “gold standard for evidence-based decision making,” the panel proudly points to the fact that, even though folate was well known to decrease the risk of neural tube defects in animal studies, policy recommendations for folate supplementation to prevent neural tube defects were delayed while authorities waited some years for confirmation from RCTs. One can only wonder how many infants were born with neural tube defects while authorities waited.
“Proudly,” huh? Inclusion of that word shows how pissed the authors of the letter are — and rightly so. One author is Bruce Ames, a neighbor of mine, for whom I have great respect; another is Walter Willett, the Harvard epidemiologist. In 1998, Willett wrote a smart article challenging the popular belief that a low-fat diet is a good way to lose weight.
Here is part of the reply from the authors of the report that Ames et al. criticized:
It is important to note that our panel was not charged with asking whether vitamins and minerals play a role in human disease –a topic that occupies much of the letter by Ames et al, and for which observational evidence is indeed central — but, as a State-of-the Science Panel, was charged to reflect on the state of the available evidence for a treatment recommendation on the use of vitamins and minerals in the general population. For treatment decisions, the RCT is the established standard. No better proof of this principle can be found than in the RCTs reviewed in our report, which showed serious harm from vitamin ingestion in certain circumstances.
A less-than-reassuring answer. A commentator on my earlier post thought I should address the strongest arguments on the other side. I had trouble thinking of any. It’s hard to argue that less evidence is better. You can see that those who wrote this paragraph — some of the most prominent nutrition scientists in the country — were equally baffled.
I will revise my “common mistakes” article to mention the Ames et al. letter.