Will Eating Half a Stick of Butter a Day Make You Smarter?

To my pleasant surprise, Mark Frauenfelder posted this call for volunteers. Will eating half a stick of butter per day or a similar amount of coconut fat improve your performance on arithmetic problems? Eri Gentry is organizing a simple trial to find out. The trial is inspired by my recent Quantified Self talk. Study details.

During the question period of my talk, I responded to a question about a trial with 100 volunteers by saying I would suggest starting with 2 volunteers. A reader has written to ask why.

What’s your reasoning behind suggesting only 2 volunteers to test the eating more butter results? You seem highly convinced earlier in the video, but if you were so convinced why not have a larger trial?

Because the trial will be harder than the people running it expect. If you’re going to make mistakes, make small ones.

This is my first rule of science: Do less. A grad student in English once told me that a little Derrida goes a long way and a lot of Derrida goes a little way. Same with data collection. A little goes a long way and a lot goes a little way. A tiny amount of data collection will teach you more than you expect. A large amount will teach you less.

My entire history of self-experimentation started with a small amount of data collection: An experiment about the effectiveness of an acne medicine. It was far more informative than I expected. My doctor was wrong, I was wrong — and it had been so easy to find out.

This may sound like I am criticizing Eri’s study. I’m not. What’s important is to do something, however flawed, that can tell you something you didn’t know. Maybe that should be the first rule, or the zeroth rule. It has the pleasant and unusual property of being easier than you might think.

Thanks to Carl Willat.

The Contribution of John Ioannidis

From an excellent Atlantic article about John Ioannidis, who has published several papers saying that medical research is far less reliable than you might think:

A different oak tree at the site provides visitors with a chance to try their own hands at extracting a prophecy. “I [bring] all the researchers who visit me here, and almost every single one of them asks the tree the same question,” Ioannidis tells me . . . “’Will my research grant be approved?'”

A good point. I’d say his main contribution, based on this article, is pointing out the low rate of repeatability of major medical findings. Until someone actually calculated that rate, it was hard to know what it was, unless you had inside experience. The rate turned out to be lower than a naive person might think. It was not lower than an insider might think, which explains lack of disagreement:

David Gorski . . . noted in his prominent medical blog that when he presented Ioannidis’s paper on [lack of repeatability of] highly cited research at a professional meeting, “not a single one of my surgical colleagues was the least bit surprised or disturbed by its findings.”

I also like the way Ioannidis has emphasized the funding pressure that researchers face, as in that story about the oak tree. Obviously it translates into pressure to get positive results, which translates into overstatement.

I also think his critique of medical research has room for improvement:

1. Black/white thinking. He talks in terms of right and wrong. (“We could solve much of the wrongness problem, Ioannidis says, if the world simply stopped expecting scientists to be right. That’s because being wrong in science is fine.”) This is misleading. There is signal in all that medical research he criticizes; it’s just not as strong a signal as the researchers claimed. In other words the research he says is “wrong” has value. He’s doing the same thing as all those meta-analyses that ignore all research that isn’t of “high quality”.

2. Nihilism (which is a type of black/white thinking). For example,

How should we choose among these dueling, high-profile nutritional findings? Ioannidis suggests a simple approach: ignore them all.

I’ve paid a lot of attention to health-related research and benefited greatly. Many of the treatments I’ve studied through self-experimentation were based on health-related research. An example is omega-3. There is plenty of research suggesting its value and this encouraged me to try it. Likewise, there is plenty of evidence supporting the value of fermented foods. That evidence and many other studies (e.g., of hormesis) paint a large consistent picture.

3. Bias isn’t the only problem, but, in this article, he talks as if it is. Bias is a relatively minor problem: you can allow for it. Other problems you can’t allow for. One is the Veblenian tendency to show off. Thus big labs are better than small ones, regardless of which would make more progress. Big studies better than small, expensive equipment better than cheap, etc. And, above all, useless is better than useful. The other is a fundamental misunderstanding about what causes disease and how to fix it. A large fraction of health research money goes to researchers who think that studying this or that biochemical pathway or genetic mechanism will make a difference — for a disease that has an environmental cause. They are simply looking in the wrong place. I think the reason is at least partly Veblenian: To study genes is more “scientific” (= high-tech = expensive) than studying environments.

Thanks to Gary Wolf.

How to Lie with Meta-Analysis

Michael Constans drew my attention to a Consumer Reports article about spinal surgery. According to the article, a popular type of spinal surgery called vertroplasty (involving cementing vertebrae together) doesn’t work and should be stopped:

Despite the popularity of the procedure, the American Academy of Orthopedic Surgeons has just released a guideline [actually a meta-analysis of available data] saying it doesn’t work, and shouldn’t be used.

They reviewed all the literature about the procedure, and found two good-quality studies (randomized controlled trials) that show vertebroplasty works no better than a fake (placebo) procedure. There were no clinically significant differences in pain or disability, they say.

No significant difference means it “doesn’t work”? At best, it’s absence of evidence. And it’s not even that because we don’t know what the rest of the research suggests. The meta-analysis might have ignored thirty studies; it doesn’t say how many studies were ignored. Nor does it say what “placebo” means. Patients are interested in pain relief. Whether the pain relief is “all in their head” or whatever hardly matters. From a patient’s view, and a clinician’s view, a better comparison is a group that gets another plausible treatment or no treatment. The meta-analysis reports one study that compared vertrebroplasty to “conservative” treatment, which makes more sense.

So how come vertebroplasty has been used so often? Other experts have recommended the treatment in the past, including the United Kingdom’s National Institute for Health and Clinical Excellence, which said most people got some pain relief from the procedure.

This suggests that ignoring of evidence and poor choice of comparison group made a difference.

In this case, it probably comes down to what you mean by significant pain relief. The US surgeons set a strict definition—they said a difference in pain relief of less than 2 points on a 10-point scale was not meaningful for patients. Smaller differences in pain relief were recorded in the studies, but, say the surgeons, they were not big enough to make a real difference.

I think patients would prefer to decide for themselves what degree of pain relief is meaningful.

The surgeons say the evidence against vertebroplasty is strong, and they don’t expect future studies to overturn their recommendations.

Haha! Two studies with inferior comparison groups that failed to find a difference — and ignoring an unknown number of studies and arbitrarily raising the bar — adds up “strong evidence against”!

Still the Guideline and Evidence Report is useful. It reviews a range of treatments, provides citations (so you can look much further), and isn’t wedded to the placebo-comparison group. It includes studies with other comparison groups; the article just doesn’t mention them.

And I completely agree with the article’s conclusion:

If you’re considering any type of surgery, ask your surgeon to show you data about how likely it is to solve your problem.

And don’t take your surgeon’s word for it that such data exists.

Science Journalism Cliches

I enjoyed this funny article about science-journalism cliches. Via Andrew Gelman. At the moment it has 643 comments. The five posts that preceded it (none of which Andrew linked to) have 19, 7, 6, 11, and 20 comments. Correlation or causation?

Last night someone asked me if it was hard to write scientific articles. I said no. As a friend said to me about her copy-editing job at The New Yorker, a trained monkey could do it. My articles are just as formulaic as everyone else’s. I hope the content isn’t formulaic, but the structure is.

Why Psychologists Don’t Imitate Economists

Justin Wolfers, an economist, via Marginal Revolution:

When I watch and speak with my friends in psychology, very little of their work is about analyzing observational data. It’s about experiments, real experiments, with very interesting interventions. So they have a different method of trying to isolate causation. I am certain that we have an enormous amount to learn from them. But I am curious why we have not been able to convince them of the importance of careful analysis of observational data.

By “careful analysis of observational data” I think Wolfers means the way economists search within observational data for comparisons in which the factor of interest is the only thing that changes (which is why he says “isolate” rather than “infer”). He’s right — it really is a methodological innovation that psychologists are unfamiliar with. It lies between ordinary survey data and experiments.

Here’s why I think this innovation has had (and will have) little effect on psychology:

1. Most psychology professors are bad at math. They still use SPSS! Which is terrible but they think R is too difficult. Economics papers are full of math. That is part of the problem. Math difficulty also means they have trouble with basic statistical ideas. When analyzing data, they’re afraid they’ll do the wrong thing. For example, most psychology professors don’t transform their data. It wasn’t in some crummy textbook so they are afraid of it. Lack of confidence about math makes them resistant to new methods of analysis. Experimental data is much easier to analyze than observational data. You don’t need to be good at math to do a good job. So they not only cling to SPSS, they cling to experimental data.

2. Psychology studies smaller entities than economics. Study of the parts often influences study of the whole; the influence rarely goes the other way. This is why, when it comes to theory, physics will always have a much bigger effect on chemistry than vice-versa, chemistry a much bigger effect on biology than vice-versa. Method is different than theory but if you aren’t reading the papers — and physicists don’t read a lot of chemistry — you won’t pick up the methods.

3. There is a long history of longitudinal research in psychology. Studying one or more groups of children year after year into adulthood. The Terman Genius project is the most famous example. I find these studies unimpressive. They haven’t found anything I would teach in an introductory psychology class. I think most psychologists would agree. This makes observational data less attractive by association.

4. Like everyone else, psychologists have been brainwashed with “correlation does not equal causation”. I have heard many psychology professors repeat this; I have never heard one say how misleading it is. To the extent they believe it, it pushes them away from observational data.

5. Psychologists rarely use observational data at all. To get them to appreciate sophisticated analysis of observational data is like getting someone who has never drunk any wine to appreciate the difference between a $20 wine and a $40 wine.

Web Alternative to Peer Review


Mixing traditional and new methods, the journal [“the prestigious Shakespeare Quarterly”] posted online four essays not yet accepted for publication, and a core group of experts . . . were invited to post their signed comments on the Web site MediaCommons, a scholarly digital network. Others could add their thoughts as well, after registering with their own names. In the end 41 people made more than 350 comments, many of which elicited responses from the authors. The revised essays were then reviewed by the quarterly’s editors, who made the final decision to include them in the printed journal, due out Sept. 17.

The NY Times article never says how many of the four posted essays were published. If all of them made the cut, then perhaps the web stuff was just for show. And if any of them didn’t make the cut, the public embarrassment would be great. Perhaps too great. I suspect that all of them made the cut and the whole thing was closer to a publicity stunt than something that you could plausibly do again and again. If the probability of acceptance given that your essay is posted is 100%, what matters is getting posted. Peer review wasn’t replaced by web review; it was replaced by behind-closed-doors review.

Another instance of academics outwitting this particular journalist:

To Mr. Cohen, the most pressing intellectual issue in the next decade is this tension between the insular, specialized world of expert scholarship and the open and free-wheeling exchange of information on the Web. “And academia,” he said, “is caught in the middle.”

Haha! Poor poor professors! Caught in the middle! I was under the impression that professors = expert scholarship. Anything to distract attention from the real change: The more education you can get from the Web, the less you need to get from professors. The more evaluation you can get from the Web (e.g., by reading someone’s blog), the less you need to get from professors. The less professors are needed, the fewer of them there will be.

Thanks to Dave Lull.

Open-Access Publication Fees at the BMJ

Open-access is why you’re reading this. Because my long self-experimentation paper was in an open-access journal, many people could easily read it. I’m sure this is why I managed to get a contract to write The Shangri-La Diet.

The BMJ is experimenting with a way to support open access: Ask for publication fees from authors with grants that include the appropriate support.

We are introducing this policy as the next step in our efforts to ensure the sustainability of open access publication of research in the BMJ, and we are doing so in the spirit of experimentation. Many research funding organisations, sponsors, and universities now provide grants that cover journals’ fees for open access publication.

Wise. While I was writing The Shangri-La Diet, I visited Alice Water’s Edible Schoolyard. I learned that it cost hundreds of thousands of dollars a year, provided by foundations. As far as I could tell, the people in charge were doing nothing to reduce the subsidy required. Yet they wanted the idea to spread.

Arithmetic and Butter

On Tuesday I gave a talk called “Arithmetic and Butter” at the Quantified Self meeting in Sunnyvale. I had about 10 slides but this one mattered most:

It shows how fast I did simple arithmetic problems (e.g., 2*0, 9-6, 7*9) before and after I started eating 1/2 stick (60 g) of butter every day. The x axis covers about a year. The butter produced a long-lasting improvement of about 30 msec.

I think the hill shape of the butter function is due to running out of omega-3 in Beijing — my several-months-old flaxseed oil had gone bad, even though it had been frozen. When I returned to Berkeley and got fresh flaxseed oil, my scores improved.

This isn’t animal fat versus no animal fat. Before I was eating lots of butter, I was eating lots of pork fat. It’s one type of animal fat versus another type. Nor is it another example of modern processing = unhealthy. Compared to pork fat, butter is recent.

Most scientists think philosophy of science is irrelevant. Yet this line of research (measuring my arithmetic speed day after day, in hopes of accidental discovery) derived from a philosophy of science, which has two parts. First, scientific progress has a power-law distribution. Each time we collect data, we sample from a power-law-like distribution. Almost all samples produce tiny progress; a very tiny fraction produce great progress. Each time you collect data, in other words, it’s like buying a lottery ticket. I realized that a short easy brain-function test allowed me to buy a large number of lottery tickets at low cost. Second, we underestimate the likelihood of extreme events. Nassim Taleb has argued this about the likelihood of extreme negative events (which presumably have a power-law distribution); I’m assuming the same thing about extreme positive events (with a power-law distribution). We undervalue these lottery tickets, in other words. Perhaps all scientists hope for accidental discoveries. I seem to be the first to use a research strategy that relies on accidental discoveries.

In the graph, note that one point (actually, two) is down at 560 msec. This suggests there’s room for improvement.

How Well Do Authors of Scientific Papers Respond to Criticism?

This BMJ research asked how well authors responded to criticism in emailed letters to the editor. A highly original subject, but the researchers, one of whom (Fiona Godlee) is the top BMJ editor, appear lost. They summarize the results but appear to have no idea what to learn from them, ending their paper with this:

Editors should ensure that authors take relevant criticism seriously and respond adequately to it.

Which was perfectly reasonable before any data was collected. So that’s not a good conclusion.

The real conclusion is this: The letters to the editor were far better than nothing because authors responded to their criticisms about half the time.

Quantity Versus Quality of Research

In this interviewer Craig Venter says that sequencing the human genome has had “close to zero” medical benefits so far. I thought this comment was even more interesting:

The human genome project was . . . supposed to be the biggest thing in the history of biological sciences. Billions in government funding for a single project — we had never seen anything like that before in biology. And then a single person comes along and beats scientists who have been working on it for years.

The government-funded people used inferior methods, said Venter:

Initially, Francis Collins and the other people on the Human Genome Project claimed that my methods would never work. When they started to realize that they were wrong, they began personal attacks against me.

The government-funded research was high in quantity (“billions”) but low in quality.

A similar story emerged from the Netflix Prize competition. Netflix had in-house researchers who had tried to do the same thing as the competitors for the prize: predict ratings. The algorithm they’d developed took two weeks to run. According to my friend David Purdy, one of the competitors for the prize managed to compute the same thing in an hour, the same sort of speed-up that Venter is talking about. The in-house research was high in quantity (it had been going on for years) but low in quality.

From my point of view, a similar story comes from my self-experimentation. Working alone, with no funding, I found several ways to improve my sleep — avoiding breakfast, standing a lot, standing on one foot, eating pork fat, etc. In contrast, professional sleep researchers have found nothing that has helped me improve my sleep. There are hundreds of sleep researchers and they’ve received hundreds of millions of dollars in funding.

Why such big differences in outcome? I think it has to do with the price of failure. When the government-funded genome researchers used inferior methods, nothing happened. They’d already gotten the grant. In contrast, Venter’s group got nothing until they succeeded. In the case of the Netflix in-house researchers, use of inferior methods cost them nothing; they still got paid. Whereas the prize competitors didn’t get paid unless they won. Use of inferior methods would cause them to lose. In the case of the sleep researchers, lack of practical results cost them nothing. They could still have a successful career. Whereas to me, without practical results I had nothing.

Thanks to Paul Sas.