The Ketogenic Diet and Evidence Snobs

If we can believe a movie based on a true story, the doctors consulted by the family with an epileptic son in …First Do No Harm knew about the ketogenic diet but (a) didn’t tell the parents about it, (b) didn’t take it seriously, and (c) thought that irreversible brain surgery should be done before trying the diet, which was of course much safer. Moreover, these doctors had an authoritative book to back up these remarkably harmful and unfortunate attitudes. The doctors in …First, as far as I can tell, reflected (and still reflect) mainstream medical practice.

Certainly the doctors were evidence snobs — treating evidence not from a double-blind study as worthless. Why were they evidence snobs? I suppose the universal tendency toward snobbery (we love feeling superior) is one reason but that may be only part of the explanation. In the 1990s, Phillip Price, a researcher at Lawrence Berkeley Labs, and one of his colleagues were awarded a grant from the Environmental Protection Agency (EPA) to study home radon levels nationwide. They planned to look at the distribution of radon levels and make recommendations for better guidelines. After their proposal was approved, some higher-ups at EPA took a look at it and realized that the proposed research would almost surely imply that the current EPA radon guidelines could be improved. To prevent such criticism, the grant was canceled. Price was told by an EPA administrator that this was the reason for the cancellation.

This has nothing to do with evidence snobbery. But I’m afraid it may have a lot to do with how the doctors in … First Do No Harm viewed the ketogenic diet. If the ketogenic diet worked, it called into question their past, present, and future practices — namely, (a) prescribing powerful drugs with terrible side effects and (b) performing damaging and irreversible brain surgery of uncertain benefit. If something as benign as the ketogenic diet worked some of the time, you’d want to try it before doing anything else. This hadn’t happened: The diet hadn’t been tried first, it had been ignored. Rather than allow evidence of the diet’s value to be gathered, which would open them up to considerable criticism, the doctors did their best to keep the parents from trying it. Much like canceling the radon grant.

The ketogenic diet.

The Scientific Method, Half-Finished but Wholly-Accepted

In a science classroom at a middle school I saw a poster about “the scientific method.” There were seven steps; one was “analyze your data.” According to the poster, you use the data you’ve collected to say if your hypothesis was right or wrong. Nothing was said about using data to generate new hypotheses. Yet coming up with ideas worth testing is just as important as testing them.

It’s like teaching the alphabet and omitting half of the letters. Or teaching French and omitting half the common words. While no one actually teaches only half the alphabet or only half of common French words, this is how science is actually taught. Not just in middle school, everywhere. The poster correctly reflects the usual understanding. I have seen dozens of books about scientific method. They usually say almost nothing about how to come up with a new idea worth testing. An example is Statistics For Experimenters, a well-respected book by Box, Hunter, and Hunter. One of the authors (George Box) is a famous statistician.

The curious part of this omission is how unnecessary it is. Every scientific idea we now take for granted started somewhere. It would be no great effort to find where a bunch of them came from.

Before There Was News, There Was Gossip

Did the professionalization of science — people could make a living doing science — cause harm because although more science was done scientists — the professional ones — were no longer free to pursue the truth in any direction? Because their jobs and status were at stake? It’s plausible. Recall that Mendel and Darwin were amateurs. A more recent example is Alister Hardy, the Oxford professor who conceived the aquatic ape theory of evolution. He didn’t pursue it because he feared loss of reputation. The more sophisticated conclusion, I suppose, isn’t that professionalization was bad but that loss of diversity was bad. We need both amateur and professional scientists because each can do stuff the other can’t. Right now we only have professional ones. No one encourages amateur science; there is no way they can publish their work. (Unless, like Elaine Morgan, who wrote several books about the aquatic ape theory, you’re a professional writer.)

These thoughts were prompted by this remarkable blog post, which has nothing to do with science. What an amazing piece of writing, I thought. I don’t even agree with it, and here I am staring at it. A work of genius? No, lots of blog posts are really good. This one was merely better than most. Would something this brazen and effective appear in any major magazine, newspaper, TV show, radio ad, etc.? No, not even. Do we realize that, all these years, stuff like this has been missing from our media consumption? No, we don’t. Before there was news, there was gossip, I realized; news (such as newspapers) was a kind of professionalization of gossip. The blog post I admired was a bit of riveting creative gossip. Blogs are just new-fangled gossip. Bloggers are endlessly scandalized, indignant, judgmental, just as gossips are. Just as gossip is usually “passed on,” most blog posts have links and many posts consist almost entirely of “passing on” something. Just as gossip can be anything, bloggers can say what they really think, as Tyler Cowen pointed out. That’s why they’re so successful, so easy to write and read. Gossip is good for our mental ecology, just as science is. Mark Liberman’s Language Log blog is a blend of (good) gossip and science; as you can see from my interview with him, it filled a gap. I hope blogs will provide a kind of support structure on which amateur science can grow.

Tools Not Rules

I am fascinated by how human nature interferes with science. This article in the Wall Street Journal helped me understand one way this happens.

A civility campaign in Howard County, Maryland, centered on a book called Choosing Civility: The Twenty-five Rules of Considerate Conduct (2002) by P. M. Forni, a John Hopkins professor of romance languages. Rule 7, for example, is “don’t speak ill.” The book bothered Heather Kirk-Davidoff, a pastor. She visited Professor Forni. “Jesus didn’t say, ‘I am the rule,’ right?” she told him. Professor Forni agreed. “Yes, Jesus said, ‘I am the way.’ If I had met you before, probably I would have used way. The 25 Ways of Being Considerate and Kind,” he said.

Hmm. The way versus the rule: similar. The way versus a way: big difference. Neither the professor nor the pastor noted that a better title would omit the: 25 Ways of Being…

The writer of a book about civility — in that very book — fails to grasp a big point about civility. The pastor who points out the problem makes a similar omission. Our tendency to turn tools into rules must be strong.
If you invent a useful tool, you have made the world a better place. If you denigrate non-users, the improvement is less obvious. Randomization, for example, is a tool. Many scientists treat it like a rule. Were I to write a book on scientific method, it would contain a paragraph beginning: “A few years ago, the head librarian of the Howard County, Maryland, county library bought 2300 copies of a book called . . .”
Twisted skepticism.

Buried Treasure (part 2)

Before the invention of statistical tests, such as the t test, science moved forward. People gathered data, computed averages, drew reasonable conclusions. As far as I can tell, modern ways of analyzing data improved the linkage between data and conclusion because they reduced a big source of noise: How the data were analyzed. Procedures became standardized. Hypothesis testing improved. Hypothesis formation, however, did not improve. Knowing how to do a t test and the philosophy behind it will not help you come up with new ideas. Yet data can be used to generate new ideas, not just test the ones you already have.

Our understanding of outliers is in a kind of pre-t-test era. People use them in an unstructured way. As Howard Wainer’s analysis of his blood sugar data indicates, better use of them will improve hypothesis formation. A kind of standardized treatment should help generate ideas, just as the t test and related ideas helped test ideas. Here are some questions I think can be answered:

1. Cause. What causes outliers? It’s a step forward to realize that outliers are often caused by other outliers. Howard has found that unusually high blood sugar readings are caused by eating unusual (for him) foods.

2. Inference. I’m fond of saying lightning doesn’t strike twice in one place for different reasons. The longer version is if two outliers could have the same explanation, they probably do. I think this principle can be improved.

3. Methodology. To test ideas, you want variation to be low. To generate ideas, you want outlier rate to be high. Howard could make progress in understanding what controls his blood sugar by deliberately testing foods that might produce outliers. In genetics, x-rays and chemical mutagens have been used to increase mutation rates; mutations are outliers. (Discovery of a white-eyed mutant fruit fly led to a wealth of new genetic ideas.) In physics, particle accelerators increase the outlier rate in order to discover new subatomic particles. There are no comparable procedures for psychology. Self-experimentation increased my rate of new ideas because it increased my outlier detection rate. It increased that rate for three reasons: 1. I kept numerical records. 2. I analyzed my data using the same methods as Howard. 3. I did experiments. Travel is like experimentation; there too it helps to keep numerical records and analyze them. The question: What are the basic principles for increasing outlier rate?
Part 1.

Buried Treasure (part 1)

Not long ago, Howard Wainer, the statistician I mentioned recently, learned that his blood sugar was too high. His doctor told him to lose weight or risk losing his sight. He quickly lost about 50 pounds, which put him below 200 pounds. He also started making frequent measurements of his blood sugar, on the order of 6 times per day, with the goal of keeping it low.

It was obvious to him that the conventional (meter-supplied) analysis of these measurements could be improved. The conventional analysis emphasized means. You could get the mean of your last n (20?) readings, for example. That told you how well you were doing, but didn’t help you do better.

Howard, who had written a book about graphical discovery, made a graph: blood sugar versus time. It showed that his measurements could be divided into three parts:

measurement = average + usual variation + outlier (= unusual variation)

Of greatest interest to Howard were the outliers. Most were high. They always happened shortly after he ate unusual food. Before a reading of 170, for example, he had eaten a pretzel. He had not realized a pretzel could do this. He stopped eating pretzels.

When Howard told me this, it was like a door had opened a tiny crack. Recently a deep-sea treasure-hunting company found a shipwreck off the coast of Spain. They named it Black Swan, apparently a reference to Nassim Taleb’s book. Shipwrecks are black swans on the ocean floor; black-swan weather had sunk the ship. For Howard, outliers were another kind of buried treasure: the key to saving his sight.

It isn’t just Howard. Outliers are buried treasure in all science. They are a source of new ideas, especially the new ideas that lead to whole new theories. The Shangri-La Diet derived from an outlier: Unusually low hunger in Paris. My self-experimentation about faces and mood started with an outlier: One morning I felt remarkably good. My discovery that standing improved my sleep started with a series of days when I slept unusually well.

Modern statistics began a hundred years ago with the t test and the analysis of variance and p values — very useful tools. Almost all scientists use them or their descendants. Almost all statistics professors devote themselves to improvements along these lines. However, conventional statistical methods, the t test and so on, deal only with usual variance. (Exploratory data analysis is still unconventional.) As Taleb has emphasized, outliers remain not studied, not understood, and, especially, not exploited.

Stoplights, Experimental Design, Evidence-Based Medicine, and the Downside of Correctness

The Freakonomics blog posted a letter from reader Jeffrey Mindich about an interesting traffic experiment in Taiwan. Timers were installed alongside red and green traffic lights:

At 187 intersections which had the timers installed, those that counted down the remaining time on green lights saw a doubling in the number of reported accidents . . . while those that counted down until a red light turned green saw a halving in . . . the number of reported accidents.

Great research! Unexpected results. Simple, easy-to-understand design. Large effects — to change something we care about (such as traffic accidents) by a factor of two in a new way is a great accomplishment. This reveals something important — I don’t know what — about what causes accidents. I expect it can be used to reduce accidents in other situations.

It’s another example (in addition to obstetrics) of what I was talking about in my twisted skepticism post — the downside of “correctness”. There’s no control group, no randomization (apparently), yet the results are very convincing (that adding the timers caused the changes in accidents). The evidence-based medicine movement says treatment decisions should be guided by results from controlled randomized trials, nothing less. This evidence would fail their test. Following their rules, you would say: “This is low-quality evidence. Controlled experiment needed.” The Taiwan evidence is obviously very useful — it could lead a vast worldwide decrease in traffic accidents — so there must be something wrong with their rules, which would delay or prevent taking this evidence as seriously as it deserves.

Twisted Skepticism (continued)

Writing about advances in obstetrics, Atul Gawande, like me, suggests there is a serious downside to being methodologically “correct”:

Ask most research physicians how a profession can advance, and they will talk about the model of “evidence-based medicine”—the idea that nothing ought to be introduced into practice unless it has been properly tested and proved effective by research centers, preferably through a double-blind, randomized controlled trial. But, in a 1978 ranking of medical specialties according to their use of hard evidence from randomized clinical trials, obstetrics came in last. Obstetricians did few randomized trials, and when they did they ignored the results. . . . Doctors in other fields have always looked down their masked noses on their obstetrical colleagues. Obstetricians used to have trouble attracting the top medical students to their specialty, and there seemed little science or sophistication to what they did. Yet almost nothing else in medicine has saved lives on the scale that obstetrics has. In obstetrics . . . if a strategy seemed worth trying doctors did not wait for research trials to tell them if it was all right. They just went ahead and tried it, then looked to see if results improved. Obstetrics went about improving the same way Toyota and General Electric did: on the fly, but always paying attention to the results and trying to better them. And it worked.

Is there a biological metaphor for this? A perfectly good method (say, randomized trials) is introduced into the population of medical research methods. Unfortunately for those in poor health, the new method becomes the tool of a dogmatic tendency, which uses it to reduce medical progress.

Twisted Skepticism

Scientists are fond of placing great value on what they call skepticism: Not taking things on faith. Science versus religion, is the point. In practice this means wondering about the evidence behind this or that statement, rather than believing it because an authority figure said it. A better term for this attitude would be: Value data.

A vast number of scientists have managed to convince themselves that skepticism means, or at least includes, the opposite of value data. They tell themselves that they are being “skeptical” — properly, of course — when they ignore data. They ignore it in all sorts of familiar ways. They claim “correlation does not equal causation” — and act as if the correlation is meaningless. They claim that “the plural of anecdote is not data” — apparently believing that observations not collected as part of a study are worthless. Those are the low-rent expressions of this attitude. The high-rent version is when a high-level commission delegated to decide some question ignores data that does not come from a placebo-controlled double-blind study, or something similar.

These methodological beliefs — that data above a certain threshold of rigor are valuable but data below that threshold are worthless — are based on no evidence; and the complexities and diversity of research imply it is highly unlikely that such a binary weighting is optimal. Human nature is hard to avoid, huh? Organized religions exist because they express certain aspects of human nature, including certain things we want (such as certainty); and scientists, being human, have a hard time not expressing the same desires in other ways. The scientists who condemn and ignore this or that bit of data desire a methodological certainty, a black-and-whiteness, a right-and-wrongness, that doesn’t exist.

How to be wrong.

In Science, What Matters?

And how do you learn what matters?

When I was a grad student, I read Stanislav Ulam’s memoir Adventures of a Mathematician. I was impressed by something Ulam said about John von Neumann: that he grasped the difference between the trunk of the tree of mathematics and the branches. Between core issues and lesser ones. Between what matters more and what matters less. I wanted to make similar distinctions within psychology. Nobody talked about this, however. Not even other books.

Some research will be influential, will be built upon. Some won’t. To put it bluntly, some research will matter, some won’t. I once thought of teaching a graduate course where students learn to predict how many citations an article will receive. You take a 10-year-old journal issue, for example, and try to predict how many citations each article will receive. I like to think it would have been a helpful class: The key to a successful scientific career is writing articles that are often cited. I even had a title: “What Will You Do After You Stop Imitating Your Advisor?”

When I was a grad student the short answer to “what matters?” in experimental psychology was clear enough:

1. New methods. The Skinner box, for example, was a new way to study instrumental learning. Skinner didn’t discover or create the first laboratory demonstration of instrumental learning; he simply made it easier to study.

2. New effects. New cause-and-effect linkages. For example, John Garcia discovered that if you make a rat sick after it experiences a new flavor it will avoid foods with that flavor.

My doctoral dissertation was about a new way to study animal timing.

A few months ago I had coffee with Glen Weyl, a graduate student in economics at Princeton. We discussed his doctoral research, which is about how to test theories. One of Glen’s advisors had told him about a paper by Hal Pashler and me on the subject. Hal and I argued that fitting a model to data is a poor way to test the model because there is no allowance for the model’s flexibility. The first reviewers of our paper didn’t like it. “You don’t realize how hard it is to find a model that fits,” one of them wrote.

Glen’s interest in this question began during a seminar in Italy, when he realized the speaker was more or less ignoring the problem. The speaker was comparing how well two different theories could explain the same data without taking into account their different amounts of flexibility. Glen’s thesis proposes a Bayesian framework that allows you to do this. His main example uses data of Charness and Rabin from choice experiments. (Matt Rabin is a MacArthur Fellow.) Taking flexibility into account, he reaches a different conclusion than they did.

I wondered how Glen decided this was important. (It’s a method, yes, but a highly abstract one.) I asked him. He replied:

Sadly, despite my interest in the history of economic thought, I don’t have a lot of insight about why I came upon these thoughts. But one thing: my interests are very interdisciplinary . . . My work is based on drawing connections between economics, philosophy of science, and computer science (and meta-analysis from psychology and bio-statistics). Most of my work takes this form: as you’ll see on my website, I’ve used theoretical insights from economics and computer science as well as evidence from neuroscience, psychology and biology to critique the individualist foundations of liberal rights theory; I’ve used ideas from decision theory to lay firmer foundations for goals set out by computer scientists designing algorithms; I’ve used tools from information theory to instantiate insights from psychology to help understand the design of auctions; and I’ve used computational neuroscience to model biases in economic information processing. Broad interests are hard to have, because they limit the time for learning a particular area in depth, but I prefer to read broadly and draw connections rather than to read deeply and chip away at open questions.

That was interesting. I read broadly, and so does Hal, who knows more about the philosophy of science than I do. I wrote to Glen:

The usual comment about interdisciplinary knowledge is that it’s good because you can bring ideas from one area, including solutions and methods, to solve problems in another area. . . . But maybe it’s also good because by learning about different areas you absorb a range of different value systems and this makes you less sensitive to fads (which vary from field to field), more sensitive to longer-lasting and more broadly-held values.

The more trees you know, the easier it is to see the forest.

Evaluating new product ideas.