Testing Treatments: Nine Questions For the Authors

From this comment (thanks, Elizabeth Molin) I learned of a British book called Testing Treatments (pdf), whose second edition has just come out. Its goal is to make readers more sophisticated consumers of medical research. To help them distinguish “good” science from “bad” science. Ben Goldacre, the Bad Science columnist, fulsomely praises it (“I genuinely, truly, cannot recommend this awesome book highly enough for its clarity, depth, and humanity”). He wrote a foreword. The main text is by Imogen Evans (medical journalist), Hazel Thornton (writer), Iain Chalmers (medical researcher), and Paul Glaziou (medical researcher, editor of Journal of Evidence-Based Medicine).

To me, as I’ve said, medical research is almost entirely bad. Almost all medical researchers accept two remarkable rules: (a) first, let them get sick and (b) no cheap remedies. These rules severely limit what is studied. In terms of useful progress, the price of these limits has been enormous: near total enfeeblement. For many years the Nobel Prize in Medicine has documented the continuing failure of medical researchers all over the world to make significant progress on all major health problems, including depression, heart disease, obesity, cancer, diabetes, stroke, and so on. It is consistent with their level of understanding that some people associated with medicine would write a book about how to do something (good science) the whole field manifestly can’t do. Testing Treatments isn’t just a fat person writing a book about how to lose weight, it’s the author failing to notice he’s fat.

In case the lesson of the Nobel Prizes isn’t clear, here are some questions for the authors:

1. Why no chapter on prevention research? To fail to discuss prevention, which should be at least half of health care, at length is like writing a book using only half the letters of the alphabet. The authors appear unaware they have done so.

2. Why are practically all common medical treatments expensive?

3. Why should some data be ignored (“clear rules are followed, describing where to look for evidence, what evidence can be included”)? The “systematic reviews” that Goldacre praises here (p. 12) may ignore 95% of available data.

4. The book says: “Patients with life-threatening conditions can be desperate to try anything, including untested ‘treatments’. But it is far better for them to consider enrolling in a suitable clinical trial in which a new treatment is being compared with the current best treatment.” Really? Perhaps an ancient treatment (to authors, untested) would be better. Why are there never clinical trials that compare current treatments (e.g., drugs) to ancient treatments? The ancient treatments, unlike the current ones, have passed the test of time. (The authors appear unaware of this test.) Why is the comparison always one relatively new treatment versus another even newer treatment?

5. Why does all the research you discuss center on reducing symptoms rather than discovering underlying causes? Isn’t the latter vastly more helpful than the former?

6. In a discussion of how to treat arthritis (pp. 170-172), why no mention of omega-3? Many people (with good reason, including this) consider omega-3 anti-inflammatory. Isn’t inflammation a major source of disease?

7. Why is there nothing about how to make your immune system work better? Why is this topic absent from the examples? The immune system is mentioned only once (“Bacterial infections, such as pneumonia, which are associated with the children’s weakened immune system, are a common cause of death [in children with AIDS]“).

8. Care to defend what you say about “ghostwriting” (where med school professors are the stated authors of papers they didn’t write)? You say ghostwriting is when “a professional writer writes text that is officially credited to someone else” (p. 124). Officially credited? Please explain. You also say “ghostwritten material appears in academic publications too – and with potentially worrying consequences” (p. 124). Potentially worrying consequences? You’re not sure?

9. Have you ever discovered a useful treatment? No such discoveries are described in “About the Authors” nor does the main text contain examples. If not, why do you think you know how? If you’re just repeating what others have said, why do you think your teachers are capable of useful discovery? The authors dedicate the book to someone “who encouraged us repeatedly to challenge authority.” Did you ever ask your teachers for evidence that evidence-based medicine is an improvement?

The sad irony of Testing Treatments is that it glorifies evidence-based medicine. According to that line of thinking, doctors should ask for evidence of effectiveness. They should not simply prescribe the conventional treatment. In a meta sense, the authors of Testing Treatments have made exactly the mistake that evidence-based medicine was supposed to fix: Failure to look at evidence. They have failed to see abundant evidence (e.g., the Nobel Prizes) that, better or not, evidence-based medicine is little use.

Above all, the authors of Testing Treatments and the architects of evidence-based medicine have failed to ask: How do new ideas begin? How can we encourage them? Healthy science is more than hypothesis testing; it includes hypothesis generation — and therefore includes methods for doing so. What are those methods? By denigrating and ignoring and telling others to ignore what they call “low-quality evidence” (e.g., case studies), the architects of evidence-based medicine have stifled the growth of new ideas. Ordinary doctors cannot do double-blind clinical trials. Yet they can gather data. They can write case reports. They can do n=1 experiments. They can do n=8 experiments (“case series”). There are millions of ordinary doctors, some very smart and creative (e.g., Jack Kruse). They are potentially a great source of new ideas about how to improve health. By denigrating what ordinary doctors can do (the evidence they can collect) — not to mention what the rest of us can do — and by failing to understand innovation, the architects of evidence-based medicine have made a bad situation (the two rules I mentioned earlier) even worse. They have further reduced the ability of the whole field to innovate, to find practical solutions to common problems.

Evidence-based medicine is religion-like in its emphasis on hierarchy (grades of evidence) and rule-following. In the design of religions, these features made sense (to the designers). You want unquestioning obedience (followers must not question leaders) and you want the focus to be on procedure (rules and rituals) rather than concrete results. Like many religions, evidence-based medicine draws lines (on this side “good”, on that side “bad”) where no lines actually exist. Such line-drawing helps religious leaders because it allows their followers to feel superior to someone (to people outside their religion). When it comes to science, however, these features make things worse. Good ideas can come from anybody, high or low in the hierarchy, on either side of any line. And every scientist comes to realize, if they didn’t already know, that you can’t do good science simply by following rules. It is harder than that. You have to pay close attention to what happens and be flexible. Evidence-based medicine is the opposite of flexible. “ There is considerable intellectual tyranny in the name of science,” said Richard Feynman.

Testing Treatments has plenty of stories. Here I agree with the authors — good stories. It’s the rest of the book that shows their misunderstanding. I would replace the book’s many pages of advice and sermonizing with a few simple words: Ask your doctor for the evidence behind their treatment recommendation. He or she may not want to tell you. Insist. Don’t settle for vague banalities (“It’s good to catch these things early”). Don’t worry about being “difficult”. You won’t find this advice anywhere in Testing Treatments. If I wanted to help patients, I would find out what happens when it is followed.

More Two of the authors respond in the comments. And I comment on their response.

Brain Surprise! Why Did I Do So Well?

For the last four years or so I have daily measured how well my brain is working by means of balance measurements and mental tests. For three years I have used a test of simple arithmetic (e.g, 7 * 8, 2 + 5). I try to answer as fast as possible. I take faster answers to indicate a better-functioning brain.

Yesterday my score was much better than usual. This shows what happened.

My usual average is about 550 msec or more; my score yesterday was 525 msec. An unexplained improvement of 25 msec.

What caused the improvement? I came up with a list of ways that yesterday was much different than usual, that is, was an outlier in other ways. These are possible causes. From more to less plausible:

1. I had 33 g extra flaxseed last night. (By mistake. I’m not sure about this.)

2. The test came at the perfect time after I had my afternoon yogurt with 33 g flaxseed. When I took flaxseed oil (now I eat ground flaxseed), it was clear that there was a short-term improvement for a few hours.

3. Many afternoons I eat 33 g ground flaxseed with yogurt. Yesterday I ground the afternoon flaxseed an unusually long time, making made the omega-3 more digestible.

4. I did kettlebells swings and a kettlebell walk about 2 hours before the test. These exercises are not new but usually I do them on different days. Yesterday was the first time I’ve done them on the same day. I’m sure ordinary walking improves performance for perhaps 30 minutes after I stop walking.

5. I had duck and miso soup a half-hour before the test. Almost never eat this.

6. I had a fermented egg (“thousand-year-old egg”) at noon. I rarely eat them.

7. I had peanuts with my yogurt and ground flaxseed. Peanuts alone seem to have no effect. Perhaps something in the peanuts improves digestion of the omega-3 in the flaxseed.

8. I started watching faces at 7 am that morning instead of 6:30 am or earlier.

Here are eight ideas to test. Perhaps one or two will turn out to be important. Perhaps none will.

After I made this list, I read student papers. The assignment was to comment on a research article. One of the articles was about the effect of holding a warm versus cold coffee cup. Holding a warm coffee cup makes you act “warmer,” said the article. Commenting on this, a student said she thought it was ridiculous until she remembered going to the barber. She sees the person who washes her hair (in warm water) as friendly, the barber as cold. Maybe this is due to the warm water used to wash her hair, she noted. This made me realize another unusual feature of yesterday: I had washed my hair in warm water longer than usual. I think I did it at least 30 minutes before the arithmetic test but I’m not sure. In any case, here is another idea to test. I found earlier that cold showers slowed down my arithmetic speed.

This illustrates a big advantage of personal science (science done for personal gain) over professional science (science done because it’s your job): The random variation in my life may suggest plausible new ideas. As far as I can tell, professional scientists have learned almost nothing about practical ways to make your brain work better. You can find many lists of “brain food” on the internet. Inevitably the evidence is weak. I’d be surprised if any of them helped more than a tiny amount (in my test, a few msec). The real brain foods, in my experience, are butter and omega-3. Perhaps my tests will merely confirm the value of omega-3 (Explanations 1-3). But perhaps not (Explanations 4-8 and head heating).

Nobel Prize Report Card: Economics

The Nobel Prizes awarded each year resemble a kind of report card where each prize-worthy discipline (Physics, Chemistry, etc.) gets a grade that depends on the prize-winning research. If the prize-winning research is useful and surprising, the grade is high. If not the grade is low. More generally, at least to me, the intellectual history of the prize winners sheds light on the whole profession. Perhaps some biologists were unaware of the behavior of Eric Kandel described in Explorers of the Black Box when he was awarded the biology prize. Kandel, I hasten to add, is an unusual case.

Thomas Sargent is one of the winners of this year’s Economics prize. In 2007, he gave a graduation speech at Berkeley to economics majors (via Marginal Revolution). In the speech, Sargent called economics “organized common sense”. He went on to list 12 common-sense ideas, such as “Individuals and communities face trade-offs” and “governments and voters respond to incentives” that economists believe. The reasons for their belief weren’t stated.

When I started as a professor (at Berkeley) I did many experiments with rats and, to my annoyance, discovered an inconvenient truth: I understood rats less well than I thought. Even in a heavily-controlled heavily-studied situation (Skinner box), my rats often did not do what I expected. My common sense was often wrong, in other words. This experience made me considerably more skeptical of other people’s “common sense”.

To me, and I think to most scientists, science begins with common sense. Experimental psychology certainly does. I used common sense to design my experiments. Had I not done those experiments, I would not have learned that my common sense was wrong. So relying on common sense was helpful — as a place to start. As a way to begin to understand. You begin with common-sense ideas and you test them. That common sense is often wrong is a theme of Freakonomics, in agreement with my experience. Yet Sargent seemed content (he called economics “our beautiful subject”) to end with common sense, perhaps tidied up.

This is really unfortunate because economics, beautiful or not, is so important. If you ignore data, the answer to every hard question is the same: the most powerful people are right. That way lies stagnation (problems build up unsolved because powerful people prefer the status quo) and collapse (when the problems become overwhelming). Alan Greenspan’s faith-based belief in free markets and the 2008 financial crisis — after Sargent’s speech — is an example. In 2009, Sargent’s speech might have been less well-received.

 

Acupuncture Critic Misses Big Points

Recently the Guardian ran an article by David Colquhoun, a professor of pharmacology at University College London, complaining about peer review. His complaints were innocuous; what was interesting was his example. How bad is peer review? he said . Look what gets published! He pointed to a study of the efficacy of acupuncture and included graphs of the results. “It’s obvious at a glance that acupuncture has at best a tiny and erratic effect on any of the outcomes that were measured,” he wrote.

Except it wasn’t. There were four graphs. Each had two lines — one labelled “acupuncture,” the other labelled “control”. You might think to assess the effect of acupuncture you compare the two lines. That wasn’t true. The labels were misleading. The “acupuncture” group got acupuncture early in the experiment; the “control” group got acupuncture late in the experiment. Better names would have been early treatment and late treatment. You could not allow for this “at a glance”. It was too complicated. With this design, if acupuncture were effective the difference between the two lines should be “erratic”.

The paper’s data analysis is poor. To judge the efficacy of acupuncture, their main comparison used only the data from the first 26 weeks. They could have used data from all 52 weeks. That is, they ignored half of their data when trying to answer their main question. Colquhoun could have criticized that, but he didn’t.

Colquhoun’s criticism was so harsh and shallow, apparently he is biased against acupuncture. But there are two big things few pharmacology professors appear to know. One is how to stimulate the immune system. This should be central in pharmacology, but it isn’t. Half of why I think fermented foods are so important is that I think they stimulate the immune system. (The other half is they improve digestion.) There are plenty of less common ways to do this. The phenomenon of hormesis suggests that small doses of all sorts of poisons, including radiation, stimulate repair systems. The evidence behind the hygiene hypothesis suggests that dirt improves the immune systems of children. Bee stings have been used to treat arthritis. And so on. In this context, sticking needles into someone, which puts a small amount of bacteria into their blood, is not absurd. Acupuncture also allowed patients to share their symptoms, the value of which Jon Cousins has emphasized.

The other big thing Colquhoun doesn’t seem to know is the absurdity of the chemical imbalance theory of depression. Speaking of ridiculous, that’s ridiculous. Which plays a larger role in modern medicine — antidepressants or acupuncture? If you want criticize peer review, criticize the chemical imbalance theory. It is as if peer reviewers have been saying, yes, the earth really is flat for fifty years. Perhaps this is ending. During a talk that Robert Whitaker gave at the Massachusetts General Hospital in January, he was told by doctors there that the chemical-imbalance theory was an “outdated model”.

Thanks to Dave Lull and Gary Wolf.

 

Poor Replication Rate in Psychiatric Genetics Research

With the ability to measure individual genes has come interest in learning what they do. Perhaps Person X is depressed and Person Y is not depressed because Person X’s genes differ from Person Y’s. A whole generation of psychiatry researchers now believes this is plausible. There are “general reasons to expect that GxEs [gene by environment interactions] are common,” says a new review paper in the American Journal of Psychiatry. By “common” they mean large enough and common enough to do research about.

I don’t agree with this conclusion. Sure, twin studies show that genes matter for psychiatric diagnoses. Identical twins are more likely to be concordant (= have the same diagnosis) than fraternal twins, for example. But this is a very long way from indicating that single genes matter. Twins results are entirely consistent with the possibility that a large number of genes each matter a little. If this is true — and I find it far more plausible, when it comes to psychiatry, than the single-gene idea — then searching for one gene that does this or that is a waste of time. Individual genes are too weak. To do psychiatric gene research you have to dismiss or ignore the many-tiny-effects possibility, because if true it would mean what you are doing is bound to fail. The new review paper I mentioned ignores it.

The new review paper surveys all of the research papers about GxEs during the first decade of research (2000-2009) in this area — about 100 papers. It asks (a) if initial findings have been repeatable and (b) how much we should trust the repetition attempts. To answer the first question, they found that only a third (10 of 37) of initial findings were repeated when tested a second time. If things were working well, all of the initial findings would have been repeatable. The low replication rate doesn’t mean that two-thirds of the initial findings were false. Perhaps the replication attempts were poorly done and all of the initial findings would have held up if they were better done (e.g., larger samples). Or perhaps the replication attempts were biased toward positive results and none of the initial findings would have held up if they were better done.

The review paper also found that positive replication attempts had much smaller samples (median sample size about 150) than negative replication attempts (median sample size about 380). This suggests that the negative replication attempts are more trustworthy than the positive ones. The true replication rate is probably lower than one-third.

The findings, in other words, support my initial belief that the whole field is a waste of time. Amusingly, the authors of the review (one at Harvard, the other at the University of Colorado) conclude the opposite. Here’s what they say:

This review should not be taken as a call for skepticism about the G×E field in psychiatry. . . . True progress in understanding G×Es in psychiatry requires investigators, reviewers, and editors to agree on standards that will increase certainty in reported results. By doing so, the second decade of G×E research in psychiatry can live up to the promises made by the first.

Of course their findings support skepticism about GxE research. This isn’t slanting your conclusions to be more convenient, this is bending them backwards. And failure to mention the many-tiny-effects possibility, a plausible explanation for all the results they describe, is another sign that this area of research is not to be trusted.

The Continued Existence of Acne Reveals the Perverse Incentives of Modern Medicine

Yesterday I wrote how Alexandra Carmichael’s headache story illustrated a large and awful truth about modern healthcare: It happily provides expensive relief of symptoms while ignoring investigation of underlying causes. If we understood underlying causes (e.g., causes of migraines), prevention would be easy. Let people get sick so that we can make money from them. There should be a name for this scam. In law enforcement, it’s called entrapment.

Sensible prevention research would start small. Not by trying to prevent breast cancer, or heart disease, or something like that: They take many years to develop and therefore are hard to study. Sensible prevention research would focus on things that are easy to measure and happen soon after their causative agents. One example is migraines. Migraines happen hours after exposure. The fact that Chemical X causes migraines means it is likely that Chemical X is bad for us, even if it doesn’t cause migraines in everyone. This is the canary-in-a-coal-mine idea. Migraines are the canary.

Acne is another canary. Acne is easy to measure. Figuring out how to prevent it would be a good way to begin prevention research. To prevent acne would be to take the first steps toward preventing many more diseases. A high-school student could do ground-breaking research — research that would improve the lives of hundreds of millions of people — about how to prevent acne but somehow this never happens. In spite of this possibility, grand-prize-winning high-school science projects, from the most brilliant students in the whole country, are always about trivia.

A just-published review in The Lancet reveals once again the unfortunate perspective of medical school professors. The abstract ends with this:

New research is needed into the therapeutic comparative effectiveness and safety of the many products available, and to better understand the natural history, subtypes, and triggers of acne.

Actually, finding out what causes acne is all that’s needed.

To figure out what causes acne (and thereby how to prevent it) three things are necessary: (a) study of environmental causes, such as diet, (b) starting with n=1, and (c) willingness to test many ideas that might be wrong (because it’s far from obvious how to prevent acne). All three of these things are exactly what the current healthcare research system opposes. It opposes prevention research because drug companies don’t fund it. It opposes n=1 studies because they are small and cheap, which is low-status. To do such a study would be like driving a Corolla. It opposes studies that could take indefinitely long because such studies are bad for a researcher’s career. Researchers need a steady stream of publications.

High school students, who aren’t worried about status or number of publications, could make a real contribution here. You don’t need fancy equipment to measure acne.

Thanks to Michael Constans.

Better To Do Than To Think

The most important thing I learned in graduate school — or ever — about research is: Better to do than to think. By do I mean collect data. It is better to do an experiment than to think about doing an experiment, in the sense that you will learn more from an hour spent doing (e.g., doing an experiment) than from an hour thinking about what to do. Because 99% of what goes on in university classrooms and homework assignments is much closer to thinking than doing, and because professors often say they teach “thinking” (“I teach my students how to think”) but never say they teach “doing”, you can see this goes against prevailing norms. I first came across this idea in an article by Paul Halmos about teaching mathematics. Halmos put it like this: “The best way to learn is to do.” When I put it into practice, it was soon clear he was right.

I have never heard a scientist say this. But I recently heard a story that makes the same point. A friend wrote me:

I met Kary Mullis after high school. I knew that PCR was already taught in some high schools (like mine) and was curious how he discovered it. He said that he had some ideas about how to make the reaction work and discussed them with others, who explained why it wouldn’t work. He wasn’t insightful enough to understand their explanations so he had to go to the lab and see for himself why it wouldn’t work. It turned out it worked.

An example of better to do than to think.

Better to do than to think is not exactly anti-authoritarian but it is close. I was incredibly lucky to learn it from Halmos. It isn’t obvious how else I might have learned it. It took me many years to learn Research Lesson #2: Do the smallest easiest thing. And I learned this only because of all my self-experimentation. I started doing self-experimentation because of better to do than to think.

 

Causal Reasoning in Science: Don’t Dismiss Correlations

In a paper (and blog post), Andrew Gelman writes:

As a statistician, I was trained to think of randomized experimentation as representing the gold standard of knowledge in the social sciences, and, despite having seen occasional arguments to the contrary, I still hold that view, expressed pithily by Box, Hunter, and Hunter (1978) that “To find out what happens when you change something, it is necessary to change it.”

Box, Hunter, and Hunter (1978) (a book called Statistics for Experimenters) is well-regarded by statisticians. Perhaps Box, Hunter, and Hunter, and Andrew, were/are unfamiliar with another quote (modified from Beveridge): “Everyone believes an experiment except the experimenter; no one believes a theory except the theorist.”

Box, Hunter, and Hunter were/are theorists, in the sense that they don’t do experiments (or even collect data) themselves. And their book has a massive blind spot. It contains 500 pages on how to test ideas and not one page — not one sentence — on how to come up with ideas worth testing. Which is just as important. Had they considered both goals — idea generation and idea testing — they would have written a different book. It would have said much more about graphical data analysis and simple experimental designs, and, I hope, would not have contained the flat statement (“To find out what happens …”) Andrew quotes.

“To find out what happens when you change something, it is necessary to change it.” It’s not “necessary” because belief in causality, like all belief, is graded: it can take on an infinity of values, from zero (“can’t possibly be true”) to one (“I’m completely sure”). And belief changes gradually. In my experience, significant (substantially greater than zero) belief in the statement A changes B usually starts with the observation of a correlation between A and B. For example, I began to believe that one-legged standing would make me sleep better after I slept unusually well one night and realized that the previous day I had stood on one leg (which I almost never do). That correlation made one-legged standing improves sleep more plausible, taking it from near zero to some middle value of belief (“might be true, might not be true”) Experiments in which I stood on one leg various amounts pushed my belief in the statement close to one (“sure it’s true”). In other words, my journey “to find out what happens” to my sleep when I stood on one leg began with a correlation. Not an experiment. To push belief from high (say, 0.8) to really high (say, 0.99) you do need experiments. But to push belief from low (say, 0.0001) to medium (say, 0.5), you don’t need experiments. To fail to understand how beliefs begin, as Box et al. apparently do, is to miss something really important.

Science is about increasing certainty — about learning. You can learn from any observation, as distasteful as that may be to evidence snobs. By saying that experiments are “necessary” to find out something, Box et al. said the opposite of you can learn from any observation. Among shades of gray, they drew a line and said “this side white, that side black”.

The Box et al. attitude makes a big difference in practice. It has two effects:

  1. Too-complex research designs. Just as researchers undervalue correlations, they undervalue simple experiments. They overdesign. Their experiments (or data collection efforts) cost far more and take much longer than they should. The self-experimentation I’ve learned so much from, for example, is undervalued. This is one reason I learned so much from it — because it was new.
  2. Existing evidence is undervalued, even ignored, because it doesn’t meet some standard of purity.

In my experience, both tendencies (too-complex designs, undervaluation of evidence) are very common. In the last ten years, for example, almost every proposed experiment I’ve learned about has been more complicated than I think wise.

Why did Box, Hunter, and Hunter get it so wrong? I think it gets back to the job/hobby distinction. As I said, Box et al. didn’t generate data themselves. They got it from professional researchers — mostly engineers and scientists in academia or industry. Those engineers and scientists have jobs. Their job is to do research. They need regular publications. Hypothesis testing is good for that. You do an experiment to test an idea, you publish the result. Hypothesis generation, on the other hand, is too uncertain. It’s rare. It’s like tossing a coin, hoping for heads, when the chance of heads is tiny. Ten researchers might work for ten years, tossing coins many times, and generate only one new idea. Perhaps all their work, all that coin tossing, was equally good. But only one researcher came up with the idea. Should only one researcher get credit? Should the rest get fired, for wasting ten years? You see the problem, and so do the researchers themselves. So hypothesis generation is essentially ignored by professionals because they have jobs. They don’t go to statisticians asking: How can I better generate ideas? They do ask: How can I better test ideas? So statisticians get a biased view of what matters, do biased research (ignoring idea generation), and write biased books (that don’t mention idea generation).

My self-experimentation taught me that the Box et al. view of experimentation (and of science — that it was all about hypothesis testing) was seriously incomplete. It could do so because it was like a hobby. I had no need for publications or other steady output. Over thirty years, I collected a lot of data, did a lot of fast-and-dirty experiments, noticed informative correlations (“accidental observations”) many times, and came to see the great importance of correlations in learning about causality.

 

 

 

 

 

 

 

 

Mercury Damage Revealed by Brain Test

For several years I have been doing simple daily tests to measure my brain function. I got the idea when I noticed that a few capsules of flaxseed oil improved my balance. Flaxseed oil also improved other measures of brain function, such as digit span. I wasn’t surprised I could do better; what was surprising was how easy it was. It revealed a big gap in our understanding of nutrition. I do the daily tests not only to improve brain function but also to improve the rest of my body. I think the brain is like a canary in a coal mine — especially sensitive to bad environments. Learning what environment was best for the brain would suggest what environment is best for the rest of the body. When I started taking an optimal amount of flaxseed oil, my gums turned from red (inflamed) to pink (not inflamed), supporting this assumption.

I tried six or seven mental tests and eventually settled on a test of arithmetic (how fast I could do simple problems such as 5-3). I hoped that now and then my score would change (in either direction, faster or slower) and that these changes would point to new things that control brain function. No one had/has done such a thing. I had no idea if unexpected changes would show up or, if they did, how often. I didn’t know what the score changes would look like (their size and shape) nor, of course, what would cause them. Would all of them involve diet? Would all of them make sense in terms of what we already know? (Flaxseed oil makes sense because the brain contains lots of omega-3.)

The first two surprises were these: 1. My score suddenly improved a few days after switching from Chinese flaxseed oil to American flaxseed oil. This made sense: It is easy to destroy omega-3 if flaxseed oil is kept at room temperature. 2. My score suddenly improved when I switched from pig fat to butter. This was counter-intuitive: pig fat is more paleo than butter.

Last fall, there was another surprise: My score greatly improved since the summer. I was much faster than ever before. At first I thought the improvement was due to moving to Beijing. I had moved from Berkeley to Beijing in early September. My Beijing life differed in a thousand ways from my Berkeley life. I had three ideas about which differences might matter. 1. Walnuts. Perhaps I ate more walnuts in Beijing. Walnuts are supposed to be good for brain function. 2. Heat. It was much hotter in Beijing than Berkeley. Maybe that improved brain function. 3. Vitamins. I took less vitamin supplements in Beijing. Maybe they harmed brain function.

I tested these possibilities. 1. I stopped eating walnuts. My arithmetic score did not clearly change. 2. Winter came, it got much colder. The improvement did not go away. 3. I took the same amount of vitamins I’d taken in Berkeley. My arithmetic score didn’t change. So all of these ideas were wrong.

Because they were wrong, I considered a fourth possibility: The improvement was due to removal of two mercury amalgam fillings on July 28, 2010. They were replaced with non-amalgam fillings. I’d had them removed for precautionary reasons. I wasn’t suffering from any signs of mercury poisoning. Hair tests had repeatedly shown mildly high amounts of mercury in my hair (75th percentile of an unspecified sample). Measurements of the mercury in my breath had come out higher than usual but it was hard to be sure the machine was working correctly.

I looked again at my data. It showed something I hadn’t noticed: the improvement started before I went to Beijing. It started very close to July 28. That was good evidence that the mercury explanation was correct. Now the evidence is even stronger. I’ve returned to Berkeley and thereby made my life quite similar to the situation when my scores were much higher than now. The improvement has remained.

The evidence for causality — removal of mercury amalgam fillings improved my arithmetic score — rests on three things: 1. Four other explanations made incorrect predictions. 2. The improvement, which lasted months, started within a few days of the removal. Long-term improvements (not due to practice) are rare — this is the only one I’ve noticed. 3. Mercury is known to harm neural function (“mad as a hatter”). As far as I’m concerned, that’s plenty.

A long Wikipedia article describes evidence on both sides of the question of whether mercury amalgam fillings cause damage. In 2009, the American Dental Association stated in a press release “the overwhelming weight of scientific evidence supports the safety and efficacy of dental [mercury-containing] amalgam.” As recently as 1991, Consumer Reports told readers “if a dentist wants to remove your fillings because they contain mercury, watch your wallet.” (Dental insurance will pay most of the cost of removing my remaining amalgam fillings.) In an essay last revised in 2006, Stephen “Quackwatch” Barrett explained at length why mercury toxicity is a “scam”. According to Barrett, “there is overwhelming evidence that amalgam fillings are safe.”

Ask your dentist some pointed questions.

 

 

 

 

 

 

 

 

 

Albert Einstein: Out-of-Touch Theorist

Martin Wolf relays what passes for wisdom:

Albert Einstein is reported to have said that insanity consists of doing the same thing over and over again and expecting different results.

Which, if true, shows that Einstein was a theorist.

Call me insane. Based on many years of data collection, I believe scientific progress has a power-law distribution. You sample from this distribution when you collect data. You collect data again and again — “doing the same thing over and over again”. Almost all the data you collect produces little progress; a tiny fraction produces great progress. The secret to scientific progress is doing the same thing over and over — and being wise enough to grasp that the results will vary greatly. (Nassim Taleb understands this.) In the short term, it seems like you are getting nowhere.

I learned this lesson from my sleep research. For ten years I tried various solutions to my problem of early awakening. Nothing worked. All my ideas were wrong. Eventually I got “lucky” but actually I made my own luck by persisting so long.

Once you realize the distribution of progress, you grasp that the secret of success is making the cost per sample as low as possible. Few scientists, in my experience, have figured this out. They prefer expensive experiments because larger grants signal higher status. Won’t fancy equipment tell me more? they rationalize. Grant givers, also failing to understand the basic point, are happy to oblige the status-seekers: Much easier to administer one $200,000 grant than 10 $20,000 grants. And progress slows to a crawl.

More Rita Mae Brown is a more likely source of this saying than Albert Einstein.