The Blindness of Scientists: The Problem isn’t False Positives, It’s Undetected Positives

Suppose you have a car that can only turn right. Someone says, Your car turns right too much. You might wonder why they don’t see the bigger problem (can’t turn left).

This happens in science today. People complain about how well the car turns right, failing to notice (or at least say) it can’t turn left. Just as a car should turn both right and left, scientists should be able to (a) test ideas and (b) generate ideas worth testing. Tests are expensive. To be worth the cost of testing, an idea needs a certain plausibility. In my experience, few scientists have clear ideas about how to generate ideas plausible enough to test. The topic is not covered in any statistics text I have seen — the same books that spend many pages on to how to test ideas.

Apparently not noticing the bigger problem, scientists sometimes complain that this or that finding “fails to replicate”. My former colleague Danny Kahneman is an example. He complained that priming effects were not replicating. Implicit in a complaint that Finding X fails to replicate is a complaint about testing. If you complain that X fails to replicate, you are saying that something was wrong with the tests that established X. There is a connection between replication failure and failure to generate ideas worth testing. If you cannot generate new ideas, you are forced to test old ideas. You cannot test an old idea exactly — that would be boring/repetitive. So you give an old idea a slight tweak and test the variation. For example, someone has shown that X is true in North America. You ask if X is true in South America. You hope you haven’t tweaked X too much. No idea is true everywhere, except maybe in physics, so as this process continues — it goes on for decades — the tested ideas gradually become less true and the experimental effects get weaker. This is what happened in the priming experiments that Kahneman complained about. At the core of priming — the priming effects studied 30 years ago — is a true phenomenon. After reading “doctor” it becomes easier to decide that “nurse” is a word, for example. This was followed by 30 years of drift away from word recognition. Not knowing how to generate new ideas worth testing, social psychologists have ended up studying weak effects (recent priming effects) that are random walks away from strong effects (old priming effects). The weak effects cannot bear the professional weight (people’s careers rest on them) they are asked to carry and sometimes collapse (“failure to replicate”). Sheena Iyengar, a Columbia Business School professor and social psychologist, got a major award (best dissertation) for and wrote a book about a new effect that has turned out to be very close to non-existent. Inability to generate ideas — to understand how to do so — means that what appear to be new ideas (not just variations of old ideas) are more likely to be mistakes. I have no idea whether Iyengar’s original effect was true or not. I am sure, however, that it was weak and made little sense.

Statistics textbooks ignore the problem. They say nothing about how to generate ideas worth testing. I haven’t asked statisticians about this, but they might respond in one of two ways: 1. That’s someone else’s problem. Statistics is about what to do with data after you gather it. That makes as much sense as teaching someone how to land a plane but not how to take off. 2. That’s what exploratory data analysis is for. If I said “E xploratory data analysis can only identify effects of factors that the researcher decided to vary or track. Which is expensive. What about other factors?” they’d be baffled, I believe. In my experience, exploratory data analysis = full analysis of your data. (Many people do only a small fraction, such as 10%, of all reasonable analyses of their data.) Full analysis is better than partial analysis, but calling it a way to find new ideas fails to understand that professional scientists study the same factors over and over.

I suppose many scientists feel the gap acutely. I did. I became interested in self-experimentation most of all because it generated new ideas at a much higher rate (per year) than my professional experiments with rats. I had no idea why, at first, but as it kept happening — my self-experimentation generated one new idea after another. I came to believe that by accident I was doing something “right”. I was doing something that fit a general rule of how to generate ideas, even though I didn’t know what the general rule was.

T he sciences I know about (psychology and nutrition) have great trouble coming up with new ideas. The paleo movement is a response to stagnation in the field of nutrition. The Shangri-La Diet shows what a new idea looks like in the area of weight control. The failure of nutritionists to study fermented foods is ongoing. Stagnation in psychology can be seen in the fact that antidepressants remain heavily prescribed, many years after the introduction of Prozac (my work on morning faces and mood suggests a much different approach), lack of change in treatments for bipolar disorder over the last 50 years (again, my morning-faces work suggests another approach), and in the failure of social psychologists to discover any big new effects in the last ten years.

 

Here is the secret to idea generation: Cheaper tests. To find ideas plausible enough to be worth testing with Test X, you need a way of testing ideas that is cheaper than Test X. The cheaper your test, the larger the region of cause-effect space you can explore. Let’s say Test Y is cheaper than Test X. With Test Y, you can explore more of cause-effect space than you can explore with Test X. In the region unexplored by Test X, you can find points (cause-effect relationships) that pass Test Y. They are worth testing with Test X. My self-experimentation generated new ideas worth testing with more expensive tests because it was much cheaper than existing tests. Via self-experimentation, I could test many ideas too implausible or too expensive to be tested conventionally. Even cheaper than a self-experiment was simply monitoring myself — tracking my sleep, for example. Again and again, this generated ideas worth testing via self-experimentation. I did what all scientists should do: use cheaper tests to generate ideas worth testing with more expensive tests.

More About Magic Dots

Govind M., the Stanford grad student who recommended brown noise, has good things to say about magic dots:

I have been using magic dots for about two months now and they work. I have no idea why they work — maybe it’s the reinforcement — but they do. I enjoy making them and for me, I have to finish them. I use 9 min/mark for 90 min intervals, which also provides a very easy way to track time. A four box day is enormously productive, though the fourth box typically gets torpedoed by a meeting or something.

One of the advantages of magic dots is that instead of setting down an intimidating 90-minute chunk of time, my mental horizon is shortened to the next 9 minutes. After that, the box takes over. So in situations in which (1) it is difficult to get started and (2) I want to add structure to the day, I use magic dots.

I asked, “When you are using the magic dots, do you work for longer periods of time before taking a break?” Govind said:

Yes. However, it is possible that goal gets shifted from “be focused and attentive and not goofing off on facebook” to “work long enough make 10 marks on a piece of paper.” It makes it easier to start and to continue on working.

I too find that magic dots make it easier to start work. I think this happens because the task in front of me (getting work done) seems more doable.

Dangerous Noise and “Doctors Hurt You”

I have a friend with life-altering hyperacusis, a hearing problem where ordinary sounds can cause pain. It started after she worked in a noisy workplace for three years.

“People are always told about things they should do for good health: eat right, exercise, wear sunscreen, don’t smoke,” said my friend. “But they are almost never warned about loud noise, and if they are, it’s only about hearing loss far off in the future.” Her healthcare philosophy is doctors hurt you, which she finds so self-evident that she can barely explain why she believes it.

Her husband has hyperacusis, too, even worse than hers. His came from too many rock concerts. He sought medical treatment for a disorder that even Google has barely heard of, and now takes a staggering amount of pain medicine. His philosophy, at least historically, has been doctors help you. She has done her best to keep him away from doctors, but there is no doubt that, through a combination of bad advice and bad treatment, doctors have made his health much worse. (The pain medicines do reduce pain — but much of his pain was caused by doctors.) Judging by his and her experience, doctors hurt you is more accurate.

I am writing this in the loudest Starbucks I have ever been in, in New York City. (I have been in hundreds of Starbucks.) Three employees have told me they cannot control the volume of the music. Even with my Bose noise-cancelling headphones, it is too loud. I must find somewhere else. A friend who used to work at Starbucks disputes their claim that they cannot control the volume. She says the content of the music is set by corporate but the volume is controllable at individual stores. A customer at the loud Starbucks told me he thought the employees made the music so loud to drive customers away.

Exhibit 1 in the argument that doctors hurt you is tonsillectomies, probably the most common operation ever. Your tonsils are part of your immune system — removing them makes as much sense as removing part of your brain. Tonsillectomies remained common long after it was clear that tonsils were part of the immune system. Perhaps doctors didn’t understand high school biology? Or they didn’t care? Either answer suggests that doctors should be avoided.

 

 

Deirdre McCloskey and Me

In an appreciation of Ronald Coase, I came across an article by Deirdre McCloskey, the economist. It reminded me of our back and forth emails in 2007 about her and Lynn Conway’s treatment of Michael Bailey, who had written a book they hated. I reread the emails and found them still interesting, especially McCloskey’s claim that she and Conway have/had no special power. Is there a variant of sophistry that refers to self-deception? You can read the whole correspondence, McCloskey’s version, which omits my final email, or my version (“McCloskey and Me: A Back-and-Forth”, plus plenty of context — my article starts on p. 117 of the 139 pp).

Thank god she and Conway failed to end Bailey’s career. The Man Who Would Be Queen (pdf) — about male homosexuals and cross-dressers — remains the best psychology book I have ever read. Last year I assigned my Tsinghua students to read a third of it (any third they wanted). One student said it was so good she read the whole thing.

A Little-Noticed Male/Female Difference: Pressure to Conform

In Americanah, Chimamanda Adichie’s new novel, she writes (p. 240):

Ojiugo wore orange lipstick and ripped jeans, spoke bluntly, and smoked in public, provoking vicious gossip and dislike from other girls, not because she did those things but because she dared to without having lived abroad, or having a foreign parent, those qualities that would have made them forgive her lack of conformity.

Here is another example, from a profile of Claire Danes:

She changed schools twice, “fleeing one mean girl only to find another incarnation of that same girl in the next school.” She was targeted for her looks, her nerdy curiosity, her refusal to conform.

My impression is that these examples illustrate a large male/female difference: Women will commonly criticize another woman for lack of conformity (unless somehow “earned”); men are much less likely to criticize another man this way. When women do it, it is called being catty. There is no equivalent term when men do it — presumably because no one invents a term for something that doesn’t happen.

I have never seen this mentioned in the literature on male/female differences (nor in Sheryl Sandberg’s Lean In). It isn’t easy to explain. Could it be learned? Well, in my experience girls are under more pressure to “act a certain way” than boys (Japan is an example), but I can’t explain that, either, nor can I see why that would translate to women putting pressure on other women to conform.

One reason this tendency is hard to explain is its effect on leadership. Putting pressure on other women to conform makes it harder for women to become leaders — leadership is the opposite of conformity. Making it harder for women to be leaders makes it easier for men to be leaders. It is hard to see how this particular effect (there are many others) benefits women.

Magnesium and Rectum Healing

After I posted a link to an article about magnesium deficiency (“50 studies suggest that magnesium deficiency is killing us”), a reader who wishes to be anonymous looked into it.

After reading your post about magnesium oil, I read up on it, and thought I’d try it. I didn’t notice any difference, but I have a report. In my reading, I came across stories of people who sprayed the oil on wounds.

I have a recurring minor irritation that, when it occurs, usually takes weeks to heal. Passing a large stool can cause small tears in the rectum, so small they don’t even bleed but nonetheless can be felt. If another stool, even a regular-sized one, passes before the tears heal, they are painfully re-opened, though not re-opened fully. The pain is not severe but is, frankly, a pain in the ***. In my case it usually takes weeks for the tears to completely heal.

I was a couple weeks into this cycle when my bottle of magnesium oil arrived. I had read that it promotes healing and some people spray it on wounds. So I sprayed it on my irritated area once a day for three days, and on the third day when I passed a stool there was no pain! Never before had it healed so quickly, and I’ve had this problem at least once a year for over ten years.

I’m impressed. This resembles a theory making an unlikely prediction that turns out to be true. Other examples of magnesium benefits are here and here. Maybe magnesium will improve my sleep. That should be easy to test.

Assorted Links

  • fruit and diabetes. Blueberries good, cantaloupe bad.
  • R most popular language for “analytics/data mining/data science work” among survey respondents. I wish I could describe the respondents, but I can only say they are people who might call what they do “data mining” or “data science”. In addition, the use of R is growing. Most psychology departments teach SPSS or Matlab.
  • Thomas Frank criticizes universities, undergraduate education in particular. “An educational publisher wrote to me [asking] to reprint an essay of mine [that is freely available]. . . . The low, low price that students were to pay for this textbook: $75.95.”

What is College For?

David Brooks, the New York Times columnist, tries to answer this question:

Are universities [he means undergraduate education] mostly sorting devices to separate smart and hard-working high school students from their less-able fellows so that employers can more easily identify them? Are universities factories for the dissemination of job skills? Are universities mostly boot camps for adulthood, where young people learn how to drink moderately, fornicate meaningfully and hand things in on time? My own stab at an answer would be that universities are places where young people acquire two sorts of knowledge, what the philosopher Michael Oakeshott called technical knowledge and practical knowledge.

My answer: Almost all college students want to figure out what job to choose. The answer will depend on what they do well, what they enjoy, and will have a big effect on the rest of their life. The better the answer, the more successful and happy they will be. For them, that is above all what college is for.

This doesn’t even occur to Brooks as a possibility. I suppose professors like this state of affairs (a smart person — Brooks — can’t even think of this). If no one mentions it, they are that much further from having to consider it. Trying to help students reach this goal means giving up power. The more a college helps students learn what they enjoy and what they are good at, the less professors can do exactly what they want.

There is nothing terrible about college classes. I don’t say that this or that humanities course is “useless”. The trouble is lack of balance: too many normal classes, too few “classes” that explicitly help students to learn about the world of work and how they might fit into it. Only a few colleges — often low-prestige “trade schools” — do much to help students learn about possible jobs, what they enjoy, and what they are good at.

Judging by how Berkeley courses are taught — they do little to help students decide what job to do, unless they are seriously considering being a professor — most professors have little or no interest in helping students this way. I suspect, however, they don’t know what they might gain from doing so. At Berkeley I taught a class called Psychology and the Real World whose goal was exactly that: help students find their way (a particular problem for psychology majors, few of whom go to graduate school in psychology). They could do almost anything, so long as it was off-campus. It was little work for me and the students learned a lot. I enjoyed seeing them begin to find their way. This is what I think isn’t obvious to professors: the more you help students learn what they want to learn, the easier and more satisfying it is for you.

The Irrelevance of Grass-Fed Beef (Ancestral Health Symposium 2013)

Grass-fed beef is better than ordinary (grain-fed) beef because it has a better omega-3/omega-6 ratio. I’ve heard this a thousand times. It’s true. Grass has more omega-3 than grain, which is high in omega-6. But it is misleading. For practical purposes, grass-fed and grain-fed beef are the same in terms of omega-3 and omega-6.

Peter Ballerstedt made this point in his talk at the recent Ancestral Health Symposium. He showed this slide, based on research by Susan Burkett. omega3omega6

This shows the amount of omega-3 and omega-6 in one serving of various foods. The amounts in grass- and grain-fed beef are small relative to other foods most people eat. People who have said eat grass-fed beef, such as Michael Pollan, should have been saying eat less chicken. When I started eating grass-fed instead of grain-fed beef, I noticed no differences, which agrees with this analysis.