Kahneman Criticizes Social Psychologists For Replication Difficulties

In a letter linked to by Nature, Daniel Kahneman told social psychologists that they should worry about the repeatability of what are called “social priming effects”. For example, after you see words associated with old age you walk more slowly. John Bargh of New York University is the most prominent researcher in the study of these effects. Many people first heard about them in Malcolm Gladwell’s Blink.

Kahneman wrote:

Questions have been raised about the robustness of priming results. The storm of doubts is fed by several sources, including the recent exposure of fraudulent researchers [who studied priming], general concerns with replicability that affect many disciplines, multiple reported failures to replicate salient results in the priming literature, and the growing belief in the existence of a pervasive file drawer problem [= studies with inconvenient results are not published] that undermines two methodological pillars of your field: the preference for conceptual over literal replication and the use of meta-analysis.

He went on to propose a complicated scheme by which Lab B will see if a result from Lab A can be repeated, then Lab C will see if the result from Lab B can be repeated. And so on. A non-starter, too complex and too costly. What Kahneman proposes requires substantial graduate student labor and will not help the grad students involved get a job — in fact, “wasting” their time (how they will see it) makes it harder for them to get a job. I don’t think anyone believes grad students should pay for the sins of established researchers.

I completely agree there is a problem. It isn’t just social priming research. You’ve heard the saying: “1. Fast. 2. Cheap. 3. Good. Choose 2.” When it comes to psychology research, “1.True. 2. Career. 3. Simple. Choose 2.” Overwhelmingly researchers choose 2 and 3. There isn’t anything wrong with choosing to have a career (= publish papers) so I put a lot of blame for the current state of affairs on journal policies, which put enormous pressure on researchers to choose “3. Simple”. Hardly any journals in psychology publish (a) negative results, (b) exact replications, and (c) complex sets of results (e.g., where Study 1 finds X and apparently identical Study 2 does not find X). The percentage of psychology papers with even one of these characteristics is about 0.0%. You could look at several thousand and not find a single instance. My proposed solution to the problem pointed out by Kahneman is new journal policies: 1. Publish negative results. 2. Publish (and encourage) exact replications. 3. Publish (and encourage) complexity.

Such papers exist. I previously blogged about a paper that emphasized the complexity of findings in “choice overload” research — the finding that too many choices can have bad effects. Basically it concluded the original result was wrong (“mean effect size of virtually zero”), except perhaps in special circumstances. Unless you read this blog — and have a good memory — you are unlikely to have heard of the revisionist paper. Yet I suspect almost everyone reading this has heard of the original result. A friend of mine, who has a Ph.D. in psychology from Stanford, told me he considered Sheena Iyengar, the researcher most associated with the original result, the greatest psychologist of his generation. Iyengar wrote a book (“The Art of Choosing”) about the result. I found nothing in it about the complexities and lack of repeatability.

Why is personal science important? Because personal scientists — people doing science to help themselves, e.g., sleep better — ignore 2. Career and 3. Simple.

Assorted Links

Thanks to Adam Clemens.

Why Self-Track? The Possibility of Hard-to-Explain Change

My personal science introduced me to a research method I have never seen used in research articles or described in discussions of scientific method. It might be called wait and see. You measure something repeatedly, day after day, with the hope that at some point it will change dramatically and you will be able to determine why. In other words: 1. Measure something repeatedly, day after day. 2. When you notice an outlier, test possible explanations. In most science, random (= unplanned) variation is bad. In an experiment, for example, it makes the effects of the treatment harder to see. Here it is good.

Here are examples where wait and see paid off for me:

1. Acne and benzoyl peroxide. When I was a graduate student, I started counting the number of pimples on my face every morning. One day the count improved. It was two days after I started using benzoyl peroxide more regularly. Until then, I did not think benzoyl peroxide worked well — I started using it more regularly because I had run out of tetracycline (which turned out not to work).

2. Sleep and breakfast. I changed my breakfast from oatmeal to fruit because a student told me he had lost weight eating foods with high water content (such as fruit). I did not lose weight but my sleep suddenly got worse. I started waking up early every morning instead of half the time. From this I figured out that any breakfast, if eaten early, disturbed my sleep.

3. Sleep and standing (twice). I started to stand a lot to see if it would cause weight loss. It didn’t, but I started to sleep better. Later, I discovered by accident that standing on one leg to exhaustion made me sleep better.

4. Brain function and butter. For years I measured how fast I did arithmetic. One day I was a lot faster than usual. It turned out to be due to butter.

5. Brain function and dental amalgam. My brain function, measured by an arithmetic test, improved over several months. I eventually decided that removal of two mercury-containing fillings was the likely cause.

6. Blood sugar and walking. My fasting blood sugar used to be higher than I would like — in the 90s. (Optimal is low 80s.) Even worse, it seemed to be increasing. (Above 100 is “pre-diabetic.”) One day I discovered it was much lower than expected (in the 80s). The previous day I had walked for an hour, which was unusual. I determined it was indeed cause and effect. If I walked an hour per day, my fasting blood sugar was much better.

This method and examples emphasize the point that different scientific methods are good at different things and we need all of them (in contrast to evidence-based medicine advocates who say some types of evidence are “better” than other types — implying one-dimensional evaluation). One thing we want to do is test cause-effect ideas (X causes Y). This method doesn’t do that at all. Experiments do that well, surveys are better than nothing. Another thing we want to do is assess the generality of our cause-effect ideas. This method doesn’t do that at all. Surveys do that well (it is much easier to survey a wide range of people than do an experiment with a wide range of people), multi-person experiments are better than nothing. A third thing we want to do is come up with cause-effect ideas worth testing. Most experiments are a poor way to do this, surveys are better than nothing. This method is especially good for that.

The possibility of such discoveries is a good reason to self-track. Professional scientists almost never use this method. But you can.

B. F. Skinner: Brilliant Engineer, Brilliant Self-Promoter, Mediocre Scientist

I majored in psychology at Reed College. At the time, the whole major centered on Skinnerian psychology — the importance of reward in controlling behavior. The introductory course used a Skinnerian textbook (e.g., we learned the correct meaning of “negative reinforcement” — it does not mean punishment). Other courses also had a Skinnerian emphasis. They never convinced me. I always thought it was an exceedingly narrow way to study behavior.

When I was a graduate student, I visited Harvard and heard Skinner give a talk, titled “Why I am not a cognitive psychologist”. During the question period I asked if he was familiar with the work of Saul Sternberg — perhaps the most influential cognitive psychologist. No, said Skinner. I thought it was foolish to criticize an area of research you know little about.

After I became a professor, I went back to Reed to give a talk. After the talk, I went out to dinner with several psychology professors. I told them I thought Skinner was a brilliant engineer — the Skinner box is really useful — but a mediocre scientist. He was unable to discover anything, he just repeated the same result (rewarding something increases how often it is done) countless times. They had no reply.

In the last two days, strangely enough, Skinner has come up in two different conversations. In the first, a friend said that Skinner’s views about language were ridiculous. I agreed. Why write such nonsense? my friend asked/complained. I said maybe Skinner’s productivity system worked too well. It caused him to write when he had nothing to say. In the second, a different friend brought up David Freedman’s recent Atlantic article called “The Perfected Self”, which argues that Skinnerian techniques really work when you implement them as smartphone apps — techniques to lose weight, for example. “B. F. Skinner’s notorious theory of behavior modification was denounced by critics 50 years ago as a fascist, manipulative vehicle for government control,” writes Freedman (or an editor), but actually that theory is really good.

My area of academic psychology (animal learning) is the same as Skinner’s. Within this field, I have never heard anyone complain that Skinner’s work was “fascist” or “manipulative” or a “vehicle for government control.” It never became popular — it was always a minority point of view — probably because it was boring (the same thing over and over) and perhaps because it was anti-intellectual. Skinner wrote a well-known paper about why theories are unnecessary. He didn’t understand the role of theories in science and didn’t bother to find out. Sure, the psychology theories of the time (1950) were awful. Psychology theories are still mostly awful. But there are plenty of good theories in other areas of science.

For a long time, Skinnerian ideas, nearly dead in academia, lived on in the treatment of autism. The people applying these ideas called themselves “behavior analysts” and the whole field of applied Skinnerian psychology was called “behavior analysis”. What caused this persistence was that the techniques worked. Using the techniques (carefully rewarding this or that behavior) improved the lives of autistic children and their parents. Which was a real contribution. I could make a long list of famous psychologists who have done less to improve human well-being.

The success of Skinnerian ideas in improving the lives of autistic children should not be confused with figuring out what causes autism. To figure out the cause of autism is to figure out the environmental cause(s) — to which people with certain genes are more sensitive — and how autism can be avoided entirely, not meliorated. I have blogged about possible causes of autism many times, in particular the possibility that sonograms cause autism. I have no idea if behavior analysts understand the difference between melioration and figuring out the cause. Maybe Skinner would claim there is no difference — he was full of bizarre statements like that. If your child is autistic, you are in crisis. You have zero interest in questions about “cause” — you simply want help. In any form. Behavior analysts, while helping autistic children and their parents, contribute nothing that helps us find the cause of autism. Which, if you are planning on having children, you care about enormously. So you can avoid having autistic children.

So Skinner’s legacy is mixed. The Skinner box is terrific. I happily used them in my research for years, even though I hardly believed a single word Skinner said. As an engineer — an applier of stuff discovered by others — Skinner made a lasting contribution. As a self-promoter, he was incredibly successful — he was on the cover of Time, for example. As a scientist, he was a zero. He discovered nothing that matters. As a thinker (e.g., the book Beyond Freedom and Dignity) he was less than zero. He was a charlatan, claiming over and over that he understood puzzling things (e.g., language) that he did not understand. An unusual mix. Few great engineers are charlatans.

Prize Fight: The Race and Rivalry to be First in Science by Morton Meyers

Prize Fight: The Race and Rivalry to be First in Science (2012) by Morton Meyers (copy sent me by publisher) is about battles/disagreements over credit, often within a lab. Jocelyn Bell noticed the first quasar — how much credit does she deserve relative to her advisor, Anthony Hewish, who built the structure within which she worked? (Not much, said Bell. “I believe it would demean Nobel Prizes if they were given to research students.”) The structure and subtitle of the book make little sense — there is a chapter about how science resembles art and a chapter about data fabrication, for instance, and nothing about races or being first. The core of the book is two stories about credit: for the discovery of streptomycin, the first drug effective against tuberculosis, and for the invention of MRI (magnetic resonance imaging). Meyers is a radiology professor and a colleague of one of the inventors of MRI.

I liked both stories. I find it hard to learn anything unless there is emotion involved. Both stories are emotional — people got angry — which made it easy to learn the science. Streptomycin was found by screening dirt. It was already known that dirt kills microbes. The graduate student who made the discovery was indeed a cog in a machine but later he was mistreated and got angry and sued. The first MRI-like machine was built by a doctor named Raymond Damadian, who was not one of the recipients of the Nobel Prize given out for its invention. He had good cause to be furious. The otherwise good science writer Horace Freeland Judson wrote an op-ed piece about it (“No Nobel Prize for Whining”) that ended “His behavior stands in stark and elegant contrast to the noisy complaining of Raymond Damadian”. To name-call (“whining”, “noisy”) in a New York Times op-ed is to suggest your case is weak.

I have had a related experience. When I was a graduate student, at Brown University, I did experiments about cross-model use of an internal clock. Do rats use the same clock to measure the duration of sound and the duration of light? (Yes.) I got the idea from human experiments about cross-modal transfer. By the time my paper (“Cross-Modal Use of an Internal Clock”) appeared, I was an assistant professor. A few months after it was published, I went back to Brown to visit my advisor, Russell Church. On the day of my visit, he had just received a new issue of the main journal in our field (Journal of Experimental Psychology: Animal Behavior Processes — where my article appeared). It was in a brown wrapper on his desk. I opened it. The second article was “Abstraction of Temporal Attributes” by Warren Meck and Russell Church. (Meck was a graduate student with Church.) I didn’t know about it. It was based on my work. The first experiment was the same (except for trivial details) as the first experiment of my article. The introduction did not mention me. I leafed through it. Buried in the middle it said “This result replicates previous reports from our laboratory (Meck & Church, in press; Roberts, 1982).”

I was angry. Why did you do this? I asked Church. “To make it seem more important,” he said. I consoled myself by thinking how bad it looked (on Church’s record). I never visited him, and almost never spoke to him, again. Years later I was asked to speak at a conference session honoring him. I declined. What he did amounted to rich (well-established) stealing from poor (not established) and jeopardized my career. When my article appeared, I didn’t have tenure. It was far from certain I would get it. I hadn’t written many papers. If you read both papers (Meck and Church, and mine), you could easily be confused: Who copied who? This confusion reduced the credit I got for my work and reduced my chance of getting tenure. Church surely knew this. Failure to get tenure could have ended my career.

 

 

Who Watches the Watchdogs? The Myths of Journalism

In a great essay, Edward Jay Epstein points out, at least by implication, that the Pulitzer Prize committee is not terribly interested in the truth of things:

A sustaining myth of journalism holds that every great government scandal is revealed through the work of enterprising reporters who by one means or another pierce the official veil of secrecy. . . This view of journalistic revelation is propagated by the press even in cases where journalists have had palpably little to do with the discovery of corruption. Pulitzer Prizes were thus awarded this year to the Wall Street journal for “revealing” the scandal which forced Vice President Agnew to resign and to the Washington Star/News for “revealing” the campaign contribution that led to the indictments of former cabinet officers Maurice Starts and John N. Mitchell, although reporters at neither newspaper in actual fact had anything to do with uncovering the scandals. . . . Yet to perpetuate the myth that the members of the press were the prime movers in such great events as the conviction of a Vice President and the indictment of two former cabinet officers, the Pulitzer Prize committee simply chose the news stories nearest to these events and awarded them its honors.

The Nobel Prize in Biology committee operates the same way, except with the disadvantage that there is not one important (= useful in a big way) biology discovery per year. There are far fewer than that. So almost every year the Nobel Prize in Biology goes to discoveries with little practical importance that are described as having great practical importance. The profession (in this case, biology) is credited with much more power than it actually has.

Why does this happen? One possible reason is that no one points it out. (Epstein’s essay, still relevant today, was published in 1974.) When a powerful journalistic institution does bad things, it is incredibly dangerous (to your career) to point this out. This is why the Murdoch scandal is so big — it went on so long. Spy magazine had a column called Review of Reviewers. It was hilarious because the misdeeds were great. Unlike almost anything else in Spy, the author was anonymous. Brilliant writing that the author did not take credit for because it was dangerous to criticize the watchdogs. Likewise, hardly anyone except Epstein criticizes the prize committees (who resemble watchdogs) so they can be profoundly inaccurate.

 

Personal Science is to Professional Science as Professional Science is to Engineering

A few days ago I gave a talk at Microsoft Beijing titled “The Rise of Personal Science: Discoveries about Acne, Blood Sugar, Mood, Weight Loss, Sleep, and Brain Function.” (Thanks to Richard Sprague, who invited me.) The audience was engineers.

In response to a question, I said that the relationship between personal science and professional science resembled the relationship between professional science and engineering. Cause-effect statements (X causes Y) vary in their degree of plausibility anywhere from zero (can’t possibly be true) to one (absolute certainty). Engineers, professional scientists, and personal scientists tend to work at different places along this scale:

Engineers work with cause-effect relationships at the top of the scale, that are well-established. (For example, Newton’s Laws.) Relationships in which we have total confidence.

Professional scientists like to study cause-effect relationships that are in the middle of the scale of degree of belief: true and false are equally plausible. When both true and false are plausible, you can publish the results no matter what you find. If everyone already agrees that X causes Y, further evidence isn’t publishable — too obvious. If it is highly implausible that X causes Y, professional scientists cannot study the question because a test of whether X causes Y is too unlikely to pay off. If you find that X does cause Y you can publish it but that’s too unlikely. Finding that X does not cause Y is unpublishable (“we already knew that”).

Personal scientists can easily test ideas with low plausibility. First, because personal science is cheap. Many tests cost nothing. Second, because what other people think is irrelevant. (A professional scientist who takes seriously an idea that “everyone knows is nonsense” risks loss of reputation.) Third, because there is no pressure to produce a steady stream of publications. An example of a personal scientist testing an idea with low plausibility is when I tested the idea that standing causes weight loss. I thought it was unlikely (and, indeed, I didn’t lose weight when I stood much more than usual). But I could easily test it. It led me to discover that standing a lot improves my sleep.

Plainly we need all three (engineers, professional scientists, personal scientists). Has anyone reading this heard someone besides me make this point?

I have been shocked — I sort of continue to be shocked — how much I have been able to discover via personal science. But a high rate of discovery makes sense if personal science supplies a necessary ingredient — ability to test low-plausibility ideas — that has been missing.

Drug Companies Release More Data From Drug Trials

Drug companies, in a few cases, have recently started to release much more data from drug trials. Unsurprisingly, analysis of the new data by outsiders — people who have nothing to gain from positive results — has often contradicted the drug company analysis of the same data.

One example involves the flu drug Tamiflu. The new analysis suggested that “Tamiflu falls short of claims—not just that it ameliorates flu complications, but also that the drug reduces the transmission of influenza.” Another example involved Prozac. The new analysis “ended up bucking much of the published literature on antidepressants. . . . [It]found no link between Prozac and suicide risk among children and young adults . . . Prozac appeared to be more effective in youth, and antidepressants far less efficacious in the elderly, than previously thought.”

Another reason to believe in the value of this new data is the work of Lisa Bero at UCSF. She looked at the efficacy of nine drugs using unpublished FDA data. “Nineteen of the redone analyses showed a drug to be more efficacious, while 19 found a drug to be less efficacious. The one harm analysis that was reanalyzed showed more harm from the drug than had been reported.”

I hope that the FDA will eventually require that all raw data from drug trials be publicly available as a condition of approval. (The same should also be true of journal articles, as a condition of publication.) It is abundantly clear that drug company analyses are often misleading — which harms the public.

Assorted Links

  • The corruption of science by research grants. This reminds me of a BBC documentary called something like Science Under Attack. It was hosted by a Nobel Prize winner (Biology) named Paul Nurse. Part of it was about “climate change denialism”. If you don’t believe that humans are dangerously warming the planet, Nurse implied, you are somehow attacking science. When people who win Nobel Prizes cannot see that AGW is a crock, something curious has happened.
  • Edward Jay Epstein interviews DSK. “”Thank you so much for your interest in this case,” he says.”
  • Researcher discovers new treatment for her own vertigo. ” A University of Colorado School of Medicine researcher who suffers from benign paroxysmal positional vertigo (BPPV) and had to “fix it” before she could go to work one day was using a maneuver to treat herself [the usual treatment] that only made her sicker. “So I sat down and thought about it and figured out an alternate way to do it. Then I fixed myself and went in to work” and [thereby] discovered a new treatment for this type of vertigo.”

Thanks to Melissa Francis.

Assorted Links

Thanks to Peter Spero and Hal Pashler.