Trouble in Mouse Animal-Model Land

Most drugs are first tested on animals, often on “animal models” of the human disease at which the drug is aimed. This 2008 Nature article reveals that in at least one case, the animal model is flawed in a way no one really understands:

In the case of ALS, close to a dozen different drugs have been reported to prolong lifespan in the SOD1 mouse, yet have subsequently failed to show benefit in ALS patients. In the most recent and spectacular of these failures, the antibiotic minocycline, which had seemed modestly effective in four separate ALS mouse studies since 2002, was found last year to have worsened symptoms in a clinical trial of more than 400 patients.

I think that “close to a dozen” means about 12 in a row, rather than 12 out of 500. The article is vague about this. A defender of the mouse model said this:

As for the failed clinical trial of minocycline, Friedlander suggests that the drug may have been given to patients at too high a dose — and a lower dose might well have been effective. “In my mind, that was a flawed study,” he says.

Not much of a defense.

That realization is spreading: some researchers are coming to believe that tests in mouse models of other neurodegenerative conditions such as Alzheimer’s and Huntington’s may have been performed with less than optimal rigor. The problem could in principle apply “to any mouse model study, for any disease”, says Karen Duff of Columbia University in New York, who developed a popular Alzheimer’s mouse model.

“Less than optimal rigor”? Oh no. Many scientists seem to believe that every problem is due to failure to follow some rules they read in a book somewhere. They have no actual experience testing this belief (which I’m sure is false — the world is a lot more complicated than as described in their textbooks); they just feel good criticizing someone else’s work like that. In this case, the complaints include “small sample sizes, no randomization of treatment and control groups, and [no] blinded evaluations of outcomes.” Very conventional criticisms.

Here’s a possibility no one quoted in the article seems to realize: The studies were too rigorous, in the sense that the two groups (treatment and control) were too similar prior to getting the treatment. These studies always try to reduce noise. A big source of noise, for example, is genetic variability. The less variability in your study, however, the less likely your finding will generalize, that is, be true in other situations. The Heisenberg Uncertainty Principle of experimental design. Not in any textbook I’ve seen.

In the 1920s and 30s, a professor in the UC Berkeley psychology department named Robert Tryon tried to breed rats for intelligence. His measure of intelligence was how fast they learned a maze. After several generations of selective breeding he derived two strains of rats, Maze Bright and Maze Dull, which differed considerably in how fast they learned the maze. But the maze-learning differences between these two groups didn’t generalize to other learning tasks; whatever they were bred for appeared to be highly specific to maze learning. The measure of intelligence lacked enough variation. It was too rigorous.

When an animal model fails, self-experimentation looks better. With self-experimentation you hope to generalize from one human to other humans, rather from one genetically-narrow group of mice to humans.

Thanks to Gary Wolf.

7 thoughts on “Trouble in Mouse Animal-Model Land

  1. But are you going to try untested drugs on yourself? It seems to me that there is at least *some* value in animal models.

  2. Thanks for your perspective on this, Seth. In response to Andy M.’s comment above: yes, that’s right, the criticism here doesn’t apply to all uses of mice or other animals in research. Howard Florey ran his crucial test on using penicillin to cure bacterial infection in living mammals using only 8 mice. All were given a big does of streptococci, four got penicillin injections. Untreated mice were dead overnight; three of the others survived. A large increase in confidence obtained. To get to Seth’s point, though, look how “poorly” this experiment was designed. No randomization, no double blinding, no repeat trials; just a crappy little home brew test of an idea that helped significantly to get the ball rolling to save millions (well, tens of millions) of lives. I really think this is an important point. There isn’t a “gold standard” for scientific truth in the sense of a standard experimental design that will settle all questions. There is a step by step process of confidence building. Reducing statistical “noise” in an experiment carries a cost. When you are as ignorant of the underlying biology as we are (not a criticism of the scientists personally, this is a general ignorance) about ALS, the real problems may not come from conventional errors in experimental design… That’s the point, at least, as I take it.

  3. Quick add-on: I don’t mean to imply this was Florey’s only test! Just that you can acquire big news from experiments that don’t meet the “gold-standard,” and then go on to answer the questions that emerge by doing more experiments… hope that was clear.

  4. Andy M, yeah, I agree, self-experimentation won’t replace testing drugs on animals. I’m saying that the conventional view that animal models are good and self-experimentation is bad isn’t supported by these particular facts.

    Andrew, the psychologists I know don’t like the two-group (control and experimental) design you and Rosenbaum describe. They prefer a design in which each subject is his own control. You compare the same subject with and without the treatment. In the within-subject design, you are not penalized for increasing subject-to-subject variation, the way you are in a between-subject design, where the control group and the treatment group consist of different subjects. Because the “ideal study” you describe has lots of between group variation, it is less than ideal for detecting a difference between the groups.

  5. Hi Seth,

    Just wanted to thank you for inspiring me to practice self-experimentation. I am half-way through 30 days of restricting calories, trying to be as self-examining as possible. I’m dumping the basic recordings of this experiment on this blog:

    https://thirtydaysdown.blogspot.com/

    I’m averaging about 650 calories a day and have learned that the hunger sensation is easily tamed but coping with the mental effects of low blood sugar is difficult and I may be eating more sugar than I did before the diet started. By adding a bit of sugar to my tea and eating dry apricots through the day I am almost 100% fine until evening meal time. I was going to use a tablespoon of oil a day to dampen my hunger but my body seems to be coping just fine – that said, this is more of a crash course experiment versus a longer term weight loss programme. The other big observation so far is that while the blanket advice is to increase consumption of fruit and vegetables I find the former far less effective in terms of satiety and have steered away from almost all fruit over the period (except the magic dried apricots!).

    I’m enjoying your next adventure in the realm of fermented foods and look forward to the findings. Just a shame I can’t find natto so easily here in the UK. All the best, Riz.

  6. Thanks, Andrew. I agree that some designs are better than others. But I disagree that there is no Uncertainty Principle. In your example — the within-subject design — where you seem to say no price is paid for wide variation in subjects, there is indeed a price because it is likely that there is nonzero treatment x subject interaction, which inflates the error term. The greater the subject-to-subject variation, the greater the inflation of the error term. The Uncertainty Principle also applies to the treatment. The more precisely the treatment is repeated from one subject to the next the clearer any difference will be (because “treatment error” is minimized) but the less well you can generalize to other treatments. Herb Clark, a psych professor at Stanford, made similar points in a well-known-to-psychologists verbal-learning paper about 30 years ago. He explained the difference between a random and a fixed effect. It’s another example of what I’m talking about.

Leave a Reply

Your email address will not be published. Required fields are marked *