Researchers Fool Themselves: Water and Cognition

A recent paper about the effect of water on cognition illustrates a common way that researchers overstate the strength of the evidence, apparently fooling themselves. Psychology researchers at the University of East London and the University of Westminster did an experiment in which subjects didn’t drink or eat anything starting at 9 pm and the next morning came to the testing room. All of them were given something to eat, but only half of them were given something to drink. They came in twice. On one week, subjects were given water to drink; on the other week, they weren’t given water. Half of the subjects were given water on the first week, half on the second. Then they gave subjects a battery of cognitive tests.

One result makes sense: subjects were faster on a simple reaction time test (press button when you see a light) after being given water, but only if they were thirsty. Apparently thirst slows people down. Maybe it’s distracting.

The other result emphasized by the authors doesn’t make sense: Water made subjects worse at a task called Intra-Extra Dimensional Set Shift. The task provided two measures (total trials and total errors) but the paper gives results only for total trials. The omission is not explained. (I asked the first author about this by email; she did not explain the omission.) On total trials, subjects given water did worse, p = 0.03. A surprising result: after persons go without water for quite a while, giving them water makes them worse.

This p value is not corrected for number of tests done. A table of results shows that 14 different measures were used. There was a main effect of water on two of them. One was the simple reaction time result; the other was the IED Stages Completed (IED = intra/extra dimensional) result. It is likely that the effect of water on simple reaction time was a “true positive” because the effect was influenced by thirst. In contrast, the IED Stages Completed effect wasn’t reliably influenced by thirst. Putting the simple reaction time result aside, there are 13 p values for the main effect of water; one is weakly reliable (p = 0.03). If you do 20 independent tests, purely by chance one is likely to have p < 0.05 at least once even when there are no true effects. Taken together, there is no good reason to believe that water had main effects aside from the simple reaction time test. The paper would be a good question for an elementary statistics class (“Question: If 13 tests are independent, and there are no true effects present, how likely will at least one be p = 0.03 or better by chance? Answer: 1 – (0.97^13) = 0.33″).

I wrote to the first author (Caroline Edmonds) about this several days ago. My email asked two questions. She replied but failed to answer the question about number of tests. Her answer was written in haste; maybe she will address this question later.

A better analysis would have started by assuming that the 14 measures are unlikely to be independent. It would have done (or used) a factor analysis that condensed the 14 measures into (say) three factors. Then the researchers could ask if water affected each of the three factors. Far fewer tests, far more independent tests, far harder to fool yourself or cherry-pick.

The problem here — many tests, failure to correct for this or do an analysis with far fewer tests — is common but the analysis I suggest is, in experimental psychology papers, very rare. (I’ve never seen it.) Factor analysis is taught as part of survey psychology (psychology research that uses surveys, such as personality research), not as part of experimental psychology. In the statistics textbooks I’ve seen, the problem of too many tests and correction for/reduction of number of tests isn’t emphasized. Perhaps it is a research methodology example of Gresham’s Law: methods that make it easier to find what you want (differences with p < 0.05) drive out better methods.

Thanks to Allan Jackson.

Are Drug Companies Becoming Less Law-Abiding?

Alex Chernavsky drew my attention to a report of the giant fines assessed drug companies for fraudulent marketing. For example,

Merck agreed to pay a fine of $950 million related to the illegal promotion of the painkiller Vioxx, which was withdrawn from the market in 2004 after studies found the drug increased the risk of heart attacks. The company pled guilty to having promoted Vioxx as a treatment for rheumatoid arthritis before it had been approved for that use. The settlement also resolved allegations that Merck made false or misleading statements about the drug’s heart safety to increase sales.

Fines, of course, are supposed to reduce bad behavior. Here are the fines by year:

  • 2009: 2 fines
  • 2010: 1 fine
  • 2011: 1 fine
  • 2012: 5 fines

This pattern does not suggest the fines are working. Drug companies, of course, are very big. I would like to see cross-industry comparisons: which industries pay the most in fines per dollar of revenue?

 

More Magic Dots

A New Jersey patent attorney named Jim D writes:

I’ve been using the magic dots as you described, marking a dot or line every six minutes. I use an online timer with an audible tone every six minutes. A portion of my work requires focus, as I have to review, compare and contrast technical documents. I’ve historically had limited ability to focus for extended periods of time. I’ve used an online bar graph countdown timer, but even with the visual feedback of the bar graph counting down, the longest I could go without a short break was 20 minutes. I’ve also tried online Pomodoro timers, with alternating work and break periods, but again, the longest I could go without a break was 20 minutes.

In contrast, by using the magic dots method, I can easily focus for 60 minutes. I’ve been working for 60 minutes until the box is completed, and then taking a short break before starting another 60 minute box. After a few more weeks, I will see if I can extend the focus length for a longer period of time. (As an aside, I wonder if completing an entire “box” is psychologically important, and, if so, would a 90 minute “box” shape work better than continuing with consecutive 60 minute boxes?).

I don’t think finishing a box matters. Sometimes I do, sometimes I don’t, it doesn’t seem to make a difference. A friend used a much different counting system; it also worked. After years of using six-minute intervals I have started to use five-minute intervals; they don’t interfere too much and shorter intervals are likely to be more powerful. I would like to compare different interval lengths but it is a difficult experiment to do.

Assorted Links

  • Open Source Malaria
  • Criticism of Malcolm Gladwell by The Korean, Gladwell’s persuasive rebuttal, more from The Korean, more from Gladwell. I thought the work under discussion (“ethnic theory of plane crashes”) was the best part of Outliers. Gladwell summarizes it: “That chapter in Outliers is about a series of extraordinary steps taken by Korean Air, in which an institution on the brink of collapse and disgrace turned themselves into one of the best airlines in the world. They did so by bravely confronting the fact that a legacy of their cultural heritage was frustrating open communication in the cockpit. That is not a slight on Korean culture, or any other high-power distance culture for that matter.”
  • More praise for the new TV show Naked and Afraid on the Discovery Channel. It really is riveting.
  • Ziploc omelette. Poor man’s sous vide.

Thanks to Nicole Harkin.

Butter and Coffee

In Perfect Coffee at Home, authors Michael Haft and Harrison Suarez, who have started a digital publishing company, say that the Buttermind experiment influenced them to start adding butter to their coffee. In this excerpt, they say that their cholesterol went down 25 points during the period they drank it (their lives changed in many other ways at the same time). At the end they say:

Later, we would learn that Ethiopian warriors had drunk buttered coffee to energize before battle as far back as 600 CE. But that was after we had stopped regularly drinking it. When we transitioned out of the Marine Corps and our days became less frenetic, it just didn’t seem as necessary.

The mention of Ethiopian warriors reminds me of how, after discovering that pork fat (from pork belly) improves my sleep, I learned that Mao Tse-Tung praised a certain pork-belly dish (红烧肉), calling it “brain food”. I don’t drink coffee but I have tried tea with butter. It tasted good, but I didn’t like the residue it left on the tea cup. Cream doesn’t leave a residue. I haven’t noticed that butter gives me energy. The benefits I believe in are better brain function and better sleep. Maybe more calmness.

How to Detect Dementia

Dementia is common. You might think that doctors and neuropsychologists would have a good understanding of how to detect it. Judging from a recent New York Times article, they don’t. The article is based on a study that found that people who report memory problems not detected by a standard test turn out to be more likely to end up with dementia (measured by a standard test) than those that don’t. This isn’t surprising; what’s more revealing is how people who report memory problems have been treated in the past: their complaints have been dismissed. For example:

Patients like this have long been called “the worried well,” said Creighton Phelps, acting chief of the dementias of aging branch of the National Institute on Aging. “People would complain, and we didn’t really think it was very valid to take that into account.”

Doctors had no idea whether these complaints were valid but rather than admit this ignorance they . . . confabulated. They claimed, based on nothing, that the complaints were not valid. It reminds me of a surgeon telling me that research supported her claim that I needed surgery (for a hard-to-notice hernia). No such research existed. When I asked her what research? she said she would find it. She was bluffing, in other words. That’s just one doctor making up evidence. Here it has been a whole group of doctors.

The problem isn’t just confabulation. Apparently doctors in this area fail to understand basic principles of measurement. When Patient Y visits Doctor X and complains of memory problems, Doctor X gives Patient Y a series of memory tests. Only if Patient Y scores below normal range does Doctor X think that Patient Y’s complaint is “real”. For example:

The man complained of memory problems but seemed perfectly normal. No specialist he visited detected any decline. “He insisted that things were changing, but he aced all of our tests,” said Rebecca Amariglio, a neuropsychologist at Brigham and Women’s Hospital in Boston.

Amariglio apparently fails to understand that a series of measurements on one person — which is what the man’s complaint was based on, comparing himself now to himself in the past — is going to be vastly more sensitive to change than a comparison of one person to other people. A reasonable response to a complaint of memory loss would be: This is hard to detect with a one visit. Let’s give you a sensitive test and have you come back in six months to see if you decline more than normal. Judging from the Times article, doctors still haven’t figured this out.

Speaking of memory decline, Posit Science still hasn’t sent me the data they promised to send me.

Thanks to Alex Chernavsky.

Assorted Links

Thanks to Alex Chernavsky.

Anesthesia Dolorosa Mirror Cure Update

I recently posted about using a mirror to cure anesthesia dolorosa, a painful skin condition similar to phantom limb pain, whichi is always caused by surgery. Beth Taylor-Schott, the inventor of the technique, told me what’s happened since she last blogged about it:

Since then, David is still pretty much pain free and off Neurontin. Just a twinge now and then that he takes care of with Lidocaine, if anything. Have not had to re-do the therapy or anything like that.

I have been contacted by two researchers in the UK, and as I understand it, they’re doing research based on the blog, though I have not heard what there results are. Also, someone in Australia wanted to fly us out to do a demonstration and be at a conference, but I couldn’t get enough time off work to make it worth it.

I have heard from people who have read the blog and who want to do the therapy and who have questions, which I always answer, but only a very few of them ever come back to tell me that it’s worked, and no one has ever come back to say it hasn’t, so I’m not sure what to do with that. I don’t really see this as my crusade. I put the blog out there, and I figure the people who are meant to find it will find it. We ourselves only discovered this through a weird series of coincidences, after all.

She explained what she meant by “weird series of coincidences”:

It was by no means certain that I would read it. I think it had been sitting in my in-box for three or four months when I went back and read it. That is one coincidence. Another is that I came across it because, given that David had just flatlined twice in the hospital, I was in no shape to do anything BUT clean out my inbox, not something I did at all frequently at the time (like MAYBE once a year or every two years). And then too, I read it at a moment when his cardiologist had just told us that probably David needed to not only go off the stimulants he’d been taking to counteract the Neurontin, but also the Neurontin itself. His pain was being kept under control in the hospital with injections of Toradol, and there was no way they would let me give him that much when he was at home and not being monitored. So what would we have done if the mirror therapy hadn’t worked? He would not only have been non-functional, but in constant, excruciating pain. And yet I did not go into the inbox looking for answers, I went into it to distract myself from the fact that I seemingly had no answers. So the whole thing had a very deus ex machina quality to it.

And then there’s the fact that I happen to be the kind of person who is resilient enough to actually try something like this despite years of frustration with the condition and our treatment at the hands of the experts. What are the chances that I’d be in a situation like this, especially given the rarity of David’s condition? (Last I heard, M, the woman in the piece by Gawande, did not pursue the therapy, even after it was suggested to her.)

In a list of things that made the discovery less likely (e.g., rarely cleans out her inbox) she includes something that made the discovery much more likely, namely “what would we have done if the mirror therapy hadn’t worked?” She and her husband were incredibly motivated to make it work. More motivated than professional scientists ever are. This is an enormous advantage of personal science over professional science: the much greater motivation of the personal scientist.

Assorted Links

Thanks to Nicole Larkin and Tim Beneke.

My Heart Watch: Bay Area Health Measurements

For many years I have used the services of Heart Watch to measure my cholesterol and other health-related things, such as HbA1c. The couple that runs Heart Watch, Sandy and Glen, travels up and down California. I was able to get tested only every three months. Feeling that this was inadequate, just as I did, a man named Karl Corbett recently started a business called My Heart Watch that allows much more frequent tests in the Bay Area, at similar price. My Heart Watch uses the same portable testing devices as Heart Watch.

The Berkeley location is almost across the street from Whole Foods. I signed up online (I was the first person to use their online sign-up service), which was very convenient.

Corbett told me that he greatly improved his cholesterol numbers by changing to a Caldwell-Esselstyn “plant-based diet” that included lots of vegetables, some fruit, no oils, and no animal-based products. (Since the usual oils, such olive and soybean oil, are plant-based, this is a curious feature. Esselstyn seems to ignore bad effects of cholesterol lowering.) The more often you can test yourself, the more easily you can determine what controls what you’re measuring. When you can test yourself often enough to be sure whether a dietary (or other) change has made a difference, you can begin to ignore large clinical trials and their many limitations, which include poor choice of control group, poor statistics, incomplete reporting, biassed reporting, publication bias, confoundings, investigator fraud, on and on. They are the fool’s-gold standard. If I can determine if alternate-day fasting improves my HbA1c, I can ignore what clinical trials say about it.

Before writing this post I spoke to Corbett about getting discounted testing in return for publicizing My Heart Watch.