Andrew Gelman on Web Trials and the Shangri-La Diet

Andrew Gelman is a professor of statistics at Columbia University. Years ago we co-taught a seminar about left-handedness. His blog. This interview took place via instant messaging in February, 2007 and has been edited slightly.

SR I want to ask your opinion of web trials. People go to a website where they choose or are randomly assigned a treatment. Then they come back and report the results.

AG Then the records of their choices and outcomes are made publicly available.

SR Yes. And there would probably be some summary of the results prepared by experts. It wouldn’t be just raw data.

COMPARING TO CURRENT STATE OF THE ART IN MEDICAL RESEARCH

AG We could compare to the current state of the art in medical research, which I think is to have some moderately large randomized clinical trials, each of which is published in a journal, followed by a meta-analysis of these trials. A difficulty with the current state-of-the-art is that sample sizes in clinical trials seem to be simultaneously too small and too large. Too small in that results tend to be just barely statistical significant (and often not significant for subgroups), so that you can’t really put your faith in one study, hence the need for meta-analysis. Too large in that each study is unwieldy, takes a huge amount of effort and doesn’t allow for much learning and experimentation during the study.

SR. A famous epidemiologist [Richard Doll] once said that if the effect is strong, you don’t need a big study.

AG In some way the high cost is a good barrier in that people have to think seriously and justify what they want to do. On the other hand, within any particular research plan, it would seem to limit the possibility for innovation.

Speaking generally, a challenge is to integrate clinical judgment (including ideas of experimentation and trying different things with different patients) with scientific goals such as replicability.

Also, there are well-known cognitive illusions in clinical judgment, which is what motivates the evidence-based-medicine movement (for randomized trials, public records of data, etc.) in the first place.

SR How do web trials fit into the picture you have drawn?

AG Ideally, web trials are intermediate between controlled randomized trials on one hand, and full recording of observational data on the other. If people are really volunteering to be randomized, then they follow the protocol, then this is a clean randomized expt (albeit not blinded, an issue I’d like to raise with you). In practice there will be lots of selection, dropout, measurement error, etc., which moves it toward an observational study. The dispersed nature of the data collection is similar to (in fact, more dispersed than) the idea of individual clinicians recording their experiences and outcomes into a centralized databased. That is, the data collection is dispersed, the database is centralized.

SR A web trial would have more regularity — less variation — across subjects than observations collected from individual doctors. Because everyone would get the same instructions. Whereas different doctors are obviously going to give different instructions (for the same nominal treatment).

AG Yes. That’s why I said the web trial is in between.

DIFFICULTIES WITH BLINDING

SR In the area of blinding I think a web trial would be better than the conventional double-blind clinical trial. If the goal is to guide practice. In practice patients are not blinded. Blinding is a tool to equate expectations. Better to equate expectations by comparing different treatments both believed to be effective.

AG One of the difficulties with your self-experimentation is that there’s no blinding at all. Similarly with these trials. Some of it is the nature of your treatments, but perhaps with some effort you could come up with blinded versions.

SR In my self-experimentation the expectations are equal in the different conditions, in many cases.

AG For example, consider the recent self-experiment that you describe on your blog, where you try different oils and measure your balance. I’d believe these results a lot more if you blinded the treatments.

SR Sure, blinding would help in that case, I agree. I plan to do something like that. But blinding is not necessary to equate expectations. For example, I tried many ways of losing weight. In every case I expected it to work. Some ways worked much better than others. It is this comparison of the effects of different treatments that is interesting. In general expectations cannot be very powerful or there would be no problems left to solve. Expectations are powerful in a few areas and seem to have no effect in many areas. I don’t mean we should ignore them; but to emphasize them as a big deal is not what the evidence suggests. In any case in web trials the participants would only be randomized (or choose) treatments they thought might work

AG There’s some work by Rubin and other statisticians on “broken randomized trials” which can more generally be thought of as experiments that have partial randomization.

SR I think of web trials as giving “entrants” (or subjects) a choice: to be or not to be randomized. Then when it’s all over you compare the two groups.

AG That makes sense. You’ll still have some problems: 1. People not following protocol. 2. Non-blindness of treatments. 3. Other problems, I’m sure, which I can’t think of offhand.

SR Well, these are equal for all conditions so they shouldn’t distort anything

AG In a controlled trial you can deal with some of these things: 1. In a controlled trial you can have more interactions with the experimental subjects, thus maybe more likely they’ll follow protocol. 2. In a controlled trial you can (sometimes) ensure blindness. In general, I don’t think you can get away with assuming that biases cancel out.

ANALYZING DATA FROM WEB TRIALS

AG Your web trials should give us a big juicy source of data that can be thrown at a stat Ph.D. student as a thesis project, perhaps! My intuition as an amateur sociologist of applied statistics is that an exemplary applied analysis is a good way to kick-start the study of a statistical problem.

SR What’s an example of such a kick-start? That’s an interesting point.

AG I’m thinking of the hierarchical models that were fit by Lindley, Novick, Rubin, and others in the late 1960s thru early 1980s to educational data. These provided examples for people to follow–templates–as well as demonstrations that these methods really worked. There were various interesting disciussions of these models in the stat literature, in particular I’m thinking of a paper by Rubin on law school validity studies in J. Amer. Stat. Assoc. from 1980 that had several discussants.

SR Yes, it is true that the data from web trials would be complex and interesting in new ways and accessible to everyone.

AG Yes, having available data is another plus–that’s really a new feature which should help. Now back to the warnings. A very well known example is the Nurses Health Study, an observational study that found that taking post-menapausal drugs was associated with lower heart-attack risks (and lower death rates). But when a big randomized expt was done, no association was found. Actually, taking the drugs slightly increased cancer risk, I believe. See here.

I talked with various people about this, and there are different potential explanations for the discrepancies. One story is that the women who took the drugs were otherwise healthier, more health conscious, etc.–even after controlling for whatever pre-treatment variables they controlled for. Another story is that the populations of the 2 studies were different (in particular, in their average ages), and perhaps the drugs are beneficial for some ages but not others. (Incidentally, the drugs were not originally intended to reduce heart-attack risk. This was an unexpected effect (or non-effect), I believe.)

Anyway, the people I trust on these matters (notably John Carlin) believe that the difference is because of “selection”, i.e., the drugs don’t really reduce heart attack risk. But the observational study led people to recommend the drugs. So this is a big example where the obs study was misleading.

SR: Did the randomized study conclusively rule out the effect size seen in the correlational study? or did it simply find no effect?

AG I’m not sure. My impression is that the expt actually contradicted the obs study–a stat signif negative effect for one, and a stat signif positive effect for the other–not just that there was significance for the expt and no signif for the obs study–but I never really looked into it.

SR I’d like to return to the issue of blind vs don’t blind. You believe any experiment where subjects are not blind to the treatment has a problem?

AG Yes, if knowledge of the treatment could affect the outcome (for example, through motivation). I worry about it for your diet and depression studies.

SR Well, in much research the first question is whether there is a useful effect. later experiments deal with mechanism. I was under the impression that what matters is to equate expectations across conditions and that blinding is just one way to do this.

AG Maybe you’re right, I’m not actually up on this literature. I know that Paul Rosenbaum has written about it.

** MORE ON BLINDNESS: CONSIDERING THE SHANGRI-LA DIET **

AG My knowledge of it is not particularly sophisticated. For your diet and depression studies, there are obvoious stories based on motivation.

I wouldn’t go so far as some people and simply dismiss your results. But the concerns are natural, I think. It’s a little different than the problem with the Nurses study. Here I’m worried about motivation, there the issue was selection.

Although there’s a possible selection problem in your study too, in that the people (including you) doing the Shangri-La Diet might be those who are ready to try something new and lose weight.

SR There are a lot of people who are always ready to try something new and lose weight.

AG Again, this could be tested with a blinded study. For example, half the people get the oil apart from a meal, half get the oil with the meal. Not that this would solve all problems of interpretation. . . .

For example, Caroline thinks that your diet works, but that the reason why it works is that it stops people for snacking for a 2-hour period (before and after the oil) and also focuses people on their snacking.

SR If anyone thinks that — and it is a perfectly reasonable thing to think if you are just starting to learn about it — then they can replace the oil with water and see if they continue to lose.

AG To answer your comment (”there are a lot of people who are always ready to try something new and lose weight”): yes, I remember you saying this before, and this is a big reason I wouldn’t dismiss your results immediately. But, still, people willing to try this wacky new thing might be special (on average). To put it another way, I expect there were similar successes with people trying Scarsdale, Atkins, etc.

SR I’m sure that people who try my diet are unusual early adopter types. I think Atkins has some truth to it — some reasons it would actually work. I don’t know enough about Scarsdale to comment. My theory says that merely changing what you eat (to foods with unfamiliar or at least less familiar flavors) should lower your set point.

AG Sure, but you had another point which was that these were people for whom nothing worked before. I was just using these diets as examples of other things that worked when nothing worked before. It relates to the historical perspective of new diets as things that will work for a few years before burning out. Possibly because the new diets can motivate people.

SR I tend to think they burn out because the new food becomes familiar.

AG I’m not saying that this is necessarily true of your diet–yours might be different–I’m just giving a historical control to give insight as to how there could really be motivational issues.

SR That’s true, research to distinguish my explanation of the burn out and a motivational one could be done but of course hasn’t been.

AG Your story, “they burn out because the new food becomes familiar”, is plausible. It’s also plausible that it’s easier to motivate yourself with a plan that’s new and different.

SR I hope there will be studies of whether the theory behind my diet is correct. These would essentially be studies that test the prediction that familiarity matters. This is a prediction that other theories do not make.

AG Yeah, based on reading the appendix to your book, there’s still some research synthesis that needs to be done (presumably with the help of animal studies).

SR I agree.

BACK TO WEB TRIALS

SR Web trials are relatively early in the research chain and they are relatively practical. In these cases you don’t worry a lot about mechanism, you worry much more about efficacy — is there an effect?

AG Regarding the analysis of web trials, it would be interesting to look at other examples of partially randomized experiments. Rubin and Hill and others worked on a study of school choice where they looked into some of these issues. It was a study that randomized some aspects of which kids went to which schools, but parents had some choices too.

In medicine and also in economics/public-policy, there has been a lot of interest in recent years in trying to get inside this sort of study rather than just relying on the “intent to treat” or explicit randomization.

SR “get inside this sort of study”–what do you mean?

AG: I mean, look at what treatments are actually chosen by the individuals in the study, not just looking at what treatments they were assigned to.

SR Could you sum up why you like the idea of web trials?

AG 1. Lots of data. 2. Motivates people to randomize, to apply the treatment, and to record results. 3. More generally, gets people involved in the project as participants, not just “subjects”

SR Those are good points, thanks.

AG Thank you for giving me the opportunity to think about these things. I’m still struggling with the question, “Are medical experiments too small or too big (in number of subjects)?”. As discussed here.

A New Way to Quit Smoking?

A few days ago on the Dean Edell radio show, I’m told, Dean Edell told his listeners that nicotine patches don’t cause any addiction problems; people just don’t get addicted to them. To anyone who has read The Shangri-La Diet this will sound eerily familiar: Dr. William Jacobs, a professor of psychiatry and addiction researcher at the University of Florida, told me that no one gets addicted to unflavored sugar water, although lots of people get addicted to Coke, Pepsi, and other forms of flavored sugar water.

These examples suggest is that it isn’t the drug (sugar, nicotine) that causes addiction, it’s the signal of the drug — the conditioned stimulus (CS), to use animal-learning jargon. No signal, no addiction. In the case of sugar water, it’s very clear: Digestion of calories provides little or no pleasure. Ingestion of sweet-tasting things provides just a little pleasure. Ingestion of a flavor that has been paired with calories many times, such as the flavor of Coke, provides a lot of pleasure. The pattern with nicotine may be similar: Nicotine itself provides little or no pleasure. It is learned signals of nicotine — events repeated followed by nicotine — that can be very pleasant.

The practical application is that you may not need nicotine patches to quit smoking. It may be enough to hold your nose while you smoke. (The nose-clipping that SLD forum readers are familiar with.) When you smoke, the smell may become the CS. With this way of smoking you could have cigarettes whenever you wanted. You’d just come to want them less and less.

Likewise, it may be possible to get rid of an addiction to coffee by holding your nose while you drink it.

Thanks to Carl Willat.

Science in Action: Omega-3 (sleep data)

When I started taking omega-3 the rationale was not crystal clear. Many Shangri-La dieters reported better sleep; the diet involves drinking fat; omega-3, a fat, may affect the brain; sleep is controlled by the brain. I had not noticed any change in my sleep when I switched from sugar water to ELOO. Maybe this was because ELOO was low in omega-3, I thought, and this is what prompted my interest in omega-3. Later, a fly in the ointment: a poll of SLDers found that ELOO was as likely to produce better sleep as other oils. Implying that it is not omega-3 that is producing better sleep. I was puzzled, but continued my omega-3 investigations, which by then were motivated by an unmistakable improvement in my balance. My sleep did seem to improve somewhat when I started taking flaxseed oil capsules (a good source of omega-3).

Now I think I understand. I recently changed the time of day that I take 3 tablespoons of flaxseed oil. I had been taking it around 10 pm every evening; I switched to 10 am every morning. I wondered if the change would affect my balance, which I test around 7 am every morning.

To my surprise the change affected my sleep: I started waking up earlier. That is, I slept fewer hours before I woke up. This was not good — in general, the longer I sleep in one continuous stretch at night, the better. I was waking earlier and less rested. My impression was that my sleep was reverting to an earlier, lower-quality state.

To confirm this, I entered a lot of my sleep data into my computer and made a graph of how the length of my sleep (my “1st” sleep, to distinguish it from sleep when I fall back asleep a few hours after waking up) varied over the last two years. Here is the graph:

Length of 1st sleep over time

T = tablespoon. The labels give the daily dose — e.g. “3 T flax” means 3 tablespoons/day of flaxseed oil. Each point is a mean. The error bars are standard errors. This graph shows that in recent months I had been sleeping longer. I had noticed this change: it was especially clear when I switched from 1 T/day flaxseed oil to 2 T/day. I thought the improvement was due to omega-3 — ignoring the fact that a switch to sesame oil (low in omega-3) didn’t eliminate it.

Now, with a third fact contradicting my original idea (the first two were the poll and the sesame oil results), I have finally managed to change my mind. It is fat in the evening that causes longer sleep. Not only omega-3 fat — perhaps any fat has this effect. Now all sorts of things make sense.

  • When I started drinking ELOO my sleep didn’t improve because I drank most of the ELOO during the day.
  • When I started the flaxseed oil capsules they had only a little effect on my sleep because I swallowed them throughout day as well as in the evening.
  • When I switched from 10 flaxseed oil capsules per day to 1 Tablespoon of flaxseed oil per day my sleep got longer because I always drank the tablespoon in one shot — around 10 pm. When I switched to 2 Tablespoons/day, I continued to drink it all at one time, around 10 pm. I attributed the improvement to the increase in omega-3; it was really due to the increased evening intake of fat.
  • ELOO and other fats helped many SLDers sleep better because they drank them in the evening.
  • If you want to try this, note that the effect was bigger with 2 tablespoons at 10 pm than with 1 tablespoon at 10 pm.

    To be continued.

    Jane Jacobs on the Food Industry

    According to Paul Goldberger in the NY Sun,

    [Jane Jacobs] regretted the construction of more and bigger buildings, and the enormous power held by the real estate industry, Mr. Goldberger said. “But she was also a realist,” he said. “She was not Utopian, and I think that was the thing that distinguished her from many other intellectual and urban thinkers. She believed that the world we had was actually pretty good, if only we would learn to understand it, appreciate it, and handle it right.”

    Exactly. That is what I was saying in my comments on Michael Pollan (here and here). Our food world — which is mainly a processed food world, very little food is unprocessed — is actually pretty good. Some food processing is done according to wrong theories — the wrong theory that fat per se is fattening, for example. The newest food processing gets the most attention because it is still noteworthy (e.g., low-fat foods) but it is new theories that are most likely to be wrong. This is why “processed food” gets a bad rap. Most food processing, which is no longer advertised and we no longer notice because it is so common, is done according to correct theories — the main examples being cooking, refrigeration, freezing, and other forms of germ reduction. The germ theory of disease is correct. The poor health of many Americans reveals plenty of room for better understanding; I think the theory behind the Shangri-La diet is an example of better understanding. That theory suggests new types of food processing, as I explain in the last chapter of the book.

    The Hidden Relevance of Experimental Psychology

    I used to teach introductory psychology. As I skimmed introductory psych texts, I could sense the disinterest that almost all the authors of these books had for my field — experimental psychology. Pavlov, memory — that was boring. What did that stuff have to do with everyday life? the authors seemed to be saying.

    The Shangri-La Diet was built on thousands of experiments about Pavlovian learning. Empirical generalizations from that data helped me make the mental jump from experiments by Israel Ramirez to a new theory of weight control. A conceptual understanding of Pavlovian learning (what makes an association weak or strong) allowed me to use the new theory to find new ways of losing weight. Suddenly that boring stuff was relevant.

    My omega-3 findings (such as this), if they hold up, would do the same thing for two other areas of experimental psychology. The experimental designs I use, such as ABA, are straight from Skinnerian psychology. Although I am now measuring my balance — not part of experimental psychology — my guess is that most of the measurements will eventually be more “mental.” I assume that omega-3s improve my whole brain, not just the balance-related part. Experimental psychologists have spent 100 years developing simple and effective measures of many mental functions; all that measurement work should help us figure out how much omega-3 and omega-6 we should consume. Too little omega-3 and too much omega-6 appear to cause a vast range of health problems, including the most serious. The problem is that it is extremely hard to measure the functioning of our immune system or our circulatory system or most other parts of our body. It is even hard to measure how well our mood-regulating system is working. (Too little omega-3 appears to increase the risk of bipolar disorder.) It is much easier to measure memory.

    Experimental psychology can be divided into two parts — human (Part A) and animal (Part B). Part B can be subdivided into B1 (Skinnerian) and B2 (associative learning). Part B2 can be subdivided into B21 (Pavlovian learning) and B22 (instrumental learning). If you know the field you know these are the natural divisions. All my mainstream work has been in B22. I have managed (or hope to manage) to show the relevance of every area of experimental psychology except my own. Curious.

    Eating Less

    Emily Yoffe has a fascinating piece in Slate about going on a “CRON” (calorie-restricted optimal nutrition) plan. She eats 1500 calories/day. I was struck by three things: 1. Roy Wolford, apparently the first person to try something like this for a long time, did not live to be unusually old. He was 79 when he died. This is very helpful self-experimentation: CRON didn’t work, at least for life expectancy. One data point is much better than none. 2. Hunger is a huge problem. 3. In spite of the hunger, Yoffe is continuing the plan after the allotted 2 months have finished. Her sleep is still poor, etc., but she likes being thinner.

    Yoffe mentions the UpDayDownDay regime studied by NIH researcher Mark Mattson. There is now a website for an associated book and diet. (My earlier comments.) A few weeks ago I asked Donald Laub, a Stanford professor of medicine who is doing this regime, if he was still taking olive oil to make the low-calorie days easier to endure. He said he was.

    How Often Should I Weigh Myself?

    I dislike weighing myself. But the recent Fancy Food Show left me with a fabulous collection of beautiful rare chocolates and I have gained weight. This essay by Bill McKibben about the value of knowing your gas mileage and this great piece by Atul Gawande on the value of a birth-outcome score (the Apgar score) have made me realize:

    1. Weighing yourself is an act of courage.

    2. Weighing yourself is always beneficial. No matter what the scale says.

    Paperback SLD

    I have just finished correcting the proofs of the paperback edition of The Shangri-La Diet, due out in May. The paperback edition has much less about drinking sugar water, and more about omega-3s, nose-clipping, and lessons learned from the SLD forums. The first three interludes (case studies) are different.

    All of the changes are due to user feedback. In This Film is Not Yet Rated, Fred Von Lohmann, an attorney for the Electronic Frontier Foundation, says

    Everyone always forgets . . . that Sony thought the VCR would be primarily used for time-shifting. We all know that’s not what it’s good for; it’s good for going to Blockbuster and renting movies, right? It took some time in the hands of consumers for that device to sort of find its highest and best use.

    Perhaps someday everyone will forget that SLD was originally based on drinking sugar water.

    Web Trials Update

    At the Shangri-La Diet forums, SLDers — more than a hundred of them — have been posting their weight for many months, thanks to Rey Arbolay. No similar data is available for any other weight-loss method, as far as I know.

    The main weakness of the SLD data is lack of comparison. This led me to propose web trials — a hybrid of the SLD data collection and a clinical trial, where there is always a comparison (at least two treatments, or treatment and control). After I interviewed Robin Hanson about them, a British student programmer named Andrew Sidwell contacted me and offered to set up a website to allow web trials to be done.

    How exciting! A website that does web trials will allow cheap, easy, testing of many solutions to many problems. Although clinical trials usually involve medical problems, web trials can be used to study anything, as Robin pointed out. Andrew and I plan to start with procrastination.

    Michael Pollan vs. Processed Food

    The problem with Michael Pollan’s latest food piece in the New York Times is that it isn’t very . . . nutritious. It doesn’t contain a story with new and interesting facts — like the story of Joel Salatin, a brilliant Virginia farmer, well told by Pollan in The Omnivore’s Dilemma. Instead it contains many broad generalizations, the evidence for which is never given in any detail. Long ago we ate food (i.e., unprocessed food), says Pollan, and it was better for us than the processed food products we eat today. Long ago we listened to stories, say I, and it was better for us than the expert statements on which much of modern journalism is based. If I taught journalism (as Pollan does), I would tell my students the best thing is a story of success (e.g., Salatin) because we can always learn from it. Next best is a story of failure because we can always learn from that, too. Worst is to quote experts (e.g., Pollan quotes Marion Nestle). For two reasons: 1. Experts are often wrong. When they are, it is worse than learning nothing — we are actively misled. 2. Experts — at least in standard journalism — never say the facts on which their claims are based. Even if they are correct, what the reader learns from quoting them is shallow.

    Misled by experts, apparently, Pollan repeats Marion Nestle’s recommendation to “eat less” (to reduce obesity). Why it is helpful to repeat failed advice that the rest of us have heard a thousand times is not explained. Nor is it made clear what ancient foodway — Pollan is basically saying we should return to long-ago ways of eating — we would be following if we tried to “eat less.” As far as I know, the answer is none of them.

    Several big important stories contradict Pollan’s conclusions. One is the story of B vitamin supplementation of flour and other processed food, which greatly reduced neural birth defects. I heard a dean of a public health school tell a room full of new students that this one advance, which averted so much suffering, fully justified all the money spent on schools of public health. I agree. Processing food is not always bad. Sometimes it can be very good. When you process food based on a correct theory, that often happens. Food sterilization, refrigeration, and preservation via additives — all based on a correct theory, the germ theory of disease — have had many benefits. It’s when you process food based on a wrong theory — such as the theory that fat causes obesity — that you can easily do more harm than good.

    There is no turning back. We can’t avoid processed food. To move forward, we need better theories to guide the processing. Anyone who reads this blog regularly knows I think ancient foodways are a good source of evidence with which to build theories (e.g., Weston Price) but of course there are many other good sources of evidence.

    —————————————————-

    Usually CISSP professionals prefer doing N10-003 as it helps them in their SY0-101 later. A small number however is content with 70-649 too.