Science in Action: Omega-3 (what’s the best dose?)

With a better understanding of how to measure balance, I looked again at my data about the effects of flaxseed oil. Here is a new, improved comparison of 2 tablespoons/day and 3 tablespoons/day:

2 vs 3 tablespoons/day

Very clear difference: one-tailed p = .004.

Here is a messy comparison between 3 and 4 tablespoons/day:

3 vs 4 tablespoons/day

I compared 3 tablespoons/day at 2 different times with 4 tablespoons/day divided between those 2 times. I didn’t want to take 4 tablespoons at one time and I wanted to have at least 2 tablespoons in the evening because of the sleep benefits. The graph shows that 4 tablespoons/day has about the same effect as 3.

The big picture: Earlier data convinced me there is probably an effect. Before doing more subtle, convincing, publishable experiments, I have been trying to make the effect as large as possible. For two reasons: 1. To make the effect as clear as possible. 2. To have the most beneficial possible baseline (a baseline to which I will return many times). I foresee doing an experimental design like this: baseline (n days), something else 1 (n days), baseline (n days), something else 2 (n days), baseline (n days), something else 3, and so on. During those many baseline days I want the effect to be as strong as possible.

Science in Action: Omega-3 (measurement improvement)

I’ve learned a few things. As some of you may know, I’ve been measuring my balance by standing on a board that is balanced on a tiny platform (a pipe plug) — pictures here. Now and then the board would slip off the platform. I supposed this was a failure of balance but I wasn’t sure, especially if it happened as soon as I stood on it. So I got another board into which my brother-in-law kindly drilled the perfect-size hole so that the plug will never slip:

New board (with hole for plug)

To see if this made a difference I did an experiment with a design I have never used before but that I really like: ABABABAB… (one day per condition). In other words, Monday I tested my balance with the old board, Tuesday with the new board, Wednesday with the old board, Thursday with the new board, etc. Simple, efficient, well-balanced. Here are the results:

new board vs. old board

The red line is fit to the red points, the blue line to the blue points. The two lines are constrained to have the same slope.

Well, that’s clear. I expected my balance to be better with the new board, actually.

Speaking of the unexpected, I made another measurement improvement that truly surprises me — the surprise is that I never did it before. When I looked at my early balance data (the first 10 or so days of data) I saw that my balance improved for the first 5 trials and was roughly constant after that. Each session was 20 trials so I dropped (excluded) the first 5 trials from my analyses — considering them “warm-up” trials. I took the mean of the last 15 trials. That seemed very reasonable and I thought nothing of it.

Recently I asked again how performance changes over a session. The answer was a bit different: I found that performance improved for the first 10 trials. Now there are 30 trials in a session, so dropping the first 10 of them seemed okay. And that’s what I did.

But then I looked at how variability changed over a session. I expected the earliest trials to be more variable than the rest but the data didn’t show that. Variability was pretty constant from the first trials to the last. Hmm. Maybe I am losing valuable information by not including those early trials in my averages. It occurred to me: why not allow for the warmup effect by modelling it, rather than by excluding it? (Modelling it meaning estimating it and then subtracting it.) I did that, and then I looked at the size of the standard errors of the means (standard errors based on the residuals from the fit) for the most recent 40 days — essentially, the error in measurement. Here is what I found. Median standard errors:

First 10 trials (out of 30) excluded: 0.073
First 5 trials excluded: 0.064
First trial excluded: 0.061
No trials excluded: 0.059

My eyes opened wide when I saw these numbers. Oh my god! I was throwing away so much! A reduction in error from 0.073 to 0.059 — that’s 20% better.

The Berkeley School Lunch Program: Correction

After I mentioned that the Berkeley lunch program was in poor shape, Ann Cooper, the chef in charge, invited me to visit — to set the record straight. It was quite an opportunity; the Berkeley lunch program, some hope, will become a model for the whole country. This is why there was a long New Yorker article recently about what Cooper is doing.

Chef Ann Cooper

Spending about $1/day more per student, Cooper has shifted the lunch menu far away from the heavily-processed and factory-made food of most school lunches. Far more of the food is cooked in the district kitchens, albeit days in advance in some cases. I took Cooper’s word for it that the students actually eat the new food. This is a great improvement, in my opinion. The big questions are whether these changes are sustainable and what effect they will have.

The single best thing you can do for your health is to eat healthy food (the exact nature of which has yet to be determined, but you get the idea). Obesity, diabetes, heart disease, cancer, stroke — all the big American health problems are made much worse by the crummy diet of most Americans. Will Cooper’s improved lunches cause her lucky diners to eat better as adults? If so, $1/day is a great bargain compared to health care costs. (She estimated these changes will cost $2/day across the country.) Will Cooper’s improvements reduce obesity and diabetes? That is obviously the hope.

I wouldn’t say the Berkeley school lunch program is in trouble or in poor shape; I would say it is in limbo. Four things are big question marks:

1. Cooper seemed to be working very hard and not quite enjoying it. Even after a year on the job. This is not a good sign. Her salary is being paid by the Chez Panisse Foundation — not a good sign. She struck me as incredibly dedicated but how much failure and frustration can she and the Chez Panisse Foundation bear? This sort of thing is often much harder than anyone imagines in the beginning.

2. Obesity is a big big issue. Whether the new food will help is unknown. Cooper seems to take it on faith that her food will be less fattening. I am less sure. As anyone who has read The Shangri-La Diet knows, I believe that American food became really fattening not because it was processed or “unhealthy” but because of the increasing popularity of foods that tasted exactly the same each time (e.g., microwave entrees). If she cooks the same recipes again and again, the hoped-for weight loss may not happen. If it doesn’t, will the program continue? Or will $1/day be seen as better spent on something that hasn’t yet failed, such as more physical ed?

3. The effects of Cooper’s changes are going to be measured by UC Berkeley School of Public Health researchers. As far as I could make out, the comparison will be between Berkeley students and students in another school district. You have n = 1 (1 school district) in the experimental group and n = 1 (1 school district) in the control group. This is better than nothing but, given the importance of the question — can better school lunches improve health for the rest of a student’s life — and our great ignorance as to its answer, it is scary bad. It will be so easy to reach the wrong answer. Researchers with this sort of design often act as if they have hundreds of subjects in each group — each student is treated as a different and randomly-assigned subject. This isn’t just false, it’s misleading.

4. While I am sure the researchers can measure obesity, I am less sure they will do a decent job of measuring changes in attitudes toward food. It is not a typical public health question.

Chef Ann Cooper at work

I am very optimistic about the future of food — and therefore health — in America, but it’s because of (a) the Food Network, (b) the growth of farmers’ markets, and (c) the success of Whole Foods and similar stores. Not to mention Rachael Ray. Americans are becoming food connoisseurs, starting to catch up with a large chunk of the rest of the world, such as China. The American increase in connoisseurship is trickle-down — from rich people to everyone else. Like cell phones, like TVs, like literacy, like many things. Whereas Ann Cooper is working in a school district that has lots of poor people. Not a good place to start this sort of revolution.

Addendum: This article in New York magazine reminded me that Ann Cooper’s previous job was at an expensive private school. So maybe it is another case of trickle-down after all.

Is Sugar Fattening?

In 1987, Dr. Israel Ramirez, a researcher at the Monell Chemical Senses Institute, whose research led to the theory behind the Shangri-La Diet, questioned the prevailing assumption that sugar causes obesity in humans. Rat experiments did not support such a simple idea, he pointed out.

The most recent issue of the American Journal of Clinical Nutrition has a review article that agrees with Ramirez (but, alas, does not cite him). Now there is clinical evidence that Ramirez was right. From the abstract:

Numerous clinical studies have shown that sugar-containing liquids, when consumed in place of usual meals, can lead to a significant and sustained weight loss

Maybe the Shangri-La Diet isn’t so crazy.

How To Do Experiments That Generate Ideas

A few days ago a graduate student in economics asked me what I thought of behavioral economics. On the positive side, I said, some of the phenomena are impressive. For example, the endowment effect, which is so strong I would demonstrate it in class. On the negative side, none of the researchers use experiments to generate ideas. They don’t merely not do it; they seem unaware of the possibility of doing it. The graduate student wondered how it can be done. I said there were three main ways:

1. Do something extra. Do a little more than necessary so that your experiment tells you about something that isn’t the focus of interest. For example, vary a factor that you think is not important. This is Saul Sternberg’s idea. I did this in my peak-procedure experiments: measured how long rats held down the bar. This was irrelevant to the purpose of the experiments, which was to understand how rats measured time. These measurements greatly surprised me. For years, I misunderstood them. Eventually they led to a new line of research about the control of variability.

2. Measure a function, not a point. Ask how your treatment changes a whole function, not just this or that numerical measure. This is what I did in my peak procedure experiments: The experiments generated for every condition an entire function showing response rate as a function of time. I saw how treatments changed the entire function. This talk describes some of the new ideas this led to.

3. Make your experiment easy and fast. The easier and faster it is, the more you can do it in lots of variations. Our ignorance of behavior being great, some fraction of these are likely to generate unexpected – and therefore inspiring — results. This is one reason self-experimentation is good for generating ideas: It is easy and fast.

I am not aware of any other written answers to this question, strangely enough.

How Good is the Atkins Diet?

A new study, just published in JAMA, compares several popular diets: Atkins, Zone, Ornish, and LEARN (a conventional-wisdom-type diet based on “national guidelines,” according to the paper). The Atkins diet did much better than the other three. The results were quite a bit more positive for Atkins than an earlier comparative study where compliance was poor, weight loss was minimal, and no diet was clearly better than the rest. The Atkins Company, not surprisingly, is pleased with the new study; they have put it in their research library.

Here is what the researchers concluded from their data: “A low-carbohydrate, high-protein, high-fat diet may be considered a feasible alternative recommendation for weight loss” (from the abstract — the meaning of “alternative” is not explained).

However, a graph in the paper (Figure 2 for those of you with access) makes a very important point that the researchers don’t mention: Persons on the Atkins diet weighed more after 12 months on the diet than after 6 months. After 6 months, in other words, the lost weight was coming back. The regain is not small: From Month 6 to Month 12 the Atkins dieters regained about one-quarter of the weight they had lost. At the end of the study (Month 12), they had lost about 10 pounds.

My interpretation is that the Atkins Diet works for two reasons: 1. The food is new. The flavors of the new food are not yet associated with calories. The novelty wears off, of course. This is why some of the lost weight was regained. 2. High-glycemic-index foods (such as bread and potatoes) are eliminated. This produces permanent weight loss, but not a lot. When I started to eat low-glycemic-index foods I lost 6 pounds, which I never regained. A 6-pound loss is not terribly different from the 10 pounds (average) lost by study participants.

In a newspaper article, the study’s lead author mentioned the regain:

As the study progressed, [Christopher Gardner, an assistant professor of medicine at the Stanford Prevention Research Center] said, some dieters put back on some of the weight they had lost early in the year.

That’s misleading. It wasn’t “some dieters” — it was a trend shown by the whole group. But at least he (kinda) mentioned it.

Jane Jacobs on College

Jane Jacobs, the urban and economic theorist, wrote:

Only in stagnant economies does work stay docilely within given categories. And wherever it is forced to stay within prearranged categories — whether by zoning, by economic planning, or by guilds, associations or unions — the process of adding new work to old can occur little if at all.

In the case of college, the “work” is post-high-school education. College students are not forced to join a union but the need for credentials forces them to attend college, where, as Jacobs correctly predicts, a narrow range of subjects is taught in a narrow range of ways. Take my department (psychology at UC Berkeley). As one of my students, a psychology major, asked, why isn’t there a course about relationships? That’s what’s really important, he said. Yes, why not? There has never been such a course at Berkeley nor, to my knowledge, at any other elite university. What a curious omission. And why do practically all classes involve lectures, reading assignments, and tests? Aren’t there a thousand ways to teach and learn? I think Jacobs has the answer: Work has been forced to stay within prearranged categories — categories that seem increasingly outdated. The pattern of chapters in almost all introductory psychology textbooks (which cost about $100) derives from the 1950s!

An earlier post by me about college. Other people’s comments. Jane Jacobs on the food industry and scientific method.

Andrew Gelman Interviews Me About TV and Mood

Andrew did this interview for Stay Free!, a magazine about media and consumerism, in 2000. They didn’t publish it.

AG Why don’t you start by describing your method of using TV watching to cure depression?

SR To feel better, you watch faces on TV in the morning and avoid faces (televised and real) at night. TV faces are beneficial in the morning and harmful at night only if they resemble what you would see during an ordinary conversation. The TV faces must be looking at the camera (both eyes visible) and close to life-size. (My experiments usually use a 27-inch TV.) Your eyes should be about three feet from the screen. Time of day is critical–if you see the TV faces too early or late they will have no effect. The ave contact with other people has a big effect on when we are awake; and (c) there are many connections between depression and circadian rhythms. Depression is closely connected with insomnia, for instance.

AG I generally think of TV as an evil, addictive presence in American life. Do you think there’s something dangerous about giving TV this “badge of approval” as a medical treatment?

SR It’s not quite a “badge of approval.” Seeing faces on TV at night–which of course is when most people watch–is harmful, my research suggests, if the faces are close to life-size. And they often are. Maybe TVs will be made with variable picture sizes–one size for morning, another size for night. When I watch TV at night (very rare), I stay as far away as possible.

AG I mean, if this method really worked, I could imagine the Depression Network running talk shows in the morning that are basically infomercials for Prozac or whatever. Would you worry about that?

SR No. I watch faces on TV every morning and would appreciate more choice. I suspect the morning shows would not be Prozac infomercials, however, because the people watching would not be depressed.

One thing that bothers people about your plan is the idea of TV as a substitute for human contact. I think that most of us–even people who spend a lot of time watching TV–find this idea upsetting. It’s like “Brave New World” and virtual reality. Are you at all bothered by recommending to depressed people that they sit inside watching TV?

“Substitute for human contact”? True, but why is that so bad? Reading–which TV critics, many of them writers, seem invariably to like–is also a substitute for human contact, of course. Agriculture is a substitute for hunting and foraging. Vitamin pills substitute for food. Civilization is all about substitutes–about being able to fulfill needs in many ways.

Still, I think watching faces on TV in the morning is only a partial solution to the problem of depression, just as nutritional supplements (e.g., iodized salt, folate added to flour) are only partial solutions to the problems caused by a poor diet. A fuller solution would include changing when most people work. The usual pattern is work (morning and afternoon) then socialize (evening). A better pattern would be socialize (early morning) then work (late morning to early evening)–and go to bed early. I do my little bit for the revolution by inviting friends to brunch rather than dinner. The revolution would also include picture phones with life-size faces.

I heard you say once that depression is ten times as common now as it was 100 years ago. Where do you get that information from?

Many articles have made that point. One of them is: Klerman, G. L, & Weisman, M. M. (1989). Increasing rates of depression. Journal of the American Medical Association, 216, 2229-2235.

If depression is a consequence of modern life, do you think there’s something strange about seeking a technological solution for it? It’s sort of like saying, people are too atomized, so let’s solve the problem with even more solitude?

It is one of many technological solutions to problems caused by “atomization”–people being farther apart. Telephones, air travel, and email are other examples. So it isn’t strange. If my subjects are any guide, watching TV for an hour every morning would not increase the solitude of most depressed persons. They are already alone during that time.

Would listening to the radio be OK?

No. You have to see faces.

Have you ever tried to get your research sponsored by TV stations or networks or, for that matter, a publication like TV Guide?

No, but I once put a “TV is good” ad (ABC) on my bulletin board.

Andrew Gelman on Web Trials and the Shangri-La Diet

Andrew Gelman is a professor of statistics at Columbia University. Years ago we co-taught a seminar about left-handedness. His blog. This interview took place via instant messaging in February, 2007 and has been edited slightly.

SR I want to ask your opinion of web trials. People go to a website where they choose or are randomly assigned a treatment. Then they come back and report the results.

AG Then the records of their choices and outcomes are made publicly available.

SR Yes. And there would probably be some summary of the results prepared by experts. It wouldn’t be just raw data.

COMPARING TO CURRENT STATE OF THE ART IN MEDICAL RESEARCH

AG We could compare to the current state of the art in medical research, which I think is to have some moderately large randomized clinical trials, each of which is published in a journal, followed by a meta-analysis of these trials. A difficulty with the current state-of-the-art is that sample sizes in clinical trials seem to be simultaneously too small and too large. Too small in that results tend to be just barely statistical significant (and often not significant for subgroups), so that you can’t really put your faith in one study, hence the need for meta-analysis. Too large in that each study is unwieldy, takes a huge amount of effort and doesn’t allow for much learning and experimentation during the study.

SR. A famous epidemiologist [Richard Doll] once said that if the effect is strong, you don’t need a big study.

AG In some way the high cost is a good barrier in that people have to think seriously and justify what they want to do. On the other hand, within any particular research plan, it would seem to limit the possibility for innovation.

Speaking generally, a challenge is to integrate clinical judgment (including ideas of experimentation and trying different things with different patients) with scientific goals such as replicability.

Also, there are well-known cognitive illusions in clinical judgment, which is what motivates the evidence-based-medicine movement (for randomized trials, public records of data, etc.) in the first place.

SR How do web trials fit into the picture you have drawn?

AG Ideally, web trials are intermediate between controlled randomized trials on one hand, and full recording of observational data on the other. If people are really volunteering to be randomized, then they follow the protocol, then this is a clean randomized expt (albeit not blinded, an issue I’d like to raise with you). In practice there will be lots of selection, dropout, measurement error, etc., which moves it toward an observational study. The dispersed nature of the data collection is similar to (in fact, more dispersed than) the idea of individual clinicians recording their experiences and outcomes into a centralized databased. That is, the data collection is dispersed, the database is centralized.

SR A web trial would have more regularity — less variation — across subjects than observations collected from individual doctors. Because everyone would get the same instructions. Whereas different doctors are obviously going to give different instructions (for the same nominal treatment).

AG Yes. That’s why I said the web trial is in between.

DIFFICULTIES WITH BLINDING

SR In the area of blinding I think a web trial would be better than the conventional double-blind clinical trial. If the goal is to guide practice. In practice patients are not blinded. Blinding is a tool to equate expectations. Better to equate expectations by comparing different treatments both believed to be effective.

AG One of the difficulties with your self-experimentation is that there’s no blinding at all. Similarly with these trials. Some of it is the nature of your treatments, but perhaps with some effort you could come up with blinded versions.

SR In my self-experimentation the expectations are equal in the different conditions, in many cases.

AG For example, consider the recent self-experiment that you describe on your blog, where you try different oils and measure your balance. I’d believe these results a lot more if you blinded the treatments.

SR Sure, blinding would help in that case, I agree. I plan to do something like that. But blinding is not necessary to equate expectations. For example, I tried many ways of losing weight. In every case I expected it to work. Some ways worked much better than others. It is this comparison of the effects of different treatments that is interesting. In general expectations cannot be very powerful or there would be no problems left to solve. Expectations are powerful in a few areas and seem to have no effect in many areas. I don’t mean we should ignore them; but to emphasize them as a big deal is not what the evidence suggests. In any case in web trials the participants would only be randomized (or choose) treatments they thought might work

AG There’s some work by Rubin and other statisticians on “broken randomized trials” which can more generally be thought of as experiments that have partial randomization.

SR I think of web trials as giving “entrants” (or subjects) a choice: to be or not to be randomized. Then when it’s all over you compare the two groups.

AG That makes sense. You’ll still have some problems: 1. People not following protocol. 2. Non-blindness of treatments. 3. Other problems, I’m sure, which I can’t think of offhand.

SR Well, these are equal for all conditions so they shouldn’t distort anything

AG In a controlled trial you can deal with some of these things: 1. In a controlled trial you can have more interactions with the experimental subjects, thus maybe more likely they’ll follow protocol. 2. In a controlled trial you can (sometimes) ensure blindness. In general, I don’t think you can get away with assuming that biases cancel out.

ANALYZING DATA FROM WEB TRIALS

AG Your web trials should give us a big juicy source of data that can be thrown at a stat Ph.D. student as a thesis project, perhaps! My intuition as an amateur sociologist of applied statistics is that an exemplary applied analysis is a good way to kick-start the study of a statistical problem.

SR What’s an example of such a kick-start? That’s an interesting point.

AG I’m thinking of the hierarchical models that were fit by Lindley, Novick, Rubin, and others in the late 1960s thru early 1980s to educational data. These provided examples for people to follow–templates–as well as demonstrations that these methods really worked. There were various interesting disciussions of these models in the stat literature, in particular I’m thinking of a paper by Rubin on law school validity studies in J. Amer. Stat. Assoc. from 1980 that had several discussants.

SR Yes, it is true that the data from web trials would be complex and interesting in new ways and accessible to everyone.

AG Yes, having available data is another plus–that’s really a new feature which should help. Now back to the warnings. A very well known example is the Nurses Health Study, an observational study that found that taking post-menapausal drugs was associated with lower heart-attack risks (and lower death rates). But when a big randomized expt was done, no association was found. Actually, taking the drugs slightly increased cancer risk, I believe. See here.

I talked with various people about this, and there are different potential explanations for the discrepancies. One story is that the women who took the drugs were otherwise healthier, more health conscious, etc.–even after controlling for whatever pre-treatment variables they controlled for. Another story is that the populations of the 2 studies were different (in particular, in their average ages), and perhaps the drugs are beneficial for some ages but not others. (Incidentally, the drugs were not originally intended to reduce heart-attack risk. This was an unexpected effect (or non-effect), I believe.)

Anyway, the people I trust on these matters (notably John Carlin) believe that the difference is because of “selection”, i.e., the drugs don’t really reduce heart attack risk. But the observational study led people to recommend the drugs. So this is a big example where the obs study was misleading.

SR: Did the randomized study conclusively rule out the effect size seen in the correlational study? or did it simply find no effect?

AG I’m not sure. My impression is that the expt actually contradicted the obs study–a stat signif negative effect for one, and a stat signif positive effect for the other–not just that there was significance for the expt and no signif for the obs study–but I never really looked into it.

SR I’d like to return to the issue of blind vs don’t blind. You believe any experiment where subjects are not blind to the treatment has a problem?

AG Yes, if knowledge of the treatment could affect the outcome (for example, through motivation). I worry about it for your diet and depression studies.

SR Well, in much research the first question is whether there is a useful effect. later experiments deal with mechanism. I was under the impression that what matters is to equate expectations across conditions and that blinding is just one way to do this.

AG Maybe you’re right, I’m not actually up on this literature. I know that Paul Rosenbaum has written about it.

** MORE ON BLINDNESS: CONSIDERING THE SHANGRI-LA DIET **

AG My knowledge of it is not particularly sophisticated. For your diet and depression studies, there are obvoious stories based on motivation.

I wouldn’t go so far as some people and simply dismiss your results. But the concerns are natural, I think. It’s a little different than the problem with the Nurses study. Here I’m worried about motivation, there the issue was selection.

Although there’s a possible selection problem in your study too, in that the people (including you) doing the Shangri-La Diet might be those who are ready to try something new and lose weight.

SR There are a lot of people who are always ready to try something new and lose weight.

AG Again, this could be tested with a blinded study. For example, half the people get the oil apart from a meal, half get the oil with the meal. Not that this would solve all problems of interpretation. . . .

For example, Caroline thinks that your diet works, but that the reason why it works is that it stops people for snacking for a 2-hour period (before and after the oil) and also focuses people on their snacking.

SR If anyone thinks that — and it is a perfectly reasonable thing to think if you are just starting to learn about it — then they can replace the oil with water and see if they continue to lose.

AG To answer your comment (”there are a lot of people who are always ready to try something new and lose weight”): yes, I remember you saying this before, and this is a big reason I wouldn’t dismiss your results immediately. But, still, people willing to try this wacky new thing might be special (on average). To put it another way, I expect there were similar successes with people trying Scarsdale, Atkins, etc.

SR I’m sure that people who try my diet are unusual early adopter types. I think Atkins has some truth to it — some reasons it would actually work. I don’t know enough about Scarsdale to comment. My theory says that merely changing what you eat (to foods with unfamiliar or at least less familiar flavors) should lower your set point.

AG Sure, but you had another point which was that these were people for whom nothing worked before. I was just using these diets as examples of other things that worked when nothing worked before. It relates to the historical perspective of new diets as things that will work for a few years before burning out. Possibly because the new diets can motivate people.

SR I tend to think they burn out because the new food becomes familiar.

AG I’m not saying that this is necessarily true of your diet–yours might be different–I’m just giving a historical control to give insight as to how there could really be motivational issues.

SR That’s true, research to distinguish my explanation of the burn out and a motivational one could be done but of course hasn’t been.

AG Your story, “they burn out because the new food becomes familiar”, is plausible. It’s also plausible that it’s easier to motivate yourself with a plan that’s new and different.

SR I hope there will be studies of whether the theory behind my diet is correct. These would essentially be studies that test the prediction that familiarity matters. This is a prediction that other theories do not make.

AG Yeah, based on reading the appendix to your book, there’s still some research synthesis that needs to be done (presumably with the help of animal studies).

SR I agree.

BACK TO WEB TRIALS

SR Web trials are relatively early in the research chain and they are relatively practical. In these cases you don’t worry a lot about mechanism, you worry much more about efficacy — is there an effect?

AG Regarding the analysis of web trials, it would be interesting to look at other examples of partially randomized experiments. Rubin and Hill and others worked on a study of school choice where they looked into some of these issues. It was a study that randomized some aspects of which kids went to which schools, but parents had some choices too.

In medicine and also in economics/public-policy, there has been a lot of interest in recent years in trying to get inside this sort of study rather than just relying on the “intent to treat” or explicit randomization.

SR “get inside this sort of study”–what do you mean?

AG: I mean, look at what treatments are actually chosen by the individuals in the study, not just looking at what treatments they were assigned to.

SR Could you sum up why you like the idea of web trials?

AG 1. Lots of data. 2. Motivates people to randomize, to apply the treatment, and to record results. 3. More generally, gets people involved in the project as participants, not just “subjects”

SR Those are good points, thanks.

AG Thank you for giving me the opportunity to think about these things. I’m still struggling with the question, “Are medical experiments too small or too big (in number of subjects)?”. As discussed here.

The Trouble With College

Yesterday I heard something — a very ordinary bit of info — that neatly summed up the trouble with college. Someone told me about a friend of hers who was a graduate student in English at Berkeley. Her friend taught a small class of freshman and sophomores. He was enthusiastic about what he was teaching, but his students were not. He couldn’t make them enthusiastic, even a little. They just sat there. When I started teaching at Berkeley, I had a similar experience. My first class was introductory psychology. Over the first few months, I came to see that my students, almost all of them, had different interests than me. I thought X and Y were fascinating; they didn’t.

No one is at fault here, of course. It’s perfectly okay that the grad student enthused about something that leaves his students cold. It is perfectly okay that I liked Research X and Y but Research X and Y bored my students. Nothing wrong with any of this — in fact, we need diversity of thought and knowledge, which grows from diversity of interests. We need diversity of thought and knowledge because we have many different problems to solve.

At fault is a system (Berkeley and similar colleges) that fails to value that diversity. (In fact, it doesn’t even notice the diversity, except in a one-dimensional way: how much students resemble their professor.) Even worse, the system tries to reduce diversity of thought because it tries to make students think like their professors. Why should the 20 (or 800) students in one class be forced to learn the same material? The students vary greatly. Forcing all of them to learn the exactly same stuff is like forcing all of them to wear exactly the same clothes. It can be done, especially if rewards and punishments (i.e., grades) are used, but it’s unwise. Just as feeding children a poor diet stunts physical growth, forcing college students to imitate their professors, instead of letting them (or even better, helping them) grow in all directions, stunts intellectual growth.

I wrote about these issues here and gave a related talk about human evolution. Aaron Swartz and I have ideas about a better way, and how to get there, which I will blog about. I will tell a 10-minute story about this as part of the Porchlight story-telling series on March 26 (Monday), 8:00 pm, Cafe du Nord, 2170 Market Street, San Francisco ($12 admission).