Web Trials Update

At the Shangri-La Diet forums, SLDers — more than a hundred of them — have been posting their weight for many months, thanks to Rey Arbolay. No similar data is available for any other weight-loss method, as far as I know.

The main weakness of the SLD data is lack of comparison. This led me to propose web trials — a hybrid of the SLD data collection and a clinical trial, where there is always a comparison (at least two treatments, or treatment and control). After I interviewed Robin Hanson about them, a British student programmer named Andrew Sidwell contacted me and offered to set up a website to allow web trials to be done.

How exciting! A website that does web trials will allow cheap, easy, testing of many solutions to many problems. Although clinical trials usually involve medical problems, web trials can be used to study anything, as Robin pointed out. Andrew and I plan to start with procrastination.

Saul Sternberg on Research Design

No one has had more effect on how human experimental psychology is done than Saul Sternberg, a retired professor of psychology at the University of Pennsylvania. In the 1960s he introduced a memory-scanning task in which the subject responds as quickly as possible whether a probe item (usually a digit) is on a short memorized list. The main measure was reaction time (RT) — the time between when the subject saw the item and his response (”yes” or “no”). The linearity of the RT-vs-set-size function and the equality of the “yes” and “no” slopes suggested that search of the memorized list was serial and exhaustive (exhaustive meaning that the whole list was searched even when the target was found before the end). Before this heavily-cited work (published in Science and American Scientist), RT experiments were rare; after it, they were common.

SR: Where did your first ideas about research design come from?

SS: While I was a graduate student [(at Harvard in the 1950s)] I read Williams James’ Principles of Psychology, which increased the curiosity I had developed about mental processes from my introspections. Although I wasn’t working on those questions when I was a grad student, I became interested in them. I wrote down ideas for experiments — experiments on short-term memory, for example. Experiments to answer questions about things I observed about my mental processes while engaged in writing and reading and other everyday activities. At the time I was doing theoretical work on learning models that apply to both animals and people. [Stochastic Models for Learning by Bush, Sternberg’s advisor, and Mosteller was published in 1955.] I didn’t collect much data as a graduate student, after my interests turned to learning models. Earlier I had collected data in social psychology. I recall putting a lot of effort into an experiment on small group interactions. What was novel was that we were recording interaction events in real time by punching IBM cards.

I learned how to be an experimental psychologist during my first teaching appointment that started in 1960 at Penn, when I co-taught a laboratory course on experimental psychology. I taught it with Bob Teghtsoonian and Jack Nachmias, who became my teachers. In those days, many students took a lab course after Psych. 1. The course required us to develop experiments for undergraduates to do. Questions that arose in those experiments went beyond available knowledge. And these questions led to some actual research. In 1962 Bob Teghtsoonian and I gave a paper at EPA on all-or-none versus gradual learning of response components, in which we reported tests of two models. And in 1963 Jack Nachmias and I gave a paper at the Psychonomic Society in which we reported our application of signal detection theory to data on recognition memory that we had collected.

SR: You started your memory-scanning experiments after that course?

SS: That’s true. I started them during my second year at Penn. I was still working on learning models. I was also supervising a graduate student whose research had to do with short-term memory. Not RT experiments, however. What got me interested in RT experiments was work by Ulric Neisser measuring search times. He measured visual search times as a function of number of targets for which you search. [Neisser’s subjects searched a visual display — e.g., of digits or letters — for the presence of one or more digits or letters. The main measure was how long it took to search the whole display.] It’s like a crude RT experiment. You’re not measuring how long it takes to make one decision, you’re measuring the time to make many decisions. I was skeptical about his conclusions so I thought it would be worth measuring how long it took to make a decision about one visual item. That led to the memory-scanning experiments. I did several of them, to help choose among alternative interpretations, before I reported the results. I gave a paper on those experiments at the meeting of the Psychonomic Society during the summer of 1963.

Previous post in this series: Brian Wansink on research design.

Science in Action: Omega-3 (background)

The omega-3 story began with the circulatory system. In the 1960s, two Danish scientists wondered why Eskimos rarely die of heart disease. Could the answer explain the sharp decrease in heart disease mortality in Norway during World War II? In spite of this promising beginning, the heart and mortality benefits are still not clear. A 2006 meta-analysis of heart disease studies concluded that “omega 3 fats do not have a clear effect on total mortality, combined cardiovascular events, or cancer.”

You can find lots of recommendations to consume omega-3 fats in various forms — fish, supplement, and so on. On the other side, Marion Nestle, the author of What To Eat, seems to believe the advantages claimed for omega-3 are “ hype.” Most researchers are less certain. From a recent New York Times article about Martek, a company that makes an omega-3 food supplement:

“A lot of the claims made for DHA [a form of omega-3] are in the realm of hypotheses,” said David Schardt, senior nutritionist at the Center for Science in the Public Interest, an advocacy organization based in Washington. “They are certainly worth pursuing, but there’s not yet enough proof to warrant telling people to go out of their way to take DHA.”

The exceptions, Mr. Schardt said, are people with a history of heart disease and premature infants, who need an extra boost of DHA for proper brain and eye development to compensate for their early exit from the womb.

Martek’s scientists, when pressed, generally agreed with Mr. Schardt. The data showing any health benefits of DHA beyond those related to the heart or premature infants, while encouraging, is not quite conclusive, they say.

The typical experimental study of omega-3 takes two groups of people with a pre-existing problem, gives one group omega-3 and the other group a placebo, and measures outcomes several months later. A 2005 study in Pediatrics, for example, compared two groups of children (n = about 60/group) with Developmental Coordination Disorder. Most of them had ADHD. One group was given an omega-3 supplement; the other group was given a placebo. The children were tested before treatment and after three months of treatment. (The reading, spelling, and behavior scores of children in the supplement group improved more than the scores of children in the placebo group.) Studies like this are hard.

In summary, there is considerable uncertainty about the effects of omega-3; and the methods used to reduce that uncertainty are slow and difficult. This is why self-experimentation might help.

My recent data. The Queen of Fats (2006) by Susan Allport, a science writer, is an excellent introduction to the subject.

Science in Action: Omega-3 (balance results)

Because many SLD dieters reported better sleep, I wondered if omega-3 improved sleep. I increased my omega-3 intake by switching from olive oil, which has little omega-3, to walnut oil and flaxseed oil, which have much more — especially flaxseed oil. The amount of oil stayed roughly the same. The night after the change, my sleep got better. To my surprise, so did my balance. The next morning, I found I could more easily put on my shoes while standing up. I had been putting on my shoes standing up for 2-3 years and it had never been this easy. (I put on my shoes standing up because I thought it might improve my balance.)

I devised a simple measure of balancing ability. I stood on one foot on a platform balanced on a small metal cylinder (a pipe plug). (I will post pictures.) The parts were easy to find. I tried cylinders of different sizes until the balancing was neither too easy nor too hard. The measure was how long I could stand on one foot on the platform, which measured with a stopwatch. I made these measurements in blocks of 20 (the first 5 were warmup, leaving 15).

My early attempts had two problems: (1) The dose was too low. I had been taking the flaxseed oil as capsules (10 1000-mg capsules/day). I started taking 1 T/day in liquid form (much faster). Then I increased the amount of flaxseed oil/day from 1 T to 2 T. My sleep improved: I woke up more rested. Because the sleep effect was now perfectly clear, I thought measuring the effect on my balance would be a good idea. (2) Practice effects were too large. How well I could balance depended on how often I measured my balance. To avoid practice effects, I measured my balance no more than once/day.

I did a baseline period of several days; then I replaced the walnut oil and flaxseed oil with the same volume of sesame oil, which is low in omega-3. I continued this period until the effects seemed beyond doubt. Then I did another baseline period with the original amounts of walnut and flaxseed oil.
Effect of Type of Fat on My Balance
Here are the balance results. Each point is a geometric mean over 15 trials. The bars are standard errors. After one day, my balance got worse with sesame oil. When I returned to the high-omega-3 oils, my balance returned to its baseline level. To measure the clarity of the effect, I compared the 17 baseline days with the last 4 sesame-oil days. This gave t (19) = 4.1. A very clear effect.

I made this graph in a cafe. The person sitting next to me asked what I was working on. I showed her the graph. I explained that I measured my balance as a way of measuring how well my brain was working. The results suggested that the type of fat in my diet affected how well my brain worked. She said the results were very interesting because most people will have diets closer to sesame oil than walnut oil and flaxseed oil. Many people will be interested in these results, she said. I hope so, I said.

I will post later on the background of these results, the questions they raise, and procedural details. If you can’t wait, read the posts in the omega-3 category. If you are interested in doing a similar experiment, please let me know.

The Benefits of Theory: Crazy Spicing and B. F. Skinner

Someone has written me that she is doing well with the Shangri-La Diet by doing only crazy-spicing — adding random spices to everything. She’s not doing anything else — no oil, no sugar water, etc. My reaction is: Take that, B. F. Skinner!

In 1950, Skinner published a paper called “ Are Theories of Learning Necessary?” which revealed that he did not understand the value of theories. In 1977, he wrote a similar paper called “ Why I am not a Cognitive Psychologist,” which showed he still did not understand their value. In the later paper he wrote:

I am equally concerned with practical consequences. The appeal to cognitive states and processes is a diversion which could well be responsible for much of our failure to solve our problems.

The value of crazy spicing would never have been discovered without a theory. Without a theory, you’d never try it. It would never be discovered by accident.

Robin Hanson on Web Trials (my comments)

Yesterday I posted Robin Hanson’s comments on web trials. My comments on his comments:

1. I think Robin is right that it would be hard to get most people to allow themselves to be randomized. But I also think it doesn’t matter much. The important thing is to improve on existing methods of evaluation. Randomization of subjects to treatments isn’t an end in itself, of course. The goal is to reach the right answer: Learn which treatment works best. I think if you have what might be called a “level playing field” or a “fair comparison” (the various treatment alternatives are presented “equally” — e.g., as equally likely to work, equally attractive, equally high on a list) it will be hard to imagine how the results will be on average worse than nothing. The site can record data about each subject (age, sex, etc.) and the results can be analyzed using those factors — another way to equate subjects across treatments and to help each person decide what would be best for him or her.

2. Excellent point that web trials could be used for evaluation of any advice. Maybe it would be better to start with a non-health problem. Something where the effects are quick and easy to measure.

3. I like the Wikipedia comparison. All-to-all institutions — institutions that help connect everyone to everyone — are ancient and have been very important. Markets and money may have been the first. If I pay Sam $5 for X, and then Sam pays Peter $5 for Y, Peter and Sam have traded X and Y. Money has made this much easier. Democratic institutions allow everyone to govern everyone. Banks allow everyone to loan money to everyone. Books allow everyone to teach everyone. Wikipedia makes all-to-all teaching much easier. Web trials allow everyone to help everyone solve any problem where data would help. As Robin says, Wikipedia suggests that people will participate in all-to-all institutions when there is no obvious reward for doing so.

Robin Hanson on Web Trials

I recently asked Robin Hanson, a professor at George Mason University, what he thought of web trials. Web trials are a way to learn how to solve difficult health problems (e.g., acne, obesity). By web trial I mean a web-based collection of data that compares different ways of solving a problem. People with the problem would go to a website, sign up for one of the treatments, follow the directions, and report the results in a standardized format. For example, a site might compare three acne treatments (treatments that anyone can try, such as dietary changes or over-the-counter medicines). The cumulated results would gradually show which treatment works best — a thousand times more efficiently (sooner, cheaper, more easily) than a clinical trial (which no one would finance because there is no profit to be made). Web trials are halfway between clinical trials and the data collection now going on at the Shangri-La Diet (SLD) forums, where people post their progress on SLD.

I asked Robin because he has pioneered a similar improvement: Prediction markets are often far better than what they replace. And his core political affiliation is “I don’t know.”

Here is a summary of what he said.

1. A selection effect is a big concern. Do people wait to report back until after it works? There is always going to be the issue of sampling, selection bias for people who stay with it.

2. How could you get people to allow you (the website) to choose for them which treatment to do? That would be the hard thing. Perhaps the website could say: “would you like to see what our advice for you is?” At most you could get randomization for your advice.

3. It doesn’t have to be restricted to health problems. It could be used to test all sorts of advice. You could just get data about what happens when people do or don’t follow some advice — romantic advice, for example. Very rarely do we have randomization in choices. When we do, we call them natural experiments. In medicine, researchers have used practice variation (variation from one doctor to the next) to look at effectiveness.

4. Perhaps you could get people to commit to this the way they do to Wikipedia. The goal would be: Let’s understand humanity — a noble cause. Let’s be part of a grand project to do this.

Robin blogs at Overcoming Bias. Tomorrow I will comment on Robin’s comments.

Methodological Lessons from Self-Experimentation (part 4 of 4)

6. Curiosity helps — because it provides a wide range of knowledge. Pasteur made a similar point when he said luck favors “the prepared mind” by which he meant the well-stocked mind. To come up with my theory of weight control you needed to know both obesity research and animal learning because the theory is based on basic facts about weight control and basic facts about Pavlovian conditioning. I knew the weight control facts because I had taught introductory psychology and lectured on weight control. I knew the basic facts about Pavlovian conditioning because my graduate training was in animal learning. It was unusual to know both sets of facts. Few obesity researchers knew much about animal learning; few animal-learning researchers knew much about weight control. The same thing happened with my mood research: Facts that I had learned from teaching introductory psychology showed me that my findings made sense and were important. I had taught introductory psychology because I was curious about psychology.

These two examples (weight control, mood) surprised me. I may have heard this point made a few times but I didn’t know any examples. Since then, however, I have come across examples not involving me that make the same point. Luca Turin is a biophysicist who has come up with a far better explanation of how the nose works than any previous theory. His recent book The Secret of Scent tells the story. “In order to solve the structure/odor problem,” he wrote, “you need to know at least three things: (a) biology, (b) structure and (c) odor. Each of these three things taken individually is not difficult” (p. 166). The problem had gone unsolved because no one before Turin knew all three.

7. Publish in open-access journals. Because my long self-experimentation paper was published in an open-access journal, anybody could read it within minutes. My friend Andrew Gelman blogged about it, which caused Alex Tabarrok at Marginal Revolution to mention it. This brought it to the attention of Stephen Dubner, who with Steven Levitt wrote about it in their Freakonomics column in the New York Times. That led to a contract to write two books — one about weight loss, the other about self-experimentation in general. That anyone could download my paper made it spread much faster. In the old days, with photocopies and libraries and mailed reprints . . . no talk tonight.

A summing-up, if you want to figure something out via data collection: 1. Do something. Don’t give up before starting. 2. Keep doing something. Science is more drudgery than scientists usually say. 3. Be minimal. 4. Use scientific tools (e.g., graphs), but don’t listen to scientists who say don’t do X or Y. 5. Post your results.

Read Part 1, Part 2, and Part 3. You no longer need to register to comment. My talk Tuesday night (tomorrow Jan 9) 7:30 pm at PARC (Palo Alto) is open to the public.

Methodological Lessons from Self-Experimentation (part 3 of 4)

4. There are serious defects in the way science is usually done. I found a new and powerful way of losing weight — yet I’m an outsider to that area. Although obesity is a huge problem, and hundreds of millions of dollars go into obesity research every year, I was completely outside that group of people and resources. If science is being done properly, there should be a relation between input and output — the more input, the more output. That failed here. Professional obesity researchers, given vast input, failed to discover this; whereas I, given zero input, managed to do so. You might say this was a weird fluke except the same thing happened again with mood: I discovered a powerful way of changing mood, even though I was an outsider to the study of mood. Depression is a huge problem, vast resources go into trying to do something about it.

What the serious defects are has no simple answer. After the next lesson learned I’ll try to explain what I think is wrong.

5. There are serious strengths in the way science is usually done. I relied heavily on conventional science and could never have gotten where I did without it. Ramirez and Cabanac did brilliant research. I say there are serious strengths in conventional science because I used conventional scientific methods and conclusions to find a new solution to a serious problem — obesity is a serious problem. I didn’t just use conventional scientific tools; I also used self-experimentation, which is unconventional. But self-experimentation alone wouldn’t have gotten very far, I’m sure. The turning point in my weight control research was reading a paper by Ramirez about rat experiments. Not only did I use a vast number of conclusions from conventional science, I also used conventional experimental designs and standard, common tools for data analysis, such as programs for plotting data.

To say that science is glorified common sense has a lot of truth to it. To say that science is a collection of methods to help us understand and control the world also has a lot of truth to it. But science is far more than a collection of tools; it is a whole community and culture, with beliefs as well as tools. Like any culture, many of its beliefs are based on faith.

Here is a story to illustrate what happens. It’s pure human nature. Suppose someone gave you a power saw. Your first thought is: Wow, I have a power saw. There are many things I can now do that I couldn’t do before. It seems like a pure benefit. No negatives. You learn how to use the power saw and you become better and better at using it. Eventually you start to make a living using the power saw — other people, who don’t have a power saw, pay you to saw stuff for them. You become a power-saw professional and, along with other professionals, you establish rules about how to use power saws. To save the public from bad power saw usage, you establish a licensing test to become a power-saw professional. Your view of yourself is: I know how to use a power saw. And if there is a problem to be solved, you try to solve it with your power saw — that’s what you know how to do best. All this makes perfect sense to you. Hundreds of professions have followed this path. What is hard for you to notice is that in certain ways you have become weaker — if a problem doesn’t call for power-saw usage, you are less likely to find the solution. Because you are too busy making a living using your power saw.

I hope my point is obvious. Budding scientists go to graduate school where they learn a bundle of specialized research methods that varies from one research area to the next. That is their power saw. After graduate school, they make a living using the techniques that they have learned. After graduate school, they are in better shape to make a living; but they are in worse shape to solve problems for which the techniques that they have learned are not appropriate. Conventional scientific methods could go part of the way toward finding the Shangri-La Diet; but they could not go all the way. Other techniques were needed — very simple ones, pre-power-saw. So conventional science never found it.

In Dark Age Ahead, Jane Jacobs gives another example of this. During a recent heat wave in Chicago, two nearby neighborhoods, similar in many ways, had very different death rates. A good explanation of the difference was provided by a graduate student in sociology, who used very simple very low-cost methods. In contrast, a task force of scientists from the Centers For Disease Control, with vast resources and great methodological sophistication, failed to explain the difference. They were blinded by their expertise. They failed to see that their methods weren’t working.

Read Part 1 and Part 2.

Methodological Lessons from Self-Experimentation (part 2 of 4)

3. Be minimal. In other words, do the easiest, simplest thing that will that will tell you something important and new, that will provide significant progress. This I learned by failure. In the early days of my self-experimentation — and of my rat research, too — I constantly tried to do experiments that broke this rule, and again and again they failed to work. (Slow learner.) The more complex the design, the more untested assumptions it makes. And untested assumptions, at least mine, are often wrong. I’ve been good about following this rule in my own research in recent years so to give an example of how it is broken I will describe someone else’s research. I sat in on a planning session for an experiment about asthma at a highly-ranked school of public health. The experiment was expensive — the grant to pay for it was many hundreds of thousands of dollars. There were to be about 50 families in the treatment group and 50 families in the control group. They had done some pilot work involving three families. They proposed to begin the full experiment. I suggested that they do a larger pilot experiment — maybe four families in each group. There were several professors and several more people with Ph.D.’s at the meeting. No one agreed with me. Several people explicitly disagreed: “I don’t think we need to do any more pilot work.” As it turned out, I was right. They began the full experiment and it failed miserably because recruitment turned out to be far more difficult than expected.

Almost all proposed research I hear about breaks this rule, which is fascinating in a train-wreck kind of way. I have never seen a book about research design that makes this point. As a result, I suspect that books about research design are often counter-productive: The student would have been better off if he or she hadn’t read them. The textbook teaches this or that complication to people who can barely do basic stuff. The poor student wastes time using complex designs that fail in cases where a simpler design would have succeeded.

Part 1 is here. Part 3 is here. You no longer need to register to comment.