Effect of Graphical Feedback on Productivity

After talking to Matthew Cornell a few months ago, I decided to try to measure how much time I worked. Measuring it might help me control it. I’d done this before but hadn’t gotten anywhere. Maybe this time . . .

I used R. It was easy to record when I worked. I work a while (e.g., 60 minutes), take a break (e.g., 30 minutes), go back to work, take another break, go back to work, take another break, and so on. The R programs I wrote recorded when each bout of work started and stopped. A typical day might have six bouts of work, interspersed with breaks. It was harder to write a program to show the data so I collected data for about eight weeks before I looked at it.

The display program I eventually wrote showed “efficiency” (total time spent working that day/available time that day) as a function of time of day. Each bout of work generated two points on the graph: one when it started, one when it ended. For each point, the efficiency of the whole day up to that point was computed. For example, if a bout of work started at 10 am, the efficiency for that time was how much work I had done before 10 am divided by how much time I had available before 10 am. Time available was computed from 3 am or when I woke up, whichever was later — as amusing/horrifying as that might sound. Suppose I woke up at 5 am. At 10 am, then, I had had 5 hours available to work. Suppose I had only worked between 8 am and 9 am. Then total work up to that point = 1 hour and efficiency = 20% (= 1/5). So I plot a point at (10 am, 20%). Suppose I work for an hour. End point: 11 am. Total work up to that point: 2 hours. Efficiency: 33% (= 2/6). That’s a point at (11 am, 33%).

Although I had collected the data to test an idea, I also thought it would be interesting to see how the current day compares to previous days. Was I doing better than usual? Worse than usual? To make this comparison I plotted the data from the current day as a line rather than as points, to make it stand out. I also made it a different color. I often ran the display program while working. It showed the results up to that moment.

All this had a surprising result: I became considerably more efficient. Here is an example of the graphs I looked at many times per day:

The brown line is the current day. The line goes up when I work, down during a break, up again when I resume working. Blue and green points are previous days. Blue points are from the days before I started looking at graphs like this, green points from the days after I started looking at graphs like this. In other words, the difference between the green and blue points shows the effect of looking at graphs like this. The red and black points are the final points of the day — red from the days before feedback, black from the days after feedback began. They summarize the day. The higher they are, the more efficient I was.

The green points are mostly above the blue points — and, especially, the black points are above the red points. This suggests that the graphical feedback made me more efficient. Before it began, I was about 25% efficient throughout the day. After this feedback began, I was about 40% efficient. The only change was addition of this feedback.

I was shocked by these results — the improvement was sudden and large. Had I an inkling that such a thing was possible, I would have tried it long ago. The comparison isn’t feedback vs. no feedback. Before the graphical feedback started I got printed feedback (“120 minutes [work] so far”) as often as I wanted and whenever I started or stopped work. And I’ve kept records of how much I work in other ways for a long time. My professional research area is animal learning — not far from studying the effect of feedback.

If the improvement persists, I will try to explain it. I once spoke to an engineering professor who started measuring his calorie intake, hoping to lose weight. As soon as he started keeping track, his once-a-week binges of eating a whole carton of ice cream in a sitting stopped. That’s the closest result I can think of and it isn’t that close.

 

 

 

 

 

Methodological Lessons From My One-Legged-Standing Experiment

A few days ago I described an experiment that found standing on one leg improved my sleep. Four/day (= right leg twice, left leg twice) was better than three/day or two/day. I didn’t know that. For a long time I’d done two/day.
I think the results also contain more subtle lessons. At the level of raw methodology, I found that context didn’t matter. The effect of four/day was nearly the same when (a) I measured that effect using four days in a randomized design (where the dose for each day is randomly chosen from two, three, and four) and when (b) I measured that effect using a dose of four day after day. Suppose I want to compare three and four. Which design should I use: (a) 3333344444, (b) 3434343434, or (c) 4433343434 (randomized)? The results suggest it doesn’t matter.

The experiment didn’t take long (a few months) but it took me a long time to begin. I noticed the effect behind it (one-legged standing improves sleep) two years ago. Why did I wait so long to do an experiment about details?

I was already collecting the data (on paper) — writing down how long I slept, rating how rested I felt, etc. But I wasn’t entering that data in my laptop. To transfer months of data into my laptop required motivation. Most of my self-experimentation has been motivated by the possibility of big improvements — much less acne, much better mood, and so on. That wasn’t possible here. I slept well, night after night.

What broke the equilibrium of doing nothing? A growing sense of loss. I knew I was throwing away something by not doing experiments (= doing roughly the same thing day after day). The longer I did nothing, the more I lost. To say this in an extreme way: I had discovered a way to improve sleep that was unconnected to previous work — sleep experts haven’t heard of anything like it. It was real progress. To fail to figure out details was like finding a whole new place and not looking around. Moreover, the experiments wouldn’t even be difficult. The treatment takes less than a day and you measure its effect the next morning. This is much easier than lots of research. Suppose you know that radioactivity is bad and you discover something radioactive in your house. A sane person would move that radioactive thing as far away as possible — minimizing the harm it does. I had discovered something beneficial yet wasn’t trying to maximize the benefits. Crazy!

An early lesson I learned about experimentation is to run each condition much longer than might seem necessary. If you think a condition should last a week, do it for a month. Things will turn out to be more complicated than you think, having more data will help you deal with the additional complexity that turns up. Now it was clear I had gone too far in the direction of passivity. I did the experiment, it was helpful, I could have done it a year ago.

Effect of One-Legged Standing on Sleep

In 1996, I accidentally discovered that if I stood a lot I slept better. If I stood 9 hours or more, I woke up feeling incredibly rested. Yet to get any improvement I had to stand at least 8 hours. That wasn’t easy, and after about 9 hours of standing my feet would start to hurt. I stopped standing that much. It was fascinating but not practical.

In 2008, I accidentally discovered that one-legged standing could produce the same effect. If I stood on one leg “to exhaustion” — until it hurt too much to continue — a few times, I woke up feeling more rested, just as had happened when I stood eight hours or more. At first I stood with my leg straight but after a while my legs got so strong it took too long. When I started standing on one bent leg, I could get exhausted in a reasonable length of time (say, 8 minutes), even after many days of doing it.

This was practical. I’ve been doing it ever since I discovered it. A few months ago I decided to try to learn more about the details. I was doing it every day — why not vary what I did and learn more?

One thing I wanted to learn was: how much was best? I would usually do two (one left leg, one right leg) or four (two left leg, two right leg). Was four better than two? What about three?

I decided to do something relatively sophisticated (for me): a randomized experiment. Every morning I would do two stands (one left, one right). In the evening I would randomly choose between zero, one, and two additional one-legged stands. Sometimes I forgot to choose. Here are the results for three sets of days: (a) “baseline” days (baseline(2), baseline(3), baseline(4)) before the randomized experiment and during the experiment when I forgot and (b) the “random” days (random 2, random 3, random 4) when I randomly choose and (c) a later set of days (“baseline 4″) when I did four one-legged stands every day.

Each morning, when I woke up I rated how rested I felt on a scale where 0 = not rested at all (as tired as when I went to sleep), and 100 = completely rested, not tired at all.

 This shows means and standard errors. The number of days in each condition are on the right.

The main results are that three was better than two and four was better than three. The three/four difference was large enough compared to the two/four difference to suggest that five might be better than four. The similarity between random 4 and baseline 4 means that the amount of one-legged standing on previous days doesn’t matter much. For example, on Monday night it doesn’t matter how much I stood on Sunday.

These differences were not reflected in how long I slept. Below are the results for “first” sleep duration, meaning the time from when I went to sleep to when I woke up for the first time — which is when I measured how rested I was (the graph above). On a small fraction of days, I went back to sleep a few hours later.

These results mean that one-legged standing increased how deeply I slept, what you could call sleep “efficiency”.

I also computed “total” sleep duration, which included first sleep duration, second sleep duration, and nap time the previous day (e.g., nap time on Monday plus sleep Monday night). If I took a long nap, I slept less that evening. Here are the results for total sleep duration.

The results also support the idea that one-legged standing made me sleep more deeply.

The randomized experiment had pluses and minuses compared to a simpler design (such as an ABA design, where you do each treatment for several days in a row). The two big pluses were that the conditions being compared were more equal and you could simply continue until the answer was clear. The two big minuses were that I often forgot to do the randomization and lack of realism. If I decided that four was the best choice, I’d do four every day, not in midst of two’s and three’s.

Overall, it was clear beyond any doubt that four was better than two, and clear enough that four was better than three (one-tailed p = 0.02). The results suggest trying larger doses, such as five and six. I’ve only done six once: before a flight from Beijing to San Francisco. It was one of the few long flights where I slept most of the way.

If you try this and you do more than one right and one left, leave plenty of time (two hours?) before the second pair, to allow the signaling molecules to be regenerated.

Growth of Quantified Self (more)

At the Quantified Self blog, Alexandra Carmichael has posted several graphs showing how much the Quantified Self movement has grown during the past year. The number of QS meetup members has grown by a factor of 3; the number of groups has grown by a factor of 6.

Measuring yourself is a step toward controlling yourself — especially, controlling your health and well-being. Almost everyone wants more control of these things. I believe that the idea, which the Quantified Self movement encourages, that ordinary people can do useful science is a shift with implications on the order of the shift from religion (the Sun revolves around the Earth) to science (the Earth revolves around the Sun). When ordinary people begin to do science, I predict we will learn a lot more about how to control our bodies.

Before science became powerful, people knew lots of correct useful stuff (e.g., metallurgy). But there were limits on what could be learned (e.g., Galileo was imprisoned). Now religion is much less powerful but most people believe that science can only be done by certain people (e.g., professors). This too places serious limits on what can be learned. For control of the outside world (e.g., material science, physics), I don’t think these limits matter (although the case of Starlight suggests that even here amateurs can make important discoveries). But for control of the inner world (our bodies), the message of my work is that these limits matter a lot. By studying myself I managed to learn a bunch of useful things that professional scientists could learn only with great difficulty. For example, I could learn from accidents how to sleep better; I could easily test ideas about how to sleep better. Few if any professional sleep researchers measure sleep night after night for long periods of time; nor do they do cheap fast experiments.

My Talk at EG

Last year I gave a 20-minute talk at EG (EG is short for Entertainment Gathering) titled “You Had Me at Bacon” about my self-experimentation. I described some of the things I’ve discovered by self-experimentation. Then I tried to say why it had been successful — why I had managed to discover such useful stuff. My conclusion is that my success came from the combination of four things: 1. Self-experimentation. Much faster, more flexible than ordinary research. 2. The Stone Age = good idea. I used the idea that our bodies were shaped to work well under Stone-Age conditions to choose what experiments to do. 3. Subject-matter knowledge. My knowledge of psychology, experimental design, and data analysis helped a lot. My weight-control theory, for example, was based on ideas from animal learning. 4. Freedom. I could do and say what I wanted. Most scientists cannot. They fear career damage. The combination of these four things is why my work was effective.

After my talk, a few people asked: Were you serious? No doubt you’ve heard Arthur Clarke’s maxim that “any sufficiently advanced technology is indistinguishable from magic.” Let me propose a related idea: Any sufficiently advanced science is indistinguishable from a joke.

Miso Bar

At a hotel buffet restaurant near Tsinghua I had fermented food in a form new to me: a miso-soup “bar”. You serve yourself from a tureen of miso soup and have a wide choice of add-ons: carrot, turnip, tofu, pickled ginger, green onion, Japanese pickle. Adding color, visual diversity, crunch, and DIY to the soup makes it taste much better — and it already tastes really good.

If I made a scatterplot of all the foods I can make, with difficulty on one axis and deliciousness on the other, this would be a bivariate outlier: very easy and very delicious.

Terrific Essay by Cory Doctorow

I highly recommend this editorial by Cory Doctorow about the dangers of allowing a small number of people — such as big companies — to control how everyone’s computer, smart phone, etc., operates. I especially like his conclusion, modeled on Isaac Asimov’s T hree Laws of Robotics:

But we’ll only arrive at those solutions once we stop reflexively demanding limits on the general functionality of a PC and a network — and the sooner we do, the sooner we’ll legitimize a technology world whose first rule is “Obey your owner” and whose second rule is “Protect your owner’s interests”.

In case it isn’t obvious, self-experimentation and personal science increase your control of your body, just as Doctorow wants each person to control the technology they own. Without self-experimentation and personal science — and their ability to solve health problems in a way best for you — you give control over your body to doctors, drug companies, medical school professors, nutritionists, alternative-medicine advocates, and many others whose interests differ from yours. Often the difference is large — drug companies prefer expensive dangerous solutions to cheap safe ones.

Is Medical Research a Veblen Good?

Felix Salomon argues that fancy restaurants often manage to make their food a Veblen good — something that becomes more desirable when the price goes up. Restaurant food is a way to show off your wealth, in other words.

Veblen and I differ on the long-term value of Veblen goods. Veblen saw them as sort of ridiculous — which is why he coined the amusing term conspicuous waste. Whereas I see them as a way of promoting innovation: Long ago, desire for luxury goods, goods with “wasteful” features, helped the most skilled artisans make a living. These artisans were the best source of innovation within a society.

Unfortunately everyone likes to show off, not just fancy-restaurant-goers. Throughout the medical research community, there is an obvious preference for expensive research over cheaper research. (I’m not saying experimental psychologists such as me are any better: We’re not.) Few medical researchers understand that expensive studies are a last resort and the larger your sample size, the less you understand what you are studying. (Experimental psychologists do understand this.) When people doing research related to health are too concerned with showing off (e.g., doing studies that require expensive equipment) to do effective research, the benefit-cost ratio of Veblenian behavior goes below one. Desire to show off gets in the way of solving health problems. This is why personal science — using science to solve your own problems — is so important: The personal scientist will do whatever works, regardless of how impressive it is.

Does Blood Pressure Medicine Always Work?

Apparently not:

I was a very naughty patient and, after taking Atacand for 135/75 blood pressure (benign essential hypertension was the description) for a number of years on my doctor’s prescription, decided to do a little experiment. That is, I cut back on it gradually, monitoring my BP every day. No change.

 

I eventually got to no Atacand at all and have been there for the past four years, during which time the BP has remained the same as when taking the drug. Now, whether the BP is going to kill me is perhaps a separate question (I seem to be in excellent health at 65) but the Atacand doesn’t appear to have made much difference at all — except for the $600/year it cost me, even after insurance had picked up on some of the expense.

I began to grasp how helpful self-experimentation could be when I discovered that tetracycline, an antibiotic that my dermatologist had prescribed, did not reduce my acne. When I told my dermatologist about the research that revealed this, he said, “Why did you do that?”

Had this person’s doctor told him that Atacand might not work? Clearly not. Did the doctor even know that Atacand might not work? Apparently not, since there was no doctor-guided attempt to find out. Perhaps the doctor who prescribed Atacand would defend himself by saying, lamely, that all he knew is what the drug company told him. I wonder what the drug company knew.

How much money could be saved by stopping the prescription of drugs that turn out not to work? Should all drugs come with a label that says the fraction of patients for whom this drug doesn’t work? It is a warning that is truly needed.

Thanks to Rajiv Mehta.

“My Body, My Laboratory” (TIME article)

Last week Time published an article about self-experimentation called “My Body, My Laboratory” by Eben Harrell that is now fully available on-line. I am quoted a few times.

I distinguish between two kinds of self-experimentation — part of your job (the usual kind) or self-help (what I do) — and it’s easy to put each of the examples in the article into one pile or the other. However, I think that if you go far enough into the future and look back, you will see three varieties:

1. Professional. Self-experimentation done as part of your job (e.g., doctor). A dentist testing a new anesthetic, for example. All famous examples are in this category.

2. Self-help. Self-experimentation done to improve your own life. Done by non-professionals. I call this personal science.

3. Combination of the two. A professional combines job skills and self-help. This is what I did. My job (experimental psychologist) gave my self-experimentation (about weight loss, sleep, mood, and health, all common self-help topics) a considerable boost.

Professionals (Category 1) have skills and resources. The self-helpers, the non-professionals (Category 2) have freedom and (greater) motivation. People in Category 3 have all four. To summarize this paper in three words, that really helps. Please imagine the Venn diagram — one circle (“Professional”), another circle (“Self-Help”), and area of overlap (“Me”).