John Chambers, a retired Bell Labs statistician and one of the persons most responsible for R, the free open-source data analysis package I use, told me an interesting story yesterday. AT&T used to make microchips. The “yield” of chips — the percent of chips that were defect-free — was very important. Chambers and other Bell Labs statisticians were asked to help the chip makers improve their manufacturing process by increasing the yield. At the chip factory, the people Chambers and his colleagues spoke to were chemists and engineers. They wanted to do experiments that varied voltage, temperature, and similar variables. Chambers and his colleagues had a hunch that the operator — the person running the fabrication machines — was important, and this turned out to be true.
I like this story because it has a wisdom-of-crowds-but-not-exactly twist: the supposed experts at one thing (data analysis) turned out to have useful (and unpredictable) knowledge about something else. We don’t think of statisticians as experts in human behavior but in this case they were at least more expert than the chemists and engineers. I mean: who were the experts here? And when we deal with someone, which is more likely: We overestimate how much they can help us with our problem? Or we underestimate (as in this story, where the chip makers underestimated the statisticians)? And if we have no idea which it is, how might we find out?
I told Chambers that statisticians were hurt by the name of their department: statistics. It puts them in too-small a box. John Tukey’s term data analysis (in place of statistics) was an improvement, yes, but only a bit; it would be a lot better if they were called how-to-do-research departments. Yes, Chambers said, that would be an improvement.
I am fascinated by the similarity between three things:
1. Data analysis. Much of data analysis consists of putting data together in a way that allows you to extract a little bit of information from each datum. These little piece of information, added together, can be quite informative. A scatterplot, for example.
2. Wisdom-of-crowds phenomena. For example, many people guess the weight of a cow. The average of their guesses is remarkably accurate, even though the variation in guesses is large.
3. Self-experimentation. The new and interesting feature of my self-experimentation was that it involved my everyday life. From activities I was going to do anyway (such as eat and sleep), I managed to extract useful information.
In each case it’s like extracting gold from seawater: You get something of value from what seemed useless. Are there other examples? How can we find new examples? Chamber’s story suggests one direction: Making some small change so that you learn from your co-workers about stuff you wouldn’t think they could teach you about.