Self-experimentation is an example of the more general idea that non-experts can do valuable research. Another example is that two New York teenagers have shown that fish sold in New York City is often mislabeled. They gathered samples from 4 sushi restaurants and 10 grocery stores and sent them to a lab to be identified using a methodology and database called Barcode of Life. They found that “one-fourth of the fish samples with identifiable DNA were mislabeled . . . [and concluded] that 2 of the 4 restaurants and 6 of the 10 grocery stores had sold mislabeled fish.”
The article, by John Schwartz, appeared in the Science section, which makes the following sentence highly unfortunate:
The sample size is too small to serve as an indictment of all New York fishmongers and restaurateurs, but the results are unlikely to be a mere statistical fluke.
This is a Samantha-Powers-sized blunder. It could hardly be more wrong. How much you can generalize from a sample to a population depends on how the samples were chosen. Sample size has very little to do with it. (John Tukey had the same complaint about the Kinsey Report: Stop boasting about your sample size, he said to Kinsey. Your sampling methods were terrible.) To know to what population we can reasonably generalize these results we’d need to know how the two teenagers decided what grocery stores and restaurants to sample from. (Which the article does not say.) If the 14 fish sellers were randomly sampled from the entire New York City population of grocery stores and restaurants, it would be perfectly reasonable to draw broad conclusions.
I have no idea what it could mean that the results are “a mere statistical fluke”.
The effect of these errors is that Mr. Schwartz places too low a value on this research. It’s impressive not only for its basic conclusion that there’s lots of mislabeling but also for showing what non-experts can do.
The end of the article did see the big picture:
In a way, Dr. Ausubel said, their experiment is a return to an earlier era of scientific inquiry. “Three hundred years ago, science was less professionalized,” he said, and contributions were made by interested amateurs. “Perhaps the wheel is turning again where more people can participate.”
Good point.
BTW, on the warrior diet, I learned some more:
https://www.t-nation.com/free_online_article/sex_news_sports_funny/the_warrior_diet_an_interview_with_penthouse_editor_ori_hofmekler
Hi Seth,
As someone who works in the market research industry I like the Samantha Powers story and think the article makes a good point. I am more torn on the questions you raise in your post. As you know from my post about the under-use of adecdotal evidence I am a big believer in not letting valuable evidence leak away.
I was slightly confused by your statement
I have no idea what it could mean that the results are “a mere statistical flukeâ€
since it seemed from the article that the author was saying it was unlikely to be a fluke. So in that respect he seems to be on the same page as us.
It is just the sample size issue where there is divergence, and in this respect what I am wondering is this: he has obviously applied some formula to determine whether the sample size can be taken as significant, whereas we perhaps prefer to apply a commonsense interpretation of the results and sampling methodology – but there must be a point at which sample size does matter even for that approach – for example, could we have viewed a sample of 3 as having the potential to be regarded as a statistical fluke?
Methuselah
Pay Now Live Later
The term “statistical fluke” is usually applied to differences between groups. Group A and Group B differ by some amount — does this mean the populations from which they were sampled actually differ on that dimension or could the results be due to sampling variability? If they are due to sampling variability, that means the observed difference is a “statistical fluke.” In the fish case, there are not two groups. So I have no idea what he’s talking about.
You can’t have sample sizes be “[statistically] significant” or not; it is differences between groups or differences from zero that are statistically significant or not.
Got it – thanks for the explanation.
Thanks for the post. This revealed the importance of checking the source and accuracy of information before making an judgment.
The risk of stereotyping is often overlooked, just as in this case. We could perform lots of scientific experiments and conclude results based on “facts”. However, it is imperative to reveal the procedures and ensure that all “facts” presented have been critically examined. When I was at school doing chemistry experiments, I often found the results differing from the theory quite a lot. Why? There are a lot of factors which could affect the accuracy of the experiments…. Again, it is too unfair to all parties by just looking at the results, without understanding what’s behind. This applies not only to scientific “experiments”, but also to social observations.
If you were the teacher of the students, what would you have advised the students?
John