From the abstract of a 2006 study done in Japan:
An epidemiological study was carried out on [134] first-year junior high school students in Wakayama Prefecture. Analyses were performed to investigate the relationships among eating habits of fermented milk or fermented soybean foods and the presence of atopic diseases. Serum levels of total IgE values, specific IgE to house dust mite and Japanese cedar pollen in these subjects were evaluated to clarify atopic status. . . . RESULTS: Serum total IgE levels were found to be significantly lower in those subjects habitually eating yogurt and/or fermented milk drinking, in comparison with those who do not habitually eat such fermented milk foods. Subjects with habitual intake of these fermented milk foods were significantly lower in having various allergy diseases compared with those without such an eating habit. However, no difference was found on the total IgE titers and having allergy diseases between subjects with or without habitual intake of Natto, a fermented soybean food.
Note the small sample size. Contrary to some experts, it’s a good sign. It means the differences were strong enough to be significant in a relatively small sample. A review article about allergies and fermented foods.
Last January (2008) I got home from Japan and started eating miso soup so often I forgot what I used to eat. This January (2009) I went to the Fancy Food Show and became so interested in fermented foods I’m having trouble remembering what I used to blog about.
Thanks to Peter Spero.
my thought (and intuition) is that different types of fermented foods generate different types of antibodies and that a variety of fermented foods is more likely to create a broad spectrum of immunity. i suppose that would also be true of probiotics.
Peter, I agree, the difference between natto and yogurt is a puzzle. It might be a dose difference. In yogurt the bacteria are everywhere. In natto, they grow on the surface. Perhaps different fermented foods generate different antibodies but there is almost an infinite number of possible antibodies. So whether you eat 2 or 10 isn’t going to make a difference in terms of antigen-space coverage.
The benefits of small samples assumes that all studies are going to be reported (or are equally likely to be reported). But the “file drawer” problem is particularly acute for small studies, given the relatively small resources involved. If you do (or the community does) 20 small studies and only report the one that’s significant, then the assumption that the large effect size means a robust difference is not going to hold.
Other than that, I agree that a plethora of small studies is better than one big one, but I do see the file drawer problem as a big issue that you’re not taking into account here.
Kevin, a “big” issue? If there is any evidence that the file drawer problem has ever mattered, I’d love to know about it. Then I might agree with you. Without evidence, this idea resembles the complaint about graphing data that was made when exploratory data analysis, which emphasizes graphing, first became popular: The more graphs you make the more likely you will be misled by random patterns. Professors of statistics actually said this!
Good point. I found this article: https://www.scientificexploration.org/journal/jse_14_1_scargle.pdf that presents a model (but with no actual evidence) suggesting that it doesn’t take much of a file-drawer effect to bias the literature. I remembered this article — https://www3.interscience.wiley.com/journal/119263052/abstract?CRETRY=1&SRETRY=0 — which argues that some of the “effects” of early day care on attachment suffer from this problem. That is exactly the kind of situation where you might expect this to be a problem — relatively small effects within the range of variation across countries and a lot of within-country variation, some probably due to temperament, some of the rest due to parenting, and room for other factors.
I’d also heard the Thompson and McConnell planaria/learning/cannibalism results attributed to a version of this (that they kept running more worms until it worked, then stopped), although I couldn’t find a citation to that claim and it may be folklore.
What I would love to see as a paradigm is one where advocates of different views get together to design and run experiments that they would accept as being dispositive. In this case you’d have two biases (confirmation and what I’ll call “I saw it myself”) working against each other.
But clearly small scale experiments are more informative than large scale ones.
Thanks for the examples, Kevin. I suspect a variant of the file-drawer problem exists in epidemiology where the same data is analyzed many different ways until a significant difference is found. Here the evidence is that surveys find significant effects that when experimentally studied fail to produce the expected results.
In all my experimentation, with rats, myself, and other humans, I usually do the experiment more than once to make sure the effect is repeatable. Nothing ever rests solely on one significant difference. Moreover, the effects are always much more significant than p < 0.05. This is another benefit of small experiments: Easier to check the results by doing another experiment.