Here is an example of the negative evaluation bias I mentioned earlier. Larry Sanger criticizing a comparison of Wikipedia and the Encyclopedia Britannica:
Some might point to Nature’s December 2005 investigative report—often billed as a scientific study, though it was not peer-reviewed—that purported to show, of a set of 42 articles, that whereas the Britannica articles averaged around three errors or omissions, Wikipedia averaged around four. Wikipedia did remarkably well. But the article proved very little, as Britannica staff pointed out a few months later. There were many problems: the tiny sample size, the poor way the comparison articles were chosen and constructed, and the failure to quantify the degree of errors or the quality of writing. But the most significant problem, as I see it, was that the comparison articles were all chosen from scientific topics. Wikipedia can be expected to excel in scientific and technical topics, simply because there is relatively little disagreement about the facts in these disciplines. (Also because contributors to wikis tend to be technically-minded, but this probably matters less than that it’s hard to get scientific facts wrong when you’re simply copying them out of a book.) Other studies have appeared, but they provide nothing remotely resembling statistical confirmation that Wikipedia has anything like Britannica-levels of quality. One has to wonder what the results would have been if Nature had chosen 1,000 Britannica articles randomly, and then matched Wikipedia articles up with those.
“Tiny sample size”? Hmm. How often have you heard “the sample size was too large”?
Here is another example of a one-sided critique: her advisor’s reaction to her work (“My advisor started out tearing apart the things I had done”).