A Statistics Package in the News

I use R, the open-source version of S, several times/day. More often than I use Word. It works far better than S — fewer bugs, much cheaper (R is free) — and S worked a lot better than what it replaced (STATGRAPHICS). I was pleased to see a NY Times article about it:

R has also quickly found a following because statisticians, engineers and scientists without computer programming skills find it easy to use.

“Easy to use” — haha! Non-statisticians and non-engineers don’t find it easy to use, in my experience, but it’s true that I found it easy to use. “R has a steep learning curve” some people say, twisting the meaning of “steep learning curve” (which should mean fast learning, since that’s what a steep learning curve describes).

The popularity of R at universities could threaten SAS Institute, the privately held business software company that specializes in data analysis software. SAS, with more than $2 billion in annual revenue, has been the preferred tool of scholars and corporate managers. . .SAS says it has noticed R’s rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks.“I think it addresses a niche market for high-end data analysts that want free, readily available code,” said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

Ah, “freeware.” You may remember when “Made in Japan” was derogatory. Most psychology departments, including Berkeley, use SPSS (Statistical Package for the Social Sciences). Like SAS and its ten feet of manuals, it is horrible. One of my students wanted to make a scatterplot of her data. She went to the psych departmental statistics consultant (a psych grad student who had taken courses in the statistics department). The statistics consultant didn’t know how to do this! A scatterplot! It’s like Vladimir Nabokov’s observation at Cornell and other schools of language professors who couldn’t speak the language they taught. Nothing But the Best describes a Julliard composition teacher who couldn’t read music. To be a scientist and not be able to analyze your own data is pretty much the same thing. With R making a scatterplot is easy.

To me, the value of R is that it makes high-quality data analysis available to everyone — something very new in the history of mankind. R makes self-experimentation easier because it makes data analysis easier and allows you to learn more from the data you have collected (e.g., make better graphs). I also use it for data collection — measuring how well my brain is working.

Via Andrew Gelman.

7 thoughts on “A Statistics Package in the News

  1. There’s a difference between “freeware”, “open source software”, and “free software”. The SAS marketing drone chose the derogatory “freeware” to describe something that it really open source software (and might even be free-as-in-freedom software). “freeware” usually describes those free binary programs for which the source is NOT available and which often install spyware on your computer. Free software is something completely different. See:

    https://www.gnu.org/philosophy/categories.html
    https://oreilly.com/openbook/freedom/

  2. As you well know from our collaboration, I have found R fairly difficult to learn, but then I’ve always had trouble learning to write code (the learning curve tends to be so shallow that I give up early). I find visually-driven interfaces much more intuitive and I pick them up very quickly (i.e., with a steep learning curve). JMP (pronounced ‘jump’) is a statistical and graphical program –used by engineers and computer scientists — that has a visually-driven interface (like windows) but allows for some programming of functions. I find it highly useful for data exploration. Perhaps ideally I would use R, but fortunately I can get some of the same benefits (e.g., data exploration) from JMP. I’m very glad, however, that you are so proficient with R! :)

  3. yeah, before STATGRAPHICS I used APL and wrote APL functions for STATGRAPHICS. I still think it’s weird that R doesn’t have certain APL functions.

  4. Love the debate!! I use R almost daily, well weekly for sure. I find the help forums useful to a point, but often feel intimitated by replies (I’m not a statistician but a user of data). I think one real problem with the R movement is support and validation. There are many ways to do he same/similar thing in R, but as the non-expert – which one do I use?!! and who do I go to to ask to get the most appropriate reply?? I use GenStat too this over comes the problem. it’s free to use in teaching world wide and free for research in the developing world. With GenStat I can trust the tools and know it’ll point me in the right direction – something very lacking in R. If you buy it, from memory it’s lots cheaper than any product beginning with S…., but not free like R. But what is free anyway? it’s expensive if I have to invest hours to do anything interesting in R, whereas with GenStat it takes minutes?

  5. I’m in SAS marketing and just want to state for the record that we are not R or open source haters. I think the Times article was generally good but perhaps gave the impression that we believe the devil resides in R. For a more complete picture check out the response from Anne Milley who was quoted in the Times article. It’s at
    https://blogs.sas.com/sascom/

Leave a Reply

Your email address will not be published. Required fields are marked *