Andrew Gelman writes:
If I had to come up with one statistical tip that would be most useful to you–that is, good advice that’s easy to apply and which you might not already know–it would be to use transformations. Log, square-root, etc.–yes, all that, but more! I’m talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something [more] continuous (those “total scores” that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don’t care if the threshold is “clinically relevant” or whatever–just don’t do it. If you gotta discretize, for Christ’s sake break the variable into 3 categories.
I agree (and wrote an article about it). Transforming data is so important that intro stats texts should have a whole chapter on it — but instead barely mention it. A good discussion of transformation would also include use of principal components to boil down many variables into a much smaller number. (You should do this twice — once with your independent variables, once with your dependent variables.) Many researchers measure many things (e.g., a questionnaire with 50 questions, a blood test that measures 10 components) and then foolishly correlate all independent variables with all dependent variables. They end up testing dozens of likely-to-be-zero correlations for significance. Thereby effectively throwing all their data away — when you do dozens of such tests, none can be trusted.
My explanation why this isn’t taught differs from Andrew’s. I think it’s pure Veblen: professors dislike appearing useful and like showing off. Statistics professors, like engineering professors, do less useful research than you might expect, so they are less aware than you might expect of how useful transformations are. And because most transformations don’t involve esoteric math, writing about them doesn’t allow you to show off.
In my experience, not transforming your data is at least as bad as throwing half of it away, in the sense that your tests will be that much less sensitive.
And speaking of statistics, here’s an interesting debunking of John Gottman’s research into marriage & divorce:
https://www.slate.com/id/2246732/
Thanks for the links to the articles. I’m about to run a fairly large test battery using a number of different types of measure (accuracy, RT, differences in RT) and different tests of related abilities, so I’ll be needing to think hard about both transformations and principal components in the weeks to come.
In your experience, does using transformations and PCs make reviewers skittish? I could easily imagine people wondering why you transformed the data (cf. “less aware than you might expect of how useful transformations are”), or being disinclined to believe the results of a statistical test that wasn’t significant on the raw data.
Matt,
About 10-20% of reviewers in my experience are bothered by transformations. I simply explain to the editor the importance and acceptedness of transformations. I haven’t had a problem.
https://www.theonion.com/articles/report-14-trillion-spent-annually-on-trying-to-loo,17125/