Interview with Andy Maul about Test Development (part 2)

4. You write “There are multiple problems with the validity of existing EI tests that make them difficult to interpret, and make claims based on them highly suspect.” What are the main problems?

Many early tests designed to measure emotional intelligence, and some still in use today (including one developed in part by the journalist Daniel Goleman, who popularized the term emotional intelligence in his 1995 book), used self-report methods and treated the construct as a conglomerate of various desirable personality and motivational factors, such as optimism, contentiousness, happiness, and friendliness. Tests of this nature may be interesting and may predict important outcomes, but emotional intelligence, defined in this way, is really just a repackaging of old ideas. These tests are so highly correlated with traditional measures of personality as to be operationally indistinct from them. Additionally, calling this construct emotional *intelligence* is suspect: personality and intelligence are generally regarded as very different things, and assessing intelligence
through self-report is generally considered inadequate.

The MEIS and the MSCEIT, to which I referred earlier, are, to my knowledge, the only two currently published tests that assess emotional intelligence as an intelligence. These tests ask respondents to engage in a variety of tasks, such as looking at pictures of people’s faces and reading stories about human interactions, and then make judgments about the emotional content of those stimuli. These tests are a step in the right direction, but have their own problems.

The test developers have a rather odd way to score the responses people give to the stimuli on the tests. The tests were administered to a large (N=2000+) standardization sample, and the scores people are now given on the items are the percentage of people from that standardization sample who chose that alternative. In other words, if you select choice “c” on an item, and 67% of people from the standardization sample also chose “c”, then you get a .67 for that item. If you chose “d”, and only 11% of
people chose “d”, then you only get .11 for that item. Your total score is simply the sum of the weighted scores from each item.

As odd as this method of scoring may sound, it has been used in other situations where the underlying theory is not well understood (see Legree, below, for an exposition of this). However, it presents difficulties here: it defines correctness as, essentially, conformity of opinion with the standardization sample. In other words, what is actually being measured may not be “intelligence”, but rather, simply, normality or popularity of opinion: the highest-scoring respondents will simply be those who most consistently choose the responses most other people also select. Additionally, this prohibits the existence of items so difficult that most people get them wrong, such as a very subtle facial expression that only the most emotionally astute could correctly
parse: if there were any such items, the astute minority would be penalized for choosing the less-popular but more-correct alternative. This is a serious challenge to the construct validity of the test.

Additionally, the internal structure of the test itself is suspect. The test developers posit a four-factor model of emotional intelligence (the four factors being the ability to perceive emotions, the ability to allow emotions to facilitate thought, understanding emotions, and managing emotions) and have branches of the tests designed to measure all four of those factors. They have published confirmatory factor analyses that they claim support their theory; however, re-analyses of their tests, and new analyses (including one that I am conducting now) have not been able to replicate their results, calling into question the internal validity and reliability of the tests.

In my dissertation, which I can make available early next spring, I discuss all these points in greater detail.

Part 1.

References

References

Legree, P. J., Psotka, J., Tremble, T., & Bourne, D. R. (2005). Using consensus based measurement to assess emotional intelligence. In R. Schulze & R. D. Roberts (Eds.), Emotional intelligence: An international handbook (pp. 155–179). Cambridge, MA: Hogrefe & Huber.

MacCann, C., Matthews, G., Zeidner, M, & Roberts, R. D. (2003). Psychological assessment of emotional intelligence: A review of self-report and performance-based testing. International Journal of Organizational Analysis, 11, 247-274.

Mayer, J.D., Salovey, P., & Caruso, D.R. (2004). Emotional Intelligence: Theory, Findings, and Implications. Psychological Inquiry, 3, 197-215.

Mayer, J.D., Salovey, P., Caruso, D.R., & Sitarenios, G. (2001). Emotional intelligence as a standard intelligence. Emotion, 1, 232-242.

McCrae, R.R. (2000). Emotional intelligence from the perspective of the five-factor model of personality. In R. Bar-On & J.D.A. Parker (Eds.), Handbook of Emotional Intelligence (pp.92-117). San Francisco, CA: Jossey-Bass.

O’Sullivan, M. (2005) Trolling for trout, trawling for tuna: The methodological morass in measuring emotional intelligence. In press.

Roberts, R., Schulze, R., O’Brien, K., Reid, J., MacCann, C., & Maul., A. (2006). Exploring the Validity of the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT) with Established Emotions Measures. Emotion, 6(4), 663-669.

Roberts, R. D., Zeidner, M., & Matthews, G. (2001). Does emotional intelligence meet traditional standards for an intelligence? Some new data and conclusions. Emotion, 1, 196-231.

Interview with Andy Maul about Test Development (part 1)

Andy Maul, who took introductory psychology with me, is a graduate student in Educational Psychology at UC Berkeley.

1. What is your research about?

I’m taking a closer look at tests recently developed to measure the construct of emotional intelligence (EI). In particular, I’m looking at the Multifactor Emotional Intelligence Scale (MEIS) and the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT), which were both developed in the past decade and evaluated using traditional methods (confirmatory factor analysis [CFA] and classical test statistics such as alpha coefficients, along with correlations with other tests and hypothesized outcomes). I’m looking at these tests again, both though the traditional lens of CFA and through the newer lens of Item Response Theory (IRT). In the end, I hope to make points both for the development of EI tests, and for psychological measurement in general, by highlighting how newer methods can improve the construct- and test-building process.

2. How did you get interested in this line of research?

I became interested in emotions by working with Professor Dacher Keltner. At some point in graduate school my interests shifted to the more quantitative side of research, and I’ve since been working with Professor Mark Wilson on test theory and statistical measurement. I thought combining the two interests, by evaluating tests of emotional intelligence through a quantitative lens, would be a good idea.

3. What’s an example of research that shows the value of measuring emotional intelligence?

The MSCEIT appears to predict some life outcomes (such as grades, prosocial behavior, and self-reported life satisfaction), even controlling for IQ and personality. Other researchers have challenged these claims as being premature and based on insufficient evidence. There are multiple problems with the validity of existing EI tests that make them difficult to interpret, and make claims based on them highly suspect.

Some researchers feel that defining and measuring emotional intelligence could clarify and expand our definitions of intelligence and cognitive abilities in general, and provide information about an area of human functioning that could predict important personal and interpersonal outcomes (such as life satisfaction and the quality of one’s relationships) above and beyond traditionally-measured intelligence and personality. In today’s era of high-stakes testing, with so much riding on what many feel to be tests with limited utility, a new, well-validated test of emotional intelligence could provide insight into what makes students successful in schools and in life.

References

Mayer, J., Salovey, P., & Caruso, D. (2002). Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT): User’s manual. Toronto, Canada: Multi-Health Systems.

Mayer, J., Salovey, P., Caruso, D., & Sitatenios, G. (2003). Measuring emotional intelligence with the MSCEIT V2.0. Emotion, 3, 97-105.

O’Sullivan, M. (2005) Trolling for trout, trawling for tuna: The methodological morass in measuring emotional intelligence. In press.

Palmer, B., Gignac, G., Manocha, R., & Stough, C. (2005). A psychometric evaluation of the Mayer-Salovey-Caruso Emotional Intelligence Test Version 2.0. Intelligence, 33, 285-305.

Roberts, R. D., Schulze, R., Zeidner, M., & Matthews, G. (2005). Understanding, measuring, and applying emotional intelligence: What have we learned? What have we missed? In R. Schulze & R. D. Roberts (Eds.), Emotional intelligence: An international handbook (311—341). Cambridge, MA: Hogrefe & Huber.

A Story About Data

While introducing Justin Wolfers as guest blogger at Marginal Revolution — which I am greatly looking forward to, since Wolfers is an excellent data analyst — Alex Tabbarok wrote:

An open secret and an open sin in economics is that many empirical studies are difficult to replicate, even when journals supposedly require authors to make their data publicly available.

Which reminds me. Several months ago, I read an article in a psychology journal about a topic I care a lot about. The conclusions of the article were the opposite of what I think is the case. Was I wrong? Possibly — but the data analysis done in the article was unquestionably “wrong” in the sense that (a) it assumed something that was unlikely to be true and (b) it was possible to do a data analysis that didn’t make that unlikely assumption. I don’t think my opinion here is controversial; I think a blunt but fair summing up of the situation is that the authors made a big mistake.

I was in New Orleans a few weeks after the article appeared. Someone in an art gallery told me the conclusion of the paper! Which is only to say it is a really interesting conclusion. Anyway, I wrote to the first author of the paper (a graduate student) to explain my concern about their conclusions and to ask for the data, so I could do a better analysis. Two weeks went by, no answer. I sent a reminder email, and got this answer:

We typically do not give out our original data, but when I get a chance, I will run the analyses in HLM and get the results back to you. Thanks for your interest in the study,

Wow! It is the policy of the journal in which the paper was published that the data be made available. A month passed. When do you expect to run these analyses? I wrote. A month passed with no answer. I wrote to the faculty member who was a co-author on the paper. Finally I got an answer from the student:

I have been meaning to respond to your email & I apologize for not getting back to you sooner. I am a graduate student and am traveling for the summer. I understand the difficulty with the [blank] situation and am assuming that HLM would be a good way to work through that. However, I am not familiar with the procedure, so it will not be until late August/ early September when I can get a statistician here at [blank] to teach me the procedure. If you have specific suggestions about the analyses, please let me know and I will keep that in mind when I get a chance to work with it. We should have some follow-up data coming in as well so it will be good to learn the procedures for future research. Thanks for your interest in the study.

The story so far is uncomfortably close to what happened when Saul Sternberg and I questioned Ranjit Chandra‘s data. Similarity 1: He never provided the data. Similarity 2: It took a remarkably long time and several emails to get any response. Similarity 3: The response, when it finally came, was only vaguely reassuring. However, in this case, I predict the better analysis will actually be done. Which is good — I would rather someone else do them.

Interesting Idea about Addiction

From addiction and self-experimentation:

I am coming to believe that [my] addiction may be caused by a specific kind of autism-related syndrome. I don’t crave order in everything that I do but I do crave order and structure in order for me to relate to others. I need to figure out some ways to get that structured social interaction that my brain requires. . . A 12-step meeting could be [seen] as just a highly structured social event.

A friend of mine became an Orthodox Jew in college; his parents were not very religious. Now and then I went to his house for Shabbat. As I got to know him better — outside the religious rituals — I was astonished at the difficulty he had carrying on a conversation. The many structures (rituals) of Orthodox Judaism made it much easier for him to spend time with other people.

The Wikipedia Wars

Speaking of Wikipedia, the LA Times has an interesting article today about what happened when Jimmy Wales — the founder — posted a one-sentence article about a butcher shop on the outskirts of Cape Town. It was deleted quickly — not important enough — but then a big debate ensued. The Times piece turned to the bigger issue:

Perhaps the granddaddy of all the Wikipedia debates is the question of which information deserves to be included, and which doesn’t. So-called Inclusionists believe that because Wikipedia is not bound by the same physical limits as a paper encyclopedia, it shouldn’t have the same conceptual limits either. If there’s room for an article on unreleased Kylie Minogue singles — and a group of people who might find it useful — why not include it? Deletionists, meanwhile, believe that because not all articles are created equal, judicious pruning increases the overall quality of Wikipedia’s information and strengthens its reputation. An encyclopedia, they say, is not just a dumping ground for facts.

While the people who run craigslist try hard to figure out what users want and how to give it to them — starting with the assumption that they themselves do not know — the people who run Wikipedia play God, at least by comparison. In this debate, both sides are playing God. As Aaron Swartz said, it isn’t wise. Jane Jacobs tells a story about a Pennsylvania Girl Scout troop. They were snobs; they made it hard for new members to join (the Wikipedian attitude that Aaron criticized). The girls who couldn’t get in formed their own troop. Several years later the new troop was thriving; the old troop was dying.

Aaron Swartz on What’s Wrong with Wikipedia

I recently asked Aaron Swartz, who has written about Wikipedia and run for its board of directors, what he thought was wrong with it. Three big things, he said:

1. Failure to value new contributors. A small number of insiders are dismissive of and treat poorly newcomers who contribute. For example, their contributions are deleted without explanation. The insiders see the newcomers as a source of trouble rather than strength.

2. Disorganized and underfunded. It took someone Aaron knows two years to make a deal with Wikipedia. The finances are in bad shape.

3. Lack of vision. Wikipedia could be improved in many ways but actual improvements are rare.

He used to see Wikipedia as just a wonderful thing, he said; now he sees it as a wonderful thing that is falling way short of what it could be.

You seem to be saying someone could come along and start a better open-source encyclopedia, I said. That’s unlikely, he said, Wikipedia is so big.

Who does it better? A similar but vastly better-run website is craigslist, he said. A chart of page view rank and number of employees shows Yahoo at #1 with 10,000 employees, TimeWarner at #2 with 90,000, Google at #3 with 10,000, and so on. Craigslist is #7 with 23 employees.

Addendum: Wikipedia, with very few employees, would of course also rank very high on such a chart; this is the magic of both Wikipedia and craigslist and why it makes sense to compare them. The craigslist link I gave, to a Wall Street Journal article, suggests that craigslist values contributors much more than Wikipedia. Here is what happened at a Wikipedia board of directors meeting that Aaron attended a few years ago:

One presentation was by a usability expert who told us about a study done on how hard people found it to add a photo to a Wikipedia page. The discussion after the presentation turned into a debate over whether Wikipedia should be easy to to use. Some suggested that confused users should just add their contributions in the wrong way and a more experienced users would come along to clean their contributions up. Others questioned whether confused users should be allowed to edit the site at all — were their contributions even valuable?

How Much Water Should You Drink?

According to this persuasive non-embeddable video — from a BBC series called The Truth About Food — the answer is don’t worry about it.

They compare two twins. One drinks 2 liters water/day, the other doesn’t drink any water. Not self-experimentation, but close.

I did an experiment in which I drank 5 liters of water/day. I lost a few pounds, not nearly worth the trouble. There was one surprise: Flavors intensified. Every strawberry was the best-tasting strawberry I’d ever had.

Can Professors Say the Truth? (letter from Willow Arune)

Willow Arune, a retired lawyer who has been one of Michael Bailey’s supporters, sent me this email:

Hi Seth,

I have found your exchanges with Deirdre McCloskey rather amusing.

I am one of those transsexual women who supported Bailey. I did so publicly and as a result was subjected to the lies, half-truths and innuendo of Andrea James and Lynn Conway. Even now, the slander is still on both of their web pages. Along the way, I noted that Dr. McCloskey had announced that she would sue Bailey if he dared to suggest that she was “one of those” so I did it for him. Frankly, her autobiography does that to her as well, although she does not use the term

I invited Dr. McCloskey to sue me. I even wrote her lawyer providing an address for service. For one week, I wrote her daily asking her to please, please sue me. Years later, she has still not done so. A shame really as I had lined up a wonderful cast of potential witnesses to provide expert testimony. Truth is always a defence to such silly actions.

Dr. McCloskey has hidden behind the more overt actions of Andrea James and Lynn Conway. Yet she was one with them, an equal participant in the vile and ugly attacks made not only on Bailey but also on other transsexual women who dared to support him. As Dr. Dreger points out, many would not allow their names to be used for fear of attracting attacks from McCloskey’s crew. I also received many letters of support from transsexual women who agreed with Bailey or, at the least, thought the actions of Conway, James and McCloskey were repugnant. None would dare have their comments public for fear of being subjected to the same attacks that had been made against other transsexual women and myself.

Let me give you one example of those actions that McCloskey supported, those actions she says do not cause her shame.

Firstly, I am a rape survivor. Andrea James was well aware or this as we had continued a “back channel” correspondence well after Bailey’s book was published. During 2003, on a public newsgroup, an anonymous writer posted a vile accusation that I was a “registered sex offender”. Not true then or now. Then, on December 24th of that year, I received a post from Andrea James asking me to “confirm or deny” that I was a registered sex offender. In the same post, she threatened to send out “investigators” to look into my past. She justified this action by the broad premise that her end justified any means and that those of us who supported Bailey must have ugly reasons to do so in our past. She would discover those and expose us.

Her web page on me followed, as did another screed from Lynn Conway. Lies, half-truths and innuendo.

This tactic – the no-name post to an e-group or newsgroup – was repeated in the case of the Transkids. It started with a further anonymous post, this one to an e-group on Calpernia Addam’s web site. I first heard of it on an e-group for UK transsexuals and complained to the moderator. In time, thanks to confirmation from other transgendered people, Christine Burns issued a formal apology for spreading the lies, the day after she was awarded an MBE. She had, she stated, relied upon a “usually reliable source” (Andrea James).

The tactic is straight from the McCarthy days. Spread an unfounded accusation and repeat it often so that some will believe it is true. As the writer Patricia Cornwell has recently shown, even one with many financial resources cannot control a slander on the Web.

Ms. James attacked several transsexual women who dared to either support Bailey or Blanchard, or even those who simply wanted to turn down the heat. Each was (and remains) subjected to a web page on Andrea’s site. No wonder few were willing to step out of the trenches. Most transsexual women simply wish to get on with life. They do not wish to be vilified – and outed – on a web page available to anyone with a computer.

If Dr. McCloskey is not ashamed of this type of tactic, as she states, she should be. Instead, she continues to attack you and anyone who dares to express even the slightest question about Blanchard’s theory or the means used by Conway and James to attack Bailey and anyone else who crosses their sights.

It is part of this nasty group to ignore the theory and go personal. In the years since Bailey’s book has been published, I have had few conversations or exchanges about “the theory”. The hate mail that arrives in my mailbox always quotes some of the accusations made by James or Conway about me as a person. James’ screed is copied and posted to some newsgroups on a regular basis by her supporters. Nor, after all this time, have I met any transsexual who has directly suffered as a result of Bailey’s book – and I have asked repeatedly for one to come forward.

The book was published several years ago; I have certainly moved on. I stood up for Dr. Bailey’s right to publish and against the vile and arrogant tactics of Conway, James and McCloskey. I am glad that Dr. Dreger had the courage to expose the facts concerning this matter. As both James and Conway see fit to retain their personal attacks on me on their respective web pages, I can now point to Dr. Dreger’s article as some vindication, certainly as an explanation. In these days when potential employers even check e-groups and such regarding potential employees, slander of the type employed by Conway, James and McCloskey against other transsexual women can have dramatic effect. A dispute over a theory is not a reason to slander a person in the manner employed by these zealots.

Willow Arune

Her blog.

Janet Malcolm on Email

Janet Malcolm is the most divisive (within me) writer I have encountered. I loved The Journalist and the Murderer. A journalistic masterpiece (except for the opening sentence about all journalists being con artists). I wrote her a fan letter about it. I hated In the Freud Archives, her hit piece about Jeffrey Masson. This review of a book about how to write email is not very good, alas. Too obvious. How far the gifted have fallen.

Jeffrey Masson used to live in Berkeley. I visited him while writing an article for Spy about his lawsuit against Malcolm (it never ran). While I was there, he got a phone call from Joe McGinness, the “journalist” of The Journalist and the Murderer.