Assorted Links

Thanks to Greg Pomerantz and Casey Manion.

“Brain Games are Bogus”: More Trouble for Posit Science

A post on the New Yorker website called “Brain Games are Bogus” provides considerable evidence for that conclusion. The evidence is about the use of brain games to raise the IQ of children and young adults, whereas Posit Science’s training program — which I raised questions about — is aimed at older people. However, it would be surprising if brain games have no effect until you reach a certain age. More plausible is that they never provide substantial benefits — at least, benefits broad enough and strong enough and long-lasting enough to be worth the training time (one hour/day for many weeks).

I read a Posit Science paper, with older subjects, that seemed to me to show that its training had little lasting benefit. The stated conclusions of the paper were more positive. Too bad the head of Posit Science didn’t answer most of my questions.

Thanks to Alex Chernavsky.

 

Posit Science: Does It Work? (Continued)

In an earlier post I asked 15 questions about Zelinski et al. (2011) (“Improvement in memory with plasticity-based adaptive cognitive training: results of the 3-month follow-up”), a study done to measure the efficacy of the brain training sold by Posit Science. The study asked if the effects of training were detectable three months after it stopped. Henry Mahncke, the head of Posit Science, recently sent me answers to a few of my questions.

Most of my questions he declined to answer. He didn’t answer them, he said, because they contained “innuendo”. My questions were ordinary tough (or “critical”) questions. Their negative slant was not at all hidden (in contrast to innuendo). For the questions he didn’t answer, he substituted less critical questions. I give a few examples below. Unwillingness to answer tough questions about a study raises doubts about it.

His answers raised more doubts. From his answer to Question 7, I learned that although the investigators gave their subjects the whole RBANS, (a) they failed to report the results from the visual subtests and (b) these unreported results did not support their conclusions. Mahncke says this result was not reported “due to lack of publication space.” The original paper did not say that some results were omitted due to lack of space. I assume all favorable results were reported. To report all favorable results but omit some unfavorable results is misleading.

To further explain the omission, Mahncke says

We used the auditory measures as the primary outcome measure because we hypothesized that cognitive domains [by “cognitive domains” he means the cognitive gains due to training — Seth] would be restricted to the trained sensory domain, in this case the auditory system. [emphasis added]

He doesn’t say he believed the gains would be greater with auditory stimuli, he says he believed they would be restricted to auditory stimuli. The Posit Science website says their training increases “memory”, “intelligence”, “focus” and “thinking speed”. None of these are restricted to the auditory system — far from it. Unless I am misunderstanding something, the head of Posit Science doesn’t believe the main claims of the Posit Science website.

Why Mahncke fails to see a difference between methods (Question 13) and results (Question 14), fails to see a difference between methods (Question 11) and discussion (Question 15), and gives a one-word answer (“yes”) to Question 12, I cannot say. In each case, however, he errs on the side of not answering.

My overall conclusion is that this study does not support Posit Science claims. The main measure (RBANS auditory subtests) didn’t show significant retention. A closely related set of measures (RBANS visual subtests) didn’t show significant retention. A third set of measures (“secondary composite measure”) did show retention, but the p value was not corrected for multiple tests. When the p value is corrected for multiple tests, the secondary composite measure may not show significant retention. Because of the large number of subjects (more than 500), repeated failure to find significant retention under presumably near-optimal conditions (e.g., 1 hour/day of training) suggests that the training effect, after three months without training, is small or zero.

I assume that Posit Science sponsored this study because they believed it was unrealistic for subjects to spend 1 hour/day for the rest of their life doing their training. One hour/day was realistic for a while, yes, but not forever. So subjects will stop. Will the gains last? was the question. Apparently the answer is no.

If Mahncke has any response to this, I will post it.

This is another illustration of why personal science (science done for your own benefit, rather than as a job) is important. Professional scientists are under pressure to get certain results. This study is an example. Mahncke was a co-author. Someone employed by Posit Science is under pressure to get results that benefit Posit Science. (I am not saying Mahncke was affected by this pressure.) A personal scientist is not under pressure to get certain results. For example, if I study the effect of tetracycline (an antibiotic) on my acne, I simply want to know if it helps. Both possible answers (yes and no) are equally acceptable. We may need personal scientists to get unbiased answers.

Here are my original questions along with Mahncke’s answer or lack of answer.

1. Isn’t it correct that after three months there was no longer reliable improvement due to training according to the main measure that was chosen by you (the investigators) in advance? If so, shouldn’t that have been the main conclusion (e.g., in the abstract and final paragraph)?

Not answered.

[Seth: Here is Mahncke’s substitute question: “Why do you conclude that “Training effects were maintained but waned over the 3-month no-contact period” given that the “previously significant improvements became non-significant at the 3-month follow-up for the primary outcome”?”]

2. The training is barely described. The entire description is this: “a brain plasticity-based computer program designed to improve the speed and accuracy of auditory information processing and to engage neuromodulatory systems.” To learn more, readers are referred to a paper that is not easily available — in particular, I could not find it on the Posit Science website. Because the training is so briefly described, I was unable to judge how much the outcome tests differ from the training tasks. This made it impossible for me to judge how much the training generalizes to other tasks — which is the whole point. Why wasn’t the training better described?

Not answered.

[Seth: Here is Mahncke’s substitute question: “Could you describe the training program in more depth, to help judge the similarity between the training exercises and the cognitive outcome measures?”]

3. What was the “ET [experimental treatment] processing speed exercise”?

The processing speed exercise is a time order judgment task in which two brief auditory frequency modulated sweeps are presented, either of which may sweep up or down in frequency. The subject must identify each sweep in the correct order (i.e., up/up, down/down, up/down, down/up). The inter-stimulus interval is adaptively manipulated to determine a threshold for reliable task performance. Note that this is not a reaction time task. The characteristics of the sweeps are chosen to match the frequency modulated sweeps common in stop consonant sounds (like /ba/ or /da/). Older listeners generally show strong correlations between processing speed, speech reception accuracy, and memory; which led us to the hypothesis that improving core processing speed in this way would contribute to improving memory. This approach is discussed extensively in “Brain plasticity and functional losses in the aged: scientific bases for a novel intervention” available at https://www.ncbi.nlm.nih.gov/pubmed/17046669

3. [continue] It sounds like a reaction-time task. People will get faster at any reaction-time task if given extensive practice on that task. How is such improvement relevant to daily life? If it is irrelevant, why is it given considerable attention (one of the paper’s four graphs)?

Not answered.

4. According to Table 2, the CSRQ (Cognitive Self-Report Questionnaire) questions showed no significant improvement in trainees’ perceptions of their own daily cognitive functioning, although the p value was close to 0.05. Given the large sample size (~500), this failure to find significant improvement suggests the self-report improvements were small or zero. Why wasn’t this discussed? Is the amount of improvement suggested by Posit Science’s marketing consistent with these results?

Not answered.

5. Is it possible that the improvement subjects experienced was due to the acquisition of strategies for dealing with rapidly presented auditory material, and especially for focusing on the literal words (rather than on their meaning, as may be the usual approach taken in daily life)? If so, is it possible that the skills being improved have little value in daily life, explaining the lack of effect on the CSRQ?

Not answered.

6. In the Methods section, you write “In the a priori data analysis plan for the IMPACT Study, it was hypothesized that the tests constituting the secondary outcome measure would be more sensitive than the RBANS given their larger raw score ranges and sensitivity to cognitive aging effects.” Do the initial post-training tests (measurements of the training effect soon after training ended) support this hypothesis? Why aren’t the initial post-training results described so that readers can see for themselves if this hypothesis is plausible? If you thought the “secondary outcome measure would be more sensitive than the RBANS” why wasn’t the secondary outcome measure the primary measure?

In a large-scale clinical trial such as IMPACT, it is considered best practice to pick as the primary outcome measure a measure that has been employed in earlier studies. We had used the RBANS in two previous studies (references 8 and 17 in the paper). While we had seen significant results in both studies, it was also clear from those studies that the RBANS had ceiling effects in cognitively intact populations that would limit the statistical sensitivity of the measure. For example, the RBANS list recall measure had 10 words, and a reasonable portion of participants get all 10 correct at baseline, leaving no room for improvement regardless of the efficacy of the intervention. Given that observation, we added measures to the IMPACT study that we hypothesized would be more sensitive. For example, the RAVLT has 15 words, leaving more room for improvement and fewer ceiling effects. [It is unclear that more words = more sensitivity. It depends on the words — Seth] However, since we had not used those measures in previous studies, we decided to define these new measures as secondary outcome measures in the data analysis plan. This issue is discussed in depth in the methods section of the main training effect paper (reference 6), and of course that’s where all of the initial post-training results you mention are described. This improved sensitivity of the secondary outcome measures was quite evident in the post-training data; however for reasons of publication length we did not discuss it in that paper. The comparative data would make an interesting publication, and one that might be helpful to other researchers in this field.

7. The primary outcome measure was some of the RBANS (Repeatable Battery for the Assessment of Neuropsychological Status). Did subjects take the whole RBANS or only part of it? If they took the whole RBANS, what were the results with the rest of the RBANS (the subtests not included in the primary outcome measure)?

Participants took the entire RBANS. We used the auditory measures as the primary outcome measure because we hypothesized that cognitive domains [by “domains” he means “gains” — Seth] would be restricted to the trained sensory domain, in this case the auditory system. Interestingly, there was a significant effect on the overall RBANS measure, however there was no significant effect on a composite of the RBANS visual measures. This interesting result was not included in our papers for reasons of publication length.

[Seth: As I said earlier, a surprising answer.]

8. The data analysis refers to a “secondary composite measure”. Why that particular composite and not any of the many other possible composite measures? Were other secondary composite measures considered? If so, were p values corrected for this?

The measures used were the Rey Auditory Verbal Learning Test total score (sum of trials 1–5) and word list delayed recall, Rivermead Behavioral Memory Test immediate and delayed recall, and Wechsler Memory Scale letter-number sequencing and digit span backwards tests. These measures were chosen a priori as more sensitive than their RBANS cognate measures, and a priori we conservatively chose to integrate all 6 into a single composite measure. Individual test scores are all shown in table 2. This issue is discussed in depth in the methods section of the main training effect paper (reference 6). It’s straightforward to evaluate what the effects shown on other potential composites would be simply from inspecting the individual test data in table 2. In the methods section of the main training effect paper (reference 6), we discuss our approach to multiple comparisons, where we state “A single primary outcome measure (RBANS Memory/ Attention) was predefined to conserve an overall alpha level of 0.05. No corrections for multiple comparisons were made on the secondary measures.” I can see that it would have been helpful to re-iterate that statement in the 2011 paper, and my apologies for the oversight.

[Seth: He doesn’t answer my question “were other secondary measures considered?”]

9. If Test A resembles training more closely than Test B, Test A should show more effect of training (at any retention interval) than Test B. In this case Test A = the RBANS auditory subtests and Test B = the secondary composite measure. In contrast to this prediction, you found that Test B showed a clearer training effect (in terms of p value) than Test A. Why wasn’t this anomaly discussed (beyond what was said in the Methods section)?

Not answered.

10. Were any tests given the subjects not described in this report? If there were other tests, why were their results not described?

All outcome measures performed in the study are reported in the publication.

[Seth: I have no idea how this answer is consistent with (a) the subjects took the visual subtests of the RBANS and (b) the paper fails to report the results of those tests (see answer to Question 7). The paper does not say that the subjects took the visual subtests of the RBANS.]

11. The secondary composite measure is composed of several memory tests and called “Overall Memory”. The Posit Science website says their training will not only help you “remember more” but also “think faster” and “focus better”. Why weren’t tests of thinking speed (different from the training tasks) and focus included in the assessment?

Not answered.

12. Do the results support the idea that the training causes trainees to “focus better”?

Yes.

[Seth: That’s his whole answer.]

13. The Posit Science homepage suggests that their training increases “intelligence”. Was intelligence measured in this study?

At the time we designed IMPACT, we were focused on establishing the effect of the training on memory, as the most common complaint of people with general cognitive difficulties. As IMPACT was in progress, Jaeggi et. al published their very interesting paper on the effect of N-back training on measures of intelligence, where they stated that improving working memory was likely to improve measures of intelligence. It would be quite interesting to repeat the IMPACT study with those or other measures of intelligence, given the improvements in working memory documented in IMPACT. The statement on the Posit Science web page relates to the Jaeggi et. al. paper, given that the Posit training program (BrainHQ) includes N-back training.

13 (continued). If not, why not?

Not answered.

[Seth: In Question 12, Mahncke failed to explain his answer about focus (“yes”) apparently because I left out “if yes, please explain how”. In this question, he dislikes my inclusion of “if not, why not?”]

14. Do the results support the idea that the training causes trainees to become more intelligent?

This question appears to be redundant with 13.

[Seth: Question 13 asked: Was intelligence measured? (A methods question.) This question asked: What about the results? Do they support claims about intelligence? (A results question.)]

15. The only test of thinking speed included in the assessment appears to be a reaction-time task that was part of the training. Are you saying that getting faster on one reaction-time task after lots of practice with that task shows that your training causes trainees to “think faster”?

This question appears to be redundant with 11.

[Seth: Question 11 was a methods question. This is a question about what the results mean — a discussion question. I still have no idea why Posit Science says their training causes trainees to “think faster” or why I should care that their subjects get faster on a laboratory task after lots of practice.]

Consistent- versus Inconsistent-Handed Predicts Better than Right- versus Left-Handed

At Berkeley, Andrew Gelman and I taught a freshman seminar about left-handedness. Half the students were left-handed. We did two fascinating studies with them that found that left-handers tend to have left-handed friends. I kick myself for not publishing those results, which I bring up in conversation again and again.

After the class ended I got a call from a journalist who was writing an article about ridiculous classes. I told him the left-handedness class had value as a way of introducing methodological issues but all I cared about was that his article be accurate. He decided not to include our class in his examples.

Stephen Christman, who got his Ph.D. from Berkeley (and did quirky interesting stuff even as a graduate student), and two colleagues have now published a paper that is a considerable step forward in the understanding of handedness. They argue that what really matters is not direction of handedness but the consistency of it. The terms left-handed and right-handed hide a confounding. Right-handers almost all have very consistent handedness (they do everything with the right hand). In contrast, left-handers much more often have inconsistent handedness: they do some things with the left hand, some with the right. I am a good example. I write with my right hand, bat and throw left-handed, play tennis left-handed, ping-pong right-handed. In fact, I am right-wristed and left-armed. When something involves wrist movement (writing, ping-pong) I use my right hand. When something involves arm movement (batting, throwing a ball, tennis), I use my left hand. Right-handers are much more similar to each other than left-handers.

Christman and his co-authors point to two things: 1. When you can get enough subjects to unconfound the two variables, it turns out that consistency of handedness is what makes the difference. Consistent left-handers resemble consistent right-handers. 2. Consistency of handedness predicts many things. Inconsistent-handers are less authoritarian than consistent-handers. They show more of a placebo effect. They have better memory for paragraphs. And on and on — about 20 differences. It isn’t easy to say what all these differences have in common but maybe inconsistent-handers are more flexible in their beliefs. (Which would explain the friendship findings in our handedness class.)

I think about these differences as another example of how every economy needs diversity and our brains have been shaped to provide it, one idea underlying my theory of human evolution. Presidents of the United States are left-handed much more than the general population. For example, Obama is left-handed. The difference between Presidents and everyone else is overwhelming and must mean something. Yet left-handers die younger. I would say that in any group of people you need a certain fraction, not necessarily large, to be open-minded and realistic. That describes inconsistent-handers (who are usually left-handed). These people make good leaders because they will respond to changing conditions. People who are not open-minded make good followers. Just as important as realism is cooperation, ability to work together toward a common goal.

 

Posit Science: More Questions

Posit Science is a San Francisco company, started by Michael Merzenich (UCSF) and others, that sells access to brain-training exercises aimed at older adults. Their training program, they say, will make you “remember more”, “focus better”, and “think faster”. A friend recently sent me a 2011 paper (“Improvement in memory with plasticity-based adaptive cognitive training: results of the 3-month follow-up” by Elizabeth Zelinski and others, published in the Journal of the American Geriatrics Society) that describes a study about Posit Science training. The study asked if the improvements due to training are detectable three months after training stops. The training takes long enough (1 hour/day in the study) that you wouldn’t want to do it forever. The study appears to have been entirely funded by Posit Science.

I found the paper puzzling in several ways. I sent the corresponding author and the head of Posit Science a list of questions:

1. Isn’t it correct that after three months there was no longer reliable improvement due to training according to the main measure that was chosen by you (the investigators) in advance? If so, shouldn’t that have been the main conclusion (e.g., in the abstract and final paragraph)?

2. The training is barely described. The entire description is this: “a brain plasticity-based computer program designed to improve the speed and accuracy of auditory information processing and to engage neuromodulatory systems.” To learn more, readers are referred to a paper that is not easily available — in particular, I could not find it on the Posit Science website. Because the training is so briefly described, I was unable to judge how much the outcome tests differ from the training tasks. This made it impossible for me to judge how much the training generalizes to other tasks — which is the whole point. Why wasn’t the training better described?

3. What was the “ET [experimental treatment] processing speed exercise”? It sounds like a reaction-time task. People will get faster at any reaction-time task if given extensive practice on that task. How is such improvement relevant to daily life? If it is irrelevant, why is it given considerable attention (one of the paper’s four graphs)?

4. According to Table 2, the CSRQ (Cognitive Self-Report Questionnaire) questions showed no significant improvement in trainees’ perceptions of their own daily cognitive functioning, although the p value was close to 0.05. Given the large sample size (~500), this failure to find significant improvement suggests the self-report improvements were small or zero. Why wasn’t this discussed? Is the amount of improvement suggested by Posit Science’s marketing consistent with these results?

5. Is it possible that the improvement subjects experienced was due to the acquisition of strategies for dealing with rapidly presented auditory material, and especially for focusing on the literal words (rather than on their meaning, as may be the usual approach taken in daily life)? If so, is it possible that the skills being improved have little value in daily life, explaining the lack of effect on the CSRQ?

6. In the Methods section, you write “In the a priori data analysis plan for the IMPACT Study, it was hypothesized that the tests constituting the secondary outcome measure would be more sensitive than the RBANS given their larger raw score ranges and sensitivity to cognitive aging effects.” Do the initial post-training tests (measurements of the training effect soon after training ended) support this hypothesis? Why aren’t the initial post-training results described so that readers can see for themselves if this hypothesis is plausible? If you thought the “secondary outcome measure would be more sensitive than the RBANS” why wasn’t the secondary outcome measure the primary measure?

7. The primary outcome measure was some of the RBANS (Repeatable Battery for the Assessment of Neuropsychological Status). Did subjects take the whole RBANS or only part of it? If they took the whole RBANS, what were the results with the rest of the RBANS (the subtests not included in the primary outcome measure)?

8. The data analysis refers to a “secondary composite measure”. Why that particular composite and not any of the many other possible composite measures? Were other secondary composite measures considered? If so, were p values corrected for this?

9. If Test A resembles training more closely than Test B, Test A should show more effect of training (at any retention interval) than Test B. In this case Test A = the RBANS auditory subtests and Test B = the secondary composite measure. In contrast to this prediction, you found that Test B showed a clearer training effect (in terms of p value) than Test A. Why wasn’t this anomaly discussed (beyond what was said in the Methods section)?

10. Were any tests given the subjects not described in this report? If there were other tests, why were their results not described?

11. The secondary composite measure is composed of several memory tests and called “Overall Memory”. The Posit Science website says their training will not only help you “remember more” but also “think faster” and “focus better”. Why weren’t tests of thinking speed (different from the training tasks) and focus included in the assessment?

12. Do the results support the idea that the training causes trainees to “focus better”?

13. The Posit Science homepage suggests that their training increases “intelligence”. Was intelligence measured in this study? If not, why not?

14. Do the results support the idea that the training causes trainees to become more intelligent?

15. The only test of thinking speed included in the assessment appears to be a reaction-time task that was part of the training. Are you saying that getting faster on one reaction-time task after lots of practice with that task shows that your training causes trainees to “think faster”?

Update: Henry Mahncke, the head of Posit Science, said that he would be happy to answer these questions by phone. I replied that I was sure many people were curious about the answers and written answers would be much easier to share.

Further update: Mahncke replied that he would prefer a phone call and that some of the questions seemed to him hard to answer in writing. He said nothing about the sharing problem. I repeated my belief that many people are interested in the answers and that a phone call would be hard to share. I offered to rewrite any questions that seemed hard to answer in writing.

Earlier questions for Posit Science.

 

Assorted Links

Thanks to Paul Nash, Grace Liu and Anne Weiss.

Positive Psychology Talk by Martin Seligman at Tsinghua University

Here at Tsinghua University, the Second Annual Chinese International Conference on Positive Psychology has just begun. The first speaker was Martin Seligman, a professor at the University of Pennsylvania and former president of the American Psychological Association (the main professional group of American psychologists). Seligman is more responsible for the Positive Psychology movement than anyone else. Here are some things I liked and disliked about his talk.

Likes:

1. Countries, such as England, have started to measure well-being in big frequent surveys (e.g., 2000 people every month) and some politicians, such as David Cameron, have vowed to increase well-being as measured by these surveys. This is a vast improvement over trying to increase how much money people make. The more common and popular and publicized this assessment becomes — this went unsaid — the more powerful psychologists will become, at the expense of economists. Seligman showed a measure of well-being for several European countries. Denmark was highest, Portugal lowest. His next slide showed the overall result of the same survey for China: 11.83%. However, by then I had forgotten the numerical scores on the preceding graph so I couldn’t say where this score put China.

2. Work by Angela Duckworth, another Penn professor, shows that “GRIT” (which means something like perseverance) is a much better predictor of school success than IQ. This work was mentioned in only one slide so I can’t elaborate. I had already heard about this work from Paul Tough in a talk about his new book.

3. Teaching school children something about positive psychology (it was unclear what) raised their grades a bit.

Dislikes:

1. Three years ago, Seligman got $125 million from the US Army to reduce suicides, depression, etc. (At the birth of the positive psychology movement, Seligman proclaimed that psychologists spent too much time studying suicide, depression, etc.) I don’t mind the grant. What bothered me was a slide used to illustrate the results of an experiment. I couldn’t understand it. The experiment seems to have had two groups. The results from each group appeared to be on different graphs (making comparison difficult, of course).

2. Why does a measure of well-being not include health? This wasn’t explained.

3. Seligman said that a person’s level of happiness was “genetically determined” and therefore was difficult or impossible to change. (He put his own happiness in “the bottom 50%”.) Good grief. I’ve blogged several times about how the fact that something is “genetically-determined” doesn’t mean it cannot be profoundly changed by the environment. Quite a misunderstanding by an APA president and Penn professor.

4. He mentioned a few studies that showed optimism (or lack of it) was a risk factor for heart disease after you adjust for the traditional risk factors (smoking, exercise, etc.). There is a whole school of “social epidemiology” that has shown the importance of stuff like where you are in the social hierarchy for heart disease. It’s at least 30 years old. Seligman appeared unaware of this. If you’re going to talk about heart disease epidemiology and claim to find new risk factors, at least know the basics.

5. Seligman said that China had “a good safety net.” People in China save a large fraction of their income at least partly because they are afraid of catastrophic medical costs. Poor people in China, when they get seriously sick, come to Beijing or Shanghai for treatment, perhaps because they don’t trust their local doctor (or the local doctor’s treatment failed). In Beijing or Shanghai, they are forced to pay enormous sums (e.g., half their life’s savings) for treatment. That’s the opposite of a good safety net.

6. Given the attention and resources and age of the Positive Psychology movement, the talk seemed short on new ways to make people better off. There was an experiment with school children where the main point appeared to be their grades improved a bit. A measure of how they treat each other also improved a bit. (Marilyn Watson, the wife of a Berkeley psychology professor, was doing a study about getting school kids to treat each other better long before the Positive Psychology movement.) There was an experiment with the U.S. Army I couldn’t understand. That’s it, in a 90-minute talk. At the beginning of his talk Seligman said he was going to tell us things “your grandmother didn’t know.” I can’t say he did that.

 

 

Posit Science: Does It Help?

Tim Lundeen pointed me to the website of Posit Science, which sells ($10/month) access to a bunch of exercises that supposedly improve various brain functions, such as memory, attention, and navigation. I first encountered Posit Science at a booth at a convention for psychologists about five years ago. They had reprints available. I looked at a study published in the Proceedings of the National Academy of Sciences. I was surprised how weak was the evidence that their exercises helped.

Maybe the evidence has improved. Under the heading “world class science” the Posit Science website emphasizes a few of the 20-odd published studies. First on their list of “peer-reviewed research” is “the IMPACT study”, which has its own web page.

With 524 participants, the IMPACT study is the largest clinical trial ever to examine whether a specially designed, widely available cognitive training program significantly improves cognitive abilities in adults. Led by distinguished scientists from Mayo Clinic and the University of Southern California, the IMPACT study proves that people can make statistically significant gains in memory and processing speed if they do the right kind of scientifically designed cognitive exercises.

The study compared a few hundred people who got the Posit Science exercises with a few hundred people who got an “active control” treatment that is poorly described. It is called “computer-based learning”. I couldn’t care less that people who spend an enormous amount of time doing laboratory brain tests (1 hour/day, 5 days/week, 8-10 weeks) thereby do better on other laboratory brain tests. I wanted to know if the laboratory training produced improvement in everyday life. This is what most people want to know, I’m sure. The study designers seem to agree. The procedure description says “to be of real value to users, improvement on a training program must generalize to improvement on real-world activities”.

On the all-important question of real-world improvement, the results page said very little. I looked for the published paper. I couldn’t find it on the website. Odd. I found it on Scribd.

Effect of the training on real-world activities was measured like this:

The CSRQ-25 consists of 25 statements about cognition and mood in everyday life over the past 2 weeks, answered using a 5-point Likert scale.

Mood? Why was that included? In any case, the training group started with an average score of 2.23 on the CSRQ-25. After training, they improved by 0.07. (Significantly more than the control group.) Not only is that a tiny improvement (percentage-wise) it is unclear what it means. The measurement scale is not well-described. Was the range of possible answers 1 to 5? Or 0 to 4? What does 2 mean? What does 3 mean? It is clear, however, that on a scale where the greatest possible improvement was either 1.23 (assuming 1 was the best possible score) or 2.23 (assuming 0 was the best possible score), the actual improvement was 0.07. Not much for 50-odd hours of practice. Although the website seems proud of the large sample size (“largest clinical trial ever”), it is now clear why it was so large: With a smaller sample the tiny real-world improvement would have been undetectable. Because the website treats this as the best evidence, I assume the other evidence is even less impressive. The questions about mood are irrelevant to the website claims, which are all about cognition. Why weren’t the mood questions removed from the analysis? It is entirely possible that, had the mood questions been removed, the training would have produced no improvement.

The first author of the IMPACT study is Glenn Smith, who works at the Mayo Clinic. I emailed him to ask (a) why the assessment of real-world effects included questions about mood and (b) what happens if the mood questions are removed. I predict he won’t answer. A friend predicts he will.

More questions for Posit Science

Kahneman Criticizes Social Psychologists For Replication Difficulties

In a letter linked to by Nature, Daniel Kahneman told social psychologists that they should worry about the repeatability of what are called “social priming effects”. For example, after you see words associated with old age you walk more slowly. John Bargh of New York University is the most prominent researcher in the study of these effects. Many people first heard about them in Malcolm Gladwell’s Blink.

Kahneman wrote:

Questions have been raised about the robustness of priming results. The storm of doubts is fed by several sources, including the recent exposure of fraudulent researchers [who studied priming], general concerns with replicability that affect many disciplines, multiple reported failures to replicate salient results in the priming literature, and the growing belief in the existence of a pervasive file drawer problem [= studies with inconvenient results are not published] that undermines two methodological pillars of your field: the preference for conceptual over literal replication and the use of meta-analysis.

He went on to propose a complicated scheme by which Lab B will see if a result from Lab A can be repeated, then Lab C will see if the result from Lab B can be repeated. And so on. A non-starter, too complex and too costly. What Kahneman proposes requires substantial graduate student labor and will not help the grad students involved get a job — in fact, “wasting” their time (how they will see it) makes it harder for them to get a job. I don’t think anyone believes grad students should pay for the sins of established researchers.

I completely agree there is a problem. It isn’t just social priming research. You’ve heard the saying: “1. Fast. 2. Cheap. 3. Good. Choose 2.” When it comes to psychology research, “1.True. 2. Career. 3. Simple. Choose 2.” Overwhelmingly researchers choose 2 and 3. There isn’t anything wrong with choosing to have a career (= publish papers) so I put a lot of blame for the current state of affairs on journal policies, which put enormous pressure on researchers to choose “3. Simple”. Hardly any journals in psychology publish (a) negative results, (b) exact replications, and (c) complex sets of results (e.g., where Study 1 finds X and apparently identical Study 2 does not find X). The percentage of psychology papers with even one of these characteristics is about 0.0%. You could look at several thousand and not find a single instance. My proposed solution to the problem pointed out by Kahneman is new journal policies: 1. Publish negative results. 2. Publish (and encourage) exact replications. 3. Publish (and encourage) complexity.

Such papers exist. I previously blogged about a paper that emphasized the complexity of findings in “choice overload” research — the finding that too many choices can have bad effects. Basically it concluded the original result was wrong (“mean effect size of virtually zero”), except perhaps in special circumstances. Unless you read this blog — and have a good memory — you are unlikely to have heard of the revisionist paper. Yet I suspect almost everyone reading this has heard of the original result. A friend of mine, who has a Ph.D. in psychology from Stanford, told me he considered Sheena Iyengar, the researcher most associated with the original result, the greatest psychologist of his generation. Iyengar wrote a book (“The Art of Choosing”) about the result. I found nothing in it about the complexities and lack of repeatability.

Why is personal science important? Because personal scientists — people doing science to help themselves, e.g., sleep better — ignore 2. Career and 3. Simple.

Our Niche in Life

A Chinese teacher in Los Angeles named Yang Yang, whom you can see in this video, wrote this on her website:

I believe that we all have our own niche – something so unique and innate to us that we enjoy every second of it and can naturally do better than others. Teaching Chinese is my niche.

I think this is the beginning of wisdom about human diversity — a big improvement over judging people by how “smart” they are, as so often happens. (To a college professor, smart = able to imitate a college professor.) My theory of human evolution emphasizes the need for diversity of occupations. In ancient times, occupational diversity arose because different people enjoyed doing different things.

But I also think Yang Yang is wrong in two ways. First, I don’t think your niche is innate. I think it can be changed. I think we can come to enjoy and excel at many jobs that we do not enjoy at first. This is the other side of procrastination. Just as we dislike doing things simply because we haven’t done them in a long time, we like doing things simply because we did them yesterday. Habits are pleasant.

I also think that where you fall on a pro-status-quo/anti-status-quo (conformist/rebel) dimension is not innate. I think it has a lot to do with your birth order (first-borns are more pro-status-quo), as Frank Sulloway says in Born to Rebel. I didn’t read Amy Chua’s Battle Hymn of the Tiger Mother expecting to think about birth order and rebelliousness but that’s what I ended up thinking about.