What Happens If I Stop Grading?

I believe two things about teaching:

1. The best way to learn is to do. From an article by Paul Halmos about teaching math. I began self-experimentation to learn how to do experiments.

2. Everyone’s different. My theory of human evolution says we changed in many ways to facilitate trading. (For example, language began as advertising.) The more diverse the expertise within a group, the more members of the group can benefit from trade. Following this logic, mechanisms evolved to increase diversity of expertise among people living in the same place with the same genes. (For example, a mechanism that causes procrastination.) The theory implies that there is something inside every student that pushes them toward expertise — they want to learn — but they are being pushed in many different directions — what they want to learn varies greatly. If you accommodate the latter (diversity in what students want to learn), you can take advantage of the former (an inner drive to learn).

The novelty is #2 — the idea that #2 is relevant to teaching. Human nature: People who are the same want to be different. Formal education: People who are different should be the same. At Berkeley, most professors appeared to have little idea of the diversity of their students. (At least I didn’t, until I gave assignments that revealed it.) Almost all classes treated all students in a class the same: same lectures, same assignments, same tests, same grading scheme. I heard dozens of talks about how to teach. Supporting or encouraging individuality never came up. Now and then I told other professors these ideas — at a party, for example. “Everyone’s different, but our classes treat everyone the same,” I’d say. No one agreed. It was a new and apparently distasteful idea. Too much work was one response.

I believed my theory of human evolution partly because it explained what I saw with my students (Berkeley psychology majors in undergraduate seminars): The more freedom I gave them, the more they learned. I gave them great freedom with their term project (except I forced them to do it off-campus). That worked fine. One student had an intense fear of public speaking. Her project: give a talk to a high school class. She succeeded. “What did I learn? I learned that if I have to, I can conquer my fears,” she wrote. I wrote an article about it. I taught a whole class where the students (all 10 of them) were given great freedom to do something off campus. That worked, too. But the class was too niche and the term project too small. It wasn’t obvious if the ideas would work in an ordinary class.

The more freedom I gave my students, the more difficult it became to grade them. At Tsinghua I teach a required class for freshman psychology majors called Frontiers of Psychology. There are 20-30 students. It covers recent research. For the first few years, I had students write comments on the reading. “Write something only you could write,” I said. The students struggled to figure out what that meant. I struggled to grade their answers.

Before last semester began, I had an idea: no grading. Maybe other sources of motivation, would be enough.

Last semester, my Frontiers class had two parts:

1. Reading. During this section, they read a variety of things: recent experimental papers (e.g., from Psychological Science), book excerpts (e.g., from The Man Who Would Be Queen) where I said “read any 60 pages you want”, and my long self-experimentation paper (“read any third you want”). This taught them how to do research, not just subject-matter content. A typical assignment included a class presentation. For example, each student read a different experimental paper (they chose) and gave a presentation about it. Another assignment involved an in-class debate. I discussed the readings — for example, the controversy around The Man Who Would Be Queen — and gave feedback on presentations but rarely lectured. The main lecture I gave was at the beginning to explain the course. This part of the course resembled a traditional course, except (a) no grades, no tests, (b) many class presentations (public speaking is an important skill), and (c) lots of choice in what they read.

2. Doing. This section had two parts: (a) a short (2 week) experiment where they tested the effect of whatever they wanted (chocolate, piano music, exercise, and naps of different lengths were some choices) on brain function measured by a reaction-time test written in R. They gave presentations about their results (I regret not requiring written reports). (b) a long project (4-5 weeks) where they could study whatever psychological topic they wanted. It might or might not involve data collection. The topics they chose to study included dreams, procrastination, the perception of psychologists, fujoshi, the relative femininity of different sports, the accuracy of first impressions, different ways of teaching English, comparison of Tsinghua students and Peking University students (the top two universities in China, with stereotypically different students), cognition in native versus non-native language, reading screens versus reading books, and positive psychology. They could work in groups or by themselves. They had to get my approval for what they did so that they wouldn’t try to do too much or too little. At the end they wrote a report and gave a class presentation. I met with each student or group of students individually to discuss their work, usually for 30-60 minutes. During these discussions they provided evidence (e.g., photographs, recordings) that they had done what they said.

I did give grades (I was required to) but they were minimal. The final grade was entirely based on the final project. I divided each project into parts (e.g., background research, data collection, class presentation) and gave each part a point value such that the points add up to 96 (= A). If you finish Part X, you get the associated points. (Everyone completed all parts.) If they did really well I gave them slightly more points (e.g., 97). If they failed in some serious way I gave them slightly fewer (e.g., 94). So grading was close to binary: yes or no. You could get a good grade simply by doing what you said you would do.

It was the most pleasant teaching experience of my life. It was also the easiest by far, in contrast to my Berkeley colleagues’ claim that my ideas led to “too much work.” The hours I had spent every week grading homework in previous versions of the course — the part of the course I liked least — was gone. At the end of the class, I spent many hours discussing the student projects, but I enjoyed these discussions. They didn’t feel like work. The students had chosen topics they wanted to study and seemed happy to talk about what they had done. Unlike an oral exam, almost nothing was riding on what they told me and they could be proud of what they were talking about, since it was almost entirely their idea.

The students’s work was the highest quality I have ever seen. Two of their final projects might be publishable. (And these are first-semester freshmen.) It’s not my field, so I can’t be sure, but they have great inherent interest and no obvious flaws. The students seemed to like the class, too. On the final day, which happened to be Christmas, they gave me a Christmas card signed by everyone in the class. One student gave me a card separately. “Thank you,” I said. “Why did you give me this?” Among other things, she said I had high standards. Given the absence of grades, that was interesting. Maybe it came from the fact that after every presentation, I would point out something I liked and something I thought could be better. I tried to do that with all of my feedback. Another student told me, after the final class, that what I had said about “the best way to learn is to do” was, in her case, very true. She said she had learned more in my class than in all her other classes put together.

There were about 25 students and 12 assignments = 300 (= 25 x 12) assignments total. There were about 4 instances where a student did not do an assignment. In other words, the students did the assignments 99% of the time although there was no obvious penalty for not doing an assignment. Had I given grades, I might have gotten 100% compliance rather than 99%. To use a costly (in terms of time and student anxiety) grading scheme to get a 1% improvement in compliance is absurd. Yet that may be what most professors are doing — at least, my experience suggests they could get very high compliance without expensive grading.

I think this class worked well for both my students and me because it contained several elements: 1. A “core curriculum” (recent psychological research) taught in several different ways. 2. Good-quality materials. For example, The Man Who Would Be Queen is much better than what psychology students typically read. One student told me she read the whole book even though only a third of it was assigned. 3. Plenty of doing. A class presentation counts as doing. 4. Plenty of student choice. 5. Absence of grading, which has bad side effects.

I think several things caused students to learn a lot: 1. The material was interesting. 2. To some extent — far more than in other classes — they could choose what they wanted to learn, especially during the second half of the class. 3. Peer pressure. They wanted to look good in front of their peers. It would have been embarrassing to not be able to do a presentation when called upon. 4. The instinct of workmanship. Thorstein Veblen wrote a book called The Instinct of Workmanship. People inherently want to do a good job, said Veblen. I agree. 5. Doing is fun.

Would this work with other students? My students were/are very smart, yes. Tsinghua is extremely hard to get into and entrance is mostly based on a standardized test. My students, in other words, did very well under the usual system of teaching. This can be interpreted two ways: (a) They like the usual way of teaching, it fits them (they succeeded because of the usual methods) or (b) like everyone else, they dislike the usual way of teaching but unlike everyone else figured out how to learn on their own. The first interpretation suggests that my students would benefit less than other students from the novelty of my approach. The second interpretation suggests they would benefit more. What is clear is that Tsinghua students are known for studying very hard — yet my class required no studying beyond reading and understanding.

What did I learn? I learned that I can stop grading and things get much better, not worse. I learned that motivations other than grades are plenty powerful.

24 thoughts on “What Happens If I Stop Grading?

  1. John Holt eventually became a leader of the home schooling movement because he noticed that third and fourth graders were so distracted by grading that they were just inventing arithmetic answers instead of thinking.
    An Alexander Technique teacher told me that he graded his college courses on attendance because he found that people couldn’t let themselves more easily if they were thinking about grades.
    The Alexander Technique was invented by F.M. Alexander about a century ago because he was an actor who’d lost the ability to speak. After much self-observation in three-way mirrors, he found that he was pulling his head down and back before he started to speak. After more self-experimentation, he found a way to not pull his head down and back, and found that his general functioning improved, including clearing up some breathing problems he’d had since childhood.

    Seth: I taught a class at Berkeley with a woman who had benefited greatly from Alexander Technique. She told me about his emphasis on self-observation. I haven’t heard of grading college courses on attendance. I didn’t take attendance. Sometimes students are sick. It seems unfair to reduce their grade because of this. In any case I had no problems with attendance.
  2. Yay!
    An independent discovery of progressive teaching!
    Seth: I looked it up:progressive education. Yes, two of the 14 elements describing progressive education (“Emphasis on learning by doing” and “Highly personalized education”) are exactly what I was aiming for.
  3. IMO the resistance to individualization is not coincidental. As Terry Gilliam said, “Usually you spot how societies work by what they glorify: it’s usually the thing they’re deficient in.”
  4. I know from past posts that Seth is not so much a fan of Popper… I am a huge fan of Popper. Popper claimed that we only learn new things via trial and error, and I very much agree with this. Because of this, I think that grades can only hurt true learning (how can one learn from making errors if every error is counted against you?). Our daughter attends a Quaker school where they give no grades and take no standardized tests (all the way through high school), and they seem to teach mostly via trial and error. I love this.
    Seth: I was under the impression that a lot of learning is imitation. If someone else has learned X (can be anything, e.g., how to drive a car, what is the capital of Texas), you can learn X by imitating that person. If no one has learned X, then you will have to resort to trial and error.

    I did not know that there are Quaker schools that teach without grades. Can you give a link to more information about this?
  5. You missed the other, correct possibility.
    Very intelligent and conscientious people are selected for by Tsinghua admissions system. These people are intellectually curious, and will produce higher quality work when given freedom. The same is not true of other people, who will slack.
    Seth: That’s what I meant with my second possibility. Your comment is very interesting because you propose some connection between very intelligent, conscientious, and intellectually curious. Maybe you have something there. Tsinghua admissions on the face of it does select for people who are very intelligent and conscientious, as you say. Why such people should be unusually “intellectually” curious is not obvious. I’ve heard it said that high curiosity leads to high intelligence, which makes some sense. Nothing was said about conscientiousness. The saying “genius is an infinite capacity for taking pains” echoes your connection of conscientiousness and intelligence, without mentioning curiosity.
  6. > Very intelligent and conscientious people are selected for by Tsinghua admissions system. These people are intellectually curious, and will produce higher quality work when given freedom.
    IQ and (Big Five) Conscientiousness are very weakly positively, or actually negatively correlated. Big Five Openness, a pretty much exact math to ‘intellectually curious’, has only a weak correlation with IQ (IIRC 0.2).
    Given the extreme selection from the Chinese population to produce the Tsinghua student body, using tests that test pretty much only IQ & Conscientiousness from every description I’ve heard of them, there’s no reason to expect an extreme level of Openness in the student body.
    So perhaps Roberts is merely seeing what a little freedom looks like when applied to the most elite & capable. I’d question whether they performed as well in that respect as comparable students from Harvard or Oxford, except I think Tsinghua is more selective than either…
  7. Hi seth!
    I just happened upon your blog through an internet search…and I wasn’t sure how to contact you about a post you had written, but you had closed the comments on, so I apologize.
    Let me tell you just a wee bit about myself. I am a mom of 6 who returned to school as a biology major with the inclination to teach secondary education. I was diagnosed celiac a year ago and have had digestive issues most of my life. I have found myself leaning more toward paleo eating and am intrigued by fermented foods to heal the gut. So here is my question…in one of your posts you discuss why you believe these bacterium are good for our guts and why the human race tends toward these flavours…but why does our stomach acid not kill these bacterium? Why is it all not killed prior to hitting the intestines?
    Feel free to email be directly, should you wish. I am in the beginning stages of food fermentation, beginning with kombucha and on my way to sauerkraut and traditional polish borscht.
    Thanks in advance!!
    Alena
    Seth: Stomach acid kills only what is on the surface. We do not atomize our food in our mouths. It goes into our stomach in lumps. What’s inside the lumps is safe from the acid.
  8. When students are self-motivated, it is possible to do away with many of the “standard” resources available to teachers, including grades, exams, etc, even formal teaching/lecturing can go.
    When students are not self-motivated, the responsible teacher cannot dismiss any resource out of hand.
    And, let’s be honest, some students are motivated by grades. As a test, give lowish grades to your students at Tsinghua and see how they perform afterwards when they see their efforts ignored and unrewarded in their transcripts.
  9. Do you see any element of this I could apply in my 530-student Data Structures class (now in progress at Berkeley)?
    Seth: Maybe next time I’m in Berkeley we can talk about this over coffee. My short answer is: 1. Give students substantial length of time (such as 4 weeks) for a final project, which can be anything they like involving the course material. Have teaching assistants vet each project for appropriate degree of difficulty. The final project will get an A so long as the student completes each of the promised elements. Each student gives a progress report to the class each week in section. No lectures during this period. Lecture time devoted to the best progress reports. 2. Make the final project involve improving the data structure for someone on campus (student, professor, staff). I don’t know if this is feasible. 3. Make the final project involve improving the data structure for someone off campus (e.g., parents, friends of parents). Again, I don’t know if this is feasible. The general idea is to introduce freedom (option 1) and realism (options 2 and 3) and reduce worry about grades (option 1).
  10. Of course it’s true that simple facts (e.g., the Capitol of Texas) can be learned through imitation or memorization. However, I would submit that one cannot learn _to do_ anything (e.g., drive a car) except through trial and error (hopefully, fairly minor errors). You can learn that the ignition does X, the accelerator does Y, and brakes do Z through imitation or memorization, but you can’t learn not to over-steer or over-break except by doing it and correcting your (hopefully small) errors.
    Anyway, here’s a Quaker school link, note in particular the section on evaluation:https://www.cfsnc.org/page.cfm?p=421
    You might also be interested in the overall philosophy, as it’s very close to some of what you’ve written here:https://www.cfsnc.org/page.cfm?p=362
    Seth: Thanks for the links. The part about evaluation says in part: “Student evaluation at CFS is constant and thorough and is not reduced to letter grades.” This is not a good description of my approach because I was not “constant and thorough”. At the very end, I gave a lot of feedback about term projects. Before that, I gave considerably less feedback. After a class presentation, I would point out one thing I thought was good and one thing where there was room for improvement. I did not give each presentation a thorough critique. That would have taken too long and been overwhelming.
  11. The teacher believed he had to grade on something– for administrative reasons, I think, and since there were no other Alexander teachers in town and he believed that anyone who showed up would learn, he graded on attendance. He mentioned another Alexander teacher who graded on class notes.
    I don’t know why (I thought the class was very good), but it was down to four students by the end.
    The reason I brought up the story is that a skilled person concluded that students weren’t at their best if they were being graded.
    Any suggestions for other systems developed from self-observation and experimentation? Offhand, I can think of Bob Cooley’s resistance stretching, Gerda Alexander’s Eutonics (no relation FM Alexander), and Kenny Werner’s Effortless Mastery (cultivating efficiency and intensity for playing music).
    Txomin, you’re arguing that students can be demotivated by lowish grades, which is not the same thing as saying that they will be motivated by a chance at high grades.

    Seth: Thanks for clarifying that. To answer your question (“any suggestions…”) lots of weight-loss diets have been developed by self-observation and experimentation. South Beach, for example.
  12. I’ve been using a variation of this in some of my Computer Science courses at Bowdoin for years. The longer I’m here, the more I’m convinced that the specific material in upper level courses is not particularly important, all that matters is student engagement. When students are engaged they are learning machines and mostly my job is to stay out of their way.
    Seth: I’d like to hear more about what you do.
  13. Paragraph six of the Halmos article says:
    “Having stated this extreme position, I’ll rescind it immediately. I know that it is extreme, and I don’t really mean it—but I wanted to be very emphatic about not going along with the view that learning means going to lectures and reading books. If we had longer lives, and bigger brains, and enough dedicated expert teachers to have a student/teacher ratio of 1/1, I’d stick with the extreme views—but we don’t.”
    So Halmos is talking about an ideal. Jonathan Shewchuk’s question (above) gets to the heart of issue.
    Seth: Halmos’s “extreme position” is no use of books and lectures. He recognizes that is impossible. I don’t agree with Halmos’s extreme position. I had few lectures but plenty of reading. I gave in-class public feedback on student presentations (all students could hear what I said about Student X’s presentation), which vaguely resembles lecturing. The essence of what I’m saying doesn’t have much to do with Halmos. It is that a desire to individualize education (= make it different for each student) led me to stop grading. When I did so, things were fine. My students, it turned out, had plenty of motivation from other sources. Grading is so time-consuming and irksome for teachers that this is an interesting outcome.
  14. “IQ and (Big Five) Conscientiousness are very weakly positively, or actually negatively correlated. Big Five Openness, a pretty much exact match to ‘intellectually curious’, has only a weak correlation with IQ (IIRC 0.2).”
    So do horoscopes. Who cares? Self reporting is BS to make work for psych profs.
    Seth: “So do horoscopes”? Your point is unclear. Are you saying that intellectual curiosity (= openness) as measured by self-report personality tests, doesn’t correlate with anything important? If so, that’s an interesting and surprising claim, which I haven’t heard before. What do you base it on?
  15. Seth, I think more than 2 out of 14 things match between your insight and progressive education. I am pointing this out not to be a stickler but because I sense a kinship of approach – you might learn from them and them from you.
    Also – I have taught a few classes – unfortunately lecture style – and have to say that it’s not strictly correct that one assumes that all students learn the same way. I always tried to explain things many different ways to cover different ways of seeing things and I always was open to questions and thinking things through with students. That being said I agree your approach is better. I didn’t have the authority to make those decisions though.
    Seth: What are the other similarities between what I do and progressive education? I looked again atthat list of 14 characteristics of progressive educationand failed to find more similarities. Unless you mean an emphasis on projects and “varied learning resources”.
  16. “Are you saying that intellectual curiosity (= openness) as measured by self-report personality tests, doesn’t correlate with anything important?”
    Yes. And I am saying that all 5 Big 5 self-report dimensions are hopelessly flawed. Taking people’s self-descriptions at face value is not an intelligent way to practice psychology, even if it does permit large scale data collection. It is like studying cars by color. There will be some semi-interesting correlations, but nothing actually reliable.
    The problem is that there are radically different types of people who describe themselves in similar ways, or in ways opposite to or orthogonal to what they actually are compared to the general population. The “publishable results” system has led to an explosion in worthless statistically-significant results. As a result, psychology has moved further and further away from anything resembling an intelligent grasp of human nature.
    Psychologists can’t afford to go under the hoods of cars one at a time. So instead they count car colors from a highway overpass. Useless.
    Seth:Thiscites many studies that disagree with what you say about openness (lack of correlation). For example, https://psycnet.apa.org/?&fa=main.doiLanding&doi=10.1037/0033-2909.120.3.323. What evidence supports your view (“hopelessly flawed”)?
  17. I didn’t say “lack of correlation”. I said “semi-interesting correlations”.
    I think you and I have very different ideas about how useful psychology should be. You’re impressed by useless vague correlations of the Big 5. I’m not. What’s it good for?
    If I am blind screening applicants for a job, I will a million times take an IQ test over a Big 5 test.
    One of the concepts of PUA is “value elicitation.” And one of the ironies is that often the values are the opposite of what the person is really like.
    I have studied the life histories of over 50 applicants to my Neander Hall, in the process of typing them. I can never simply take what they say about themselves or their values at face value. It all must be placed into life context, and compared to what other people like them and different from them said and did.
    Real psychology will be founded on biological axes, such as testosterone and facial shape. It must account for things like social status, charisma, and actual extro/introversion. Naive interpretation of self-description will never accurately reveal these traits. Even the most benighted psychologist should realize that a guy’s self-reported notch count can be quite misleading, and his subjective rating of the women’s attractiveness even more so.
    Self description is nice for publishing papers but doesn’t give us any better grasp on human nature. It is only one of three crucial elements – the other two being biology and biography.
    Seth: I’m sorry you have not seen fit to provide evidence for your view that personality psychology — at least, the part based on self-report – is worthless.
  18. I presume you mean a study, which is a small subset of the word “evidence”. You linked to Wikipedia, which is untrustworthy on this topic, and a paper behind a paywall. Link to the best paper I can access and let’s see whether Big 5 and MBTI are good for anything practical, or are just for academic promotion and horoscope-style self-discovery.
    And keep in mind, this argument started because someone said something about how openness was more important than high IQ for self-directed coursework. The very idea of naively comparing Chinese vs American students’ self descriptions of openness is ridiculous, and even more so for Tsinghua students. Yet if you give those kids an inch they zoom off into independent genius mode. Hmmm….
    Seth: I’ve found some relevant evidence. Is it possible that it might be your turn to do so?
  19. Interesting. You might be interested in this report Teaching Boys and Girls How to Study’ (1919) by Peter Jeremiah Zimmers, Superintendent of City Schools, Manitowoc, Wisconsin (scanned online, Google Books) about the very good results the school system had when they shifted away from the teacher-dominated lecture system to the problem method of teaching. Sadly, such innovations never took widespread hold and we became stuck with the lecture style instruction that has long been known to induce “school helplessness” in students. The first reference I’ve found to the damage school does to creative and innovative thinking was a minor discussion in a book published in 1886.
    “In spite of the fact that schools exist for the sake of education, there is many a school whose pupils show a peculiar “school helplessness”; that is, they are capable of less initiative in connection with their school tasks than they commonly exhibit in the accomplishment of other tasks.”
  20. From a comment above (and totally irrelevant to the posted article):
    “Seth: Stomach acid kills only what is on the surface. We do not atomize our food in our mouths. It goes into our stomach in lumps. What’s inside the lumps is safe from the acid.”
    However, by the time food is ready to leave the stomach it is far away from being “lumps.” It is a chalky, watery mix called chyme, having been totally massaged and mixed with acidic, watery, “mucus-y” secretions. Many/most (?) bacteria survive the acid of the stomach, but not because they are inside the lumps (boluses) of food that came down the esophagus. These “lumps” are totally liquefied before they enter the small intestine, and have been thoroughly exposed to gastric acids.
  21. “I’ve found some relevant evidence. Is it possible that it might be your turn to do so?”
    IIRC, you’ve admitted that psychology is mostly useless. That’s because the science is in its infancy. Here are the green shoots of real, hard psychology that I’ve described above. It will correlate biomarkers with actual, not just self-reported, psychological traits.
    Seth: Thanks for the links. I don’t know about the brain damage study. The correlation between facial features and behavior is very weak. Nothing here suggests to me that this work is better (or will ever be better) than the self-report psychology you disparage. Why do we listen to what other people say? Because there is usually some truth to it. Why do you think speech evolved? Because it was useful. Self-report personality psychology is just psychology based on what people say. Many many things in the world are based on what people say and work fine. Sure, roughly all psychology research is useless. Same with all academic research. An engineering graduate student at Berkeley told me that 95% of the research in her department was useless.

  22. You are correct that the face reading studies are not very good yet. I happen to know that they will get much better, but you can’t know that yet. I know why they are not very good – because they haven’t found the right things to measure. Human facial inference is still better than academic methods.
    Introversion / extroversion is by far the least polluted and most accurate of the big 5 self report dimensions. There are certainly accurate self-report dimensions – for example male partner count is accurate.
    The page you linked does not specify that the assessment was self-report. The paper has not yet been published. However, I agree with the pro-ambivert result. But this result is not very interesting, by itself, because it does not cover the biological or biographical angles.
    Seth: May I ask how you know that face reading studies will get much better? I haven’t heard that before.
  23. Because I’m doing it – matching facial patterns to hardwired psychological dimensions, and seeing how the different ways they work out in biographical patterns.
    The most important facial dimensions and corresponding psychological dimensions have not been measured yet in a study. For the most part, psychologists do not even pay attention to the hardwired psychological dimensions I have uncovered. This is because they are mostly not independent axes on self-reporting.
    Extro/introversion most closely corresponds to the recession of the eyeball from the upper brow bone, aka socket depth.

Leave a Reply

Your email address will not be published. Required fields are marked *