This psychological experiment asked participants to judge the following actions.
(1) Stealing a towel from a hotel
(2) Keeping a dime you find on the ground
(3) Poisoning a barking dog
They had to give each action a mark out of 10 depending on how immoral the action was, on a scale where 1 is not particularly bad or wrong and 10 is extremely evil.
A second group were asked to do the same, but they were given the following three actions to judge.
(1”) Testifying falsely for pay
(2”) Using guns on striking workers
(3”) Poisoning a barking dog
I am sure you can guess the point of this. Item (3) and item (3”) are identical, and yet the two groups consistently differ on their ratings of these items. The latter group judge the action to be less evil than the former group. The reason is not hard to see: when you are thinking in terms of stealing towels and dimes, poisoning a barking dog seems heinous, but in the context of killing humans, it seems less so. The same principle has been observed in many other fields, and has led many psychologists to conclude that human judgment is essentially comparative, not absolute. There is a fantastic game on the No More Marking website which demonstrates exactly the same point. I recommend you click on the image below which links to the game, and play it right now, as it will illustrate this point better than any words can.
In brief, the game asks you to look at 8 different shades of blue individually and rank them from light to dark. It then asks you to make a series of comparisons between shades of blue. Everyone is much better at the latter than at the former. The No More Marking website also includes a bibliography about this issue.
Hopefully, you can also see the applications of this to assessment. This is one of the reasons why we need to define absolute criteria with comparative examples.
The interesting thing to note about the ‘evil’ and ‘blue’ examples is that the criteria are not that complex. One does not need special training or qualifications to be able to tell light blue from dark blue. The final judgment is one that everyone would agree with. Similarly, whilst judging evil is morally complex, it is not technically complex – everyone knows what it means. And yet, even in cases where the criteria are so clear, and so well understood, we still struggle. Imagine how much more we will struggle when the criteria are technically complex, as they are in exams. If we aren’t very accurate when asked to judge evil or blue in absolute terms, what will we be like when asked to judge how well pupils have analysed a text? The other thing this suggests is that learning more about a particular topic, and learning more about how pupils respond to it, will not of itself make you a better marker. You could have great expertise and experience in teaching and reading essays on a certain topic, but if you continue to mark them in this absolute way, you will still struggle. Expertise in a topic, and experience in marking essays on that topic, are necessary but not sufficient conditions of being an accurate marker. However expert we are in a topic, we need comparative examples to guide us.
Unfortunately, over the last few years, the idea that we can judge work absolutely has become very popular. Pupils’ essays are ticked off against APP grids or mark schemes, and if they tick enough of the statements, then that means they have achieved a certain grade. But as we have seen, this approach is open to so much interpretation. Our brains are just not equipped to make judgments in this way. I also worry that such an approach has a negative impact on teaching, as teachers teach towards the mark scheme and pupils produce essays which start to sound like the mark scheme itself. Instead, what we need to do is to take a series of essays and compare them against each other. This is at the heart of the No More Marking approach, which also has the benefit of making marking less time-consuming. If you aren’t ready to adopt No More Marking, you can still get some of the benefits of this approach by changing the way you mark and think about marking. Instead of judging essays against criteria, compare them to other essays. Comparisons are at the heart of human judgment.
I am grateful to Chris Wheadon from No More Marking for talking me through how his approach works. No More Marking are running a trial with FFT and Education Datalab which will explore how their approach can be used to measure progress across secondary. See here for more detail.
As an interesting aside, one of the seminal works in the research on the limitations of human judgment is George Miller’s paper The Magical Number Seven. I knew of this paper in a different context: it is also a seminal work in the field of working memory, and the limitations of working memory. Miller also wrote the excellent, and very practical, article ‘How Children Learn Words‘, which shows how looking words up in dictionaries and other reference sources may not be the best strategy for learning new vocabulary. I’ve written a bit about this here.
My last few posts have all been about the flaws in using criteria, and alternatives to using them. In my last post, I mentioned that there were two pieces of theory which had influenced my thinking on this: research on human judgment, and on tacit knowledge. This blog post has looked at the research on human judgment, and in my next, I will consider how the idea of tacit knowledge also casts doubt on the use of criteria.