The first issue is the slippery nature of standards.
Even a well-crafted statement of what you need to get an A grade can be loaded with subjectivity – even in subjects such as science. It’s genuinely hard to know how difficult a specific exam is.
It’s genuinely hard to set exams based on criteria, and it is also hard to mark exams only using criteria. Most of the exam criteria for essay based questions suffer from what I call the ‘adverb problem’. Often, the difference in criteria between the top few bands is the difference of an adverb – getting a band 3 may require you to write ‘well’; getting a band 4 requires you to write ‘fluently’. Etc. Here’s an extract from the WJEC exam board criteria for English.
Band 3: there is some use of devices to achieve particular effects
Band 4: devices to achieve particular effects are used consciously and effectively
Band 3: plot and characterisation are convincingly sustained
Band 4: plot and characterisation are effectively constructed and sustained
This last one is particularly interesting as I would think that ‘convincingly’ is better than ‘effectively’, but that’s by the by. The point is these adverbs are extremely vague and their meaning lies in the eye of the beholder. The only way we can give these criteria any kind of concrete and specific meaning is by referring to actual samples of pupil work – which is of course what we all do, in internal and external moderation days. Criteria are only given meaning by reference to individual pupil performance. That is, criteria are given meaning by norms.
Let me give another example of this. When I was first marking a set of creative writing coursework tasks, I came across one that I felt deserved the top band. One of the criteria for the top band was something like ‘appropriate and controlled variation of sentence structure.’ This pupil’s sentence structure was fairly (there is another adverb for you) well controlled and appropriate, but there were moments where it wasn’t completely secure. Did it deserve the top band? I asked my mentor, and she told me what an exam moderator had said to her, which was that when awarding the top band, you have to take into account what it is realistic for a 15 year old to achieve. I have no idea if this really is official exam board advice, but it seemed reasonable enough. On that basis, I gave it the top band. Yes, the piece of work was not flawless, but on the basis of what is realistic for 15 year olds to achieve, it deserved the top band. (Of course, this does raise the question of how a teacher is supposed to know what is realistic for 15 year olds to achieve – in this case, I had my mentor, who had experience of marking hundreds of exam scripts, backing up my judgment. In general, this is a reason why it is important for all teachers to have a good idea of the whole of the attainment range, and it is why exam moderation meetings often involve looking at sample scripts from across the range. Tom Sherrington makes a good case for why and how we should do more of this here.) The point is that here, we were giving criteria meaning by reference to norms. Suppose, for the sake of argument, that my norm in this case was not what it was possible for 15 year olds to achieve but what it is possible for Booker Prize winners to achieve. Clearly, I also expect them to vary their sentences in an appropriate and controlled way. That is, the same criterion applies to them. But if Booker Prize winners were the norm, I’d interpret that criterion in a completely different way. I could apply the same criterion to KS2 pupils, where I would again interpret it in a different way.
When judging performance on complex tasks, we don’t just want to see if the pupil can do it or not – we want to make a judgment about how well they can do it. You can judge if a pupil is capable of doing something or not in isolation of the performance of others. You can judge a pupil very clearly on the criteria ‘is able to write a creative story’. Either they can or they can’t, and you can judge that in isolation of other pupils. But we want exams to tell us more than that. For more complex tasks, performance isn’t pass/fail – it’s on a continuum. Most pupils can write a creative story – we want the exam to give us some idea of how well they do that. And that judgment is inevitably bound up with the performance of others. Criteria can help guide that judgment, but they can’t do the whole job on their own. The key point here is not different adverbs, but the comparative and superlative forms of the same adverb – not effectively, convincingly, fluently, but well, better, best.