Category Archives: Uncategorized

How do bad ideas about assessment lead to workload problems?

This is part 7 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

Bad ideas can cause workload problems. If you have a flawed understanding of how a system works, the temptation is to work harder to try and make the system work, rather than to look at the deeper reasons why it isn’t working.

The DfE run a regular teacher survey diary. In the survey from 2010, primary teachers recorded spending 5 hours per week on assessment. By 2013, they were spending 10 hours per week on assessment. Confusion and misperceptions around assessment are creating a lot of extra work – but there is no evidence they are providing any real benefits.

So what are the bad assessment ideas which are creating workload but not generating any improvements? Here are a few ideas.

Over reliance on prose descriptors when grading work
Like a lot of teachers, I used to really dislike marking. But when I would stop and think about it, I realised that I actually really liked reading pupils’ work. It was the process of sitting there with the mark scheme trying to work out a grade and provide feedback from the mark scheme that I disliked. And it turns out there is a good reason for that: the human mind is not good at making these kind of absolute judgements. The result is miserable teachers and not very accurate grades. There is a better way (comparative judgement).

Over reliance on prose descriptors when giving feedback
Prose descriptors are equally unhelpful for giving feedback. A lot of the guidance that comes with descriptors recommends using the language of the descriptors with pupils, or at least using ‘pupil friendly’ variations of the descriptor. The result is that teachers end up writing out whole paragraphs at the end of a pupils’ piece of work: ‘Well done: you’ve displayed an emerging knowledge of the past, but in order to improve, you need to develop your knowledge of the past.’

These kind of comments are not very useful as feedback because whilst they may be accurate, they are not helpful. How is a pupil supposed to respond to such feedback? As Dylan Wiliam says, feedback like this is like telling an unsuccessful comedian that they need to be funnier.

I like the approach being pioneered by a few schools which involves reading a class’s responses, identifying the aspects they all struggled with, and reteaching those in the next lesson. If this response is recorded on a simple proforma, that can hopefully suffice for accountability purposes too.

Mistrust of short answer questions and MCQs
Short answer questions and multiple-choice questions (MCQs) can’t assess everything, clearly. But they can do some things really well and they also have the bonus of being very very easy to mark. A good multiple choice question is not easy to write, to be fair. But once you have written it, you can use it again and again with limited effort, and you can use MCQs that have been created by others too. Unlike feedback based on prose descriptors, if you use MCQs to give feedback then pupils can actively do something helpful in response to your feedback.

How can we measure progress in lessons?

This is part 6 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

With national curriculum levels, it was possible to use the same system of measurement in exams as in individual lessons.

For example, national curriculum tests at the end of year 2 and 6 were measured using national curriculum levels. But you could also use NC levels to measure progress in individual lessons and at the end of terms. For example, you could have a task at the end of a lesson, and then you could tell pupils that in order to be a level 4a, they would need to perform in a certain way on the task; to be a 5c, they would need to reach a certain standard, and so on.

You can see the attraction of this approach: it is coherent, because you are always using and talking about the same grades. It’s also great for accountability. When Ofsted call, you can offer them ‘real-time’ assessment data based on performance from the most recent lesson.

However, in practice this system led to confusion. Pupils might be able to perform at a certain level at the end of a particular lesson. But when they came to sit their test at the end of the unit or the end of the year, they might not be at that level. As Rob Coe says here, levels started to take on very different meanings depending on how they were being used. Far from providing a coherent and unified system, levels were providing the illusion of a coherent system: everyone was talking about the same thing, but meaning something very different.

So what is the answer? I don’t think exam grades can be used to measure progress in lessons. What happens in the lesson is an ‘input’, if you like, and what happens in the exam is an ‘output’. It makes no sense to try to measure both on the same scale. Here is an analogy: we know that if you eat more, you put on weight. But we don’t measure food intake and weight output with the same scale, even though we know there is a link between them. We measure food with calories, and weight with kilograms. Similarly, we have to record what happens in lessons in a different way to what happens in the final assessment.

If you do try to measure activities in an individual lesson with the same scale as the final exam grade, then I think one of two things can happen. One is that you use activities in the classroom which are most suited for learning and for formative assessment: for example, in English you might use a spelling test. Activities like spelling tests are not very well suited for getting a grade, so the grade you get from them is very inaccurate, and causes a lot of confusion. The second option is to start to alter all of the activities you do in class so that they more closely resemble exam tasks. So you get rid of the spelling test, and get pupils to do a piece of extended writing instead.  This makes it more likely (although not certain) that the grades you get in class will be accurate. But it means that you are now hugely restricted in the types of activities you can do in class. You have effectively turned every lesson into a summative assessment.

We should record in-lesson progress in the way that is most suitable for the tasks we want to use. And lots of very useful activities are not capable of being recorded as a grade or a fraction of a grade.

How can we close the knowing-doing gap?

This is part 4 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

One frequent criticism of memorisation is that it doesn’t lead to understanding. For example, a pupil can memorise a rule of grammar, or a definition of a word, but still have no idea how to use the rule or the word in practice. This is a real problem. I would say almost every pupil I have ever taught knew that a sentence began with a capital letter. They ‘knew’ this in the sense that if I asked them ‘what does a sentence begin with?’ they would instantly respond ‘a capital letter’. Yet many, many fewer of them would reliably begin every sentence they started with a capital letter. This is a classic example of a knowing-doing gap, where a pupil knows something, but doesn’t do it.

Frequently, I see people using examples like this one to prove that explicit instruction and teaching knowledge do not work, and to argue that we should use more authentic and real-world teaching practices instead. For example, if we just ask pupils to read lots of well-written books and articles, and do lots of writing, they will implicitly understand that sentences begin with a capital letter, and use capital letters in the same way in their own work too. Unfortunately, this kind of unstructured, discovery approach overloads working memory. If only it were possible to pick up the rules of the apostrophe simply by reading a lot of books – the world would be a lovelier place, but it wouldn’t be the world.

So what is the answer? The approach with the best record of closing the knowing-doing gap is direct instruction. I will discuss one specific direct instruction programme here, Expressive Writing, as it is the one I know best. Expressive Writing aims to teach the basic rules of writing. Whilst it does introduce some rules and definitions, this is a small part of the programme. The bulk of the programme is made up of pupils doing a lot of writing, but the writing they do is very structured and carefully sequenced.  The programme begins with the absolute basics: pupils are given a list of verbs in the present tense, and have to convert them to the past tense. Then they are given sentences that use those same verbs in the present tense, and they have to cross out the words and replace them with the past tense verb. Then pupils are given sentences with a blank which they have to fill in with either a past or present tense verb. As the programme continues, pupils are expected to do bigger and longer pieces of writing. Each activity is carefully sequenced so that pupils are never expected to learn too many new things at once, or take on board too many new ideas, and there is frequent repetition so that important rules are not just things that pupils ‘know’ – they are things that become habits.

To sum up, there is such a thing as the knowing-doing gap. Pupils can memorise a definition and not know what a word means, and they can memorise a rule and not know how to apply it. But this does not mean either that memorisation is redundant, or that discovery learning is better at producing true understanding. The way to close the knowing-doing gap is through memorisation and practice, but through memorisation and practice of the right things, broken down in the right ways. Expressive Writing offers one way of doing this for writing, and I think Isabel Beck’s work does something similar for vocabulary. In the next post, we’ll look at how this approach can be applied to the creation of formative and summative assessments.

Is all practice good?

This is part 3 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

I can remember having a conversation with a friend a few years ago about the value of memorisation and practice. I said how important it was for pupils to remember things and to practice using them. She disagreed: she was sick to death of reading cookie-cutter, spoon-fed coursework essays that all sounded exactly the same, and all sounded as though they had regurgitated the words she had said in class in the lesson. For her, practice and memorisation were killing the life of her subject.

I completely recognised the truth of what she was saying. I had marked my fair share of coursework essays and felt exactly the same thing. So, why, if I am agreeing that this kind of practice is so damaging, am I still defending practice and memorisation?

For me, it is all about what we ask pupils to remember and to practise. If we ask pupils to write down their teacher’s thoughts on the opening of Great Expectations, memorise what they have written, and reproduce it in the exam, then we are not helping them to understand Great Expectations or come up with their own views on it.

But what if we ask pupils to remember the meanings of Tier Two vocab words? What if we ask them to memorise key quotations from Great Expectations? What if we ask them to practise and practise using the apostrophe? Is that just as damaging? I would argue that on the contrary, this kind of practice is really useful.

Something similar is true of maths. If a teacher asked a pupil to memorise the answers to problems like 372 * 487 and 657 * 432, it would seem very odd. But if you ask pupils to memorise the times tables, that’s much more understandable.

Why? Why is some content worth memorising, and why is some content not worth memorising?

Dan Willingham puts forward some really useful guidelines here. For him, there are three types of content particularly worth remembering and practising.

  • The core skills and knowledge that will be used again and again.
  • The type of knowledge that students need to know well in the short term to enable long-term retention of key concepts. In this case, short-term overlearning is merited.
  • The type of knowledge we believe is important enough that students should remember it later in life.

I would add one thing to the first bullet point. The type of core skills and knowledge that will be used again and again are often the fundamental building blocks of a subject. They are things that have been broken down and decontextualized, like times tables or the sounds that different letters make. Memorising these can seem inauthentic. It feels more meaningful to plunge straight in to real maths problems or real books. But the building blocks give you flexibility. If a pupil memorises a model essay on Great Expectations, that has no usefulness unless they are asked that one question. If they memorise 700 Tier Two words, those words have use in understanding Great Expectations, and in thousands of other contexts too.

One criticism of memorisation is that it drives out meaning. The next blog post will look at how we can make sure the things our pupils remember do have meaning.

Herbert Simon and evidence-based education

Who is Herbert Simon?

Herbert Simon was one of the great scholars of the twentieth century, whose discoveries and inventions ranged from political science (where he began his career) to economics (in which he won a Nobel Prize) to computer science (in which he was a pioneer) and to psychology.

Simon was one of the towering intellectual figures of the twentieth century. He wrote a classic on decision making in organizations while still in his twenties, and among many other achievements he went on to be one of the founders of the field of artificial intelligence, a leader in cognitive science, an influential student of the process of scientific discovery, a forerunner of behavioral economics and, almost incidentally, a Nobel laureate in economics.

Those quotations are both taken from Daniel Kahneman’s Thinking, Fast and Slow. Kahneman is himself a Nobel Laureate for his work on decision making. Kahneman goes on to say of Simon that he is

perhaps the only scholar who is recognized and admired as a hero and founding figure by all the competing clans and tribes in the study of decision making.

As well as the fields which Kahneman lists, Simon also made some contributions to education. Of particular significance for primary and secondary educators is this paper, which Simon wrote with John Anderson and Lynne Reder in 2000, shortly before he died in 2001. It is about mathematics education, but it has applications for all subjects. It is strikingly critical of some very popular educational practices and recommends other practices which frequently get a bad name. For example:

He criticises authentic, real-world learning tasks.

Contrary to the contention that knowledge can always be communicated best in complex learning situations, the evidence shows that: A learner who is having difficulty with components can easily be overwhelmed by the processing demands of a complex task. Further, to the extent that many components are well mastered, the student wastes much time repeating these mastered components to get an opportunity to practice the few components that need additional effort. There are reasons sometimes to practice skills in their complex setting. Some of the reasons are motivational and some reflect the skills that are unique to the complex situation. While it seems important both to motivation and to learning to practice skills from time to time in full context, this is not a reason to make this the principal mechanism of learning.

He defends drill, in the face of criticisms that it drives out understanding

This criticism of practice (called “drill and kill,” as if this phrase constituted empirical evaluation) is prominent in constructivist writings. Nothing flies more in the face of the last 20 years of research than the assertion that practice is bad. All evidence, from the laboratory and from extensive case studies of professionals, indicates that real competence only comes with extensive practice.

He rejects discovery learning, and praises teacher instruction

When, for whatever reason, students cannot construct the knowledge for themselves, they need some instruction. The argument that knowledge must be constructed is very similar to the earlier arguments that discovery learning is superior to direct instruction. In point of fact, there is very little positive evidence for discovery learning and it is often inferior (e.g., Charney, Reder & Kusbit, 1990). Discovery learning, even when successful in acquiring the desired construct, may take a great deal of valuable time that could have been spent practicing this construct if it had been instructed. Because most of the learning in discovery learning only takes place after the construct has been found, when the search is lengthy or unsuccessful, motivation commonly flags.

Simon is also critical of the state of education research.

New “theories” of education are introduced into schools every day (without labeling them as experiments) on the basis of their philosophical or common-sense plausibility but without genuine empirical support.

We see that influential schools have arisen, claiming a basis in cognitive psychology… but which have almost no grounding in cognitive theory and at least as little grounding in empirical fact. This is particularly grievous because we think information-processing psychology has a lot to offer to mathematics education.

So, for instance, in the 1993 draft of the NCTM assessment standard for school mathematics, we find condemnation of the “essentialist view of mathematical knowledge” which assumes “mathematics consists of an accumulation of mathematical concepts and skills” (p.12). We can only say we find frightening the prospect of mathematics education based on such a misconceived rejection of componential analysis.

He is also optimistic that the findings of cognitive psychology can offer a basis for a better understanding of teaching and learning.

Human beings have been learning, and have been teaching their offspring, since the dawn of our species. We have a reasonably powerful “folk medicine,” based on lecturing and reading and apprenticeship and tutoring, aided by such technology as paper and the blackboard–a folk medicine that does not demand much knowledge about what goes on in the human head during learning and that has not changed radically since schools first emerged. To go beyond these traditional techniques, we must follow the example of medicine and build (as we have been doing for the past thirty or forty years) a theory of the information processes that underlie skilled performance and skill acquisition: that is to say, we must have a theory of the ways in which knowledge is represented internally, and the ways in which such internal representations are acquired. In fact, cognitive psychology has now progressed a long way toward such a theory, and, as we have seen, a great deal is already known that can be applied, and is beginning to be applied, to improve learning processes.

Anyone working in the field of evidence-based education needs to consider Simon’s work and this article very seriously.

Comparative judgment: practical tips for in-school use


I have blogged a bit before about comparative judgment and how it could help make marking more efficient and more reliable, and help to free the teaching of writing from tick box approaches. I think CJ has the potential to be used for national assessments – that’s why I’m working with Dr Chris Wheadon and the No More Marking team on a national experiment to moderate Key Stage 2 writing assessments using comparative judgement. However, whatever happens nationally there are ways you can use CJ within your own school. The No More Marking website allows you to use comparative judgment for free. The website is very easy to use and it is definitely worth a go if you are interested.  Here are some suggestions based on what we have done at Ark, plus some practical tips.

Primary writing assessments

There are no interim frameworks outside Y2 & Y6, so using CJ for writing in Y1, 3, 4, & 5 feels as good a solution as any. If you want to try and measure progress across these years, you could get pupils to do a task now and judge it now. Then keep the tasks and use them again in a judging session at Christmas or this time next year and see how they compare to the same pupils’ work at that point in time. We haven’t done this yet but it feels like a really powerful way of showing pupils the progress they are making. I know lots of schools already keep portfolios of pupils’ work across time, so you could use these to start the process.

KS3 English assessments

I’ve personally found it easier to judge writing assessments than to judge literature essays. Others have said the same. We recently got some very high reliability scores when judging a set of allegories that had been written by our Y8 pupils.

KS4 English exams

We haven’t used CJ for KS4 tasks yet, but it would certainly be possible. You could try and judge entire exams, or just pick out individual questions. I feel that individual questions would be easier to judge, and that you would get more accurate results for them. I think you would also get more interesting discussions and feedback afterwards when sharing the results.

KS3 history assessments

CJ has worked just as well for us in history, although again, I found judging history essays to be harder and slightly more time consuming than judging writing tasks. We did some judging on essays on the Battle of Hastings. This is a classic Y7 task and it was interesting for me to see the different ways different teachers had approached it.

And here are some general practical tips

  • Go to No More Marking, set up an account for free, and then, on the dashboard, create a new task.
  • You will need to upload all the scripts from your pupil. You can upload them as a jpg or pdf. If the pupil work is on paper, you’ll need to scan it – if you have a copier with this facility that shouldn’t be too difficult. The slightly fiddly bit is making sure every separate pdf or jpg has a pupil name or identifier in the title. This means that when you get the results, you will be able to easily see which pupil has which mark. Alternatively, you can use the QR coded answer sheets that provides. The bar coded sheets automatically recognise which pupil is which from the bar code on the scan and match them to their results.
  • How many judgments do you want per script? If reliability is very important, you will want 10 judgments per script. If it is less important, you can get away with 5. It can feel nice to aim high and try and get a reliability score of 0.85 or more, but there are a couple of things to consider. First, what type of reliability are you getting at the moment without comparative judgment? You probably don’t even know or have a way of finding out. So if you can get a reliability score of 0.75 from doing 5 judgments, that’s more than likely to be an improvement on what you are doing currently. You might be able to get up to 0.9 by doubling the judgments, but you will need to consider whether it is worth doubling the amount of time. It will depend on what you are using the results for. I am starting to think that in some cases, doing 5 judgments per script as a quick sift and then meeting as a team to discuss the results and set standards might be the best way forward.
  • Do you want to do the judging together as a group, or send out the links to people? To begin with, I’ve found it quite powerful to have people in the same room doing the judging. Whatever you choose to do, I also think it is worth having a group follow up session where you discuss the scripts and think about why certain scripts were better than others, and what the teaching and learning implications. As I have said before, the two immediate benefits of CJ are that it saves time and it is more reliable. But the longer term benefit is freeing teaching from the tick box. If you don’t meet after to discuss the results and implications, then you are making it harder to achieve that.
  • Do you want to include exemplars or not? These can make it easier to apply the standards or grades once you have the results. But I would only use this if you are sure they are at a certain standard. Also, be careful – if you are putting in a script that you feel sure is a C grade, do you think it is a top or bottom C? The C-grade is a large grade so you need to be sure.
  • I would recommend trying to get scripts from more than one class to begin with (if possible even more than one school). One of the nice things about the CJ tasks we have done is how they make it quick and easy for teachers to see how other teachers and pupils have attempted similar tasks.

Ouroboros by Greg Ashman

I’m a bit late to this, but I just wanted to write about how much I enjoyed Ouroboros by Greg Ashman. It’s a very elegantly and sparely written account of Greg’s experiences of teaching in England and Australia, and of the education research which is relevant to his experiences. The central organising metaphor is the ouroboros, ‘an ancient symbol of a snake or dragon that is consuming its own tail.’ Ouroboros can be ‘a vicious metaphor to represent the antithesis of progress – we cannot move forward if we are going round and round. Moreover, Ouroboros adds something to the cycle. It represents the reinvention of old ideas as new ideas. Again and again.’

I found this metaphor very helpful when thinking about modern education. It is so demoralising to see the number of fads that get warmed over and served up as new. And the great fear is not only that bad ideas persist. Even worse, the constant recycling of bad ideas prevents the adoption of new ones, and makes teachers understandably cynical and mistrusting of innovation in general, even though real innovation is what we desperately need to break out of this cycle.

But ouroboros can be a more positive metaphor. ‘We can also view Ouroboros as a virtuous metaphor; a feed-back loop with information flowing from the effect back to the cause. When we teach, we do not speak into the void.’

Greg thinks that this kind of feedback loop is at the heart of good teaching. However, he also notes that attempts to promote the use of feedback over the last decade or so, under the name of Assessment for Learning, have led to disillusion. ‘In U.K. schools, formative assessment followed an unfortunate trajectory that hollowed-out much of the original purpose and has therefore left many teachers quite jaded.’ However, as he notes, ‘the basic principle is sound.’ And there is much good advice in this book about how to rescue the sound principles of formative assessment from the ‘bureaucratic barnacles’ that have grown up around it.

Highly recommended.