How do bad ideas about assessment lead to workload problems?

This is part 7 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

Bad ideas can cause workload problems. If you have a flawed understanding of how a system works, the temptation is to work harder to try and make the system work, rather than to look at the deeper reasons why it isn’t working.

The DfE run a regular teacher survey diary. In the survey from 2010, primary teachers recorded spending 5 hours per week on assessment. By 2013, they were spending 10 hours per week on assessment. Confusion and misperceptions around assessment are creating a lot of extra work – but there is no evidence they are providing any real benefits.

So what are the bad assessment ideas which are creating workload but not generating any improvements? Here are a few ideas.

Over reliance on prose descriptors when grading work
Like a lot of teachers, I used to really dislike marking. But when I would stop and think about it, I realised that I actually really liked reading pupils’ work. It was the process of sitting there with the mark scheme trying to work out a grade and provide feedback from the mark scheme that I disliked. And it turns out there is a good reason for that: the human mind is not good at making these kind of absolute judgements. The result is miserable teachers and not very accurate grades. There is a better way (comparative judgement).

Over reliance on prose descriptors when giving feedback
Prose descriptors are equally unhelpful for giving feedback. A lot of the guidance that comes with descriptors recommends using the language of the descriptors with pupils, or at least using ‘pupil friendly’ variations of the descriptor. The result is that teachers end up writing out whole paragraphs at the end of a pupils’ piece of work: ‘Well done: you’ve displayed an emerging knowledge of the past, but in order to improve, you need to develop your knowledge of the past.’

These kind of comments are not very useful as feedback because whilst they may be accurate, they are not helpful. How is a pupil supposed to respond to such feedback? As Dylan Wiliam says, feedback like this is like telling an unsuccessful comedian that they need to be funnier.

I like the approach being pioneered by a few schools which involves reading a class’s responses, identifying the aspects they all struggled with, and reteaching those in the next lesson. If this response is recorded on a simple proforma, that can hopefully suffice for accountability purposes too.

Mistrust of short answer questions and MCQs
Short answer questions and multiple-choice questions (MCQs) can’t assess everything, clearly. But they can do some things really well and they also have the bonus of being very very easy to mark. A good multiple choice question is not easy to write, to be fair. But once you have written it, you can use it again and again with limited effort, and you can use MCQs that have been created by others too. Unlike feedback based on prose descriptors, if you use MCQs to give feedback then pupils can actively do something helpful in response to your feedback.

How can we measure progress in lessons?

This is part 6 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

With national curriculum levels, it was possible to use the same system of measurement in exams as in individual lessons.

For example, national curriculum tests at the end of year 2 and 6 were measured using national curriculum levels. But you could also use NC levels to measure progress in individual lessons and at the end of terms. For example, you could have a task at the end of a lesson, and then you could tell pupils that in order to be a level 4a, they would need to perform in a certain way on the task; to be a 5c, they would need to reach a certain standard, and so on.

You can see the attraction of this approach: it is coherent, because you are always using and talking about the same grades. It’s also great for accountability. When Ofsted call, you can offer them ‘real-time’ assessment data based on performance from the most recent lesson.

However, in practice this system led to confusion. Pupils might be able to perform at a certain level at the end of a particular lesson. But when they came to sit their test at the end of the unit or the end of the year, they might not be at that level. As Rob Coe says here, levels started to take on very different meanings depending on how they were being used. Far from providing a coherent and unified system, levels were providing the illusion of a coherent system: everyone was talking about the same thing, but meaning something very different.

So what is the answer? I don’t think exam grades can be used to measure progress in lessons. What happens in the lesson is an ‘input’, if you like, and what happens in the exam is an ‘output’. It makes no sense to try to measure both on the same scale. Here is an analogy: we know that if you eat more, you put on weight. But we don’t measure food intake and weight output with the same scale, even though we know there is a link between them. We measure food with calories, and weight with kilograms. Similarly, we have to record what happens in lessons in a different way to what happens in the final assessment.

If you do try to measure activities in an individual lesson with the same scale as the final exam grade, then I think one of two things can happen. One is that you use activities in the classroom which are most suited for learning and for formative assessment: for example, in English you might use a spelling test. Activities like spelling tests are not very well suited for getting a grade, so the grade you get from them is very inaccurate, and causes a lot of confusion. The second option is to start to alter all of the activities you do in class so that they more closely resemble exam tasks. So you get rid of the spelling test, and get pupils to do a piece of extended writing instead.  This makes it more likely (although not certain) that the grades you get in class will be accurate. But it means that you are now hugely restricted in the types of activities you can do in class. You have effectively turned every lesson into a summative assessment.

We should record in-lesson progress in the way that is most suitable for the tasks we want to use. And lots of very useful activities are not capable of being recorded as a grade or a fraction of a grade.

What makes a good formative assessment?

This is part 5 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

In the last two blog posts – here and here –  I’ve spoken about the importance of breaking down complex skills into smaller pieces. This has huge implications for formative assessments, where the aim is to improve a pupil’s performance, not just to measure it.

Although we typically speak of ‘formative assessment’ and ‘summative assessment’, actually, the same assessment can be used for both formative and summative purposes. What matters is how the information from an assessment is used. A test can be designed to give a pupil a grade, but a teacher can use the information from individual questions on the test paper to diagnose a pupil’s weaknesses and decide what work to give them next. In this case, the teacher is taking an assessment that has been designed for summative purposes, but using it formatively.

Whilst it is possible to reuse assessments in this way, it is also true that some types of assessment are simply better suited for formative purposes than others. Because complex skills can be broken down into smaller pieces, there is great value in designing assessments which try to capture progress against these smaller units.

However, too often, a lot of formative assessments are simply mini-summative assessments – tasks that are really similar in style and substance to the final summative task, with the only difference being that they have been slightly reduced in size. So for example, if the final assessment is a full essay on the causes of the first world war, the formative assessment is one paragraph on how the assassination of Franz Ferdinand contributed to the start of the first world war. If the final summative assessment is an essay analysing the character of Bill Sikes, the formative assessment is an essay analysing Fagin. The idea is that the comments and improvements a teacher gives pupils on the formative essay will help them improve for the summative essay.

But I would argue that in order to improve at a complex task, sometimes we need to practise other types of task. Here is Dylan Wiliam commenting on this, in the context of baseball.

The coach has to design a series of activities that will move athletes from their current state to the goal state. Often coaches will take a complex activity, such as the double play in baseball, and break it down into a series of components, each of which needs to be practised until fluency is reached, and then the components are assembled together. Not only does the coach have a clear notion of quality (the well-executed double play), he also understands the anatomy of quality; he is able to see the high-quality performance as being composed of a series of elements that can be broken down into a developmental sequence for the athlete. (Embedded Formative Assessment, p.122)

Wiliam calls this series of activities ‘a model of progression’. When you break a complex activity down into a series of components, what you end up with often doesn’t look like the final activity. When you break down the skill of writing an essay into its constituent parts, what you end up with doesn’t look like an essay. I wrote about this about five years ago, where I set out what I felt were some of the series of activities that could help pupils become a good writer.

Once we’ve established a model of progression in a subject, then we can think about how to measure progress – and measuring progress is what the next post will be about.

How can we close the knowing-doing gap?

This is part 4 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

One frequent criticism of memorisation is that it doesn’t lead to understanding. For example, a pupil can memorise a rule of grammar, or a definition of a word, but still have no idea how to use the rule or the word in practice. This is a real problem. I would say almost every pupil I have ever taught knew that a sentence began with a capital letter. They ‘knew’ this in the sense that if I asked them ‘what does a sentence begin with?’ they would instantly respond ‘a capital letter’. Yet many, many fewer of them would reliably begin every sentence they started with a capital letter. This is a classic example of a knowing-doing gap, where a pupil knows something, but doesn’t do it.

Frequently, I see people using examples like this one to prove that explicit instruction and teaching knowledge do not work, and to argue that we should use more authentic and real-world teaching practices instead. For example, if we just ask pupils to read lots of well-written books and articles, and do lots of writing, they will implicitly understand that sentences begin with a capital letter, and use capital letters in the same way in their own work too. Unfortunately, this kind of unstructured, discovery approach overloads working memory. If only it were possible to pick up the rules of the apostrophe simply by reading a lot of books – the world would be a lovelier place, but it wouldn’t be the world.

So what is the answer? The approach with the best record of closing the knowing-doing gap is direct instruction. I will discuss one specific direct instruction programme here, Expressive Writing, as it is the one I know best. Expressive Writing aims to teach the basic rules of writing. Whilst it does introduce some rules and definitions, this is a small part of the programme. The bulk of the programme is made up of pupils doing a lot of writing, but the writing they do is very structured and carefully sequenced.  The programme begins with the absolute basics: pupils are given a list of verbs in the present tense, and have to convert them to the past tense. Then they are given sentences that use those same verbs in the present tense, and they have to cross out the words and replace them with the past tense verb. Then pupils are given sentences with a blank which they have to fill in with either a past or present tense verb. As the programme continues, pupils are expected to do bigger and longer pieces of writing. Each activity is carefully sequenced so that pupils are never expected to learn too many new things at once, or take on board too many new ideas, and there is frequent repetition so that important rules are not just things that pupils ‘know’ – they are things that become habits.

To sum up, there is such a thing as the knowing-doing gap. Pupils can memorise a definition and not know what a word means, and they can memorise a rule and not know how to apply it. But this does not mean either that memorisation is redundant, or that discovery learning is better at producing true understanding. The way to close the knowing-doing gap is through memorisation and practice, but through memorisation and practice of the right things, broken down in the right ways. Expressive Writing offers one way of doing this for writing, and I think Isabel Beck’s work does something similar for vocabulary. In the next post, we’ll look at how this approach can be applied to the creation of formative and summative assessments.

Is all practice good?

This is part 3 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

I can remember having a conversation with a friend a few years ago about the value of memorisation and practice. I said how important it was for pupils to remember things and to practice using them. She disagreed: she was sick to death of reading cookie-cutter, spoon-fed coursework essays that all sounded exactly the same, and all sounded as though they had regurgitated the words she had said in class in the lesson. For her, practice and memorisation were killing the life of her subject.

I completely recognised the truth of what she was saying. I had marked my fair share of coursework essays and felt exactly the same thing. So, why, if I am agreeing that this kind of practice is so damaging, am I still defending practice and memorisation?

For me, it is all about what we ask pupils to remember and to practise. If we ask pupils to write down their teacher’s thoughts on the opening of Great Expectations, memorise what they have written, and reproduce it in the exam, then we are not helping them to understand Great Expectations or come up with their own views on it.

But what if we ask pupils to remember the meanings of Tier Two vocab words? What if we ask them to memorise key quotations from Great Expectations? What if we ask them to practise and practise using the apostrophe? Is that just as damaging? I would argue that on the contrary, this kind of practice is really useful.

Something similar is true of maths. If a teacher asked a pupil to memorise the answers to problems like 372 * 487 and 657 * 432, it would seem very odd. But if you ask pupils to memorise the times tables, that’s much more understandable.

Why? Why is some content worth memorising, and why is some content not worth memorising?

Dan Willingham puts forward some really useful guidelines here. For him, there are three types of content particularly worth remembering and practising.

  • The core skills and knowledge that will be used again and again.
  • The type of knowledge that students need to know well in the short term to enable long-term retention of key concepts. In this case, short-term overlearning is merited.
  • The type of knowledge we believe is important enough that students should remember it later in life.

I would add one thing to the first bullet point. The type of core skills and knowledge that will be used again and again are often the fundamental building blocks of a subject. They are things that have been broken down and decontextualized, like times tables or the sounds that different letters make. Memorising these can seem inauthentic. It feels more meaningful to plunge straight in to real maths problems or real books. But the building blocks give you flexibility. If a pupil memorises a model essay on Great Expectations, that has no usefulness unless they are asked that one question. If they memorise 700 Tier Two words, those words have use in understanding Great Expectations, and in thousands of other contexts too.

One criticism of memorisation is that it drives out meaning. The next blog post will look at how we can make sure the things our pupils remember do have meaning.

Teaching knowledge or teaching to the test?

This is part 2 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

For many people, teaching knowledge, teaching to the test and direct, teacher-led instruction are one and the same thing. Here is Fran Abrams from BBC Radio 4’s Analysis programme making this argument.

In fact, there’s been an increasing focus on knowledge, as English schools have become ever more exam driven.

And also Tom Sherrington, who writes the Teacher Head blog.

If anything, we have a strong orientation towards exam preparation; exams are not as content free as some people suggest.

Teaching knowledge and teaching to the test are seen as similar things – but what I want to argue is that they’re actually very different.

I think teaching knowledge and direct teacher instruction are good things – but that teaching to the test is a really bad idea. I also think, perhaps slightly counter-intuitively, that teaching to the test is more likely to happen when you don’t focus on teaching knowledge. It’s when you try and teach generic skills that you end up teaching to the test.

First of all, what is teaching to the test and why is it bad? I’ve written at length about this here, but briefly, teaching to the test is bad because no test in the world can directly measure everything we want pupils to know and be able to do. Instead, tests select a smaller sample of material and use that to make an inference about everything else. If we focus teaching on the small sample, two bad things happen. One, the results a pupil gets are no longer a valid guide to their attainment in that subject. Two, we stop teaching important things that aren’t on the test, and start teaching peripheral things that are on the test. My favourite example of this is a history one. A popular exam textbook on interwar Germany doesn’t mention Bismarck, and barely mentions Kaiser Wilhelm II. It does have lengthy sections on how to answer the 4-mark and 8-mark question. That’s teaching to the test.

Direct instruction and teaching knowledge are very different from this. Direct instruction is about breaking a skill down into its smallest components, and getting pupils to practise them. Teaching knowledge is about identifying the really important knowledge pupils need to understand the world they live in, and teaching that.

A knowledge-based approach to teaching inter-war Germany would teach lots of key dates and facts and figures about not just about inter-war Germany, but about, for example, the growth of nationalism in 19th century Europe.

One possible difficulty with the knowledge-based, direct instruction approach is identifying what knowledge you should teach, and in what way you should break down complex skills. For example, I’ve said that to understand inter-war Germany, you should teach 19th century Europe and Bismarck – but am I right? How do you decide what content you need? And given that we presumably expect pupils to be able to write historical essays, surely some direct instruction in the 4-mark question, say, is valuable? This question – what should we expect pupils to memorise – is the subject of the next post.

Why didn’t Assessment for Learning transform our schools?

This is part 1 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

Giving feedback works. There is an enormous amount of evidence that shows this, much of it summarised in Black and Wiliam’s Inside the Black Box.  The importance of giving feedback was the rationale behind the government-sponsored initiative of Assessment for Learning, or AfL. Yet, nearly twenty years after the publication of Inside the Black Box, and despite English teachers saying they give more feedback to pupils than nearly every comparable country, most metrics show that English education has not improved much over the same period. Dylan Wiliam himself has said that ‘there are very few schools where all the principles of AfL, as I understand them, are being implemented effectively’.

How has this happened?

My argument is that what matters is not just the act of giving feedback, but the type and quality of the feedback. You can give all the feedback you like, but if it doesn’t help pupils to improve, it doesn’t matter. And over the past twenty years or so, the feedback teachers were encouraged to give was based on a faulty idea of how pupils learn: the idea that pupils can learn generic skills.

National curriculum levels, the assessing pupil progress grids, the interim frameworks and various ‘level ladders’ are all based on the assumption that there were generic skills of analysis, problem-solving, inference, mathematical awareness and scientific thinking, etc., that could be taught and improved on. In these systems, all the feedback pupils get is generic. Teachers were encouraged to use the language of the level descriptors to give feedback, meaning that pupils got abstract and generic comments like: ‘you need to develop explanation of inferred meanings drawing on evidence across the text’ or ‘you need to identify more features of the writer’s use of language’.

Unfortunately, we know that skill is not something that can be taught in the abstract. We all know people who are good readers, but their ability to read and infer is not an abstract skill: it is dependent on knowledge of vocabulary and background information about the text.

What this means is that whilst statements like ‘you need to identify more features of the writer’s use of language’ might be an accurate description of a pupil’s performance, these statements are not actually going to help them improve. What if the pupil didn’t know any features to begin with? What if the features they knew weren’t present in this text?

Generic feedback is descriptive, not analytic. It’s accurate, but it isn’t helpful. It tells pupils how they are doing, but it does not tell them how to get better. For that, they need something much more specific and curriculum-linked. In fact, in order to give pupils more helpful feedback, they need to do more helpful, specific and diagnostic tasks. If you try to teach generic skills, and only give generic feedback, you will end up always having to use assessments that have been designed for summative purposes. That is, you will end up over-testing and teaching to the test.

Teaching to the test, and the vexed question of whether it is a good or a bad thing, will be the subject of the next post.