Teaching knowledge or teaching to the test?

This is part 2 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

For many people, teaching knowledge, teaching to the test and direct, teacher-led instruction are one and the same thing. Here is Fran Abrams from BBC Radio 4’s Analysis programme making this argument.

In fact, there’s been an increasing focus on knowledge, as English schools have become ever more exam driven.

And also Tom Sherrington, head teacher of Highbury Grove school.

If anything, we have a strong orientation towards exam preparation; exams are not as content free as some people suggest.

Teaching knowledge and teaching to the test are seen as similar things – but what I want to argue is that they’re actually very different.

I think teaching knowledge and direct teacher instruction are good things – but that teaching to the test is a really bad idea. I also think, perhaps slightly counter-intuitively, that teaching to the test is more likely to happen when you don’t focus on teaching knowledge. It’s when you try and teach generic skills that you end up teaching to the test.

First of all, what is teaching to the test and why is it bad? I’ve written at length about this here, but briefly, teaching to the test is bad because no test in the world can directly measure everything we want pupils to know and be able to do. Instead, tests select a smaller sample of material and use that to make an inference about everything else. If we focus teaching on the small sample, two bad things happen. One, the results a pupil gets are no longer a valid guide to their attainment in that subject. Two, we stop teaching important things that aren’t on the test, and start teaching peripheral things that are on the test. My favourite example of this is a history one. A popular exam textbook on interwar Germany doesn’t mention Bismarck, and barely mentions Kaiser Wilhelm II. It does have lengthy sections on how to answer the 4-mark and 8-mark question. That’s teaching to the test.

Direct instruction and teaching knowledge are very different from this. Direct instruction is about breaking a skill down into its smallest components, and getting pupils to practise them. Teaching knowledge is about identifying the really important knowledge pupils need to understand the world they live in, and teaching that.

A knowledge-based approach to teaching inter-war Germany would teach lots of key dates and facts and figures about not just about inter-war Germany, but about, for example, the growth of nationalism in 19th century Europe.

One possible difficulty with the knowledge-based, direct instruction approach is identifying what knowledge you should teach, and in what way you should break down complex skills. For example, I’ve said that to understand inter-war Germany, you should teach 19th century Europe and Bismarck – but am I right? How do you decide what content you need? And given that we presumably expect pupils to be able to write historical essays, surely some direct instruction in the 4-mark question, say, is valuable? This question – what should we expect pupils to memorise – is the subject of the next post.

Why didn’t Assessment for Learning transform our schools?

This is part 1 of a series of blogs on my new book, Making Good Progress?: The future of Assessment for Learning. Click here to read the introduction to the series.

Giving feedback works. There is an enormous amount of evidence that shows this, much of it summarised in Black and Wiliam’s Inside the Black Box.  The importance of giving feedback was the rationale behind the government-sponsored initiative of Assessment for Learning, or AfL. Yet, nearly twenty years after the publication of Inside the Black Box, and despite English teachers saying they give more feedback to pupils than nearly every comparable country, most metrics show that English education has not improved much over the same period. Dylan Wiliam himself has said that ‘there are very few schools where all the principles of AfL, as I understand them, are being implemented effectively’.

How has this happened?

My argument is that what matters is not just the act of giving feedback, but the type and quality of the feedback. You can give all the feedback you like, but if it doesn’t help pupils to improve, it doesn’t matter. And over the past twenty years or so, the feedback teachers were encouraged to give was based on a faulty idea of how pupils learn: the idea that pupils can learn generic skills.

National curriculum levels, the assessing pupil progress grids, the interim frameworks and various ‘level ladders’ are all based on the assumption that there were generic skills of analysis, problem-solving, inference, mathematical awareness and scientific thinking, etc., that could be taught and improved on. In these systems, all the feedback pupils get is generic. Teachers were encouraged to use the language of the level descriptors to give feedback, meaning that pupils got abstract and generic comments like: ‘you need to develop explanation of inferred meanings drawing on evidence across the text’ or ‘you need to identify more features of the writer’s use of language’.

Unfortunately, we know that skill is not something that can be taught in the abstract. We all know people who are good readers, but their ability to read and infer is not an abstract skill: it is dependent on knowledge of vocabulary and background information about the text.

What this means is that whilst statements like ‘you need to identify more features of the writer’s use of language’ might be an accurate description of a pupil’s performance, these statements are not actually going to help them improve. What if the pupil didn’t know any features to begin with? What if the features they knew weren’t present in this text?

Generic feedback is descriptive, not analytic. It’s accurate, but it isn’t helpful. It tells pupils how they are doing, but it does not tell them how to get better. For that, they need something much more specific and curriculum-linked. In fact, in order to give pupils more helpful feedback, they need to do more helpful, specific and diagnostic tasks. If you try to teach generic skills, and only give generic feedback, you will end up always having to use assessments that have been designed for summative purposes. That is, you will end up over-testing and teaching to the test.

Teaching to the test, and the vexed question of whether it is a good or a bad thing, will be the subject of the next post.

Making Good Progress?: The future of Assessment for Learning

In February, my second book is going to be published by Oxford University Press. It’s called Making Good Progress?: The future of Assessment for Learning. 

It is the assessment follow-up to my first book, Seven Myths about Education, which was about education more generally. In Seven Myths about Education, I argued that a set of flawed ideas had become dominant in education even though there was little evidence to back them up. Broadly speaking, I argued that knowledge and teacher-led instruction had been given an undeserved bad reputation, and that the research evidence showed that knowledge, practice and direct instruction were more likely to lead to success than discovery and project-based learning.

The hardest questions I had to answer about the book were from people who really liked these ideas, and wanted to know how they could create an assessment system which supported them.  Certain kinds of activities, lessons and assessment tasks simply didn’t work with national curriculum levels. For example, discrete grammar lessons, vocabulary quizzes, multiple choice questions, and historical narratives were hard, if not impossible, to assess using national curriculum levels. Many schools required every lesson, or every few lessons, to end with an activity which gave pupils a level: e.g., at the end of this lesson, to be a level 4a, you need to have done x, to be a 5c, you need to have done y, to be a 5b, you need to have done z. This type of lesson structure had become so dominant as to feel completely natural and inevitable. But actually, it was the product of a specific set of questionable beliefs about assessment, and it imposed huge restrictions on what you could teach. In short, the assessment system was exerting a damaging influence on the curriculum, and that influence was all the more damaging for being practically invisible.

Over the last four years, in my work at Ark Schools, I have been lucky enough to have the time to think about these issues in depth, and to work on them with some great colleagues. Making Good Progress is a summary of what I have learnt in that time. It isn’t a manual about one particular assessment system. But it does contain all the research and ideas I wish I had known about when I first started thinking about this. In the next seven blog posts, I will outline a few brief summaries of some of the ideas it contains. Here they are.

  1. Why didn’t AfL transform our schools?
  2. Teaching knowledge or teaching to the test?
  3. What makes a good formative assessment?
  4. Is all drill good?
  5. How can we bridge the knowing-doing gap?
  6. How can we measure progress in individual lessons?
  7. How do bad ideas about assessment lead to workload problems?

Herbert Simon and evidence-based education

Who is Herbert Simon?

Herbert Simon was one of the great scholars of the twentieth century, whose discoveries and inventions ranged from political science (where he began his career) to economics (in which he won a Nobel Prize) to computer science (in which he was a pioneer) and to psychology.

Simon was one of the towering intellectual figures of the twentieth century. He wrote a classic on decision making in organizations while still in his twenties, and among many other achievements he went on to be one of the founders of the field of artificial intelligence, a leader in cognitive science, an influential student of the process of scientific discovery, a forerunner of behavioral economics and, almost incidentally, a Nobel laureate in economics.

Those quotations are both taken from Daniel Kahneman’s Thinking, Fast and Slow. Kahneman is himself a Nobel Laureate for his work on decision making. Kahneman goes on to say of Simon that he is

perhaps the only scholar who is recognized and admired as a hero and founding figure by all the competing clans and tribes in the study of decision making.

As well as the fields which Kahneman lists, Simon also made some contributions to education. Of particular significance for primary and secondary educators is this paper, which Simon wrote with John Anderson and Lynne Reder in 2000, shortly before he died in 2001. It is about mathematics education, but it has applications for all subjects. It is strikingly critical of some very popular educational practices and recommends other practices which frequently get a bad name. For example:

He criticises authentic, real-world learning tasks.

Contrary to the contention that knowledge can always be communicated best in complex learning situations, the evidence shows that: A learner who is having difficulty with components can easily be overwhelmed by the processing demands of a complex task. Further, to the extent that many components are well mastered, the student wastes much time repeating these mastered components to get an opportunity to practice the few components that need additional effort. There are reasons sometimes to practice skills in their complex setting. Some of the reasons are motivational and some reflect the skills that are unique to the complex situation. While it seems important both to motivation and to learning to practice skills from time to time in full context, this is not a reason to make this the principal mechanism of learning.

He defends drill, in the face of criticisms that it drives out understanding

This criticism of practice (called “drill and kill,” as if this phrase constituted empirical evaluation) is prominent in constructivist writings. Nothing flies more in the face of the last 20 years of research than the assertion that practice is bad. All evidence, from the laboratory and from extensive case studies of professionals, indicates that real competence only comes with extensive practice.

He rejects discovery learning, and praises teacher instruction

When, for whatever reason, students cannot construct the knowledge for themselves, they need some instruction. The argument that knowledge must be constructed is very similar to the earlier arguments that discovery learning is superior to direct instruction. In point of fact, there is very little positive evidence for discovery learning and it is often inferior (e.g., Charney, Reder & Kusbit, 1990). Discovery learning, even when successful in acquiring the desired construct, may take a great deal of valuable time that could have been spent practicing this construct if it had been instructed. Because most of the learning in discovery learning only takes place after the construct has been found, when the search is lengthy or unsuccessful, motivation commonly flags.

Simon is also critical of the state of education research.

New “theories” of education are introduced into schools every day (without labeling them as experiments) on the basis of their philosophical or common-sense plausibility but without genuine empirical support.

We see that influential schools have arisen, claiming a basis in cognitive psychology… but which have almost no grounding in cognitive theory and at least as little grounding in empirical fact. This is particularly grievous because we think information-processing psychology has a lot to offer to mathematics education.

So, for instance, in the 1993 draft of the NCTM assessment standard for school mathematics, we find condemnation of the “essentialist view of mathematical knowledge” which assumes “mathematics consists of an accumulation of mathematical concepts and skills” (p.12). We can only say we find frightening the prospect of mathematics education based on such a misconceived rejection of componential analysis.

He is also optimistic that the findings of cognitive psychology can offer a basis for a better understanding of teaching and learning.

Human beings have been learning, and have been teaching their offspring, since the dawn of our species. We have a reasonably powerful “folk medicine,” based on lecturing and reading and apprenticeship and tutoring, aided by such technology as paper and the blackboard–a folk medicine that does not demand much knowledge about what goes on in the human head during learning and that has not changed radically since schools first emerged. To go beyond these traditional techniques, we must follow the example of medicine and build (as we have been doing for the past thirty or forty years) a theory of the information processes that underlie skilled performance and skill acquisition: that is to say, we must have a theory of the ways in which knowledge is represented internally, and the ways in which such internal representations are acquired. In fact, cognitive psychology has now progressed a long way toward such a theory, and, as we have seen, a great deal is already known that can be applied, and is beginning to be applied, to improve learning processes.

Anyone working in the field of evidence-based education needs to consider Simon’s work and this article very seriously.

Research Ed 2016: evidence-fuelled optimism

One of the great things about the Research Ed conferences is that whilst their aim is to promote a sceptical, dispassionate and evidence-based approach to education, at the end of them I always end up feeling irrationally excited and optimistic. The conferences bring together so many great people and ideas that it’s easy to think educational nirvana is just around the corner. Of course, I also know from the many Research Ed sessions on statistics that this is a sampling error: the 750+ people at Capital City Academy yesterday are entirely unrepresentative of just about anything, and educational change is a slow and hard slog, not a skip into the sunlit uplands. Still, I am pretty sure there must be some research that says if you can’t feel optimistic at the start of September, you will never make it through November and February.

And there was some evidence that the community of people brought together by Research Ed really are making a difference, not just in England but in other parts of the world too. One of my favourite sessions of the day was the last one by Ben Riley of the US organisation Deans for Impact, who produced the brilliant The Science of Learning report. Ben thinks that English teachers are in the vanguard of the evidence-based education movement, and that we are way ahead of the US on this score.   One small piece of evidence for this is that a quarter of the downloads of The Science of Learning are from the UK. There clearly is a big appetite for this kind of stuff here. In the next few years, I am really hopeful that we will start to see more and more of the results and the impact of these new approaches.

Here’s a quick summary of my session yesterday, plus two others I attended.

My session

For the first time, I actually presented some original research at Research Ed, rather than talking about other people’s work. Over the last few months, I have been working with Dr Chris Wheadon of No More Marking on a pilot of comparative judgment of KS2 writing. We found that the current method of moderation using the interim frameworks has some significant flaws, and that comparative judgment delivers more reliable results with fewer distortions of teaching and learning. I will blog in more depth about this soon: it was only a small pilot, but it shows CJ has a lot of promise!

Heather Fearn

Heather (blog here) presented some research she has been working on about historical aptitude. What kinds of skills and knowledge do pupils need to be able to analyse historical texts they have never seen before, or comment on historical eras they have never studied? The Oxford Historical Aptitude Test (HAT) asks pupils to do just that, and I have blogged about it here before. In short, I think it is a great test with some bad advice, because it constantly tells pupils that they don’t have to know anything about history to be able to answer questions on the paper. Heather’s research proved how misleading this advice was. She got some of her pupils to answer questions on the HAT, and then analysed their answers and looked at the other historical eras they had referred to in order to make sense of the new ones they encountered on the HAT. Pupils were much better at analysing eras, like Mao’s China, where comparisons to Nazi Germany were appropriate or helpful. When asked to analyse eras like 16th century Germany, they fell back on to anachronisms such as talking about ‘the inner city’, because they didn’t really have a frame of reference for such eras.

This is a very very brief summary of some complex research, but I took two implications from it, one for history teachers, and one for everyone. First, the more historical knowledge pupils have, the more sophisticated analysis they can make and they more easily they are able to understand new eras of history. Second, there are profound and worrying consequences of the relentless focus in history lessons on the Nazis. Heather noted that her pupils were great at talking about dictatorships and fascism in their work, but when they had to talk about democracy, they struggled because they just didn’t understand it – even though it was the political system they had grown up with. This seems to me to offer a potential explanation of Godwin’s Law: we understand new things by comparing them to old things; if we don’t know many ‘old things’ we will always be forcing the ‘new things’ into inappropriate boxes; if all we are taught is the Nazis, we will therefore end up comparing everything to them. I think this kind of research shows we need to teach the historical roots of democracy more explicitly – perhaps by focussing more on eras such as the ancient Greeks, and the neglected Anglo-Saxons.

Ben Riley

Ben is the founder of Deans for Impact, a US teacher training organisation.  The Science of Learning, referenced above, is a report by them which focusses on the key scientific knowledge teachers need to understand how pupils learn. In this session, Ben presented some of their current thinking, which is more about how teachers learn. Their big idea is that ‘deliberate practice’ is just as valuable for teachers as it is for pupils. However, deliberate practice is a tricky concept, and one that requires a clear understanding of goals and methods. We might have a clear idea of how pupils make progress in mathematics. We have less of an idea of how they make progress in history (as Heather’s research above shows). And we probably have even less of a clear idea of how teachers make progress. Can we use deliberate practice in the absence of such understanding? Deans for Impact have been working with K Anders Ericsson, the world expert on expertise, to try and answer this question. I’ve been reading and writing a lot about deliberate practice over the last few months as part of the research for my new book, Making Good Progress?, which will be out in January. In this book, I focus on using it with pupils. I haven’t thought as much about its application to teacher education, but there is no doubt that deliberate practice is an enormously powerful technique which can lead to dramatic improvements in performance – so if we can make it work for teachers, we should.

Comparative judgment: practical tips for in-school use


I have blogged a bit before about comparative judgment and how it could help make marking more efficient and more reliable, and help to free the teaching of writing from tick box approaches. I think CJ has the potential to be used for national assessments – that’s why I’m working with Dr Chris Wheadon and the No More Marking team on a national experiment to moderate Key Stage 2 writing assessments using comparative judgement. However, whatever happens nationally there are ways you can use CJ within your own school. The No More Marking website allows you to use comparative judgment for free. The website is very easy to use and it is definitely worth a go if you are interested.  Here are some suggestions based on what we have done at Ark, plus some practical tips.

Primary writing assessments

There are no interim frameworks outside Y2 & Y6, so using CJ for writing in Y1, 3, 4, & 5 feels as good a solution as any. If you want to try and measure progress across these years, you could get pupils to do a task now and judge it now. Then keep the tasks and use them again in a judging session at Christmas or this time next year and see how they compare to the same pupils’ work at that point in time. We haven’t done this yet but it feels like a really powerful way of showing pupils the progress they are making. I know lots of schools already keep portfolios of pupils’ work across time, so you could use these to start the process.

KS3 English assessments

I’ve personally found it easier to judge writing assessments than to judge literature essays. Others have said the same. We recently got some very high reliability scores when judging a set of allegories that had been written by our Y8 pupils.

KS4 English exams

We haven’t used CJ for KS4 tasks yet, but it would certainly be possible. You could try and judge entire exams, or just pick out individual questions. I feel that individual questions would be easier to judge, and that you would get more accurate results for them. I think you would also get more interesting discussions and feedback afterwards when sharing the results.

KS3 history assessments

CJ has worked just as well for us in history, although again, I found judging history essays to be harder and slightly more time consuming than judging writing tasks. We did some judging on essays on the Battle of Hastings. This is a classic Y7 task and it was interesting for me to see the different ways different teachers had approached it.

And here are some general practical tips

  • Go to No More Marking, set up an account for free, and then, on the dashboard, create a new task.
  • You will need to upload all the scripts from your pupil. You can upload them as a jpg or pdf. If the pupil work is on paper, you’ll need to scan it – if you have a copier with this facility that shouldn’t be too difficult. The slightly fiddly bit is making sure every separate pdf or jpg has a pupil name or identifier in the title. This means that when you get the results, you will be able to easily see which pupil has which mark. Alternatively, you can use the QR coded answer sheets that nomoremarking.com provides. The bar coded sheets automatically recognise which pupil is which from the bar code on the scan and match them to their results.
  • How many judgments do you want per script? If reliability is very important, you will want 10 judgments per script. If it is less important, you can get away with 5. It can feel nice to aim high and try and get a reliability score of 0.85 or more, but there are a couple of things to consider. First, what type of reliability are you getting at the moment without comparative judgment? You probably don’t even know or have a way of finding out. So if you can get a reliability score of 0.75 from doing 5 judgments, that’s more than likely to be an improvement on what you are doing currently. You might be able to get up to 0.9 by doubling the judgments, but you will need to consider whether it is worth doubling the amount of time. It will depend on what you are using the results for. I am starting to think that in some cases, doing 5 judgments per script as a quick sift and then meeting as a team to discuss the results and set standards might be the best way forward.
  • Do you want to do the judging together as a group, or send out the links to people? To begin with, I’ve found it quite powerful to have people in the same room doing the judging. Whatever you choose to do, I also think it is worth having a group follow up session where you discuss the scripts and think about why certain scripts were better than others, and what the teaching and learning implications. As I have said before, the two immediate benefits of CJ are that it saves time and it is more reliable. But the longer term benefit is freeing teaching from the tick box. If you don’t meet after to discuss the results and implications, then you are making it harder to achieve that.
  • Do you want to include exemplars or not? These can make it easier to apply the standards or grades once you have the results. But I would only use this if you are sure they are at a certain standard. Also, be careful – if you are putting in a script that you feel sure is a C grade, do you think it is a top or bottom C? The C-grade is a large grade so you need to be sure.
  • I would recommend trying to get scripts from more than one class to begin with (if possible even more than one school). One of the nice things about the CJ tasks we have done is how they make it quick and easy for teachers to see how other teachers and pupils have attempted similar tasks.

A new blog you need to follow

A good friend of mine, Maria Egan, has just set up a new education blog. It’s called the Razor Blade in the Candy Floss.

Maria has been an enormous influence on my thinking and writing so I am really pleased she has set up this blog, although it does mean I won’t be able to pass off her ideas as my own any more. I first met Maria when she was a teacher at my school in my final year at 6th form. I actually gave her the nickname ‘Razor Blade in the Candy Floss’ because of the habit she had of saying things that seemed completely innocuous but turned out to contain a bit of a sting in the tail – often a sting you only realised a couple of hours later while you were trying to avoid doing your homework.

Maria is also a bit of a dark horse because it turns out she has been running this blog on her school’s intranet for a while but I never knew about it, even though we have long discussions about education (this may have something to do with the fact that a lot of our discussions degenerate into monologues by me). The most recent post is a review of David Brooks’s A Road to Character. I recommended this book to Maria because I really liked it and thought it was quite insightful. However, as ever, Maria has a way of pointing out some of the flaws in it in a way that makes me feel slightly credulous.

And although I agree with Brooks that the ‘me me me’ culture is deplorable, I think that the underlying premise of the book is a bit pessimistic.  Are all the world’s greatest people really dead?  Is there an ever dwindling number of people with personality traits we could aspire to emulate?  Am I really going to start attending funerals where the nicest thing anyone can think to say about the deceased is that she had a growth mindset?  Life is different from before; people are different from before, but I’m not despairing about the current generation or future ones.  I couldn’t teach if I were.

So, in short: follow that blog!