Skip to content

Problems with performance descriptors

A primary teacher friend recently told me of some games she and her colleagues used to play with national curriculum levels. They would take a Michael Morpurgo novel and mark it using an APP grid, or they would take a pupil’s work and see how many different levels they could justify it receiving. These are jokes, but they reveal serious flaws. National curriculum levels are, and always have been, vague and unhelpful.

For example, compare:

Pupils’ writing is confident and shows appropriate and imaginative choices of style in a range of forms.

Pupils’ writing in a range of forms is lively and thoughtful.

The first is a description of performance at level 7, the second at level 4. That’s what I mean about vague and unhelpful, and that’s why my friend was able to justify the same piece of work receiving several different levels.

However, what is frustrating is that many of the replacements for national curriculum levels rely on precisely the same kind of vague performance descriptions. In fact, in many conversations I have with people, they cannot even begin to imagine an assessment system that doesn’t use some form of descriptor. For many people, descriptors simply are assessment, and if a school is to create its own assessment system, then the first – and possibly last – step must surely involve the creation of a new set of descriptors.  Unfortunately, the truth is very different: as I’ve written here, descriptors do not give us a common language but the illusion of a common language. They can’t be relied on to deliver accuracy or precision about how pupils are doing. In this post, I will recap the problems with descriptors; in the next, I will suggest some alternatives.

First, Tim Oates shows here that creating accurate prose descriptions of performance, even in subjects like maths and science, is fiendishly difficult.

Even a well-crafted statement of what you need to get an A grade can be loaded with subjectivity – even in subjects such as science. It’s genuinely hard to know how difficult a specific exam is.

Second, Dylan Wiliam shows here in Principled Assessment Design that even very precise descriptors can be interpreted in completely different ways.

Even in subjects like mathematics, criteria have a degree of plasticity. For example, a statement like ‘Can compare two fractions to identify which is larger’ sounds precise, but whether students can do this or not depends on which fractions are selected. The Concepts in Secondary Mathematics and Science (CSMS) project investigated the achievement of a nationally representative group of secondary school students, and found out that when the fractions concerned were 3/7  and 5/7  then around 90% of 14-year-olds answered correctly, but when more typical fractions, such as 3/4 and 4/5  were used, then 75% answered correctly. However, where the fractions concerned were 5/7 and 5/9 then only around 15% answered correctly (Hart, 1981).

Finally, Paul Bambrick-Santoyo makes a very similar point in Driven by Data. I’ve abridged the below extract.

 To illustrate this, take a basic standard taken from middle school math:

Understand and use ratios, proportions and percents in a variety of situations.

To understand why a standard like this one creates difficulties, consider the following premise. Six different teachers could each define one of the following six questions as a valid attempt to assess the standard of percent of a number. Each could argue that the chosen assessment question is aligned to the state standard and is an adequate measure of student mastery:

Identify 50% of 20.

Identify 67% of 81

Shawn got 7 correct answers out of 10 possible answers on his science test. What percent of questions did he get correct?

J.J Redick was on pace to set an NCAA record in career free throw percentage. Leading into the NCAA tournament in 2004, he made 97 of 104 free throw attempts. What percentage of free throws did he make?

Bambrick-Santoyo goes on to give examples of two even more difficult questions. As with the Dylan William example, we can see that whilst 90-95% of pupils might get the first question right, many fewer would get the last one right.

The problems with the vagueness and inaccuracy of descriptors are not just a problem with the national curriculum levels. It is a problem associated with all forms of prose descriptors of performance. The problem is not a minor technical one that can be solved by better descriptor drafting, or more creative and thoughtful use of a thesaurus. It is a fundamental flaw. I worry when I see people poring over dictionaries trying to find the precise word that denotes performance in between ‘effective’ and ‘original’. You might find the word, but it won’t deliver the precision you want from it. Similarly, the words ’emerging’, ‘expected’ and ‘exceeding’ might seem like they offer clear and precise definitions, but in practice, they won’t.

So if the solution is not better descriptors, what is the answer? Very briefly, the answer is for the performance standards to be given meaning through a) questions and b) pupil work. I will expand on this in a later post.

What do exams and opinion polls have in common?

A lot.

Daniel Koretz, Professor of Education at Harvard University, uses polls as an analogy to explain to people how exams actually work. Opinion polls sample the views of a small number of people in order to try and work out the views of a much larger population. Exams are analogous, in that they feature a small sample of questions from a much wider ‘domain’ of knowledge and skill. In Measuring Up, Koretz says this:

The full range of skills or knowledge about which the test provides an estimate – analogous to the votes of the entire population of voters in [an opinion poll] – is generally called the domain by those in the trade. Just as it is not feasible for the pollster to obtain information from the entire population, it is not feasible for a test to measure an entire domain exhaustively, because the domains are generally too large. Instead we create an achievement test, which covers a small sample from the domain, just as the pollster selects a small sample from the population.

Since first reading Koretz’s book (see my review here) I’ve used the analogy quite a lot. I used it this week to explain something to a colleague. She stopped and looked at me like I was crazy. ‘Daisy’, she said, ‘I think you need to get a new analogy.’

I know what she means. After a week where opinion polls have been torn to shreds over their failure to predict the result of the 2015 UK general election, it seems perverse for me to keep using them as an analogy. But actually, the failure of the recent opinion polls makes the analogy all the more useful, because just as opinion polls can and do get things wrong, we need to acknowledge that the similar structure of exams means that they can and do get things wrong too. Exams are susceptible to the same kinds of errors that opinion polls are. In fact, in one way, exams are even more susceptible to error than opinion polls. In the case of opinion polls, we can check the validity of the poll result because eventually the domain is measured, in the form of the election. In the case of exams, there is no final equivalent measure of the domain.  Imagine if there never was an election, and all we ever had were opinion polls of differing types. That’s what exams are like.

Plenty of reasons have been put forward for the failure of the polls in this general election. One of the most popular is the idea that, for whatever reason, people did not tell the pollsters who they were really planning to vote for. The analogy with tests would be where a pupil is, for whatever reason, not interested in answering the items on the test to the best of their ability. In Koretz’s words,

Just as the accuracy of a poll depends on respondents’ willingness to answer frankly, the accuracy of a test score depends on the attitudes of the test-takers – for example, their motivation to perform well.

Whilst this may have been the problem with the UK opinion polls, I don’t think it is a major problem with UK tests. Enough important outcomes depend on the tests for pupils to be motivated to do well on them. Of course, we should never forget that variation in performance on the day is always one of the major factors in exam-score unreliability, and pupils who feel under too much pressure to perform may fail to produce their best work. But by and large, I don’t think this is the major problem with tests at the moment. However, there are two aspects of this sample and domain structure which I think do cause serious problems.

Here is Koretz again:

In the same way that the accuracy of a poll depends on seemingly arcane details about the wording of survey questions, the accuracy of a tests score depends on a host of often arcane details about the wording of items, the wording of ‘distractors’ (wrong answers to multiple choice items), the difficulty of the items, the rubric (criteria and rules) used to score students’ work, and so on…If there are problems with any of these aspects of testing, the results from the small sample of behaviour that constitutes the test will provide misleading estimates of students’ mastery of the larger domain. We will walk away believing that Dewey will beat Truman after all. Or, to be precise, we will believe that Dewey did beat Truman already.

Or, if I can update the analogy, we will believe that Ed Miliband is currently in negotiations with Nicola Sturgeon and Nick Clegg about forming the next government, and David Cameron is sunbathing in Ibiza. A great blog post here by Dan Hodges (written before the election) explains some of the ways in which some of the arcane details of opinion polls can be manipulated to get a certain result. Similarly, changes in arcane details of exam structure can change the value of the results we get from them. For me, there are two particular problems: poor test design, and teaching to the test. I’ve written about the problems of poor test design and teaching to the test on my blog before, here. I also have an article being published soon in Changing Schools where I discuss this at greater length. Here, I will add just one more point, about coursework. Coursework and controlled assessments are a perfect example of the problems with poor test design and teaching to the test.

The essential problem with coursework and controlled assessment is that they allow the teacher and pupils to know what the test sample is in advance. When I taught Great Expectations as a coursework text, I knew what the final ‘sample’ from the novel was –  the final essay would be on the first chapter of the novel. To extend the opinion poll analogy, it’s as if the final result of the election depended not on an vote of the population, nor on an anonymous sample of the population, but on a sample of 1,000 voters whose names were known in advance to all of the political parties. Even if the sample were well chosen, this would clearly be problematic. There would be an obvious incentive to neglect the views of anyone not in that sample. Some political parties might not want to behave so dishonourably, but they would almost certainly be forced to as if they didn’t, another party would do so and therefore win the ‘election’. I think the analogies with coursework and teaching are clear.  A teacher might not want to focus their instruction on the first chapter of Great Expectations, as they realise that doing so will not help pupils in the long run, particularly those pupils who want to study English Literature at A-level. But if they think that every other teacher is doing so, they may feel that they have no choice, as those pupils who do get the targeted instruction will probably get better grades. Thus, poor test design, in the shape of coursework and controlled assessment, encourages teaching to the test and distorts the validity of the final test score. Obviously, the whole problem is exacerbated by high-stakes testing, which places great weight on those test scores. But I think the poor test design is a major feature here too, and hopefully the analogy with opinion polls makes it clearer why this is such a problem.

Research Ed, Riverdale School, New York

Every single research ED conference I’ve been to has been amazing, but this one, for me, was the best yet. Mainly that’s because I got to hear new voices – either people who were completely new to me, or people I’ve read and heard a lot about, but never met before. I love the research ED format of having lots of short speeches: it forces the speakers to distil their message, and means that you are able to pack an incredible amount in. I learnt so much from the day and I have taken away a long reading list I need to follow up with. Here’s my summary of the day.

Session 1: John Mighton
John Mighton is founder of JUMP, a maths programme. He gave an overview of the programme and the research behind it, and some of the success it’s had. He also quoted one of my favourite papers – Herbert Simon’s Applications and Misapplications of Cognitive Psychology – and referred to another one I hadn’t heard of by Louis Alfieri: Does Discovery-Based Instruction Enhance Learning?  JUMP has mainly been used in Canadian schools, but I was surprised to hear it had been trialled in my home borough of Lambeth, in London. I need to find out more about that too!

Session 2: Me
I was speaking in this session (about my book), and annoyingly had been scheduled against two edu legends: Tom Bennett, founder of research ED, and Valarie Lewis, former principal of the incredible Core Knowledge School, PS 124. I visited Valarie’s school two years ago, when she was still principal. It is amazing, as is she – fortunately I was able to talk to her afterwards and get a potted summary of her session.

Session 3: Gary Jones & Betsy and Diana Sissons
I did want to see Betsy and Diana Sissons in this session, but unfortunately the room was full up by the time I got there! However, I saw some tantalising tweets from their session, which was about teaching vocabulary. I definitely need to read more about their work. Gary Jones is a fellow Brit, and spoke about evidence-informed research, with reference to some of Chris Brown’s work on this.

Session 4: Ben Riley
This was probably my favourite session of the day. I had corresponded with Ben before the conference about the work he is doing with Deans for Impact. Ben has brought together a great team of people, including Dan Willingham and Paul Bruno, to design a teacher training curriculum based on the principles of cognitive science. As well as introducing some of the elements of the curriculum to us, Ben also talked a bit about the history of other professions and the ways they have changed and adopted new ideas. This is something I have been thinking about a lot lately, and which has relevance to a lot of discussions we’re having in the UK at the moment, particularly those around a Royal College of Teaching. Ben also referenced a book about the history of medicine by Kenneth Ludmerer, and one about the history of management by Mie Augier and James March.

Session 5: Angela Logan-Smith
Angela is another amazing Core Knowledge head teacher, of Goldie Maple school in Queens. I was also lucky enough to visit her school two years ago, and again, it was great to hear about all the success her school has had. I was particularly interested to hear that the teachers in her K-8 school now specialised according to subject, instead of grade level. That may be one of the implications of a knowledge-rich primary curriculum.

Session 6: Karin Chenoweth
This was an unexpected bonus of a session. I hadn’t heard of Karin before and wasn’t really sure what the session would be about, but it was fantastic and I found myself nodding along at every point. Karin had done lots of interviews with successful leaders in schools in challenging circumstances (including Angela at Goldie Maple), and tried to distil what makes them successful. For example, successful heads keep a ‘laser-like focus on what pupils need to learn’, and for them, ‘evidence trumps opinion.’ Her presentation included videos of her interviews with staff and children at some of these schools, including one lovely video of a fifth grade pupil talking about how much she loved learning new things. I found out later that Karin had actually written a nice article about my book on the HuffPo website last year.

Session 7: Dan Willingham
This was the session we had all been waiting for, and despite that, it still managed to exceed those expectations. What was so amazing about Dan’s presentation is that it felt like we were hearing something really new – it wasn’t on the subject of any of his three books, but was instead called ‘the challenge of persuading believers’. The first half was a review of the research on the cognitive biases we are all subject to, and the second was made up of Dan’s own advice and tips on trying to change people’s minds, given the nature of these biases. It was genuinely fascinating, and I didn’t want it to end. On the first half, the subject of the cognitive biases we are all prey to, I am always reminded of the TS Eliot line: Humankind cannot bear very much reality. On the second half, I have been mulling over Dan’s advice ever since. He recommends picking your battles – sometimes it’s better to be at peace than to be right, and also that persuasion takes time – often, all you can do in one conversation is to plant a seed.

Session 8: Robert Janke
This was another unexpected bonus. Robert spoke about some of the common errors, biases and flaws in statistics and research. He also gave us all a helpful handout with some of the most common examples. He is the author of Errors in Evidence-Based Decision Making: Improving and Applying Research Literacy.

As well as all of these sessions, there were other highlights. I got to meet Paul Bruno for the first time, and Robert Craigen for the second. Eric Kalenze gave me his book, which I’m really looking forward to reading. I didn’t hear Lucy Crehan speak, as I was trying hard not to go to the sessions of people I can easily hear at home! But her session sounded great, and her project is a fantastic one. I missed Mary Whitehouse’s session for the same reason, but was delighted to be able to talk to her more than I ever have done in England. Dominic Randolph, the head of Riverdale school, was a wonderful host, and Riverdale school the most beautiful place. All in all it was phenomenal, and has certainly planted a lot of seeds in my mind.

Will big data transform education?

I think technology has great potential to transform education, but I am frustrated by how ineffective so much educational technology really is. For more on this, see my Guardian article here. Recently, I read a fascinating book about how big data could transform education, which described a lot of what I think are the more effective uses of education technology. It’s called Learning with Big Data, by Kenneth Cukier and Viktor Mayor-Schonberger, and it gives some really good examples of how data analysis of in-class formative assessment could improve teaching.

The best part of the book is where it follows the work of Professor Ng, a computer scientist at Stanford who is a co-founder of Coursera.

By tracking homework and tests done on a computer or tablet, he can identify specific areas where a student needs extra help. He can parse the data across the entire class to see how the whole cohort is learning, and adjust his lessons accordingly. He can even compare that information with other classes from other years, to determine what is most effective…For example, in tracking the sequence of video lessons that students see, a puzzling anomaly surfaced. A large fraction of students would progress in order, but after a few weeks of class, around lesson 7, they’d return to lesson 3. Why?

He investigated a bit further and saw that lesson 7 asked students to write a formula in linear algebra. Lesson 3 was a refresher class on math. Clearly a lot of students weren’t confident in their math skills. So Professor Ng knew to modify his class so it could offer more math review at precisely those points when students tend to get discouraged— points that the data alerted him to.

This, I think, is very powerful stuff. It really does have the potential to dramatically improve teaching and learning and to help identify what the most effective teaching methods are. There is more on this type of activity in the book, as well as interviews with the founders of Duolingo and Khan Academy, two of my favourite educational apps.

What I liked less about that book is that at times, it fell into lazy and entirely erroneous clichés of the ‘Shift Happens’ sort. The worst example was a throwaway comment in an otherwise excellent discussion how Professor Ng uses quizzes.

He interlaces the video classes with pop quizzes. It’s not to see if his charges are paying attention; such archaic forms of classroom discipline don’t concern him. Instead, he wants to see if they’re comprehending the material— and if they’re getting stuck, exactly where, for each person individually.

So, according to this, checking to see if pupils are paying attention is an archaic form of classroom discipline which great educators should not concern themselves with. Really? Not only does this just feel wrong, it also contradicts lots of the modern (and very un-archaic) research on the topic. Paying attention to things is how we remember them, and remembering things is how we learn. If we don’t pay attention, we don’t learn. In fact, research on the importance of paying attention is probably the most practically useful research for teachers. Dan Willingham goes so far as to say that ‘the most general and useful idea that cognitive psychology can offer teachers’ is to ‘review each lesson plan in terms of what the student is likely to think about’. Far from being an archaic form of classroom discipline, making sure that students pay attention is absolutely vital. One of the main reasons why the pop quizzes described above are so powerful is precisely because they help focus attention on the right things.

To be fair to this book, it is relatively free of this kind of error – certainly much freer than most of the books and articles I read about ed tech, where mere possession of an iPad will transform your intellectual capacities. This book does, for example, include the following important and salutary warning:

Learning will continue to require concentration, dedication, and energy.

But still, whilst the error about paying attention may be a brief one, it is there and it is important. Before we even start to think about how to use technology in the classroom, we need a clear understanding of what causes learning to happen. Only then can we start to think about how technology can enhance or improve that process. If we start with magical thinking about education, then we will end up applying technology in unhelpful ways, and the technology itself will get a bad name amongst many educators. We can see this happening at the moment – because so many uses of technology are based on misapprehensions and fail badly, many teachers become sceptical about all uses of technology. So, whilst I remain a fan of edtech, books like this actually convince me that whilst it’s important, it’s a second-tier issue: the most important issue is to establish clearly what causes learning.

Maths facts other than times tables

Nicky Morgan’s comments today have started a debate over whether pupils really do need to have to learn their times tables by the end of primary. I think they should and I’m not going to rehearse the arguments here.

What I do want to do is to ask what other maths facts it’s useful for pupils to know by heart? The new national curriculum specifically says that pupils should memorise the number bonds up to 20 and the times tables up to 12, but are there other facts it is worthwhile memorising?

I’m going to start by saying fraction / decimal equivalences. I’m not talking about ones like 0.5, 0.25, etc, which are obviously helpful but which most pupils will just know (I hope!).  I also think that memorising the decimal equivalences of less common fractions is useful: in particular, fractions with denominators of 6, 7, 8, 12 and 15.  Very often newspaper articles and statistics you come across in everyday life are reported as fractions in these terms – for example, one in seven adults has a subscription to Netflix, or one in every 12 pounds is spent at Tesco (I made those up by the way). Being able to instantly flick back and forward from that to the percentage is really useful. The reverse is also useful. A lot of data are reported as precise percentages, and being able to easily mentally flick from this to a fraction often helps with understanding. If someone tells you that Andy Carroll wins 84% of aerial duels, it can help to think instead that that means he loses about one in every six of them. (Also a made-up stat).

It’s also a classic example of why it isn’t enough to know how to work it out. You might know how to convert a fraction to a decimal, but by the time you’d worked it out, you’d have forgotten what the context of the statistic was.  The person who did know that 1/12 is 8.3% can move on to considering whether Tesco’s dominant market share is a cause for concern, estimating the share of other big chains and wondering what that might look like as an absolute sum of money. As ever, knowing stuff off by heart enables critical thinking rather than stifling it.

Some other suggestions: the 75 times tables. The person who suggested this one did so as a technique for winning the numbers game on Countdown. I wouldn’t recommend that we reorganise education around winning TV quiz shows (god forbid) but since I took this advice and learnt my 75 times tables, I have found them useful in more ways than expected. I suspect this is the case with a lot of these things – it’s only once you learn them that you fully appreciate how useful they are. A bit of the Dunning-Kruger effect, perhaps.

Any maths teachers out there, please leave your suggestions in the comments. Square numbers? Other times tables?

Prosperity or democracy – why does education matter?

In modern discussions of education, the value of education is quite often defined in economic terms. We saw this very recently with Nicky Morgan’s comment that we could link subjects to later earnings to determine their ‘true worth’. The idea that public spending on education is justified by its impact on GDP is shared by many across the political spectrum.

There are huge problems with this consensus. First, as Alison Wolf showed in her brilliant book Does Education Matter?, the link between more years of education and greater GDP is not as clear-cut as it might seem. At a basic level, universal literacy and numeracy are important for the economy, and at the elite level, expertise in science and technology drives innovation. But beyond that, it is far more complex than some glib statements suggest. In Wolf’s words:

‘We know that basic literacy and numeracy matter a great deal, and that the labour market rewards mathematical skills. We also know that technical progress depends on the best scientific and technological research; but that there is no evidence that education spills over to raise productivity in a general, economy-wide way.’

Not only that, but as Wolf remarks at the very end of her book, the idea that education must always be justified and defended on purely economic terms is a very recent one.

‘Our preoccupation with education as an engine of growth has not only narrowed the way we think about social policy. It has also narrowed – dismally and progressively – our vision of education itself. This book reflects that narrowing…The contribution of education to economic life is an important subject, and an interesting subject, and it can actually be investigated empirically. But it is only one aspect of education, not the entirety, and it does not deserve the overwhelming emphasis which it now enjoys. Reading modern political speeches and official reports and then setting them alongside those of twenty-five, let alone fifty or a hundred, years ago is a revelation. Contemporary writers may pay a sentence or two of lip-service to the other objectives of education before passing on to their real concern with economic growth. Our recent forebears, living in significantly poorer times, were occupied above all with the cultural, moral and intellectual purposes of education. We impoverish ourselves by our indifference to these…The history of public education in any modern democratic state concerns issues of identity and citizenship quite as much as the instilling of more or less utilitarian skills…The role that schools play in creating citizens, and in passing on to new generations both an understanding of their own history and society and particular moral, intellectual or religious values, should concern any modern state with a public education system.’

Modern liberal-democratic societies depend on a well-informed and well-educated citizenry. They depend on knowledge: on people knowing the contours of contemporary debates, the functions of government, the history of civilisation, the difference between the supernatural and the natural, the language and literature of their society, and much more.

As Thomas Jefferson said:

If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be.

And R.H. Tawney:

No one can be fully at home in the world unless, through some acquaintance with literature and art, the history of society and the revelations of science, he has seen enough of the triumphs and tragedies of mankind to realize the heights to which human nature can rise and the depths to which it can sink.

I would also urge you to read this brilliant blog post , where @HeatherBellaF shows the impact of ignorance with reference to North and South, Henry Mayhew and Chinese history.

Education may be important for prosperity, but it is vital for democracy.

New report by the Sutton Trust: What Makes Great Teaching

Today the Sutton Trust and the University of Durham have published a fascinating new report called What Makes Great Teaching? It sets out to answer that title question, as well as looking at ways we can measure great teaching, and how that could be used to promote better learning. Here is my short summary of some key points from the report.

1. What is effective teaching? This report is very honest about the fact that we don’t have as clear an idea of what good teaching is as we think we do. I think this is an important point to make. Too often, reports like this one start from the point of assuming that everyone knows what good teaching is, and that the challenge is finding the time/money/will/methodology to implement changes. This report is saying that actually, there are a lot of misconceptions about what good teaching is, and as such, reform efforts could end up doing more harm than good. We need to think more clearly and critically about what good teaching is – and this report does that. As well as listing what effective teaching practices are, it also lists what ineffective practices are. This list has already received some media attention (including a Guardian article with a bit from me), as it says that some popular practices such as learning styles and discovery learning are not backed up by evidence. The report draws its evidence from a wide range of sources, including knowledge from cognitive psychology. It cites Dan Willingham quite a lot, and quotes his wonderful line that memory is the residue of thought. As regular readers will know, I think cognitive psychology has a lot to offer education, so it is great to see it getting so much publicity in this report.

2. How can we measure good teaching? According to this report, the focus should always be on student outcomes (not necessarily just academic ones). This can also be a bit of a hard truth. If a group of teachers work really hard at mastering a particular technique or teaching approach, and they do master it and use it in all their lessons, it can be tempting to define this as success. But this report says – no. The focus has to be on student outcomes. Although we can devise proxy measures which can stand in for student outcomes, we always need to be regularly checking back to the student outcomes to see if those assumptions are still holding true. The report is also honest about the fact that a lot of the current ways we measure teaching are flawed. That’s why we need to use more than one measure, to always be checking them against each other, and to be very careful about the purposes we put these measurements to. The report suggests that our current measures are probably only suitable for low-stakes purposes, and that they certainly can’t be used for both formative and summative measures at the same time (or ‘fixing’ and ‘firing’ as they call it).

3. How can we improve measurement? Although the report is very cautious about the current state of measurement tools, it offers some useful thoughts about how we could improve this state of affairs. First, school leaders need to be able to understand the strengths and limitations of all these various data sources. According to the report, there is ‘the need for a high level of assessment and data skills among school leaders. The ability to identify and source ‘high-quality’ assessments, to integrate multiple sources of information, applying appropriate weight and caution to each, and to interpret the various measures validly, is a non-trivial demand.’ Also, student assessment needs to be improved. If we always want to be checking the effect of our practices on student outcomes, we need a better way of measuring those outcomes. The report gives this tantalising suggestion: that the profession could create ‘a system of crowd-sourced assessments, peer-reviewed by teachers, calibrated and quality assured using psychometric models, and using a range of item formats’. It would be great to hear more details about this proposal, and perhaps about how CEM or the Sutton Trust could provide the infrastructure and/or training to get such a system off the ground.

One of the authors of the paper is Rob Coe, and I think this report builds on his 2013 Durham Lecture, Improving Education: A Triumph of Hope over Experience. This lecture was also sceptical about a lot of recent attempts to measure and define good teaching, as can be seen in the following two slides from the lecture.

Improving Education Fig 6 Mistaking School Improvement Improving Education Fig 8 Poor Proxies

I recommended this lecture to a friend who said something along the lines of ‘yes, this is great – but it’s so depressing! All it says is that we have got everything wrong for the last 20 years and that education research is really hard. Where are the solutions?’ I think this paper offers some of those solutions, and I would recommend it to anyone interested in improving their practice or their school.


Get every new post delivered to your Inbox.

Join 5,123 other followers