Assessment is difficult, but it is not mysterious

This is a follow-up to my blog from last week about performance descriptors.

In that blog, I made three basic points: 1) that we have conflated assessment and prose performance descriptors, with the result that people assume the latter basically is the former; 2) that prose performance descriptors are very unhelpful because they can be interpreted in so many different ways and 3) that there are other ways of assessing.

In response, David Didau wrote this post, in which he agreed with a lot of the things I said. I was pleased by this, because I really admire David’s work and think he has done great things in bringing important research to a wider audience. However, I was completely baffled by the end of the post, and I am going to explain why – if I’m a bit harsh, I’m sorry, and none of this changes the fact that I think most of David’s work is fantastic.

After agreeing with me about the vagueness of prose performance descriptors, he then suggested that as a replacement for prose performance descriptors, schools should use…wait for it…prose performance descriptors! Here is his example.

english assessment grid


I am really astonished by this. The above grid has all the flaws of national curriculum levels, and offers no improvement. It reproduces all the errors I discuss in my previous post. To take just one example, what does a challenging assumption about the cultural value of a text look like? Come to think of it, what does a text look like? A pupil might be able to make a challenging assumption about the value of a shampoo advert, or a limerick, or a piece of graffiti, but struggle to make one about the value of War and Peace, a newspaper article, or an unseen poem. One teacher might interpret this criterion in the context of a short poem pupils have studied before, in which case many of their pupils might achieve it, whilst another might interpret it in the context of a lengthy unseen poem, in which case many of their pupils will not achieve it. The parts on inference are particularly baffling. We know that inference is not a formal skill: we know that pupils (and indeed adults) can make great inferences about a text about baseball, and poor ones about sentences like ‘I believed him when he said he had a lake house, until he said it was forty feet from the shore at high tide.’ (Both those examples from Dan Willingham – see here for more from him about inference and reading). In short, the above grid will result in teachers collecting ‘junk data’ of the type Bodil Isaksen discusses here.

In the rest of the post, David appears to be suggesting that such an approach is OK just as long as we accept that it has flaws and can never be precise. In his words, ‘You can never hope for precision from performance descriptors, but then precision will always be impossible to achieve.’ In making this argument, David has basically proven my first point, which is that we have become so used to prose performance descriptors that we have come to assume that they are assessment and few alternatives are possible. Of course, if performance descriptors were the only way we could assess, then perhaps we would just have to accept the imprecision. But they aren’t! There are other ways, ways which offer far greater precision and accuracy. One example of a method that is far more precise is standardised tests. It’s true that they may not be perfectly precise, but there is still a world of difference between them and performance descriptors. Let me give a very concrete example of this: Back in 1995, Peter Pumfrey gave a group of 7-year-old  pupils who had been assessed as level 2 a standardised reading test. On this latter test, their reading ages ranged from 5.7 to 12.9.

And this, in short, is why I care so much about this, and why I think it is so important. There are pupils out there who are really struggling with basic skills. Flawed assessment procedures are hiding that fact from them and their teachers, and therefore stopping them getting the help they need to improve. Worse, the kinds of assessment processes which would help to identify these problems are being completely ignored.

It’s as though you saw someone try to measure the length of a room by taking a guess, and someone trying to measure the same distance using a measuring tape marked with centimetres. Then, because neither method can give you a measure to three decimal points, you conclude that ‘all measurement is imprecise and fundamentally mysterious’, so you’ll just use your best guess. Well, OK, both methods may be imprecise, but the latter method is far less so than the former, and you will be much better advised to buy a carpet based on it.

Complexity is not the same as mystery. I worry that by saying that assessment is mysterious and that it is very difficult to get a handle on how pupils are doing, we legitimise a woolly approach where anything goes because we can’t really measure anything anyway. We can do a lot better than we are doing at the moment, and one of the first things we can do is to stop depending so much on generic prose descriptors.

I realise that this leaves open the question David posed at the start of his article – ‘OK smart arse, what should we do?’ In my last post, and in others from the past, I’ve repeatedly argued that the better approach is to define criteria in terms of a) actual questions / test papers and b) actual pupil work. For example, back in December 2013 I wrote here that ‘we don’t get a shared language through abstract criteria. We get a shared language through teaching shared content, doing shared tasks, and sharing pupil work with colleagues.’ I realise I need to expand on these points further, and I will do so in my next blog post.


23 thoughts on “Assessment is difficult, but it is not mysterious

  1. David Didau

    Arggh! I thought you were going to give me an answer! You’re tough to pin down. My frustration is borne of the fact that I really agree with all the points you make and yet haven’t seen a workable solution.

    In my defence, the grid I did say, “It has been designed with a particular curriculum in mind and so should not be taken as something able to stand alone, but even so, it should be seen as not so much a map as a travel guide – pointing out interesting sights and potential places of interest along the way.” So to take it out of that context and present it as if I’m suggesting it could be used by anyone else to make a meaningful assessment is a bit unfair. Otherwise meaningless statements make more sense in light of the curriculum it’s intended to chart. So a ‘text’ will be specified as something other than a shampoo ad. But I take your point that using children’s actual work and actual test papers (but which?) would offer a more precise solution. Didn’t I say as much in my discussion of Trinity’s curriculum and what they’re doing at Michaela? Do you think they are adequate solutions?

    Thanks, David

    1. The Wing to Heaven Post author

      Thanks for the comment. I promise I will blog more about concrete solutions! I haven’t seen Trinity’s system so can’t comment. I agree completely with what you said about Michaela and I do think theirs is a workable solution. As well as that, the work David Thomas has done at Westminster Academy is also brilliant (it won some of the assessment fund money too). Obviously I also think that what we are doing at Ark is great too! So there are solutions out there, and they have managed to move away from APP, but most of the level replacements I see swirling around (and I get sent a LOT!) aren’t that workable and aren’t really moving on from levels. Instead of aiming for a decisive break and trying to do something different, they are basically just rewriting the APP grid, which is a real missed opportunity, not to mention a lot of work for very little impact.

  2. robmarchetto

    I think these descriptive progressions that David uses offer are helpful when used as assessment when learning. This formative use can enable students, peers and teachers alike to shade where they are on a continuum, and alert the teacher about the student’s zone of proximal development. I do however, agree with you Daisy, on the need for students to have questions and the corresponding work sample proof of meeting the learning.

  3. Crispin Weston

    Having just commented on David’s post (, I agree with the drift of Daisy’s logic here. But you are leaving us all hanging in suspense and so, being impatient, I am going to try and guess what you are going to say.

    The learning objective (as that is a relative term, I would argue that “capability representation” would be a more accurate term – but lets stick to learning objective) – the learning objective is described not by textual descriptors but by exemplification. I published a rather complicated UML taxonomy at I don’t expect anyone to absorb the whole thing but the critical piece is the bit at the top left which says that:
    * capability representation
    * is exemplified by
    * assessment method.

    You could argue that the assessment method then needs in turn to be exemplified by sample assessments – that piece is not shown in the diagram.

    While we are on my taxonomy diagram, I want to make two further points.

    First, the “capability representation” is not the same thing as the “capability concept”. It is an attempt to represent the concept. By this, I mean that OCR’s representation of what it means to be good at Maths is not the final and authoritative word – it is one version which it will be instructive to compare to other people’s versions, using the sort of analytics system that we hope are on their way.

    Second, the meaning of the competency representation is *also* provided by its relationship with other capability representations within the same framework (e.g. published by OCR or whoever). If OCR says that “carpentry” can be disaggregated into component capabilities, such as “fixing”, “shaping”, “finishing”, and “theory” – then if you are good at these component capabilities, then you are by definition good at “carpentry” as represented by the OCR framework. I call these “modelling relationships”.

    But there is another sort of relationship which I call a mapping relationship. This might say that OCR’s representation of carpentry maps very consistently to Mr Jones the carpentry teacher’s intuitive assessment of the carpentry skill of his students. This is not a definitional relationship but an empirically testable one.

    So I am suggesting that, while the meaning of the atomic capability representation (aka learning objective) is best provided for by exemplification, its meaning will also depend on achieving consistent meanings for the other representations with which it has modelling relationships (especially containment and orthoganal relationships – relationships with those things that it is not).

    Assuming that this is the direction in which Daisy is going with this, I want to ask another question which I think will need to be answered. At ResearchEd 2014, Daisy, you spoke on the dangers of teaching to the test, citing Tim Oates ( But Tim Oates, in arguing against teaching to the test, suggests that teachers should be teaching towards things that he calls “constructs”. I don’t think he really defines what these are, but the strong suggestion is that they are things defined by descriptive rubrics. My taxomony above is drawn from my critique of Oates’ position on this at

    If you are going to urge exemplification as a means of defining learning objectives, then I think you need to explain what the difference is between this and teaching to the test. In my view, the answer will require a more narrow statement of exactly what the problem is with teaching to the test – more narrow, at least, then I think Tim Oates provides.

  4. Mrs Jennings

    Have you looked at how exams are marked and how grades are awarded from a number? I know exam boards have descriptors too but they also use ways of marking that give points for essays – it’s not just decision at the end of reading a long answer about which descriptor the whole piece fits into (I’ve marked IGCSE and AS English exams). Is this what you mean by standardised tests? I think school assessment systems could learn a lot from the way that exam boards mark – especially essays and longer pieces of writing.

  5. Dylan Wiliam (@dylanwiliam)

    I think much of the confusion concerning assessment stems from a lack of clarity about what an assessment is. At its heart, an assessment is nothing more, or less, than a procedure for making inferences. We get students to engage in certain activities, from which we collect evidence, and the evidence is used to make inferences about the student. Performance descriptions can help if they make it more likely that students generate evidence about particular capabilities. For example, if a performance description says that the highest marks in a mathematical investigation are only given if students express a generalization in the form of an algebraic expression, then that is likely to cue students who can do so to express their generalizations in the form of algebraic expressions. As a result, the validity of the inferences are improved, because the performance descriptions have made it is more likely that students who could generalize using algebra, do, in fact, do so. In other words, “if they know it, they show it.”

    Performance descriptions can also reduce the variability of the interpretations made by different assessors, thus reducing the random variability in the interpretations of the evidence, again improving the validity of the inferences.

    Performance descriptions therefore have their uses, but the problem is that performance descriptions themselves are imperfect attempts to exemplify what psychologists call constructs. Despite what Crispin Weston says in the extended post to which he refers in his post above, constructs are nothing to do with postmodernism, and nor is it something that an individual constructs in her or his head. To a psychologist, a construct is something that is hypothesized to account for the things we observe. So, for example, the construct of “intelligence” was proposed to account for the fact that students who score higher on mathematics tests also, on average, score higher on tests of English, history, and even music, drama and art. The simplest model that accounts for what we see is that a student’s performance on a mathematics test depends on both on their specific abilities in mathematics, and their level of general intelligence. Constructs allow us to connect evidence to inferences.

    Most arguments about assessment, which often appear to be about the adequacy of one assessment compared to another, are in fact arguments about constructs. For example, can history be adequately assessed with multiple-choice tests? Some people say yes, and others no, so they argue about the relative merits of multiple-choice tests versus constructed-response tests, but they are in fact, likely to be arguing past each other. If you believe that history is all about facts and dates, then you think multiple-choice tests are pretty nifty, because they allow you to test a lot of facts and dates relatively swiftly. Moreover, you will regard essay-type questions as unfair, because as well as measuring history, you are also measuring handwriting and linguistic ability. Assessment experts would say, in this situation, that the scores on constructed-response tests suffer from construct-irrelevant variance because some of the variation in scores is due to the construct of interest (historical knowledge) but some of the variation in the scores is due to things that are not relevant, such and handwriting and linguistic skill. On the other hand, if you believe that history is all about historical argument, then you would say that a multiple-choice test under-represents the construct of history, because it only assesses part of what historical thinking is about (assessment experts call this construct under-representation).

    The vast majority of educational assessments are attempts to elicit evidence that can be used to support inferences regarding constructs, and most arguments about assessments are, in fact, arguments about differences in underlying constructs. This is why it is so important to define constructs clearly. Once this is done, assessment becomes a technical, rather than a value-driven, enterprise. The only question is how well the assessment allows one to draw inferences about the construct in question.

    Performance descriptions have a role to play in construct definition, but they are not the only way (try making up performance descriptions for different levels of intelligence). However where performance descriptions can provide guides to increasing levels of performance on a construct, they have a role to play.

    To sum up, assessments are procedures for making inferences. Most of the time these inferences relate to constructs, such as “historical ability” and performance descriptions have a role to play in both defining, and exemplifying those constructs.

  6. Richard Galloway

    The problem we all face is that assessment is no longer, nor has it been for some time, just about the students performance but that of the teacher / school. Until this changes then it will always be in everyone’s (except the students) best interests to have vague descriptors.

    For instance, we just completed something called ‘a school wide write.’ it’s not a formal assessment insofar as it’s not used in any statistical analysis but it is what we use in the school as a snap shot of where the students are. When the work was finished we then used a rubrics to mark the work, which was full of vague statements. But it meant that many of the students (G3/4) were at grade level. I was somewhat flabbergasted as the standard of work I was seeing wasn’t what I would have expected to see from nine and ten year olds, but according to the criteria that’s where they were and so could give ourselves a pat on the back.

    It’s totally understandable that you wouldn’t want a more robust assessment system when so much rides on results. You want to be able to report that as many as possible of your students are where they should be, whether this is true or not. Until education becomes about the students again and not some political football I don’t see any chance of meaningful change. Of course what we do as teachers in the classroom is another matter.

    Great post and discussion – lots to think about. Thank you.

  7. Crispin Weston

    Dear Dylan,


    Thank you for the clarification over the use of the term “construct” in psychology. I am still not convinced that this is a helpful term – regardless of whether its use is widely recognised in Psychology.

    The term “construct” certainly captures the idea that, like a hypothesis, this is a human creation. But I think the term is unsatisfactory for two main reasons:
    * it fails to say what this truth claim is *about* (in this case, the existence of a disposition to produce a certain sort of performance);
    * it fails to capture the fact that a hypothesis is a truth claim, not just an invention – we may not be able to see how the neurons are lining up, but we do believe that this disposition really exists.

    Thirdly, the term has no currency among teachers. If we were communicating with psychologists, we might reasonably say “what the heck” and go with the flow. But given that we are trying to communicate with teachers, most of whom have never heard of constructs except in the post-modernist sense, and given that this is not a very good term in the first place, for the reasons I give above, I suggest that we would do much better to create a new term.

    Many are talking about “competency”, though I think this is unsatisfactory on the grounds that competence is commonly used to refer to a binary characteristic and no-one seems to know if there is any difference between “competence” and “competency”. I have proposed “capability” – or “capability representation” if one wants to capture the sense that this is a hypothesis. I think “representation” is better than “construct” because the question “is it an accurate representation?” is a more useful question than “is it well constructed”.

    *Importance of defining the learning objective (construct / hypothesis / capability)*

    I strongly agree with the importance of maintaining a division between construct definition and assessment – and the effect that this has on separating the value judgments that are embodied in our learning objectives and the technical business of assessing and teaching to those objectives. Emphasizing the importance of this distinction has been one of the main themes of my blog and I believe that it is an essential prerequisite for the development of useful ed-tech.

    But I think the profession really struggles to perceive this division between the ends of education and the means of education that. Andrew Old has suggested that this is because many come into teaching because they see it as a way of expressing their own values, and so are loath to delegate this part of their role to the curriculum. And that is one reason why clear terminology on this issue – a terminology that can really achieve recognition in the profession – matters.

    *A strategy for resolving ideological disagreements about constructs*

    You refer to the intractability of many arguments about assessment which turn out to be “in fact arguments about constructs”. For example, people disagree about what History really is – a matter of factual recall or a matter of intellectual skill.

    I agree with your diagnosis of the problem. But the way the problem is expressed presupposes that we recognise one thing called Historical skill/capability/competency and then fight over what it means. I suggest that there is a better way than this to address the problem.

    Given that Historical ability is a construct or representation, then Historical understanding can in theory be whatever we say it is. We can define any number of constructs: some for different bodies of factual knowledge and others for skills of argument. We could define a hierarchy of parental constructs leading up to a summit construct called “being good at History”. This top level construct would depend on all of the above – but if it is to have meaning, it would also have be exemplified (as I argue) by reference to performance descriptions.

    Having created our different representations of capability in History, we can then start to correlate achievements as measured against different constructs. How well does the ability to write what we judge to be a good essay correlate to the mastery of an associated body of knowledge, and how well does it correlate to the mastery of a set of transferable skills of argumentation? How well does what we correlate to be a good History essay correlate to what top universities judge to be a good history essay? By measuring many different learning objectives, and other outcomes like university admissions and success in different careers, we will fairly quickly start to see patterns in the data. I suspect that those patterns will start to point fairly convincingly in certain directions. As a strategy, this means moving away from single constructs over which we fight ideological battles, to a multiplicity of constructs, over which we can fight statistical battles. I suggest that this will be a shorter and more mobile war.

    *Describing capability/constructs in terms of performance*

    I do not agree with you when you suggest that performance descriptions are not the only way to describe constructs. I have argued that relationships between constructs also matter – but these relationships point to other constructs which are themselves ultimately defined in terms of performance.

    You justify your assertion with a challenge: “try making up performance descriptions for different levels of intelligence”. Well, if we create the construct “intelligence” to account, as you say, for the fact that some students tend to get high scores in many subjects, then presumably “getting high scores in many subjects” is a vital indicator of intelligence. Is “getting high scores in many subjects” not a description of performance? And if we wanted to know exactly what we meant by “getting high scores in many subjects”, I would like to go further than merely describing this in a short rubric: I would like to see some examples that showed the difficulty of the tests, the range of tests and the variability of the tests that we were talking about.

    But if performance is not the only way to define what capability means, perhaps you can say what other ways exist?

    *Explicit performance descriptions in assessments*

    In your first paragraph, you defend explicit performance descriptions on the grounds that, having seen the performance description, “if they know it, they show it”. But many complex situations, what you are assessing is the understanding that algebra is relevant and useful to the current problem, not the ability to do the algebra itself (although that can also be assessed, perfectly easily, without dressing it up as a problem).

    In one of my favourite films, Apollo 13, the rookie pilot crashes the lunar lander module simulator when the control team throws an instrument malfunction. “Well” says the more experienced captain lugubriously, “we were obviously into an SG65 there”. The skill was in spotting that you are into an SG65. Anyone can implement the SG65 procedure once it has helpfully been pointed out that that is what is required.

    I don’t think this is just about “authentic assessment” – it is more fundamental than that. Just trotting out your algebra is a sort of procedural rote learning: perceiving how and why algebra is required to solve this problem is what shows true understanding. As Eric Mazur puts it, “I think the true hallmark of education is being able to take what you’ve learnt and apply it in a new context”.

    Do not explicit descriptions of how performance will be marked short-circuit this vital aspect of assessment?

    Thank you for your responses,

  8. Paul Robson

    Did a return to teaching thing a few years ago.

    One of the exercises we had to do (in pairs) was to grade a piece of work according to the levels. I thought L2, my partner, another teacher of advanced age, a L1. We argued (politely) back and forth about whether we thought said pupil had really grasped the things specified in L2. It was very borderline.

    Apparently it was a L6 because of the words used. Even though it was obvious said pupil didn’t know what most of them meant. Farce.

    1. Crispin Weston

      This presumably reflects the doctrine of positive marking? Knows the word: tick. Doesn’t know what it means: not relevant.

      To my mind, this attitude reflects the idea that Dylan refers to, suggesting that the purpose of reliable assessment ought to be to give credit to what the student knows in the sense of having it in the mental locker, as it were. When what I am suggesting in my Apollo 13 analogy is that what matters is not what you’ve got, but what you can do with it.

      This also illustrates a further benefit of concrete exemplars of performance rather than abstract descriptors, which is that they offer a sort of common-sense check, useful in a pre-release pilot / moderation process. No-one really understands what the descriptor rubric really means in practice until it is applied to a concrete performance – so it is not until it is too late that anyone can say, “hang on, that’s absurd”.

  9. Pingback: Education: what needs to be done? | Clio et cetera

  10. Pingback: May on The Learning Spy | David Didau: The Learning Spy

  11. Rob Pritchard

    I am constantly impressed with the level of debate here and the intellect on show. So much so I just read and don’t comment because you are all too clever for me.

    I am a humble headteacher working with students, parents and teachers on a daily basis. The ideas you put forward here are a world away from our daily work – I am trying to imagine explaining ‘construct’s in the context of assessment as we embark on yet another look at assessment. The title of the blog sums it up “Assessment is difficult but not mysterious”. Don’t we just:
    check what students know
    teach them
    find out what they have learnd
    then fill in the gaps.

    You see, not mysterious.

    One of the best quotes from a blog I read recently came from Joan McVittie
    “I delve into Twitter to see what ideas are floating out there and sometimes it provides useful material. However, at other times it feels like the tourists who don’t look at the amazing sights but at the cameras on the end of their selfie sticks instead, living the experience through their photographs.”

    1. Crispin Weston

      Hi Rob,

      If I am one of the people who you think are looking in the wrong direction and making things too complicated, maybe Daisy won’t mind if I reply. I hope you won’t mind if my reply (as per normal) is a long one because I think your challenge is an important one.

      I taught for 10 years. I do not teach now – so your tourist jibe has some truth, though I do know about the realities of the classroom. I agree with you very much that we need to make it as simple as possible and that your 4-point process diagram is a good starting point.

      But is it really quite as simple and easy as you make out? You may have heard the quote from Prof Diana Laurillard: “teaching isn’t rocket science – its much more complicated than that”. Was she wrong? Her point is that teaching is a highly process- and transaction-driven business. Many of those transactions are informal and implicit in the sort of keep-it-simple-stupid paradigm that you urge. And that’s fine so long as the informal transactions involved in assignment, feedback and progression are handled consistently and everyone feels comfortable.

      But they’re not handled consistently. Research frequently quoted by Dylan shows that students with a good teacher typically make the same amount of progress in six months as is made by students with a bad teacher in two years. If that sort of inconsistency of outcome were shown in NHS death rates, there would be a public outcry. And there is an endemic recruitment crisis, and has been for fifty years, so there are not enough good teachers to go around.

      Nor does everyone does not feel comfortable – teachers feel overstressed, overworked and under-appreciated, while the government continues, quite justifiably, to demand higher standards and more consistency.

      You think we have our heads in the clouds because we are talking about the way the system should be organised. You are talking about what it is like at the chalk-face. We are looking at two different ends of the spectrum. I would argue that by formalizing processes and transactions, you actually make things *simpler* for people at the operational level. They know what they have to do, they know how their colleagues are doing, and people can build the software systems that will make the whole organisation work more smoothly.

      I completely agree with your 4-point process as a starting point – but I think it has some holes in it. We all work on the basis of hidden assumptions. It is the job of the systems analyst to come along and make those assumptions explicit.

      1. You say “fill in the gaps”. But gaps between what? What are the objectives? Does everyone have the same idea of what the objectives are? There is a huge amount of evidence to suggest that they do not. So that is what this discussion of “constructs” is all about.

      2. You imply that doing the filling of these gaps is easy. But as my data about inconsistent outcomes suggests, it isn’t easy at all. Most research shows that what really matters is good quality practice and the right sort of feedback. But evidence suggests that most teachers do not value practice (see my Tim Oates link), that most practice in this country is really dull and that teachers generally give little feedback, often of poor quality. I’m not trying to knock teachers – I’m just saying that this is really difficult to do on the scale and with the consistency that we need to do it.

      My view is that the theory really matters. If we were engineers, we would have a whole body of theory to draw on. And that too makes the work of front-line operators easier. But in education, the theory is deeply problematic. Part of the reason for that is that too many academics (present company excepted) do not contest their views enough. The 1998 Tooley Report, based on a fairly thorough analysis of the top academic journals, suggested that only about 10% of academic research was about teaching and learning, only about 15% used robust, quantitative evidence, and only about 36% was free from major methodological flaws. If you regard those three variables as independent , then that would suggest that about 0.5% of academic research into education is really useful. The variables are probably not independent, but the figure is still extremely low. And that is before you consider the lack of repetition in trials and the lack of effective debate.

      That is why I think this sort of discussion, going on in the blogosphere, is so incredibly important. And its not about making things more complicated for front-line teachers. I feel passionately that it must be about making things simpler for front-line teachers.

      If you waded through this far, I hope you won’t mind me dropping in a quote I think you might like from L C Taylor, Director of Resources at the Nuffield Institute, in 1971.

      ***The present gap between research and daily application is such that teachers generally turn for help to those in the same boat with them, all awash in a vast sea. Professors who grandly philosophize the aims of education flash over in aircraft; those who evaluate its practices nose beneath in submarines…but our main anxiety is to stay afloat and make some sort of progress, guided, if must be, by ancient stars. We wonder how others in similar straits are getting on. “Have you tried to row facing forwards?” we shout above the racket of the elements; “No”, comes the answer, “but paddling with your feet over the stern helps”…and we suspect that the academics, secure from the daily fret of wind and wave, have forgotten what it is like to feel a little seasick all the time.***

      Its not all rosy in the lifeboat and what *I* argue that the shipwrecked mariners need are the tools of the trade (by which the Nuffield Institute meant well-designed programmes of study and by which I mean digital versions of the same idea) – a cuddy to sleep in if you like, an outboard motor, and a compass. Usage of text books in this country is extremely low (under 10% is a figure I saw). That might be because the text books aren’t much good. But it seems that in higher-performing jurisdictions like Finland and Singapore, textbook usage seems to be above 90%.

      1. Andrew Lowery (@AndrewDLowery)

        Just a note to say I ‘waded this far’, as a classroom teacher. Fascinating and fundamentally important interrogation of the roots of the system.

        The standard of UK textbooks is woeful. And regarding the “software systems” which will “make the whole system run smoother”, well every year I still end up using the defunct BBC Bytesize KS2 SAT questions as exam revision. When will a decent, large scale, national multiple-choice question bank appear?! A pot of gold awaits for those who build it.

  12. Pingback: The Elements of Language – Lessons learned | must do better…

  13. Dick Schutz

    Whether assessment is difficult or easy, mysterious or obvious is situational. “Assessment” floats in a semantic soup of terms, all of which are as muddy as it is: testing, evaluation, measurement, marking, exemplification, progressions, formative, summative, performance, leveling, scaling, standards, inferencing, hypotheses, competencies, capabilities, ability, describing, objectives, and so on.

    The thing is, none of these terms helps in untangling the matter at hand. At this point, I’m not sure what that matter is, but as I recall, it’s how to record and report the results of SATs now that the UK Government has discarded “Levels.” From what I’ve read, the Govt is headed to using statistical scales based on Item Response Theory, with cut scores and descriptors–a protocol that that has all of the limitations and deficiencies of Levels, if not more so.

    My view is that “problem” is with Item Response Theory and the SATs rather than with the recording and reporting of the results. I agree with Daisy:

    There are pupils out there who are really struggling with basic skills. Flawed assessment procedures are hiding that fact from them and their teachers, and therefore stopping them getting the help they need to improve. Worse, the kinds of assessment processes which would help to identify these problems are being completely ignored.

    Daisy says the better approach is to define criteria in terms of a) actual questions / test papers and b) actual pupil work.

    Possibly. But which questions/test papers and which pupil work? Next blog is needed.

  14. howardat58

    Many years ago I was ome of six teachers giving the same course in basic statistics for business students. They all took the same examination at the end of the course, and each teacher marked his own group. ***The exam had a very detailed marking scheme***. my colleague decided to investigate the consistency of the marking, both between teachers and between a first marking and a second marking of a sample of scripts by each teacher. The variation between teachers was high, but the variation between successive markings of the same scripts was even higher. he decided not to publish the results ! Some experimental work really needs to be done on all your schemes, but beware the results.

    1. Crispin Weston

      Ay, there’s the rub. We should be using analytics routinely to verify the accuracy of our assessments – but we dare not for fear of what can of worms we will open. So we tie ourselves to one-shot, summative sampling exercises, the results of which we dress in a little brief authority. The answer to all these problems lies in continuous sampling combined with good analytics. In such an approach lies the answer to all three inter-related problems: validity and reliability of assessment and definition of objectives.

  15. Pingback: ‘5 weeks….5 books’ Part 3: Measuring up! | From the Sandpit....

  16. Pingback: Heads I’m right, tails I’m not wrong | David Didau: The Learning Spy

  17. Pingback: My five favourite blogs of 2015 | David Didau: The Learning Spy

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s