Daniel Koretz, Professor of Education at Harvard University, uses polls as an analogy to explain to people how exams actually work. Opinion polls sample the views of a small number of people in order to try and work out the views of a much larger population. Exams are analogous, in that they feature a small sample of questions from a much wider ‘domain’ of knowledge and skill. In Measuring Up, Koretz says this:
The full range of skills or knowledge about which the test provides an estimate – analogous to the votes of the entire population of voters in [an opinion poll] – is generally called the domain by those in the trade. Just as it is not feasible for the pollster to obtain information from the entire population, it is not feasible for a test to measure an entire domain exhaustively, because the domains are generally too large. Instead we create an achievement test, which covers a small sample from the domain, just as the pollster selects a small sample from the population.
Since first reading Koretz’s book (see my review here) I’ve used the analogy quite a lot. I used it this week to explain something to a colleague. She stopped and looked at me like I was crazy. ‘Daisy’, she said, ‘I think you need to get a new analogy.’
I know what she means. After a week where opinion polls have been torn to shreds over their failure to predict the result of the 2015 UK general election, it seems perverse for me to keep using them as an analogy. But actually, the failure of the recent opinion polls makes the analogy all the more useful, because just as opinion polls can and do get things wrong, we need to acknowledge that the similar structure of exams means that they can and do get things wrong too. Exams are susceptible to the same kinds of errors that opinion polls are. In fact, in one way, exams are even more susceptible to error than opinion polls. In the case of opinion polls, we can check the validity of the poll result because eventually the domain is measured, in the form of the election. In the case of exams, there is no final equivalent measure of the domain. Imagine if there never was an election, and all we ever had were opinion polls of differing types. That’s what exams are like.
Plenty of reasons have been put forward for the failure of the polls in this general election. One of the most popular is the idea that, for whatever reason, people did not tell the pollsters who they were really planning to vote for. The analogy with tests would be where a pupil is, for whatever reason, not interested in answering the items on the test to the best of their ability. In Koretz’s words,
Just as the accuracy of a poll depends on respondents’ willingness to answer frankly, the accuracy of a test score depends on the attitudes of the test-takers – for example, their motivation to perform well.
Whilst this may have been the problem with the UK opinion polls, I don’t think it is a major problem with UK tests. Enough important outcomes depend on the tests for pupils to be motivated to do well on them. Of course, we should never forget that variation in performance on the day is always one of the major factors in exam-score unreliability, and pupils who feel under too much pressure to perform may fail to produce their best work. But by and large, I don’t think this is the major problem with tests at the moment. However, there are two aspects of this sample and domain structure which I think do cause serious problems.
Here is Koretz again:
In the same way that the accuracy of a poll depends on seemingly arcane details about the wording of survey questions, the accuracy of a tests score depends on a host of often arcane details about the wording of items, the wording of ‘distractors’ (wrong answers to multiple choice items), the difficulty of the items, the rubric (criteria and rules) used to score students’ work, and so on…If there are problems with any of these aspects of testing, the results from the small sample of behaviour that constitutes the test will provide misleading estimates of students’ mastery of the larger domain. We will walk away believing that Dewey will beat Truman after all. Or, to be precise, we will believe that Dewey did beat Truman already.
Or, if I can update the analogy, we will believe that Ed Miliband is currently in negotiations with Nicola Sturgeon and Nick Clegg about forming the next government, and David Cameron is sunbathing in Ibiza. A great blog post here by Dan Hodges (written before the election) explains some of the ways in which some of the arcane details of opinion polls can be manipulated to get a certain result. Similarly, changes in arcane details of exam structure can change the value of the results we get from them. For me, there are two particular problems: poor test design, and teaching to the test. I’ve written about the problems of poor test design and teaching to the test on my blog before, here. I also have an article being published soon in Changing Schools where I discuss this at greater length. Here, I will add just one more point, about coursework. Coursework and controlled assessments are a perfect example of the problems with poor test design and teaching to the test.
The essential problem with coursework and controlled assessment is that they allow the teacher and pupils to know what the test sample is in advance. When I taught Great Expectations as a coursework text, I knew what the final ‘sample’ from the novel was – the final essay would be on the first chapter of the novel. To extend the opinion poll analogy, it’s as if the final result of the election depended not on an vote of the population, nor on an anonymous sample of the population, but on a sample of 1,000 voters whose names were known in advance to all of the political parties. Even if the sample were well chosen, this would clearly be problematic. There would be an obvious incentive to neglect the views of anyone not in that sample. Some political parties might not want to behave so dishonourably, but they would almost certainly be forced to as if they didn’t, another party would do so and therefore win the ‘election’. I think the analogies with coursework and teaching are clear. A teacher might not want to focus their instruction on the first chapter of Great Expectations, as they realise that doing so will not help pupils in the long run, particularly those pupils who want to study English Literature at A-level. But if they think that every other teacher is doing so, they may feel that they have no choice, as those pupils who do get the targeted instruction will probably get better grades. Thus, poor test design, in the shape of coursework and controlled assessment, encourages teaching to the test and distorts the validity of the final test score. Obviously, the whole problem is exacerbated by high-stakes testing, which places great weight on those test scores. But I think the poor test design is a major feature here too, and hopefully the analogy with opinion polls makes it clearer why this is such a problem.