Recently, as a result of the ongoing COVID-19 crisis, the UK Department for Education, which is responsible for education In England, decided to cancel all of its scheduled GCSE, AS and A Level exams and replace them with a form of teacher assessment. Australian students typically sit their exams at the end of the calendar year and so no such decisions about these have yet been made. However, the standardised assessment programme for literacy and numeracy that Years 3, 5, 7 and 9 students complete every May has been cancelled this year.
Most educators seem to have responded to this news in practical terms. What are the alternatives? How do we make these alternatives work? How do we ensure they are fair? There is a grim determination to make the best of a bad hand. However, some have taken a different approach. Sensing opportunity in a crisis, they are calling for the abandonment of exams altogether in the future, post-coronavirus world.
An example of this call is an article in The Conversation. Kaili Rimfield, Margherita Malanchini and Robert Plomin argue that teacher assessments are just as ‘reliable and stable’ as exam scores. For example, they cite their own research that shows teacher assessment at the age of 11 is almost as accurate as exams at age 11 for predicting scores on GCSE exams at age 16.
There’s something odd about this.
Firstly, they make the classic mistake that is most often made in the field of genetics. If a gene raises an individual’s risk of a particular disease by a few percent then that is pretty useless at an individual level. What do you do with that information? However, it does lead to quite powerful predictions at the population level. Exams are not about making predictions at a population level, they are about individual performance.
To put it another way, when I worked in London, we used to assign students a predicted score at GCSE based upon cognitive ability tests taken at the age of 11. I blindly accepted this until, one day, I opened up the testing manual and realised that a child predicted a B may have only a 30% chance of gaining a B, a 20% chance of gaining a C, a 25% percent change of gaining an A and so on. At the individual level, these predictions were basically meaningless, but aggregated, they did tell us something about what we could expect from the cohort and what a particularly good or bad set of GCSE result might look like.
Expanding on Rimfield et al.’s argument, should we abandon assessing the years between the age of 11 and GCSEs because we could just predict the GCSEs? That sounds absurd and the reason it sound absurd is because we intuitively understand variation at the individual level – that some students will work harder than others or suddenly grasp a topic they had previously disliked.
And we have no reason to suspect that any teacher assessments at the age of 16 would generally be as valid as exams once they become high stakes.
We need to bear in mind Goodhart’s Law which is commonly phrased as, “When a measure becomes a target, it ceases to be a good measure.” Teacher assessments are currently low stakes and do not count for anything. How do teachers come up with them? In my experience, it is by extrapolating from the results of assessments sat in school that are often drawn from similar questions to the ones currently on the exams.
Yet imagine that teacher assessment now routinely determined future career and university opportunities. The pressure would become intense. The point of an exam is that if you sit a student in a room in silence and break off contact with the outside world, you can be pretty sure that any work they produce is their own. Yes, the wealthy can advantage their children through tutoring or by getting them in to better schools, but the students still ultimately have to do it for themselves in the exam.
Once you build a fuzzier system, the opportunity for advantage becomes far greater. Coursework may be partly written by a tutor. Perhaps we decide to create a more holistic system than one with a narrow focus on exams and start factoring in aspects such as community service? That sounds like a marvellous idea until we realise that the impoverished kid who has to work at weekends has less opportunity for community service than the rich one. Ultimately, the only way we can introduce any fairness into a teacher assessment system is to base it on fairly formal assessments that look at lot like exams. We end up replacing nationally standardised exams with a patchwork of idiosyncratic ones, hoping that they are somehow measuring the same things. Why would we do that?
Exams exist for a reason. They are the worst possible way of performing high stakes assessment, apart from all the other ones.
Teachers will navigate these difficult times with professionalism and humanity. They will do the best for their students – I am certain of that. But let us not pretend that crisis measures are better than the rigorous systems we have developed over time.