Are teacher assessments as good as exams?

Recently, as a result of the ongoing COVID-19 crisis, the UK Department for Education, which is responsible for education In England, decided to cancel all of its scheduled GCSE, AS and A Level exams and replace them with a form of teacher assessment. Australian students typically sit their exams at the end of the calendar year and so no such decisions about these have yet been made. However, the standardised assessment programme for literacy and numeracy that Years 3, 5, 7 and 9 students complete every May has been cancelled this year.

Most educators seem to have responded to this news in practical terms. What are the alternatives? How do we make these alternatives work? How do we ensure they are fair? There is a grim determination to make the best of a bad hand. However, some have taken a different approach. Sensing opportunity in a crisis, they are calling for the abandonment of exams altogether in the future, post-coronavirus world.

An example of this call is an article in The Conversation. Kaili Rimfield, Margherita Malanchini and Robert Plomin argue that teacher assessments are just as ‘reliable and stable’ as exam scores. For example, they cite their own research that shows teacher assessment at the age of 11 is almost as accurate as exams at age 11 for predicting scores on GCSE exams at age 16.

There’s something odd about this.

Firstly, they make the classic mistake that is most often made in the field of genetics. If a gene raises an individual’s risk of a particular disease by a few percent then that is pretty useless at an individual level. What do you do with that information? However, it does lead to quite powerful predictions at the population level. Exams are not about making predictions at a population level, they are about individual performance.

To put it another way, when I worked in London, we used to assign students a predicted score at GCSE based upon cognitive ability tests taken at the age of 11. I blindly accepted this until, one day, I opened up the testing manual and realised that a child predicted a B may have only a 30% chance of gaining a B, a 20% chance of gaining a C, a 25% percent change of gaining an A and so on. At the individual level, these predictions were basically meaningless, but aggregated, they did tell us something about what we could expect from the cohort and what a particularly good or bad set of GCSE result might look like.

Expanding on Rimfield et al.’s argument, should we abandon assessing the years between the age of 11 and GCSEs because we could just predict the GCSEs? That sounds absurd and the reason it sound absurd is because we intuitively understand variation at the individual level – that some students will work harder than others or suddenly grasp a topic they had previously disliked.

And we have no reason to suspect that any teacher assessments at the age of 16 would generally be as valid as exams once they become high stakes.

We need to bear in mind Goodhart’s Law which is commonly phrased as, “When a measure becomes a target, it ceases to be a good measure.” Teacher assessments are currently low stakes and do not count for anything. How do teachers come up with them? In my experience, it is by extrapolating from the results of assessments sat in school that are often drawn from similar questions to the ones currently on the exams.

Yet imagine that teacher assessment now routinely determined future career and university opportunities. The pressure would become intense. The point of an exam is that if you sit a student in a room in silence and break off contact with the outside world, you can be pretty sure that any work they produce is their own. Yes, the wealthy can advantage their children through tutoring or by getting them in to better schools, but the students still ultimately have to do it for themselves in the exam.

Once you build a fuzzier system, the opportunity for advantage becomes far greater. Coursework may be partly written by a tutor. Perhaps we decide to create a more holistic system than one with a narrow focus on exams and start factoring in aspects such as community service? That sounds like a marvellous idea until we realise that the impoverished kid who has to work at weekends has less opportunity for community service than the rich one. Ultimately, the only way we can introduce any fairness into a teacher assessment system is to base it on fairly formal assessments that look at lot like exams. We end up replacing nationally standardised exams with a patchwork of idiosyncratic ones, hoping that they are somehow measuring the same things. Why would we do that?

Exams exist for a reason. They are the worst possible way of performing high stakes assessment, apart from all the other ones.

Teachers will navigate these difficult times with professionalism and humanity. They will do the best for their students – I am certain of that. But let us not pretend that crisis measures are better than the rigorous systems we have developed over time.


3 thoughts on “Are teacher assessments as good as exams?

  1. Stan Blakey says:

    Reading the abstract of the research paper it seems they are quite worried about the stress of high stakes exams.
    Their solution to make teacher assessments high stakes seems to ignore both the issue you raise that it can lead to gaming teacher assessments but also once the teacher assessment is high stress anyone anxious about getting good results is going to be anxious in all their dealings with the teacher.

    I am not going to pay to read their paper. Did you? And if so did you make sense of the claim in the abstract that “Teacher and test scores correlate strongly phenotypically (r ~ .70) and genetically”.

    I am wondering what sort of phenotypes they used and why it was important to do this study with twins.

    It looks like they went back over some existing data to see if they could find correlations rather than doing a study aimed at testing their hypothesis.

  2. Chester Draws says:

    The reason that past test scores were slightly more accurate than teacher assessment might be that these factors come into play in exams, and aren’t present in teacher assessment.

    They’re just making it up. The nul hypothesis for the reason that test scores are more accurate than teacher assessments is that they measure the intended criteria better. That personal bias makes things worse is another nul hypothesis, as seen from the other side. But to hand-wave the problem away with a “might” is just bad science.

    I don’t buy the anxiety argument at all, and the research I’ve read doesn’t give much support to anxiety producing poor results — my anxious students tend to better than the average because they put more work in, because they are worried. The students who do worst in exams tend to be the ones who are over-confident, because their very confidence tends to make them study less.

    From a teaching standpoint, exams make education less about learning the curriculum, and more about how to do well in exams.

    This is true.

    However they “forget” that from a teaching standpoint, internal assessment make education less about learning the curriculum, and more about how to please the teacher.

  3. Pingback: On ditching exams | Filling the pail

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.