There has been a debate raging in my Twitter notifications between Dylan Wiliam and, from my perspective at least, everyone else. Wiliam’s position, as I understand it, is that Reading Recovery is an effective intervention for struggling readers that, perhaps more importantly, has been shown to work at scale. Although Wiliam thinks it is based on flawed science, it is better that struggling readers are given access to an intervention that is available now than given nothing at all. Although systematic synthetic phonics is a promising approach that aligns with cognitive science, there is no reason to think it will be quick or easy to either change the way reading is taught in schools or develop and deliver systematic synthetic phonics interventions to remedy poor reading instruction. Why let the best be the enemy of the good?
This seems like a reasonable, pragmatic argument. So why has it attracted criticism from so many people?
What if Reading Recovery really does work?
If Reading Recovery is an effective intervention then we are placed into a moral dilemma.
One of the interesting aspects of education research is its indifference to mechanisms of action. For instance, I have been a critic of the Education Endowment Foundation’s pursuit of Philosophy for Children. I have criticised the way the initial study was analysed, but there is a more fundamental question: How does having debates about whether it is OK to hit a teddy bear improve reading performance, as is claimed? When I ask this question, I do not get an answer, not because I have somehow stumped the researchers but because, to them, it is not even a question. Mechanisms don’t matter.
You see this indifference all across the domain of education research where interventions are tested. It is unscientific for two reasons. First of all, scientists should want to understand how the world works and indifference to mechanisms is the opposite of this. Secondly, it closes down scrutiny by other researchers. If researchers specify a sequence of cause and effect then we give other researchers the option of testing links in that chain in different ways. If they do not, they may dismiss any research that does not exactly replicate the research that was initially done.
In the case of Reading Recovery, I think mechanisms matter a great deal. I don’t think that using multi-cuing strategies is likely to help struggling readers – if anything, it is likely to cause harm by driving a reliance on word-guessing that breaks down for more complex texts – and so I would be surprised if the Reading Recovery techniques themselves have an effect. Instead, I suspect any effect of Reading Recovery is due to a combination of a placebo effect, additional reading practice and one-to-one tuition. If so, we could test each of these separately. If one-to-one tuition is the key then we can ditch the expensive Reading Recovery training and just implement one-to-one sessions. This would have the advantage of avoiding issues such as reinforcing multi-cuing and it is something that could be done pretty much straight away.
There is also the concern that the ‘good’ in this case is the enemy of the ‘best’. It seems to be the experience of many of us that teachers trained in Reading Recovery tend to become the literacy experts in their schools. This then has a backwash effect where strategies such as multi-cuing become the bedrock of initial reading instruction, making it harder for systematic synthetic phonics to make inroads.
What if Reading Recovery does not work?
Although I find it plausible that Reading Recovery has some sort of effect, I am less certain of this than I used to be. New Zealand has been using Reading Recovery for the longest time and there is little evidence of a positive effect. I recently looked at a UK study that purported to show significant effects for Reading Recovery but it appeared to have methodological flaws. The initial ‘I3’ trial in the US seemed to have similar problems. These often revolve around the way students are included in the programme. Anecdotally, some schools appear to ignore the requirement to place the students who are struggling the most into the programme on the basis that they think it will be ineffective for those students. Similarly, some students don’t complete the programme, for whatever reason, and are often excluded from the data analysis.
A later I3 trial in the US seems to have avoided some of these problems by selecting students using scores on initial assessments, pairing them up with similar students then assigning one member of each pair to Reading Recovery and the other to a control. However, when you look at missing data, there is a large disparity. 4136 in the RR group have data and 756 have missing data. Conversely 3719 in the control group have data and 1173 have missing data (note they excluded both members of a pair from the analysis if one had missing data). I understand that the appropriate test of statistical significance for data like this is McNemar’s Test, but I don’t think I have all the data I need to do this. Nevertheless, if attrition is caused by students moving schools then we should expect an equal rate of attrition from both groups, so this seems like an odd difference and makes me wonder if something funny has happened here.
So perhaps there is only really one good trial that shows the effectiveness of reading recovery and that trial has strange attrition rates. Weigh that against the epidemiological failure of the programme in New Zealand and New South Wales and it is reasonable to hold the view that the effectiveness of Reading Recovery has not been demonstrated.
How could a programme involving an obvious placebo, one-to-one tuition and extra practice result in no overall effect? Perhaps the negative effects of strategies such as multi-cuing outweigh all of these potentially positive ones.
In that case, there is no moral dilemma at all.
In my next post, I will take a closer look at the problem of implementation