How Reading Recovery probably works

I have written before about trials of Reading Recovery, particularly the recent I3 study from the U.S. Since then, I have become aware of two papers that I think are key to understanding the way that Reading Recovery works.

To say that it ‘works’ is actually quite controversial. Objectively, it does. Placing students in a Reading Recovery intervention seems to improve their reading more than if you don’t do anything. The question remains as to why this is the case. For instance, is it due to the specialist training that Reading Recovery teachers receive?

It is important to note that Reading Recovery is a one-to-one intervention of up to 60 half-hour sessions. This is hugely resource intensive. It also represents Benjamin Bloom’s ideal of a maximal form of teaching. He reviewed various interventions – specifically conventional teaching, mastery learning and tutoring – and found an effect size of d=2.0 for one-to-one tutoring. So the form of Reading Recovery likely contributes some proportion of its effect.

We could possible gauge this by comparing Reading Recovery directly with another one-to-one reading intervention of the same duration and randomising students between the two treatments. Surprisingly, there seem to be few such direct comparisons. So perhaps we should look at comparing effect sizes from Reading Recovery versus a control with effect sizes from rival one-to-one programs versus a control. This is more fraught because conditions will necessarily vary but it might be indicative.

This is where the second paper comes in. In 2011, Robert Slavin and colleagues reviewed a number of studies on reading interventions. They were quite picky about the studies that they included. When it came to Reading Recovery, they avoided outcome measures that were intrinsic to the method itself in favour of more objective measures:

“First, most Reading Recovery studies use as posttests measures from Clay’s (1985) Diagnostic Observation Survey. Given particular emphasis is a measure called Text Reading Level, in which children are asked to read aloud from leveled readers, while testers (usually other Reading Recovery teachers) record accuracy using a running record. Unfortunately, this and other Diagnostic Observation Survey measures are closely aligned to skills taught in Reading Recovery and are considered inherent to the treatment; empirically, effect sizes on these measures are typically much greater than those on treatment-independent measures.” [my emphasis]

At this point I will remind you of my first principle of educational psychology: students tend to learn the things you teach them and don’t tend to the learn the things you don’t teach them.

Slavin et. al. also ruled-out studies based only upon those students who had successfully completed Reading Recovery. Such studies prove little. I am sure that many teachers would prefer to be judged only on the results of those students who have been successful.

Once they had whittled-down the research in this way, Slavin et. al. were able to note that:

“The outcomes for Reading Recovery were positive, but less so than might have been expected…  

Across all studies of one-to-one tutoring by teachers, there were 20 qualifying studies (including 5 randomized and 3 randomized quasi-experiments). The overall weighted mean effect size was +0.39. Eight of these, with a weighted mean effect size of +0.23, evaluated Reading Recovery. Twelve studies evaluated a variety of other one-to-one approaches, and found a weighted mean effect size of +0.56… 

Across all categories of programs, almost all successful programs have a strong emphasis on phonics. As noted earlier, one-to-one tutoring programs in which teachers were the tutors had a much more positive weighted mean effect size if they had a strong phonetic emphasis (mean ES = +0.62 in 10 studies). One-to-one tutoring programs with less of an emphasis on phonics, specifically Reading Recovery and TEACH, had a weighted mean effect size of +0.23. Within-study comparisons support the same conclusion. Averaging across five within-study comparisons, the mean difference was +0.18 favoring approaches with a phonics emphasis.”

I think it is important that policymakers are aware of these findings.


  1. Pat Stone says:

    RR can involve up to 100 lessons in UK, 12 – 20 weeks.

    You criticise RR for using its own methods to assess its own teaching, but when you say the children taught 1:1 by teachers using phonics do better, what measures are used? Who are the children being taught this way? You don’t say? I have seen ‘results’ from Sounds Write who teach children – all children in a cohort, not just those needing intervention – who are taught spelling and their spelling reading age is measured after 2 years (that’s 70 weeks, how cost effective is that?) – using spelling tests to assess spelling. When these children’s book reading comes out as weaker, SoundsWrite actually boast that they are glad about this because ‘measuring spelling is a better measure of ‘literacy’ than reading books.’ The ‘reason’ for this is that in reading books, children might guess and so actual reading is not really a measure of actual reading. There is no evidence of any of this. There is plenty of rhetoric, I’ll grant you.

    There is plenty of evidence available to show how well RR children do in UK national tests, SATs in KS1 and KS2 – remarkable for children who start off in the lowest achieving 20%. Shall I send you some?

  2. Just a small point. You state that children in the Reading Recovery programme receive up to 60 one-to-one sessions. This is incorrect. Children receive between 12 and 20 weeks of one-to-one tuition depending on their rare of progress.
    In our socially disadvantaged boys school the average number of Reading Recovery lesons a child receives is closer to 100 than to 60.

  3. So up to one hundred sessions not fifty?

  4. I don’t want to comment on Pat Stone’s defence of RR. What I would like to correct is her apparent inability to separate the teaching of beginning reading and spelling to young children and teaching children who, for one reason or another, have fallen behind. So, all this talk about measuring RA after two years is indicative that she hasn’t read our report carefully. As for her claim that ‘children’s book reading comes out as weaker’, I have no idea what she is talking about and won’t waste my time trying to respond.
    What I will say is that students who are several years behind their CA in reading and spelling and who are taught using a linguistic phonics approach make very rapid progress, unless they have a serious speech and language impairment.
    What is very rapid progress? An example I can readily think of which I would consider to be fairly representative of the many pupils I have taught is that of fifteen-year-old boy. When he came to me via the Milton Keynes Psychology Service, he came out as having an RA of 8:3. [The Service were the ones who tested him.] After eleven hours of one-to-one intensive teaching he was tested again [by the Service] and had an RA of 11:9. This was an improvement of three years and nine months! Interestingly but not surprisingly, his spelling scores barely shifted, although there was one significant change: many of the spellings he was using on the test were phonically plausible but still incorrect. Clearly, such a short intervention hadn’t left a sufficient trace and what the student needed was more exposure and practice.
    Other practitioners I know and who use either Sounds-Write or Sound Reading System, which is very similar in orientation to Sounds-Write, get the same or very similar kinds of results.

    • Pat Stone says:

      You don’t know what I’m talking about? Is this not yours? See bottom of page 25

      You are not (ever) comparing like with like. RR does not teach 15 year old boys. I am thrilled for that boy that you managed to get him somewhere. Which of us does not have similar stories?

      RR is for year one (in UK) children who have fallen behind after at least three terms in school. RR is for the lowest achieving 20% in year 1. They could be in the lowest 20% for all sorts of reasons, including EAL, SaLT, missed school, behaviour management difficulties, trauma, not enough teaching… all sorts.
      I have no problem whatsoever with you and your friends getting great progress with certain children. I do take issue with phonatics continually trying to denigrate RR.

      What Ashman and various others (who don’t teach beginner readers at all) do, repeatedly, is the equivalent of comparing Italian cooks’ ability to cook Italian food with British cooks’ ability to cook British food. You / they don’t compare like with like. The results depend on what sort of food one likes. You seem to look for a reading age based on a test of words in a list? RR will look for book reading accuracy and comprehension plus spelling plus word list reading plus word list reading age.

      Another thing is the latest exhortation to teach the 5 components of reading since so many people have been concerned at the lack of some of them since phonics took such a high profile – comprehension, fluency, vocabulary, phonics, phonemic awareness. RR incorporates these 5 into every lesson, every day, and has been doing so since the 80s.

      I also note that in SoundsWrite schools, and schools that use other well-known schemes, there is wonderful work going on with Talk for Writing, rich literature environment and talk for learning development from nursery onwards. Again, RR brings these elements into the lessons if and when necessary. Yes, SP/SSP is useful, but it is a fraction of learning to read and write. There are other fractions.

      • gregashman says:

        The post is an attempt to compare like with like. As I state, I would prefer to see a three way test of no intervention versus Reading Recovery versus one-to-one tuition of the same amount. Such tests don’t seem to take place. So, instead, we need to look at effect sizes for different interventions. It is well known that effect sizes tend to be much larger if tests are used that are designed by the originators of the program. This is why Slavin et al excluded the Clay tests as an outcome measure and used standardised reading tests instead. They then found that phonics based one-to-one tuition was superior to Reading Recovery. I have no idea what Sounds Write has to do with this issue. You introduced it to the discussion but then you rightly note that you can’t really compare it to Reading Recovery. As I understand it, Sounds Write is a form of initial instruction and not an intervention.

      • Pat Stone says:

        ‘Sounds Write is a form of initial instruction and not an intervention’
        It is, you are right. It was designed and is managed / owned / promulgated by John Walker, of TheLiteracyblog. Its results (which cannot be separated from other great teaching in the schools where it is used) cannot be compared with RR, which is an intervention. I mentioned his schemes and ‘evidence’ because this is the only statistical ‘evidence’ I can find to do with phonics teaching.
        I mentioned it at all because I know from previous that you are a proponent of SP / SSP and a critic of RR. The 2 are not comparable in any way shape or form. I don’t accept that you can discredit RR by citing SP, or base your criticisms of RR on rhetoric from SP / SSP proponents.

        When people criticise RR but say things that show me they don’t really know what RR is (about numbers of lessons, for example), I am suspicious and wont to try and put the record straight.
        Thank you for the opportunity.

      • gregashman says:

        I think you should read the Slavin et al review. That’s the evidence I’m citing and I have linked to the full pdf in my post. As far as I’m aware, it has nothing to do with Sounds Write. The Slavin et al review is also where the 60 sessions figure comes from.

      • Pat Stone says:

        If Slavin says 60, I say he doesn’t know RR and so I would be wary of his criticisms. I teach the stuff. I know the stuff.
        I think you should read this, if we are telling each other what to read:

      • gregashman says:

        I am not telling you what to read. You have chosen to respond to this blog post which is largely about the Slavin et al meta-analysis. This analysis provides statistical evidence that phonics based tuition is superior to Reading Recovery. And yet you are claiming that Sounds Write is the only statistical evidence that you can find on phonics. If you read the report then you will find some more. But that’s up to you.

  5. Dick Schutz says:

    How Reading Recovery PROBABLY Works is an apt title. In actuality in “works” differently depending on the instructor, setting and, student–see the uncertainty about the number of sessions for example. Comparing statistical effect sizes is one way to untangle the variability, but it’s not a very convincing way. Slavin’s meta-analysis is now five years old and it draws on studies going back decades earlier. None of this research has had a shred of actual effect.

    You say, We could possible gauge this by comparing Reading Recovery directly with another one-to-one reading intervention of the same duration and randomising students between the two treatments. Surprisingly, there seem to be few such direct comparisons.

    Spot on! But scholar have been also been promoting this Randomized-Control Trial methodology for decades, without any success. When will we stop being surprised that it doesn’t “work” in schooling.

    There is an easier way that follows directly from “First Principle: Students tend to learn the things you teach them and don’t tend to the learn the things you don’t teach them
    Applied to RR and other “Interventions:” The UK Alphabetic Code [Phonics] Screening Check provides a valid and reliable common metric for gauging “Interventions.” You’ve blogged about the PSC elsewhere so I won’t belabor the point here.

    All we gottado is to look at how RR fares on the PSC, and do the same calibration with other programmes. With this information about reliability-of-accomplishment in hand we can dismiss the “tis-taint” squabbles and go on to cost and time considerations that are typically now ignored.

    This kind of Natural Experiment methodology is commonplace in Epidemiology and Engineering, but historically education inquiry has tried to borrow from Clinical Medicine and Agriculture. Researchers too “learn what they’ve been taught,” so an “intervention” is needed to change matters.

  6. Dan Meyer says:

    “At this point I will remind you of my first principle of educational psychology: students tend to learn the things you teach them and don’t tend to the learn the things you don’t teach them.”

    Are you defining “teaching” here as “explicit instruction?”

