Is Reading Recovery like Stone Soup?

Researchers from the Universities of Delaware and Pennsylvania have written a paper describing a large, multi-site, randomly controlled trial of Reading Recovery. The effect size is impressive: 0.69 when compared to a control group of eligible students. This is above Hattie’s effect size threshold of 0.40 and so suggests that we should pay attention. As a proponent of evidence-based education, you may think it perverse of me to question such a result.

It’s not.

Reading Recovery involves taking students out of normal lessons and giving them a series of 30-minute one-to-one reading lessons with a Reading Recovery trained teacher over a period of 12 to 20 weeks. So the intervention packages together a number of different factors including:

– the specific Reading Recovery techniques

– additional reading instructional time on top of standard classroom reading instruction

– one-to-one tuition

Each of these factors could plausibly impact on a child’s reading progress. For instance, we might expect a series of 30-minute one-to-one reading sessions with an educated adult volunteer to also improve students’ reading performance.

However, the implicit claim is that it is the specific Reading Recovery techniques that are responsible for any effect. Otherwise, why would we spend considerable amounts of money training and hiring Reading Recovery teachers? Indeed, the abstract suggests that, “the intensive training provided to new RR teachers was viewed as critical to successful implementation.”

It would be very easy to test the effect of the actual strategies. A good model is a study carried out by Kroesbergen, Van Luit and Maas on mathematics interventions with struggling maths students. They created three randomised groups. The first were given a ‘constructivist’ maths intervention, the second were given an ‘explicit’ maths intervention and a third control group were given no intervention (at least, during the study). Both interventions were beneficial when compared with the control. This is to be expected – any reasonable intervention is likely to be more effective than no intervention at all. However, the explicit intervention was found to be superior to the constructivist one and so we may assign some of the effect to the different strategies used in the two interventions.

Following this model, a good test of Reading Recovery might be to compare it with the kind of tuition from an educated volunteer that I described above or maybe to compare it with a different one-to-one intervention program. Of course, all programs would need the same amount of instructional time.

However, this is not what seems to happen in Reading Recovery research. Reading Recovery is proprietary and so the consent of the organisation is required in order to use its copyrighted materials in trials. The only trials that seem to take place are those that compare Reading Recovery with no intervention at all, like in the Delaware/Pennsylvania study (I am happy to be proved wrong on this – if you know of any different types of trials then please link in the comments).

This is problematic. The first rule of scientific research is to control variables. Admittedly, some variables are highly unlikely to affect the result and so we might not worry too much about them. However, in this case, multiple variables are changed at once, each of which could plausibly produce an improvement in reading performance.

Hey Google, what is a fair test?

Hey Google, what is a fair test?

Imagine a trial of a new medicine. It is unlikely that such a trial would be run against no intervention. At the very least, it would be compared with a placebo because of the well-known placebo effect. A more pertinent example might be if a study was done to test a regime of diet, exercise and a patented vitamin pill against no intervention at all and found that the former led to considerably more weight loss. What would we learn from this?

All that we can conclude from the Delaware/Pennsylvania study is that the entire Reading Recovery package – which is expensive to implement – is more effective than standard classroom teaching alone. We don’t know what causes this effect and whether we could gain the same effect without the same expense. Moreover, I would suggest that the principles of Reading Recovery, seemingly validated by such research, have a tendency to wash back into classroom teaching, potentially at the expense of evidence-based approaches. Researchers at Massey University in New Zealand have even claimed that the ‘failure’ of New Zealand’s literacy strategy has largely been as a result of the widespread adoption of Reading Recovery principles.

It reminds me of the folktale of the weary traveller who makes soup out of a stone. He knocks on the door of an old woman and asks for some hot water. She asks him what it’s for. He explains that he intends to make soup out of a stone and that she can have some. After a while, he tastes the soup, “It’s good,” he says, “but it could do with a little bacon.” The old woman gets some. A short time later, he tastes it again, “Mmmm,” he says, “some turnip would just improve it a little.” And so it continues, with the woman fetching one new ingredient after another. Eventually, the traveller serves the soup.

“Delicious,” says the old woman, “who would have thought that you could make such tasty soup out of a stone?”

By Qù F Meltingcardford (Own work) [CC BY-SA 3.0 (], via Wikimedia Commons

By Qù F Meltingcardford (Own work) [CC BY-SA 3.0 (, via Wikimedia Commons

Update: Since writing this, I have become aware that the control group for the I3 study was more complex than ‘no intervention’. Instead, Reading Recovery was compared with a school’s usual intervention for poor readers. This was a mix of things from no intervention at all to small group interventions and so on. However, we are still not comparing like with like and so the original criticism in this post still stands.


31 thoughts on “Is Reading Recovery like Stone Soup?

  1. Julia Douetil says:

    One of the very first studies of Reading Recovery was exactly what is described here – students randomly assigned to one of four groups, 1) Reading Recovery , 2) ‘Reading Recovery – like’ intervention but teachers had less training, 3) equivalent amount of one to one support in a skills based program 4) control with normal school provision for struggling readers.
    Result – the children who received Reading Recovery as designed made significantly greater progress on all measures. See Schmitt et al, 2005, Changing Futures: the influence of Reading Recovery in the United States, pub RRCNA, page 146

  2. Pity the poor policymaker who embraces evidence-influenced policy, for she shall choke on a soup of stone.

    A primary purpose of the US federal “Investing in Innovation and What Works” or i3 program was to investigate whether promising, evidence-backed education interventions could be successfully “scaled up,” i.e., deliver positive outcomes for a large number of students. Reading Recovery survived a rather grueling screening process to receive one of only four major i3 grants from the US government designed explicitly to test whether this intervention could successfully scale. Given that many “pilot” programs fall apart at the scaling stage (see, e.g.,, this seems a reasonable and even prudent course of action for a government to take.

    A few years later, the early results of the large randomized-control trial designed to test Reading Recovery’s expansion are in, and they appear quite promising insofar as they delivered a large effect size (.69) that easily clears the “Hattie Line of Relevance” of .40. Given that the entire point of effect-size analysis is to compare the relative impact of different interventions, you might forgive the poor policymaker for concluding that Reading Recovery might be worthy of additional public investment given the apparent returns. If you like evidence-based education policy (or evidence-influenced policy), you would seem to like the story thus far.

    Not so, says Chef Evidence-Based Ashman! Because we don’t know precisely which aspect of Reading Recovery is responsible for the positive effects on student learning — was it the teacher training? The 1:1 tutoring time? The specific Reading Recovery techniques? — we cannot “fairly” conclude that Reading Recovery itself is a good program. The only way to “really know” whether Reading Recovery is meritorious would be to conduct another large-scale randomized study (or more likely, studies), that isolates each major variable embedded in the Reading Recovery intervention and evaluates which if any is responsible for the improvement in student learning. Absent that, says Chef Ashman, our poor policymaker may be sipping a delicious broth but mistaking bacon for stone.

    In reality, of course, we never have perfect knowledge about anything. Policymakers come to grips with this fact far more readily than academics do, because they must make choices about what policies to pursue based on highly limited inputs, including research-backed evidence. If proponents of research-supported policy marginalize the sort of research that was conducted here on Reading Recovery, policymakers will simply conclude — as they so often do — that “you can use evidence to support any position you want” — and thus make their decisions based on other inputs, such as which idea sounds better in a press release.

    As Vincent Vega once said, “Bacon tastes good. Pork chops taste goooooood.” If Reading Recovery’s “stone soup” tastes goooood, let’s eat.

    • Hmmm… You seem to suggest that I am obsessing over the rather arcane matter of ‘precisely’ how RR works. However, for the policy maker, this is quite important.

      Training in RR is expensive. So is employing RR trained teachers. If my hunch is right and you could get the same effect by using adult volunteers then that would save the policy maker a lot of money. The notion of a fair test is not an obscure idea, pursued only by pedants; it is central to good science. I am surprised that you are so happy to let it slip.

      Do you think we can abandon science in favour of effect sizes? I am increasingly skeptical about this. They are not the measure people think they are. As a quotient of two values – difference in means over standard deviation – there are lots of ways this can go wrong. For instance, if you test a small section of the ability range, rather than the entire range, you will reduce the standard deviation and increase the effect size. We see this effect in the study reported where the effect size for comparison with struggling readers is larger than the effect size compared to the same aged population. Similarly, if we use a testing instrument or choose an age where the rate of progress is greater than at other points then we will inflate the effect size. I am afraid that it is no substitute for science.

      It is strange that we will abandon science in this way for educational research. Would you buy the vitamin pills mentioned in my post? I don’t think you would.

  3. Well, I think you want the research to answer a different question than the one it addressed, or maybe you want to be sure that the right conclusion is drawn from the research. Most interventions are tested as a package, so you’re left w/o knowing why it works (given that it works). All you know is that it’s better than the comparison group (which is usually “business as usual,” rather than an active control). So you don’t know the “active ingredient.”
    But given the difficulty of moving the needle w/ struggling readers and given the strong assoc of early reading problems w/ later academic achievement, I think it’s noteworthy.

    • You are right that I want it to answer a different question. This is because I would like the study to be useful in some way. What would you predict to be the effect of one-to-one tuition with an educated adult? Nothing? 0.40? We just don’t know and yet this would cost a lot less than RR.

      As someone who works in a school, I can point out that the ‘effectiveness’ if RR that is ‘proved’ by such studies is used as an argument to influence initial standard reading instruction. For instance RR uses multi-cuing (guessing from context, pictures etc) and so this is proposed for initial reading instruction.

  4. Tami Reis-Frankfort says:

    Reblogged this on and commented:
    Why is there no research that compares Reading Recovery to other 1:1 intervention programmes?

  5. Interesting post. However, surely given the weight of evidence that exists for phonics, and the conclusions of the meta-analysis cited below that ” phonics instruction is not only the most frequently investigated treatment approach, but also the only approach whose efficacy on reading and spelling performance in children and adolescents with reading disabilities is statistically confirmed”, I would argue that the only really useful test of RR would be against a one-to-one intervention using a linguistic or synthetic phonics approach.

  6. If anything, what we have here is Stone Soup at the “best chef” level–not the peasant level.

    –The link goes to a paper in the American Educational Research Journal–the flagship journal of the flagship American Educational Research Association. However, the abstract is cost-walled and the “news” is stale. That is, it reports the results of a the first year of a study, when the results of the second year of the study are openly accessible.

    –The “Scale-Up” grant was awarded in 2010. The results of the third year (2013-14) will eventually be reported, but by then the grant financing will have ended and there will be nothing realized from the “Investment in Innovation.”

    –The effects of the innovation are gauged in terms of “effect size”–the current popular gauge. The .4-.6 effect reported is “large”, but it represents a difference of three items on the achievement test and both the experimental and the control groups are still performing below the national mean on the test–which is still low as a “recovery” indicator.

    The report linked above is well-worth reading. (Just skip the fancy statistical sections–they’re obligatory these days, but uninformative even when understood.) The researchers are competent and they have given some thought to what they are doing. They are saving some matters for their third and “Final Report,” which is something we can look forward to later in the year.

    Is there an alternative methodology/orientation? Yes, there is. It’s currently in play in a natural experiment where the comparison between the modus operandi for reading instruction in the US and in the UK. The UK “treatment grou” is using an unobtrusive Alphabetic Code-based screening check administered at the end of Year 1 to identify children who still need further formal instruction in reading per se. The Check is administered at the end of Year 2 to children who were flagged in Yr 1. The Brit initiative has not been free of “stumbles and struggles,” but compared to the US “treatment group” the difference is ocularly significant–it hits you between the eyes.

  7. We definitely need something more than effect sizes and RCTs. I like this from Steven Weinberg:

    “Medical research deals with problems that are so urgent and difficult that proposals of new cures often must be based on medical statistics without understanding how the cure works, but even if a new cure were suggested by experience with many patients, it would probably be met with scepticism if one could not see how it could possibly be explained reductively, in terms of sciences like biochemistry and cell biology. Suppose that a medical journal carried two articles reporting two different cures for scrofula: one by ingestion of chicken soup and the other by a king’s touch. Even if the statistical evidence presented for these two cures had equal weight, I think the medical community (and everyone else) would have very different reactions to the two articles. Regarding chicken soup, I think that most people would keep an open mind, reserving judgment until the cure could be confirmed by independent tests. Chicken soup is a complicated mixture of good things, and who knows what effect its contents might have on the mycobacteria that cause scrofula? On the other hand, whatever statistical evidence were offered to show that a king’s touch helps to cure scrofula, readers would tend to be very sceptical because they would see no way that such a cure could ever be explained reductively…How could it matter to a mycobacterium whether the person touching its host was properly crowned and anointed or the eldest son of the previous monarch?”

  8. The point you make on knowing which elements of an intervention work is a very important one in evaluation, and one of the things too often ignored in current use of RCT’s in education. Often the designs are not sufficiently sophisticated for us to answer this question. An additional element we typically forget to study is cost effectiveness, a pretty crucial issue in a world of constrained resources. Also, your point on backwash refers to unintended consequences, which again are often not taken into account in evaluations of educational interventions. All this points to the fact that while RCT’s are essential, they are not in themselves sufficient as a basis for decision-making.

  9. Pingback: No stone unturned | Horatio Speaks

  10. Pingback: The truth about teaching methods | Filling the pail

  11. Pingback: Evidence in education | Filling the pail

  12. Pingback: The new learning styles | Filling the pail

  13. Pingback: Ability grouping | Filling the pail

  14. Pingback: New evidence suggests Reading Recovery doesn’t work | Filling the pail

  15. Pingback: How to spend money in education | Filling the pail

  16. Pingback: How Reading Recovery probably works | Filling the pail

  17. Pingback: The best way to teach | Filling the pail

  18. Pingback: No, Reading Recovery doesn’t work in America | Filling the pail

  19. Pingback: Another flawed Reading Recovery study to add to the pack – Filling the pail

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.