Is Reading Recovery like Stone Soup?

Researchers from the Universities of Delaware and Pennsylvania have written a paper describing a large, multi-site, randomly controlled trial of Reading Recovery. The effect size is impressive: 0.69 when compared to a control group of eligible students. This is above Hattie’s effect size threshold of 0.40 and so suggests that we should pay attention. As a proponent of evidence-based education, you may think it perverse of me to question such a result.

It’s not.

Reading Recovery involves taking students out of normal lessons and giving them a series of 30-minute one-to-one reading lessons with a Reading Recovery trained teacher over a period of 12 to 20 weeks. So the intervention packages together a number of different factors including:

– the specific Reading Recovery techniques

– additional reading instructional time on top of standard classroom reading instruction

– one-to-one tuition

Each of these factors could plausibly impact on a child’s reading progress. For instance, we might expect a series of 30-minute one-to-one reading sessions with an educated adult volunteer to also improve students’ reading performance.

However, the implicit claim is that it is the specific Reading Recovery techniques that are responsible for any effect. Otherwise, why would we spend considerable amounts of money training and hiring Reading Recovery teachers? Indeed, the abstract suggests that, “the intensive training provided to new RR teachers was viewed as critical to successful implementation.”

It would be very easy to test the effect of the actual strategies. A good model is a study carried out by Kroesbergen, Van Luit and Maas on mathematics interventions with struggling maths students. They created three randomised groups. The first were given a ‘constructivist’ maths intervention, the second were given an ‘explicit’ maths intervention and a third control group were given no intervention (at least, during the study). Both interventions were beneficial when compared with the control. This is to be expected – any reasonable intervention is likely to be more effective than no intervention at all. However, the explicit intervention was found to be superior to the constructivist one and so we may assign some of the effect to the different strategies used in the two interventions.

Following this model, a good test of Reading Recovery might be to compare it with the kind of tuition from an educated volunteer that I described above or maybe to compare it with a different one-to-one intervention program. Of course, all programs would need the same amount of instructional time.

However, this is not what seems to happen in Reading Recovery research. Reading Recovery is proprietary and so the consent of the organisation is required in order to use its copyrighted materials in trials. The only trials that seem to take place are those that compare Reading Recovery with no intervention at all, like in the Delaware/Pennsylvania study (I am happy to be proved wrong on this – if you know of any different types of trials then please link in the comments).

This is problematic. The first rule of scientific research is to control variables. Admittedly, some variables are highly unlikely to affect the result and so we might not worry too much about them. However, in this case, multiple variables are changed at once, each of which could plausibly produce an improvement in reading performance.

Hey Google, what is a fair test?

Hey Google, what is a fair test?

Imagine a trial of a new medicine. It is unlikely that such a trial would be run against no intervention. At the very least, it would be compared with a placebo because of the well-known placebo effect. A more pertinent example might be if a study was done to test a regime of diet, exercise and a patented vitamin pill against no intervention at all and found that the former led to considerably more weight loss. What would we learn from this?

All that we can conclude from the Delaware/Pennsylvania study is that the entire Reading Recovery package – which is expensive to implement – is more effective than standard classroom teaching alone. We don’t know what causes this effect and whether we could gain the same effect without the same expense. Moreover, I would suggest that the principles of Reading Recovery, seemingly validated by such research, have a tendency to wash back into classroom teaching, potentially at the expense of evidence-based approaches. Researchers at Massey University in New Zealand have even claimed that the ‘failure’ of New Zealand’s literacy strategy has largely been as a result of the widespread adoption of Reading Recovery principles.

It reminds me of the folktale of the weary traveller who makes soup out of a stone. He knocks on the door of an old woman and asks for some hot water. She asks him what it’s for. He explains that he intends to make soup out of a stone and that she can have some. After a while, he tastes the soup, “It’s good,” he says, “but it could do with a little bacon.” The old woman gets some. A short time later, he tastes it again, “Mmmm,” he says, “some turnip would just improve it a little.” And so it continues, with the woman fetching one new ingredient after another. Eventually, the traveller serves the soup.

“Delicious,” says the old woman, “who would have thought that you could make such tasty soup out of a stone?”

By Qù F Meltingcardford (Own work) [CC BY-SA 3.0 (], via Wikimedia Commons

By Qù F Meltingcardford (Own work) [CC BY-SA 3.0 (, via Wikimedia Commons

Update: Since writing this, I have become aware that the control group for the I3 study was more complex than ‘no intervention’. Instead, Reading Recovery was compared with a school’s usual intervention for poor readers. This was a mix of things from no intervention at all to small group interventions and so on. However, we are still not comparing like with like and so the original criticism in this post still stands.


No, Reading Recovery doesn’t work in America

Embed from Getty Images

A couple of years ago, I reported on a large randomised controlled trial in the U.S. of Reading Recovery. I pointed out that, as with other studies of Reading Recovery, it was impossible to tell whether the instructional procedures used were responsible for any effect. Instead, any gains may have been due to the one-to-one tuition format of the intervention. After all, one-to-one tuition has been held up as a ideal form of instruction by none other than Benjamin Bloom.

Since my original post, I have also pointed out that, when compared on similar outcome measures, Reading Recovery tends to generate smaller effects than programmes based on systematic synthetic phonics (SSP). I am cautious about comparing effect sizes but such an approach has the greatest validity when comparing children of the same age learning the same content, as in this case. The greater effectiveness of SSP hardly surprising given the probable status of SSP as the educational intervention with the greatest amount of empirical evidence to support it.

Reading Recovery, on the other hand, seems to have evolved from a whole language approach to one that now incorporates phonics, although not in the same systematic way as SSP. It also seems to influence ‘balanced literacy’ teaching methods and so its impact stretches much further than the realm of intervention. I think that people are drawn to the narrative at the heart of Reading Recovery and start seeing early reading from this perspective.

So it was with interest that I read a new peer-reviewed paper published in the official journal of the Learning Disabilities Association of America. It returns to the trial that I commented upon in 2015. The authors note that the long term effect of the intervention was ‘not significant’ and that there was evidence that some of the lowest achieving groups of students were systematically excluded from Reading Recovery condition.

This is very worrying if policymakers, swayed by the original study, have decided to invest money in Reading Recovery as a strategy to tackle reading difficulties.

Perhaps, as this new evidence comes to light, the U.S. will make the move away from Reading Recovery that has already been initiated in Australia following reviews of the evidence here.

The Ministry of Silly Hats

Embed from Getty Images

I noticed some discussion recently about Edward De Bono’s silly thinking hats. So I thought it might be worth re-posting this old piece that first appeared on the websofsubstance blog back in September 2013. Apologies for the fact that I’ve since reused the voice projection anecdote on this blog.

When I was a young pup, during my first year of teaching, I had to attend a special training session each week with the other young pups and our professional tutor. One week, this session was led by a drama teacher and the subject was the proper projection of the voice. We were all stood in a line and asked to say the word, “Now,” in turn. Apparently, we were to do this from our stomachs – which oddly seemed to be located in our intestines – and not from our throats. I failed. So, the instructor asked me to jog on the spot and say, “Now.” Begrudgingly, I did this but it seems that this exertion was in vain because I was still utilising my throat in the process.

Not to be discouraged, the instructor had another idea. I should run from one end of the room to another saying, “Now,” repeatedly as I did so.

“No,” I said throatily, “I won’t be doing that,” and I sat down.

My professional tutor was embarrassed and  there was something of a flap before we all agreed that it was perfectly fine for me to sit out the rest of the activity.

This highlights the importance of seeing things from multiple perspectives. What the instructor perceived to be playful and constructive, I perceived to be pointless and demeaning.

Imagine, therefore, that someone were to ask me  – literally or metaphorically – to don Edward de Bono’s red thinking hat and declare my emotional reaction to a proposal to alter the end-of-term reporting criteria. On a good day, I may confect something trite in order to move the discussion on to the next step. On a bad day, I might just refuse to play.

Further, imagine it is 2006 and the boss of a big bank is conducting a thinking hats session around the proposal to take-over a profitable sub-prime mortgage provider. Imagine an underling is given the job of performing some ‘black hat’ thinking in a meeting and divine the potential problems. Which of the following scenarios do you think would be most likely?

1. The underling plays “It’s the end of the world,” by REM on the boardroom sound-system whilst swaying rhythmically and issuing dire prognostications about the death of the bank, a global financial crisis and huge sovereign debts accrued in bailing-out a banking system deemed too big to fail.

2. The underling notes some branding differences between the two banks that will need to be overcome.

One of the largest risks we face is hubris. Just in the last decade, we have had the Iraq war and the banking collapse. Whatever you think about the moral case for the Iraq war, there is no denying that it was badly thought through, largely due to hubris. The banking crisis is a monument to hubris. Could it have been avoided with thinking hats? Probably not. What is worse, such strategies have the potential to provide a veneer of proper analysis where no such analysis exists. They replicate the form of different types of thinking without necessarily replicating their substance. The confusion of form with substance, the idea that by adopting a form you can short-cut the need to engage in the substance, is a significant error of reason.

Simply donning a white hat does not give you the knowledge – known as ‘information’ in the thinking hats schema – that you need to make a good decision. Yet, this is where the majority of the work is to be done in the majority of cases; the collation, evaluation and comprehension of sufficient domain knowledge.

I first came across thinking hats when I picked-up de Bono’s book as a Penguin Classic a few years ago. It was a cheap, impulse buy. I assumed that it would contain psychological insights based upon, well, the science of psychology. What I found was a sequence of assertions and a description of a method, plus lots of testimonials. I soon tired of this, declaring the whole thing ‘silly’ and not paying it further attention.

My next encounter was quite recently, in the book “How Mumbo-Jumbo Conquered the World,’ by Francis Wheen. To my astonishment, I found that the Blair government had actually been a big fan of de Bono and his thinking hats. Wheen explains;

“When Blair entered Downing Street, several executives from Andersen – and McKinseys, the other leading management consultancy – were seconded to Whitehall with a brief to practise ‘blue skies thinking’. Soon afterwards, in perhaps the most remarkable manifestation of New Labour’s guru-worship, they were joined by Dr Edward de Bono, whose task was ‘to develop bright ideas on schools and jobs.’

In the autumn of 1998 more than 200 officials from the Department of Education were treated to a lecture from de Bono on his ‘Six Thinking Hats system’ of decision making… ‘Without wishing to boast,’ he added, ‘this is the first new way of thinking to be developed for 2,400 years since the days of Plato, Socrates and Aristotle.’”


Francis Wheen’s book was a real eye-opener and I recommend it. He goes on to explain that the warning signs around de Bono’s judgement were already there for the Blair government to see;

In his 1985 book… Edward de Bono offered the lessons that might be learned from a number of people… The millionaires he extolled included US hotelier Harry Helmsley, later convicted of massive tax evasion, and Robert Maxwell, subsequently exposed as one of the most outrageous fraudsters in British history.”

So I knew a little about Edward de Bono and his thinking hats but I hadn’t been aware that this approach had made it into schools until I read Tom Bennett’s excellent book, ‘Teacher Proof’ – another recommended read. It seems that some teachers are using the six thinking hats in class to develop thinking amongst their students.

I sometimes offend people when I criticise forms of pedagogy. Let me be clear; it is perfectly valid to criticise or even mock a teaching approach. This is not a personal attack. However, some people choose to see it as such: I am attacking something that they do and so they see is as an attack on them personally. It’s as if claiming that the England team’s tactics are unsound is a personal attack on the integrity of the manager. It is not. Such claims are fair in a free society. But this convenient line of reasoning is often effective at shutting down legitimate debate in education.

So here are my reservations about the thinking hats:

1. As I have mentioned, adopting the form of certain type of thinking is not a short-cut to the substance. Many responses are likely to be lazy, platitudinous and uninformed. Pretending to be a wizard doesn’t make you a wizard.

2. The role of knowledge is diminished. The white hat (information) is just one of a total of five active hats who are shepherded by the blue managerial hat. In real decisions, knowledge plays a much more central role and is critical to any success or failure.

3. It relies on a proposition; something open-ended to be discussed. This is not necessarily bad in of itself, but open-endedness is fetishised in some quarters in education at the expense of the transmission of knowledge. Such strategies fit this agenda.

4. It is silly.

Does this mean that you should never touch the hats? Actually, no, it does not. I don’t care for them but I can see that they could break-up a lesson in an interesting way. They could represent a fun way of having a classroom discussion. Even if we discovered the most efficient, optimal form of teaching then you wouldn’t want to do it all of the time; students would become tired because thinking is hard and then the strategy would be suboptimal. There is something to be said for mixing things up a bit. I just don’t think that thinking hats should be taken too seriously.

There’s another reason why I wouldn’t ban the hats. I find Debra Kidd’s defence of thinking hats to be lucid, detailed and convincing, although not convincing enough to change my mind just yet. I believe that if and when Debra uses this approach then she and her students find it to be effective. This may be because of a placebo effect. It may be because Debra integrates a lot of her experience and wisdom into its application – like the man who made soup from a stone. Or, it may well be that I am completely wrong. I’m not sure that there is enough evidence to decide it one way or the other.

What I would be dead against is a whole school ‘thinking hats’ policy where begrudging, rueful teachers are forced to apply thinking hats in a tokenistic way. I’ve been there with Building Learning Power and its a bad place.

Can you imagine; all those forlorn faces sitting underneath those brightly coloured hats…

The best way to teach

Embed from Getty Images

I am actually slightly more interested in what to teach than how to teach. However, teaching methods are more amenable to experiment than curriculum content and so I find myself discussing them more often.

The reason why the effect of our choice of content is not easy to test highlights an important flaw in many experimental designs. Think about it: what will you test students on at the end of your experiment? If this content was taught in one condition but not in the other then I can tell you the outcome already. So any fair test of content has to involve a transfer of understanding from one context to another. This is hard to achieve and relies on an element of chance.

So, setting the question of content aside, what are the best teaching methods?

Teacher-led is better

In the words of Jeanne Chall:

“The methods with the highest positive effects on learning are those for which the teacher assumes direction, for example, letting students know what is to be learned and explaining how to learn it, concentrating on tasks, correcting errors, and rewarding of activities – characteristics found in traditional, teacher-centered education… Quite consistently, when results were analysed by socioeconomic status, it was the more traditional education that produced the better academic achievement among children from low-income families.”

There is no great mystery here. If you want a child to learn something then it is more effective to teach it to them than to try to create the conditions through which the child will come to understand that something for themselves. Any teacher who is well versed in formative assessment routines will be aware of just how difficult it is to convey the subtleties of an academic subject while avoiding key misconceptions, even with constant, minute-by-minute attention. The idea that students receiving less teacher input will somehow do better is quite far-fetched.

For instance, what would you predict to be more effective: teaching children how to write or just asking them to do lots and lots of writing? The evidence is clear that explicit writing instruction is superior.

So why is there experimental evidence for alternatives to teacher-led instruction?

The reason why sensible people stray from this fairly obvious position is perhaps related to the way much education research is conducted. If you want to show that your pet approach works then there are plenty of ways to go about this. Firstly, you can try manipulating content. For example, imagine an experiment where one group receives teacher-led instruction about the rate of chemical reactions and the other group conducts experiments. You then give students a test that is all about conducting experiments, the group that learnt through experiments does better and so you conclude that this is more effective than teacher-led instruction.

You could also run your well-resourced and heavily hyped intervention against a poor-quality version of the alternative or perhaps against no alternative at all. There are plenty of experiments where doing something is compared to doing nothing. The Education Endowment Foundation (EEF) seem keen to fund such studies and it is a major reason why I have argued for more ABC designs where two competing interventions are compared against each other and a control.

I suppose the EEF studies do offer us something: If you can’t get your intervention to work under such favourable conditions then it really is a dead duck. The EEF trials of Project-Based Learning and Let’s Think Secondary Science would seem to fit this bill.

This leaves us with a landscape where, as Professor John Hattie is famous for saying, “everything works”. Hattie’s solution is to coral similar studies together using the tool of meta-analysis and then only look for interventions that have an ‘effect size’ that is greater than a certain value (0.40 standard deviations). I am no longer convinced about this solution – it seems arbitrary and takes no account of the quality of the studies that have been fed into the meta-analysis sausage machine.

Well-designed experiments with good controls do tend to consistently show evidence in favour of explicit, teacher-led instruction and so do natural experiments or correlations (see the links here or Rosenshine’s article). The superiority of teacher-led approaches jumps out of the recent two rounds of PISA data. Yes, these are only correlations but they are highly suggestive and suffer far less from potential experimenter bias. They also tell us about what happens in real-world classrooms.

All explicit, all the time?

If you are going to argue that alternatives to explicit instruction are more effective then I will disagree. Similarly, if you want to argue that they are more motivating, I will still disagree. One major component in long-term motivation is the feeling of getting better at something – explicit instruction can deliver this feeling because it is effective.

However, this does not mean – in the words of one critic who dubbed me an ‘extremist’ – that I favour, ‘all explicit, all the time’. All models of explicit instruction include the gradual release of responsibility to the student. Once students have a good grounding in a topic then it is possible for them to do more open-ended and investigative work. For students who have reached a certain level of expertise, this will be more effective than redundantly listening to explanations of concepts that they already understand.

There is also an argument for variety. I don’t think explicit instruction is demotivating but I do think that doing the same thing all the time could definitely be demotivating. We might decide to trade efficiency for variety. A research project may result in less learning overall for the time invested but we might decide that we want to give students that experience. I’m fine with that provided that we do it with our eyes open.

Nevertheless, the evidence is clear. The best way to teach academic content is with explicit instruction.

How Reading Recovery probably works

I have written before about trials of Reading Recovery, particularly the recent I3 study from the U.S. Since then, I have become aware of two papers that I think are key to understanding the way that Reading Recovery works.

To say that it ‘works’ is actually quite controversial. Objectively, it does. Placing students in a Reading Recovery intervention seems to improve their reading more than if you don’t do anything. The question remains as to why this is the case. For instance, is it due to the specialist training that Reading Recovery teachers receive?

It is important to note that Reading Recovery is a one-to-one intervention of up to 60 half-hour sessions. This is hugely resource intensive. It also represents Benjamin Bloom’s ideal of a maximal form of teaching. He reviewed various interventions – specifically conventional teaching, mastery learning and tutoring – and found an effect size of d=2.0 for one-to-one tutoring. So the form of Reading Recovery likely contributes some proportion of its effect.

We could possible gauge this by comparing Reading Recovery directly with another one-to-one reading intervention of the same duration and randomising students between the two treatments. Surprisingly, there seem to be few such direct comparisons. So perhaps we should look at comparing effect sizes from Reading Recovery versus a control with effect sizes from rival one-to-one programs versus a control. This is more fraught because conditions will necessarily vary but it might be indicative.

This is where the second paper comes in. In 2011, Robert Slavin and colleagues reviewed a number of studies on reading interventions. They were quite picky about the studies that they included. When it came to Reading Recovery, they avoided outcome measures that were intrinsic to the method itself in favour of more objective measures:

“First, most Reading Recovery studies use as posttests measures from Clay’s (1985) Diagnostic Observation Survey. Given particular emphasis is a measure called Text Reading Level, in which children are asked to read aloud from leveled readers, while testers (usually other Reading Recovery teachers) record accuracy using a running record. Unfortunately, this and other Diagnostic Observation Survey measures are closely aligned to skills taught in Reading Recovery and are considered inherent to the treatment; empirically, effect sizes on these measures are typically much greater than those on treatment-independent measures.” [my emphasis]

At this point I will remind you of my first principle of educational psychology: students tend to learn the things you teach them and don’t tend to the learn the things you don’t teach them.

Slavin et. al. also ruled-out studies based only upon those students who had successfully completed Reading Recovery. Such studies prove little. I am sure that many teachers would prefer to be judged only on the results of those students who have been successful.

Once they had whittled-down the research in this way, Slavin et. al. were able to note that:

“The outcomes for Reading Recovery were positive, but less so than might have been expected…  

Across all studies of one-to-one tutoring by teachers, there were 20 qualifying studies (including 5 randomized and 3 randomized quasi-experiments). The overall weighted mean effect size was +0.39. Eight of these, with a weighted mean effect size of +0.23, evaluated Reading Recovery. Twelve studies evaluated a variety of other one-to-one approaches, and found a weighted mean effect size of +0.56… 

Across all categories of programs, almost all successful programs have a strong emphasis on phonics. As noted earlier, one-to-one tutoring programs in which teachers were the tutors had a much more positive weighted mean effect size if they had a strong phonetic emphasis (mean ES = +0.62 in 10 studies). One-to-one tutoring programs with less of an emphasis on phonics, specifically Reading Recovery and TEACH, had a weighted mean effect size of +0.23. Within-study comparisons support the same conclusion. Averaging across five within-study comparisons, the mean difference was +0.18 favoring approaches with a phonics emphasis.”

I think it is important that policymakers are aware of these findings.

Embed from Getty Images

How to spend money in education

I will always be in favour of spending more money on education. I believe that a better educated population will not only make us materially richer, it will make us culturally richer, with the one buttressing the other. Yet I will admit that it can be a frustrating lever for politicians to pull. Not only does it take 13 years to educate a child from the start of primary to the end of high school, a lot of the interventions that we could put our money into don’t actually work. Far from being a neoliberal conspiracy, I believe that the current focus on standardised tests is the fault of an education system that has consistently failed to deliver. In this context, politicians have looked for better ways to hold us to account. They are no longer content to trust and wait. And you can understand their point.

With one of the clear dividing lines in the forthcoming Australian election being over education funding, I thought it would be a good time to give my own idiosyncratic overview of the options. These might have broader relevance but I am specifically commenting upon Australia.

Good money after bad

There are lots of sexy and flawed initiatives that you can throw money at. They probably don’t cost all that much in the larger context of the entire school system, but they do divert energy and erode trust. Anything that is based more in a philosophy than empirical results, or that draws on science that is two or three times removed from the actual proposals (e.g. neuroscience shows parts of the brain lighting up therefore we should teach writing in a particular way) should be looked upon with suspicion.

The most obvious initiatives of this kind are exhortations to more inquiry learning in order to somehow improve uptake of Science, Technology, Engineering and Maths (STEM) subjects. This is clearly a latter day appearance of John Dewey’s ideas about experiential learning. There’s no real evidence that this works and efforts should probably focus more on the quality of STEM teachers (more later). I think it represents a misunderstanding of cause and effect in academic motivation and there is some evidence to support my view.

Yet there are more obscure examples. I was intrigued to read about a Queensland initiative known as “productive pedagogies” that stemmed from the Queensland School Reform Longitudinal Study (QSRLS). The simple-minded politician might think that this approach derived from successful practices uncovered by the QSRLS research but it seems like the research team already had a model in mind – based on Fred Newmann’s work on ‘authentic’ pedagogies – which they then went looking for. When you examine these ‘productive pedagogies’, they have a familiar philosophical stance, replete with ‘higher order thinking’ and ‘connection to the world beyond the classroom.’

What is a politician to do when empirical research elides into philosophy so quickly that it’s hard to notice?

Then we have technology. It is of enormous appeal to politicians because it is tangible. I remember when interactive whiteboards were introduced in the UK. They were essentially projectors that cost 10x as much as they should have done. And then we have the roll-outs of one-to-one iPads or whatever. We’ve tried the throw-a-lot-of-tech-at-it approach since at least I was at school and there is no compelling evidence that it improves outcomes in any way.

In terms of expense, the biggest failure of education spending is arguably attempts to reduce class size. John Hattie made much of this in his 2009 book, ‘Visible Learning,’ although there have been those who dispute the findings. I am not sure that reducing class sizes really is pointless if all else stays the same. But there’s the rub. In order to keep everything else the same you need to recruit more teachers of the same quality or greater than the ones you already have. This is an unlikely prospect, especially if you try to do it quickly. Most of the time, you’re probably better off with one large class with an effective teacher than two classes, one of which gets the less effective, inexperienced teacher.

Splash the cash

So where should we place our money? For a start, I don’t think you can do much in a school with severe behaviour problems, no matter what model of teaching you use or how many engaging bits of tech you purchase. An ad hoc investment in school counselors might be helpful but what if it was part of the introduction of an evidence-based approach such as the snappily titled, Schoolwide Positive Behavioural Interventions and Support (SWPBIS)? Much of the SWPBIS program is relatively inexpensive to implement and is centred around a consistent approach across a whole school.

Phonics training for primary teachers would also be a good investment. We know that systematic synthetic phonics is effective so government could fund training, costs of covering absent staff and a nice lunch. This would have some pull to it. Couple this with the right training provider and the push of the potential introduction of a phonics check and we could be on to a winner.

Reducing teacher contact hours in favour of more planning time would move us closer to the kind of practices that have seen success in places like Shanghai. You could even make this relatively cost-neutral if you simultaneously increased class sizes, but that’s going to be a tough sell for the politicians. Reduced contact would have the additional impact of making teachers less frazzled which could help with retention and possibly recruitment as the word gets out. But I admit that this is speculative.

I don’t think teachers need massive pay packets. As Dan Pink suggests, you need to pay us enough to take the issue off the table. It’s probably more of a problem in big, expensive cities and in the U.S. where it seems as if teachers are chronically underpaid. At some point, differential pay for teachers of different subjects might need to be tackled. Most people with the capability of being effective STEM teachers could probably get a job in another field that is either less stressful, pays more or is both of those things. Paying all teachers more to attract and retain a greater number of these STEM candidates does seem a little inefficient although there are strong moral grounds for doing so.

There is also a strong case for investing in school buildings, particularly where they have fallen into decay.

What else?

There are a plethora of potential initiatives that we could spend our money on so are there any broad principles that we can outline to pick the good from the bad?

I don’t think it’s enough to find evidence that an intervention works. As John Hattie has famously pointed out – and this is especially true of often badly controlled education trials – everything works. Even large RCTs can be misleading when they pit an intervention like Reading Recovery against potentially doing nothing at all. Wouldn’t we expect any additional time spend on reading to improve students’ results? What we really need to know are which interventions are the most effective. 

Even then I am mindful of the idea that extraordinary claims need extraordinary evidence. If it is not clear how an intervention might work in theory then I think we need to hold it to a pretty high standard of evidence. 

For instance, ‘cognitive acceleration‘ in middle school science has been shown by its originators to produce the most amazing results that persist over time and transfer across subjects. However, it is based in part on Piaget’s stage theories and these are not widely recognised as a good model by psychologists. This is why the results of the current UK EEF study will be of such interest. Until then, policy makers are probably best to stick to interventions with clearly understood mechanisms of action.

Should we spend money on tech?

60% of state schools in NSW are still using Reading Recovery

I asked New South Wales’ Centre for Education Statistics and Evaluation (CESE) for some data on the proportion of government schools that are still using Reading Recovery. I had seen a figure of around 60% in the press but wondered whether this was up-to-date, especially since a number of reports have recently been released. It looks like it is spot-on:

“As at February 2016, approximately 1000 NSW government schools with primary-aged student enrolments implement Reading Recovery. This represents a percentage of approximately 60% of eligible schools (ie Primary and Central Schools). Please note: This figure does not include Catholic & Independent schools that use Reading Recovery”

Why does this matter?

Reading Recovery is an expensive intervention because it involves one-to-one tuition. It evolved out of a ‘whole language’ approach to teaching reading. We now know that whole language is flawed, with various national reports finding in favour of systematic synthetic phonics and an explicit approach to teaching letter-sound relationships (see here, here and here). The 2006 Rose report from the UK made a specific point of criticising the idea of multi-cuing, a tactic that is employed in Reading Recovery. Rose suggested that it is potentially harmful because it discourages the proper decoding of words.

Yet the evidence seems to support Reading Recovery. A brief search will return randomised controlled trials (RCTs) that find a positive effect in favour of it and RCTs are supposed to be the gold standard of evidence. The problem with many of these trials is that they don’t test Reading Recovery against an alternative plausible intervention such as Multilit. Instead, Reading Recovery tends to be compared with no intervention at all. Small wonder that children who are given an additional one-to-one reading lesson with an adult make more progress than those who are not, whatever the quality of the intervention. And we might expect even more of a boost if the standard classroom instruction is something like whole language which we know to be less effective. The RCTs do indeed show, to a high standard of proof, that a Reading Recovery intervention is better than doing nothing.

Those who have studied Reading Recovery more closely finding it wanting. James Chapman is a professor at Massey University in New Zealand, the country that is the home of Reading Recovery:

“New Zealand research shows that at best, children who make some progress as a result of Reading Recovery tend to lose the gains after a few years. At worst, our longitudinal study at Massey University showed that children who were said to be successful in Reading Recovery were still, on average, one year behind their same age peers 12 months after completing the programme.”

CESE recently completed its own study in New South Wales. The findings are summarised in the CESE newsletter:

“The results showed some evidence that RR has a modest short-term effect on reading skills among the lowest performing students. However, RR does not appear to be an effective intervention for students that begin Year 1 with more proficient literacy skills. In the longer-term, there was no evidence of any positive effects of RR on students’ reading performance in Year 3.”

So it doesn’t work in theory and it doesn’t work particularly well in practice, even in New South Wales. Rational people might conclude that Reading Recovery should be dropped in favour of cheaper, small-group, synthetic phonics instruction of the kind that the various national reports recommend.

But 1000 schools? That’s a pretty big sunk cost. Nobody will want to admit to being wrong on such a large scale and so I predict that Reading Recovery will continue for a little while yet out of sheer stubbornness if nothing else. And giving the glowing results from RCTs in the U.S., we might even see a resurgence there.

New evidence suggests Reading Recovery doesn’t work

The New South Wales Centre for Education Statistics and Evaluation (CESE) has released a new study into the use of the ‘Reading Recovery’ intervention programme with Year 1 students.

You may recall the publication earlier this year of a large-scale, randomised controlled trial of Reading Recovery which seemed to provide strong evidence in its favour. I criticised this study at the time and the CESE report echoes some of these criticisms, as well as concerns about the attrition rate.

The CESE study chose to follow matched students who were either in schools that offered Reading Recovery or schools that did not. Researchers also looked at the reading performance of these students in Year 3. The official CESE newsletter summarises the findings as follows:

“The results showed some evidence that RR has a modest short-term effect on reading skills among the lowest performing students. However, RR does not appear to be an effective intervention for students that begin Year 1 with more proficient literacy skills. In the longer-term, there was no evidence of any positive effects of RR on students’ reading performance in Year 3.”

The importance of this finding should not escape those with a view of the wider context. For instance, Researchers in New Zealand have argued that a reliance on Reading Recovery is responsible for that country’s ‘failed’ national literacy strategy.

It is also worth noting that the theoretical basis for Reading Recovery has attracted much criticism, particularly for the limited use of phonics and the reliance on the kind of multiple cuing strategies that were criticised in the influential Rose report.

So it doesn’t work in theory and it doesn’t work in practice either.

Ability grouping

Those folks who use flowery verbiage to raise questions of ‘ontology’ and ‘epistemology’ do, at least, have a kernel of a point. Education is a complex and messy process.

You only have to consider the large number of educational trials that are confounded in some way. I have written of gold-standard randomised control trials that fail to isolate the relevant factors. And this means that some questions do not have a simple answer.

When I was in primary school, the whole year group took ‘games’ at the same time. During football season we were grouped into two ability groups. The P.E. teachers never made this explicit but it was pretty obvious. I was in the lower ability group. I knew this because both teachers spent their time with the other group. My group just played a game and refereed it ourselves. It was the lads on the other half of the playing field who ended-up on the school football team.

I could be wrong but I suspect that a study of the effectiveness of this approach would have shown a positive effect for some of the boys in the upper group which was outweighed by a negative effect for those in my group. After all, nobody actually taught us how to play football. We were experiential learners.

When educators talk of the need for professional autonomy so that they can get on with the job of teaching without outside interference, I tend to think of this model. After all, my teachers obviously believed that it was fine.

Compare the ability grouping of my football lessons with the sort that’s used in Engelmann’s Direct Instruction. In this case, children are given placement tests to determine their starting level and group. A great deal of planning goes in to structuring the instruction to meet this level – Engelmann has written a famously dense book about how this is done. I don’t know, but I suspect that different group levels are generally assigned to similarly experienced and committed teachers.

The point is that ability grouping is not a single thing with a single effect. And this is why a meta-analysis of the effects of ability grouping is not likely to tell us a great deal.

The new learning styles

Learning styles are a curious phenomenon for those of us interested in the education debate. Learning styles theories suggest that each student has a preferred style of learning; taxonomies vary, but a popular one distinguished between visual, auditory and kinaesthetic learners. It implies that kinaesthetic students benefit from learning through physical activity whereas visual learners learn best by seeing pictures and so on. There is evidence that students will express a preference when given a learning styles survey but there is negligible evidence that being taught accordingly leads to greater learning. Dan Willingham logically argues that this is because it is far more important to match a teaching approach to the content.

Many teachers who are active on social media would now agree that learning styles are something of a myth, whatever their stance on other teaching practices. So, the debate about constructivism might remain open, for instance, but it has largely been resolved on this particular front. It does not mean, of course, that there are no schools or consultants who still promote the idea. It is frightening how these practices still survive.

However, I would now like to ask, ‘What’s next?’ I don’t mean chronologically; I am not looking for the new fad to arrive after learning styles. Instead, I am looking for a practice that has similar features to learning styles theories in that it has been conjured into being by theorists, is widely utilised without much debate, has flawed logic and has virtually no empirical evidence to back it up.

A good candidate is the three-cuing system for reading. I have mentioned this before but I would like to examine the practice in a little more detail with learning styles in mind. It is worth noting that the three-cuing system is also known as ‘multi-cuing’ or ‘searchlights’.

The basic idea is that children need to read ‘real’ books as opposed to carefully sequenced readers that systematically introduce new words. Accordingly, they will need a way to decode words in these books that they have never seen before. One approach would be to use phonics. However, good phonics programs are structured and so a child might not yet have been introduced to all of the letter-sound correspondences needed to read all the words in one of these ‘real’ books.

Proponents of three-cuing suggest that phonics has a small, if any, part to play in this process. When a child gets to an unfamiliar word, she should try and work out what that word might be from the context or perhaps from a related picture. Phonics may be used in a limited way to analyse the first sound in the word and use this to narrow the options.

It seems to be based on notions of how expert readers read. For instance, in the sentence, ‘David was asked to wind the handle,’ context is used to decide how to pronounce ‘wind’ which can be pronounced two different ways with different meanings.

Crucially, however, this decision is made after decoding. A skilled reader has already honed it down to just two narrow options using a phonics approach before applying the context. A skilled reader is not going to substitute ‘wind’ for ‘turn’ just because it also makes sense in this context.

Similarly, imagine that a sentence read, ‘the knight placed his sword back into its scabbard’. A skilled reader who has never before met the word ‘scabbard’ will be able to sound-it-out with ease using knowledge of phonics. The context may then help this reader to learn the new word. In fact, this is one way that reading enables us to gain new knowledge (although I am not suggesting that vocabulary is usually built by single exposures like this). Yet this is the reverse of the process proposed in the three-cuing system.

And so the whole idea of the three-cuing system is quite illogical.

Instead, children should be taught by a systematic synthetic phonics programme that gradually introduces letter-sound correspondences. The requirement for ‘real’ books is essentially an ideological position which should be abandoned where it does not support effective ways of learning reading. Children should be given appropriately sequenced books whilst having classic children’s books read aloud to them. This can be fun and engaging and does not risk the frustration that attends reading failure. In a short space of time, they will learn enough to fully decode ‘real’ books for themselves using phonic.

In his review of three-cuing in the UK (known as searchlights) Jim Rose found little evidence to support this practice and suggested ways in which it might even be harmful in the long term. Children who use these strategies can become reliant upon them, may practice phonics less and may have reading difficulties that go unnoticed. Indeed, this idea has now been removed from the UK framework. Yet it is likely that many teachers are still using it.

The theoretical basis for three-cuing is in Reading Recovery. Reading Recovery also provides the evidence for these strategies and this evidence seems compelling at first glance. Reading Recovery makes use of three-cuing and studies often demonstrate large effects in its favour.

However, with Reading Recovery, it is not clear what is generating the effect. Typically, the intervention is compared with students receiving no intervention at all. So we can’t be sure that it is the three-cuing strategies that generate the effect. It is possible, and I would suggest highly likely, that any effect derives from having up-to-twelve one-to-one half-hour sessions with a adult who takes an interest in the child’s reading.

So three-cuing is a creation of theorists, is widely used, is illogical and lacks evidence. It’s the new learning styles.