Back in January, Jeffrey Bowers had an article published in what we all can agree is a prestigious journal, Educational Psychology Review (After writing that, I suppose I should mention that a paper I co-wrote based upon my PhD research was published in the same journal):
Bowers’ article claims that the evidence for the effectiveness of systematic phonics in early reading instruction is not as strong as is proposed by phonics advocates, He concludes that, “The ‘reading wars’ that pitted systematic phonics against whole language is best characterized as a draw.” And that’s a strong statement.
I have written about this paper before. Essentially, Bowers spends most of it arguing against the conclusions of various systematic reviews before focusing on what he sees as a lack of evidence from England, a country that has adopted early systematic phonics as policy. The latter argument is neither here nor there, but the former is potentially more interesting. When targeting these systematic reviews, Bowers is able to find fault with all of them, ranging from criticisms of reported effect sizes from specific studies that are too large or too small, through to a criticism of the Ehri et al. review, based upon the 2000 US National Reading Panel report, which he believes to have tested the wrong things. The Ehri et al. study compares systematic phonics programmes to programmes that don’t emphasise systematic phonics, but Bowers thinks it should have compared them to ‘nonsystematic’ phonics programmes, a presumed subset of the actual comparison group. This is all quite esoteric and we will return to why this argument matters later.
Displaying great patience, Dr. Jennifer Buckingham has highlighted various issues with Bower’s analysis in a paper published in The Educational and Developmental Psychologist, an earlier version of which can be read here. Now, we can add to this a further critique published in the same prestigious journal as the original Bowers paper.
This new paper by Fletcher, Savage and Vaughn has its own quirks. The authors are keen to suggest that it is the explicitness of systematic phonics teaching rather than its systematic nature that may account for the positive effect. In other words, an experienced teacher who understands the field does not necessarily need a meticulously planned curriculum as long as they adhere to the underlying principles. This is an interesting point, but I don’t see any great evidence presented for it and my own experience in schools suggests a meticulously planned curriculum is quite helpful.
When it comes to Bowers’ main claims, Fletcher et al. are about as forthright as it is possible to be in the measured tone of an academic paper. Like Buckingham, they follow Bowers’ idiosyncratic road trip through the literature, pointing out where they believe Bowers has overstated his case. Curiously, there is a table at the end of the paper summarising points of disagreement and potential points of agreement. I cannot help wondering whether this was at the suggestion of a reviewer because the authors take direct aim at Bowers’ central claim of a ‘draw’ between systematic phonics and whole language:
“…we think this conclusion is tantamount to acceptance of the null hypothesis and is not helpful to educators or their students. Not only is this statement not supported by the evidence from which Bowers claims to derive his judgments, it unnecessarily arouses controversy in a field that needs to focus on the best practices available… Evidence is consistently positive and replicable for effects of explicit phonics.”
Education research is messy and complex. Tying down the various factors is a little like tying down helium balloons in a strong wind. And we can all argue about methods and approaches, as I will do. However, the fact that so many different groups of researchers have investigated this question seriously and systematically and have found positive evidence for systematic phonics according to their own predetermined metrics, means that the idea of a draw between phonics and whole language, if not wholly and entirely inconceivable, is a deeply and profoundly eccentric position to take.
Which edges me slowly towards my final point.
I do not care for all the discussion of effect sizes that takes place within these reviews, criticisms of reviews and criticisms of criticisms of reviews. Although I accept that effect size has some validity, once you start mushing together effect sizes from studies with very different designs in order to produce an overall effect size, I start to feel uneasy. At least these are all studies of early literacy, unlike some of the strange attempts at meta-meta-analysis we have seen. Nevertheless, we know study methodology can change effect sizes and so I would prefer a systematic narrative review, encompassing all studies that meet a certain selection criteria but without the need to produce an overall metric. If I had the time and the relevant expertise, I could conduct a systematic review along these lines.
When Torgerson et al. examined the existing literature, they spotted a different, although related, problem to mine. They noted that many of the studies included in analyses like Ehri et al.’s where not randomised controlled trials. And so, given their view that only randomised trials should be used*, they did the right thing – they conducted their own systematic review based on randomised controlled trials alone.
When Bowers decided that he did not like the comparison group in Ehri et al., he should have done the same thing. He should have decided upon selection criteria and then conducted a systematic review of his own. That would have been far more powerful than attempting to critique the reviews of others and the reason is to do with researcher degrees of freedom.
The ideal experiment in the social sciences is preregistered. This means that the researcher sets out in advance what they will do, what measures they will make, and what constitutes a positive result. This is good practice due to the messily statistical nature of social science research. Basically, I have a one-in-20 chance of generating what looks like a significant result even though it is not. Therefore, if I use 20 different outcome measures, report one that is significant but do not mention the others, I can manufacture a pseudo-significant result. Preregistration, where I nominate what I will use as my outcome measure, removes these degrees of freedom.
Systematic reviews are meant to act in the same way as an experiment. At the outset, you nominate what you will use as your selection criteria. This way, if studies meet those criteria but are unhelpful to your overall hypothesis, you still have to include them and account for them. It is fine for someone else to criticise these criteria, but attempts to somehow reanalyse the results or retrospectively cast-out studies is flawed.
Imagine, for instance, that Bowers did as I suggest and decided to conduct his own review based upon systematic versus ‘nonsystematic’ phonics. Once he narrowed down his selection criteria, he may find himself excluding some of the studies used by Ehri et al. However, he may also find that he has to include some other studies not included in Ehri et al. that are not helpful to his argument. By instead critiquing Ehri et al., Bowers has the freedom to post-hoc re-evaluate conclusions without any of the constraints designed into the discipline of systematic review.
And that is a fundamental and fatal flaw.
*For those of you who care about these things, my own view is that we do not need to limit ourselves to randomised controlled trials. These are relatively rare and so such an approach means tossing out most of the evidence we have. In my view, the main problem arises in trying to treat different types of study in the same way and develop an overall metric. I would prefer a triangulation approach where perhaps the evidence from nonrandomised trials is presented in a separate section to that from randomised trials in the kind of narrative review I would wish to see.