In 2013, the Education Endowment Foundation in England (EEF) ran a randomised controlled trial of Philosophy for Children (P4C), a programme aimed at improving children’s thinking. A typical activity might involve a discussion about whether it is wrong to hit a teddy bear.
Before the trial, the researchers published a protocol which stated the measures they would examine in order to determine whether the trial had been a success. This is good practice for two reasons. Firstly, we know the trial is taking place. A lot of educational trials are not published if they find no effect and this leads to publication bias. However, preregistration means that future researchers will know about these trials and may still take them into account. Secondly, by specifying the measures that they will use in advance, researchers avoid the problem that is commonly – and ironically, in this case – known as ‘p-hacking’. This is the act of slicing and dicing the data once it’s been collected until you find something that supports your conclusions, then focusing on that measure in your final report.
As might be expected, P4C lessons had no impact on the academic measures that had been preregistered by the researchers. How could it? How could discussions of teddy bears improve maths performance? So you might think that was then end of this rather pointless story.
But no. Once the data was in, the evaluators sliced and diced it anyway and found a new measure which they claimed showed an impact. By chance, the control and experimental groups had different mean performances in reading, writing and maths at entry, with the P4C group having the lower averages. This gap narrowed after the trial, although not by much. It was then suggested that this was due to the P4C intervention and a whole lot of media coverage ensued. Others have suggested that it is an example of a well-known statistical artefact known as ‘regression to the mean‘, although the lead evaluator says they checked for this.
The evaluators did not report the standard test of statistical significance, a p-value, for this measure because they object to such tests. Given that the research involved public money endowed by British taxpayers, I don’t think this was their decision to make. Nevertheless, even a statistically significant result on a measure derived from torturing the data should be looked at with deep skepticism.
So perhaps we have now finally reached the end of the tale? No. The confirmation bias is strong with this one.
For a number of years, Kevan Collins and Jonathan Sharples of the EEF have been promoting the benefits of ‘metacognition and self-regulation’ (e.g here and here). The EEF have produced a toolkit; a set of reports intended to help schools choose effective interventions which it groups into ‘strands’. ‘Metacognition and self-regulation’ is one such strand and I wrote a critique of this strand for an issue of Impact magazine, the trade journal of England’s Chartered College of Teaching, that they declined to print (you can read it here and the subsequent discussions here and here).
P4C is one of the studies listed under this strand. Despite the weak 2013 results, the EEF have gone ahead with a scale-up study, costing £1.2 million which is roughly 1% of all the money endowed to them by UK taxpayers. Presumably, this will also demonstrate no effect of the P4C lessons on academic performance.
Yesterday, the EEF released a guidance report on metacognition and self-regulation which makes interesting reading. The report is sensible enough and draws evidence from studies investigating reading comprehension strategies, the use of spaced practice, teaching children how to plan and edit their writing and teachers modelling their own thinking, a key component of explicit instruction.
My main issue with the report is that I don’t think all of these approaches should be grouped under the same heading because they are too diverse. At the very least, I would argue that evidence for self-directed learning should be placed in a separate category to evidence about the effectiveness of explicit instruction because, although often used in complementary ways, they represent fundamentally different forms of learning. This is not just my view, researchers have identified key conceptual difficulties in reconciling the research evidence in these different areas.
However, I must commend the authors on stressing the importance of relevant knowledge in the use of metacognitive strategies. For instance, it was good to see this caution about writing an essay on a Shakespeare play:
“We cannot adequately deploy metacognitive strategies for monitoring and evaluating our essay-writing if we do not first understand the components of a successful essay and have a knowledge of Shakespeare’s world.”
The authors expand on this point:
“There is little evidence of the benefit of teaching metacognitive approaches in ‘learning to learn’ or ‘thinking skills’ sessions. Pupils find it hard to transfer these generic tips to specific tasks.”
In other words, you can’t teach improved thinking in some kind of general way, as Dan Willingham has been arguing for some time.
What is notable about the report is that it does not mention or reference P4C or the cognitive acceleration programmes that have formed such a central plank of the EEF’s own research into metacognition and self-regulation. Why?
Well, for one thing, P4C looks very much like a discrete thinking skills programme. Cognitive acceleration, as tested by the EEF, is a science intervention, but it is also intended to produce a general improvement in thinking ability that transfers to other subjects. So both would seem to fall foul of the advice in this new report.
What should we make of this? One the one hand, the EEF commissions £1.2 million worth of research into P4C on the basis of a questionable pilot trial result and the notion that it is an example of a metacognition and self-regulation strategy that is worth pursuing. On the other hand they publish a report into metacognition and self-regulation that does not even mention P4C and, if anything, would seem to caution against such an approach.