Robert Slavin, a huge figure in evidence-based education, has written a blog post claiming that John Hattie is wrong. Hattie pursues the approach of meta-meta-analysis. In other words, he brings different meta-analyses together to compute an overall effect size.
Slavin points out that many of the studies that sit underneath these meta-analyses are weak, poorly designed and often don’t seem to relate very well to the concept that is supposedly being investigated. It’s worth mentioning that Hattie accepts at least some of this criticism. That’s why Hattie insists on an effect size above d=0.40 rather than zero. However, Slavin notes that really well-designed studies rarely generate an effect size this large. Hattie is effectively filtering out the good stuff in order to make conclusions based on what is left.
One specific point Slavin makes is about experimenter-designed tests versus standardised tests. In my view, the former are valid and it is reasonable to make use of them in basic research. However, they will be more sensitive to the concept that is being researched than standardised tests and so will give a larger effect size. You cannot just mush together these very different measures.
I think Slavin makes a powerful case, but we could go further. It is not just the design of a study that can affect the effect size, the age of the students will also matter, with younger students typically generating larger effect sizes due, in my view, to their relatively rapid rate of learning in quite a restricted domain of knowledge. So we should always ask the age of the subjects when an effect size is quoted.
I also think that the concepts that these effect sizes are associated with need to be much more tightly defined. There has been a small industry devoted to answering the question: what does Hattie mean by ‘feedback’? The answer is: Lots of sometimes contradictory things.
It is worth noting that Hattie is not the only one engaged in meta-meta-analysis. It is the process behind the Education Endowment Foundation’s (EEF) Toolkit, a Toolkit licensed to Evidence for Learning (E4L) in Australia. Each strand represents a broad, vague concept and draws on evidence from a range of meta-analyses as well as the much higher quality randomised controlled trials conducted by the EEF itself. In other words, it synthesises apples and oranges.
The strand I have investigated the most is ‘meta-cognition and self-regulation’. What even is that? It’s hard to tell. Yet the territory, as usual, is full of contradictions. While the EEF released a report warning against generic thinking skills programmes, sitting at the heart of the randomised controlled trial evidence for this strand is ‘Philosophy for Children’ (P4C), a generic thinking skills programme.
You may think that the fact that P4C underwent a randomised controlled trial means that the data supporting its use is sound. Not so. The study was well designed and generated no effect on the measures specified prior to the trial, a null result. However, once they had the data, researchers reanalysed it and came up with a small effect size.
If true randomised controlled trials can generate misleading effect sizes like this, then what monsters wait under the bed of the meta-meta-analysis conducted by Hattie and the EEF?