I am going to ask you an important question: Do you believe in psychokinesis? If you are unfamiliar with the concept, psychokinesis means the ability for a person to alter physical systems with his or her mind. You know the sort of thing from the movies – moving a pen across a desk by thinking real hard.
Believe it or not, some people investigate concepts like psychokinesis by doing experiments and these experiments then get reported in serious journals. One approach is to see if people are able to influence the operation of a quantum-based random number generator with their minds. Because the effect that would be needed to skew the output of such a number generator in this way is very small, the alleged phenomenon is known as ‘micro-psychokinesis’.
A paper by Maier et al. from March this year attempted to establish, once and for all, if micro-psychokinesis exists. As might perhaps be expected, they found no effect at all on the main measure that they were analysing. However, by reviewing the data once they had it, they claimed the possibility of a non-random oscillation effect that might be evidence of micro-psychokinesis.
Now, Hartmut Grote, a physics professor at Cardiff University, has reanalysed the data and found that if there was no effect of micro-psychokinesis, the probability of obtaining the data that Maier et al. obtained (or data more extreme) is about 33%. In other words, in every three experiments of this kind that we performed, we could expect one to give us a result like this. I can’t follow the maths. It’s all a bit advanced for me and I don’t really understand the argument about oscillations. However, I am quite prepared, on this basis, to accept the micro-psychokinesis does not exist.
Why? Well it seems implausible to me and the data from this experiment, if Grote is correct, is entirely consistent with a world without micro-psychokinesis. I don’t have to believe in extraordinary things to account for it.
The probability that Grote calculated is known as a ‘p-value’ and these are highly controversial in some corners of psychological research. Critics claim that people misinterpret them. They suggest that, instead of viewing that 33% figure as the probability of obtaining data this extreme, or more extreme, if there really is no effect, people interpret it as the probability that there is no effect. These are not the same thing at all and this distinction becomes very important if there is a high probability of an effect.
Interestingly, another argument against p-values is that of ‘p-hacking’. This is the process of reanalysing data until you find something, anything that gives you a suitably small p-value (usually 5% in psychology). This is flawed because in any 20 such analyses we do, we would be likely to find one p-value of less that 5% by chance. So reanalysing in this way is frowned upon. Ideally, researchers should specify at the outset what measure they are going to use to decide whether there has been an effect.
In the Maier et al. case, however, it is they who reanalysed the data, even though they never reported any p-values, and it is Grote’s p-value that casts their conclusions in doubt. Ironic, perhaps. But let’s think about why Grote’s figure is important. I am inclined to think that micro-psychokinesis is unlikely. It is with this in mind that I look at the p-value and draw the conclusion that it is consistent with a world without micro-psychokinesis. If, instead, micro-psychokinesis was a well established fact, the probability of obtaining such data in a world where it did not exist would be irrelevant.
What does this have to do with education research?
Let’s examine the UK Education Endowment Foundation’s trial of “Philosophy for Children”. It is claimed that a series of lessons on such topics as whether it is okay to hit a teddy bear led to improvements in maths and reading (but not writing, for some reason). Have a think about that for a moment. How plausible does that seem? It strikes me as slightly more plausible than micro-psychokinesis, but not much, so a p-value would be really handy here. If the p-value was low, then we might have to reconsider our views on this.
However, in a parallel with the Maier et al. trial, the researchers on the Philosophy for Children trial did not compute p-values. And in another parallel, they found no effect at all of Philosophy for Children on the measures specified at the outset of the trial. Instead, the supposed effect was unearthed by a post-hoc analysis after the data was in.
More broadly, p-values seem uniquely suited to analysing the results of EEF trials because the EEF have a habit of testing implausible things. In their “Word and World Reading” trial, for instance, they tested the hypothesis that gaining knowledge of Subject A would improve reading comprehension on texts about Subject B. I am not aware of anyone who subscribes to such an hypothesis. Unsurprisingly, the researchers found no effect but, if they had, I would want to see a p-value.
Most recently of all, EEF researchers found that an intervention targeted at improving student behaviour did not generally improve their behaviour. This is a reasonable thing to investigate. However, they also found that it did not improve students’ reading ability. Who, exactly, was under the impression that it would? I would suggest that if you want to improve students’ ability to read then the most viable option would be to teach them to read. No?