I am growing increasingly concerned about the way that some randomised controlled trials (RCTs) are being used to give legitimacy to otherwise weak ideas.
I am in favour of conducting RCTs because they have the greatest potential to tease out whether one thing causes another. But this doesn’t mean that we must dismiss all other kinds of evidence. Sometimes RCTs are difficult to conduct or are unethical. We should not forget that we can potentially draw inferences from correlations provided we are suitably cautious. Basic cognitive science can also inform our theory of how different approaches might work.
Furthermore, some RCTs can be pretty badly designed or analysed. The new RCT factories such as the Education Endowment Foundation in England and I3 in the U.S. tend to find positive effects that don’t always stand up to closer scrutiny.
This became apparent in an article about inquiry-based science that Steven Cooke (@SteveTeachPhys) drew my attention to on Twitter. It was written by Wynne Harlen for the October edition of the U.K’s Association for Science Education’s ‘Science Teacher Education‘ magazine. This is a, “publication for all concerned with the pre-service education, induction and professional development of science teachers.” Bear that it mind as you read what follows.
The article is fascinating on a number of levels, not least for perpetuating the ‘constructivist teaching fallacy‘ i.e. that constructivist learning theory implies a specific set of teaching practices. For instance, the way that ‘understanding’ is defined excludes the possibility of a teacher explaining a concept to a child so that the child understand it:
“Current views of learning lead us to conclude that understanding is created by learners themselves through their mental (and physical) activity. It is not something that can be received ready-made from others; it involves generation rather than acquisition of knowledge.”
We are told that the only alternative to constructivism is ‘behaviourism’ which is characterised by rewards, punishments and rote learning. Harlen favours socio-cultural constructivism where students work in groups (which I can’t reconcile with the notion that understanding cannot be received from others). This naturally implies inquiry-based science teaching:
“To identify what this means in practice, consider what pupils will be doing when learning in this way. Their activities will include: working in groups; exploring and manipulating physical materials; building on their prior experiences and ideas; raising questions; communicating their ideas; listening to the ideas of others; reasoning; and arguing from evidence.”
This is where the RCT makes an appearance.
It’s actually pretty easy to design an RCT that will show a positive effect for inquiry learning. Here’s my recipe:
- Randomly assign your subjects to one of two groups
- Give the experimental group a set of activities to complete involving marbles rolling on ramps
- Give the control group standard explicit instruction on Newton’s laws of motion
- Conduct a post-test where students are assessed on their ability to answer questions about experiments with marbles on ramps
You see a lot of this kind of thing in the literature. These studies succeed due to my first principle of educational psychology: students tend to learn the things you teach them and don’t tend to the learn the things you don’t teach them.
To their credit, I3 did not do this. They instead used a standardised assessment known as ‘PASS’. PASS has three elements:
- Selected Response or Multiple Choice Items (MC): Items assess students’ understanding of important scientific facts, concepts, principles, laws, and theories.
- Constructed Response Investigations and Open‐Ended Questions (OE): Students analyze a problem, think critically, conduct a secondary analysis, and apply learning. They construct explanations using evidence.
- Hands‐on Performance Tasks (PT): Investigations identifying a problem to solve. Students use equipment to perform investigations; make observations; generate, organize, and analyze data; communicate understandings; and apply learning.
Note that it is mainly the first element – MC – that tests whether students know and understand science.
I3 randomised schools into one of two conditions. The experimental group received a package known as ‘LASER’ which is an inquiry-based science programme consisting of curriculum materials and support. The control group did not get LASER. I would have expected some sort of Hawthorne effect where those schools who knew they were part of an intensive intervention would perform better on the PASS test. But they did not. There was no statistically significant difference between the two groups of schools. So you might think that this was the end of it.
I3 decided to slice and dice their data. This is a risky practice. Let’s set aside the debate about p-values for a moment and assume that we are happy to use them as our test of whether something is significant. A p-value of p=0.05 means that if there really is no effect, for every twenty analyses we do we could expect to get one false positive result. So if we slice and dice our data 20 ways then we might expect to find something that passes the test of statistical significance.
This doesn’t necessarily prohibit the slicing and dicing of data but it does mean that you have to apply a much stricter test that takes account of the number of ways that you have chopped it up.
When I3 sliced the data they found three outcomes that were statistically significant using an ordinary test of significance. These were English Language Learners who performed better than the control on the OE and PT tasks and students with a disability who outperformed the control on the PT task only. However, once they applied the more stringent tests that take into account the slicing and dicing, the significance of these results disappeared.
Nevertheless, I3 claim this as an important finding. They cite the What Works Clearinghouse guidelines to support an argument that the more stringent kinds of analyses are not required for an ‘exploratory’ study. Which seems like a sleight of hand to me. At the very least, when quoting this data people need to make it clear the exploratory – and thus provisional – nature of the findings.
With a little searching, I found a separate executive summary document that goes much further than the final report, chopping the data up into separate states and making strong claims about performance in non-science subjects such as maths and reading. This seems to be the document that Harlen quotes from:
“In 2010 the U.S. Department of Education awarded the SSEC a five-year Investing in Innovation (i3) validation grant to evaluate the LASER model’s efficacy in systemically transforming science education. “LASER i3” refers to the resulting longitudinal study of the LASER model, which unequivocally demonstrates that inquiry-based science improves student achievement not only in science but also in reading and math. LASER plays a critical role in bolstering student learning, especially among underserved populations including children who are economically disadvantaged, require special education, or are English language learners.” [my emphasis]
I haven’t analysed the separate state level data but given what we have seen from the overall data, we need to treat it with great caution. I don’t see how the claim that this RCT ‘unequivocally demonstrates’ anything can be justified by looking at the overall data. We certainly should not be using it to support possibly fallacious claims about constructivist teaching practices.
I now think I understand where some of the problems in teacher education originate.