The case of CASE

I am reposting this old websofsubstance piece because of the new EEF results. The original post is from 2013. Please see this post for my current thinking on statistical tests.

Hey, you there! Are you interested in a proven strategy for improving English Language results for sixteen-year-olds? Yes?

What if I told you that the strategy involved teaching thinking in science lessons when the students are eleven or twelve? Is that what you would have expected? It’s certainly not what I expected.

Welcome to the world of CASE – Cognitive Acceleration through Science Education.

I have read quite a lot recently on domain knowledge and the possibility of generic skills. What can we teach children that will transfer across different domains of knowledge? The easiest summary of the research is this piece by Dan Willingham. In short, such skills do exist but are limited in scope. We would generally do better to focus on the systematic building of subject knowledge.

However, you may not be convinced by this paper. You may see Willingham as being partisan; on the E D Hirsch side of the knowledge-versus-skills debate, given that he is involved with the core knowledge foundation. Yet, consider this quote from a paper by David Perkins, “it’s obvious that knowledge counts for a lot. Without considerable experience, the most gifted individual cannot play chess, repair a car, play the violin, or prove theorems. Indeed, recent research on g argues that it wields its influence on performance by way of knowledge: People with high g tend to perform well because they have a rich knowledge base..” If you read the whole paper you will see that Perkins is no fan of Hirsch. This paper was written in 1989 in the noisy aftermath of the publication of Hirsch’s book, Cultural Literacy and by a long standing advocate of the teaching of thinking.

You would therefore expect the effect of a science intervention with children who are eleven or twelve to be marginal on any performance in English at age 16 as the knowledge domains involved are completely different. You would expect this, but you would be wrong (See this paper by Adey and Shayer – abstract only).

In 1985, 12 experimental classes started to receive a Thinking Science lesson instead of a regular science lesson about once every two weeks. One class dropped out and another school couldn’t implement the lessons correctly so the final cohort consisted of ten classes. The teachers involved received specific training in these lessons and were visited by the researchers at their schools.

The lessons were based upon principles that the researchers derived from the writings of Jean Piaget and Lev Vygotsky. From Piaget, came the concept of developmental levels, hence the name and aim of the project; to accelerate children’s development through these levels. From Vygotsky, came an emphasis on social construction and group work. Lesson materials were designed to introduce cognitive conflict; considered necessary to move students’ conceptions forward. Teachers were trained to orchestrate the group work and to utilise what is effectively Socratic questioning. Examples of lessons include the use of a notched stick to represent a wheelbarrow – the notched stick is then investigated experimentally as a lever system. A graph is plotted and students have to reason about the effect of loads other than the ones that they tried-out. Some of these cannot be determined by extrapolation from the graph because the axes are not large enough and so students are facilitated in introducing a form of proportional reasoning.

Common activities include drawing-up tables and plotting graphs. Key concepts included the science mainstay of controlling variables and the linking of cause and effect. There were other ‘pillars’ to the strategy which I won’t describe but which all had a basis in Piaget / Vygotsky.

The researchers performed a series of pre-tests and post-tests on students in the experimental groups and controls within the same schools. These were based upon determining Piagetian developmental levels. I do not dismiss these measures but, for brevity, I will omit a discussion of them. What is more interesting to the general reader is the fact that the Thinking Science program had an effect on performance in GCSE exams at age 16 i.e. an effect lying outside of the world of Piaget and his levels and one retained over several years from the intervention.

The effects are striking. The students are categorised by gender and whether they started the programme at age 11 (11+) or 12 (12+). The results are below. The effect size is measured in units of standard deviation from the control group (Hattie would argue that above 0.40 is an effect worth having) and N is the number of students in each sample.

Science GCSE

Boys 11+: N=35, not statistically significant

Boys 12+: N=56, effect size =0.96

Girls 11+: N=29, effect size = 0.67

Girls 12+: N=54, not statistically significant

Mathematics GCSE

Boys 11+: N=36, not statistically significant

Boys 12+: N=56, effect size = 0.50

Girls 11+: N=27, effect size = 0.72

Girls 12+: N=57, not statistically significant

English GCSE

Boys 11+: N=36, not statistically significant

Boys 12+: N=56, effect size = 0.32

Girls 11+: N=27, effect size = 0.69

Girls 12+: N=57. effect size = 0.44

So there you have it; a significant effect of the Thinking Science programme on performance in GCSE English several years later.

But, wait a minute; there’s something a bit funny about this. Why do we get huge effect sizes for 12+ boys in science but not 11+ boys? What explains the gender differences in English?

Clearly, the sample sizes are small and so this might be a factor. Adey and Shayer suggest that differences might be due to the developmental level at different ages and between genders. However, this seems amazingly specific. Does something happen to all boys’ brains in unison as they tick past their twelfth birthday?

Peter Preece questions Adey and Shayer’s analysis of the data in ways that I don’t quite follow. Much of the discussion is about a potential bimodality and Shayer retracts this analysis in a later paper.

The small sample sizes encouraged further studies into the 1990s. The evidence from the first experiment convinced the researchers that it was no longer ethical to run controlled trials where some students in a school would participate in the programme whilst others were denied access. Therefore, participating schools were compared against control schools who were not involved. The results showed that schools involved in the project did achieve better Science, Mathematics and English GCSE scores than schools who were not involved and showed similar levels of development on intake (on the experimenter’s Piaget-inspired tests).

Of course, this is a correlational study and it is always possible that other factors influenced both a school’s involvement in the project and its GCSE scores.

Marion Jones and Richard Gott were both involved in this second wave of investigation and wrote a paper (abstract) on their “alternative perspectives.” This was rebutted by Shayer but it is still worth examining some of their argument.

Firstly, Jones and Gott do not consider the case for CASE to be closed; it requires more evidence. Their observations of the schools participating in the Sunderland study is also revealing. Their vignettes of these schools indicate a mixed approach to implementation with varying levels of commitment. Often, those involved in delivering the programme were described as ‘keen’ and so we cannot be sure if this contributed to the effect. In addition, despite the ethical reservations of Adey and Shayer, not all students in the schools were involved. In some instances, CASE participation was based upon perceived ability and there was certainly a view amongst teachers that it was more effective for more able students. Again, Shayer rebuts this from the collated data.

Most significantly, Jones and Gott note that the CASE approach is a package of measures. What, they ask, is causing these effects?

Is it, for instance, the Piagetian and Vygotskyan elements such as cognitive conflict and social construction? They analyse the actual content of the course e.g. controlling variables, proportionality and types of relationship. They note that these are significant items of content in any science course – the contemporary version of the science national curriculum was split into four areas of knowledge; the latter three broadly aligned with biology, chemistry and physics but the first – scientific inquiry – had similar strands to those addressed by CASE. There is potential for confusion here; just because you may not advocate teaching science through scientific inquiry – the Kirschner, Sweller and Clark position – this does not mean that scientific inquiry concepts should not be taught as part of the curriculum.

Jones and Gott therefore suggest research that tries to separate these variables to determine whether it is the content or the teaching style that is the main cause.

There have been other investigations of CASE since these 1990 studies, including a maths-based variant known as CAME. They are generally correlation studies and produce similar results to the CASE studies. Results have also been replicated to some extent in other countries such as the US and Malawi. Interestingly, the Malawi study found similar results but with a much older cohort of students; further compounding the question of development and age.

Whatever the case, CASE provides the strongest evidence for far transfer that I am aware of. There is definitely something going on. However, I can’t help suggesting that extraordinary claims need extraordinary evidence.

Standard

8 thoughts on “The case of CASE

  1. I taught for a year at a school using CASE about 15 years ago. I missed the training but got a bit of briefing and made use of the materials. I don’t know very much about the differences between this and the LTSS programme that the EEF evaluation looks at, though.

    The way I did it, we did have some discussion but on more of a think-pair-share basis than extended collaborative work, and I would have described it pretty much as whole-class interactive teaching.

    Directly relevant to the article Dylan Wiliam linked to recently, the lesson plans were detailed and I pretty much picked them up and delivered them. Not only were they at least as good as most of my own lesson plans but this also left a bit more time to plan my other lessons better too.

    I have distinct memories of lessons on CVS and the difference between mass and volume, and density, in relation to sinking and floating. They were very good lessons.

    I have a suspicion that CASE how I experienced it might have as much in common with Engelmann’s DI lessons (very prescriptive carefully sequenced whole-class interactive teaching) than teaching more obviously associated with constructivist models.

    Those are my own experiences. As far as the EEF evaluation goes, the original KCL evaluations only showed long-term effects and the EEF have not looked at those. Like you, I’m suspicious of the potential bias of the original evaluation (just as I am of the evaluation of Mindset Works by Dweck and her own colleagues) but it’s a shame the EEF haven’t got a follow-up to look at GCSEs. This ought to be done as the children in the dataset are obviously going to produce data on GCSE grades anyway.

  2. Pingback: Education Endowment Foundation to throw money at Philosophy for Children | Filling the pail

  3. Pingback: What are ‘thinking skills’ and can we teach them? | David Didau: The Learning Spy

  4. Pingback: The curious case of explicit writing instruction | Filling the pail

  5. Pingback: Let’s Think… about this for a minute – Filling the pail

  6. Pingback: Do my standards of evidence follow my preferences? – Filling the pail

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.