This is the homepage of Greg Ashman, a teacher, blogger and PhD candidate living and working in Australia. Everything that I write reflects my own personal opinion and does not necessarily represent the views of my employer or any other organisation.

I have a book out for new teachers (which some experienced teachers have also enjoyed):

The Truth about Teaching: An evidence informed guide for new teachers

Watch my researchED talks here and here

I have written for The Australian about inquiry learning (paywalled):

Inquiry-learning fashion has us running in wheel

This is my take on the “Gonski 2.0” review of Australian education for Quillette:

The Tragedy of Australian Education

Here is a piece I wrote for The Age, a Melbourne newspaper:

Fads aside, the traditional VCE subjects remain the most valuable

Read a couple of articles I have written for The Spectator here:

A teacher tweets

School makes you smarter

Read my articles for the Conversation here:

Ignore the fads

Why students make silly mistakes

My most popular blog post is about Cognitive Load Theory:

Four ways cognitive load theory has changed my teaching

To commission an article, click here

Structured Word Inquiry fails a key test

Following yesterday’s post on Structured Word Inquiry (SWI), Kathy Rastle highlighted in the comments that The Nuffield Foundation have conducted a randomised controlled trial comparing SWI with something called ‘motivated reading’ as interventions for struggling readers.

An intervention for struggling readers is not the same as initial reading instruction but interventions are often the subject of research because they tend to be more contained and manipulable. In this case, there appears to be no published paper on the study, but the research team have released a set of slides destining the study and its findings.

Critically, this was a developer-led study. In other words, SWI developers were involved at all stages, unlike a program out in the wild.

Interestingly, the SWI researchers refer to it as the ‘MORPH’ project, illustrate the slides with images of a cartoon character called ‘Morph’ and refer to SWI as a form of ‘morphological instruction’. I have been called-out by SWI advocates on Twitter for suggesting that SWI focuses on teaching morphology from the start and I am now beginning to wonder why.

That phrase ‘from the start’ is important. The slides contain examples of what are described as ‘word matrices’ which show how a root word such a ‘sign’ can be altered by various prefixes and suffixes. I have seen Mandy Nayton of DSF demonstrate something very similar, but that was in the context of children in late primary school who had already mastered the earlier stages of a systematic phonics program. Morphology of this kind, coupled with etymology can be particularly helpful for English spelling, given our use of schwas (unstressed vowel sounds that we tend to pronounce as a short ‘uh’, making it hard to determine which vowels to use to represent them). I cannot imagine having a discussion of word matrices with early or struggling readers but then I’m not a reading teacher.

The control condition, motivated reading, is interesting. Children selected a book that was read aloud to them. They then reread it in a group while applying reading comprehension strategies. In addition, one lesson per week focused on vocabulary instruction. It does not appear that any phonics was involved.

Both the motivated reading condition and the SWI condition were delivered by the same set of teaching assistants. The SWI developers decided to script these sessions and they also provided four days of training and fortnightly school visits by the developers. The interventions consisted of three 20 minutes sessions per week for 24 weeks.

The results showed that the effects of motivated reading and SWI were basically the same, both on broad measures such as reading comprehension and on measures you might expect to favour SWI such as a ‘morphological spelling task’. The researchers conclude that SWI is no more effective an intervention than motivated reading.

The researchers suggest possible reasons for the failure of SWI and these include that it was too high level and that the teaching assistants lack of knowledge and confidence in SWI may have reduced its effectiveness (even though fidelity scores for both conditions were not significantly different). This is supported by interviews with the teaching assistants who felt that SWI was more challenging for children to learn.

I don’t find this particularly surprising.

No doubt, these findings will not convince the committed. They may find fault with the design of the study. But it is worth pointing out that these are ideal conditions, designed by developers with an interest in SWI who maintained a commitment throughout the duration of the study. If it doesn’t work as an intervention here then where will it work? What hope is there for rolling SWI out at scale as an initial form of reading instruction?

I would make two more observations. Firstly, motivated reading sounds a lot like whole language to me. Can we conclude, therefore, that SWI is no better than whole language?

Secondly, motivated reading did not appear to contain any phonics. Yet even those who are sceptical of the evidence for phonics will admit that systematic phonics programs are superior to no phonics at all. If so, we may expect a systematic phonics program to beat motivated reading and, from the results of this study, we may therefore expect it to beat SWI.

If you think these observations are a bit of a stretch then fine. The best way forward would probably be to run SWI against systematic phonics in a randomised controlled trial. In fact, I’m unsure why they didn’t do that and why they instead chose to run it against a phonics-free condition we would all expect to be inferior. Go figure.

The sophisticated world of Structured Word Inquiry

Since writing my post about structured word inquiry (SWI), I have encountered a Twitter community of SWI advocates. This has led to two insights. Firstly, I now think I understand why it is so important to this group that systematic phonics should be shown to be no more effective than whole language as a method of early reading instruction. I also think this community exposes a fundamental flaw in the whole SWI project, if that project is about displacing other forms of initial reading instruction with SWI.

Those in the SWI community often point out that they are linguists and what I had not been aware of was the level of contempt in which they hold systematic phonics. To them, phonics is a simplistic mapping of letters and sounds that takes no account of etymology, morphology and so on.

For instance, they may highlight that ‘make’ and ‘making’ have the same initial vowel sound but phonics insists that this is represented as a different grapheme in each of these words, despite the obvious common origin. I have some sympathy for this argument. I guess a systematic phonics advocate would also have some sympathy but would argue that this kind of observation should come later in teaching rather than being something to tackle with five-year-olds. I’m not sure.

As an illustration of the antipathy, one SWI proponent, Gina Cooke, went so far as to describe a list of graphemes from a systematic phonics programme as a ‘lie’, which I understand to be an intentionally false statement:

Cooke also tends to describe advocates of systematic phonics as ‘phombies‘, a portmanteau of ‘phonics’ and ‘zombies’ which strikes me as somewhat pejorative.

I suppose that if you hold a teaching method in such contempt, it is hard to concede any merit in it, even relative to another method you hold in contempt. Perhaps this drives the need to find whole language and systematic phonics equally (in)effective.

Cooke has also taken issue with my writing. I apparently mischaracterised SWI in my previous post. I said that the method focused on teaching morphology from the start. This is apparently a common mistake made by the ignorant and misinformed. Although SWI does teach morphology from the start, it does so in interrelation with lots of other things:

I didn’t think I had claimed that SWI focused only on morphology. In fact, in my original post, I described at some length how Peter Bowers claims SWI also teaches GPCs ‘from the start’. However, I am happy to be corrected by someone who knows more about SWI than I do.

But here’s the thing. There is an awful lot of nuance and fine distinction going on here, in much the same way that many of the SWI community constantly make nuanced and fine distinctions about the nature of the English language itself. For instance, another advocate has the claim that ‘*tion is not a suffix’ listed in her pinned tweet:

I could not possibly comment on the validity of such a claim. I am not a linguist and so I am happy to defer. I would only point out that, in this case, all the dictionaries appear to be wrong.

And here is the rub. For SWI to become a primary means of initial reading instruction, all primary school teachers in the English-speaking world would need to deliver it. Moreover, as far as I can tell, there is no scripted version to follow, or even a highly structured one. And so they would have to know all the stuff that these expert linguists know in order to make all of the correct decisions in real-time while planning and teaching. For instance, they would need the knowledge and confidence to ignore incorrect claims made in dictionaries (if they are indeed incorrect and I am not missing an even more nuanced point that reconciles the linguists and the dictionaries).

My practical experience of working with teachers to deliver improvements in schools – including with the far more simplistic systematic phonics – suggests to me that this seems like a tall order. Primary school teachers will certainly not leave initial teacher education with anything approaching the required knowledge to teach SWI and would perhaps need something like a masters degree in linguistics. Maybe advocates of SWI need to focus on how they can turn it into an instructional strategy that can work at scale. Otherwise, it looks a lot like a game of one-upmanship.

Is Jeffrey Bowers right that there is no evidence to support phonics teaching?

I think it is uncontroversial to claim the general consensus among reading researchers is that phonics teaching is a critical component of early reading instruction. I have been reading a paper by Jeffrey Bowers and I think he would agree this is the consensus while disputing the evidence in support of it.

Bowers’ argument is detailed and, at times, arcane. I am not a reading researcher. Nevertheless, I will need to dig into some of the weeds to try to both understand and evaluate Bowers’ argument. This overlong blogpost will therefore not be for everyone.

Where are we coming from?

I am a researcher working in the field of cognitive load theory. I am also a teacher, a parent and a blogger with a lot of experience of ideological resistance to phonics teaching and some experience of how reading is taught in the wild. All of these incline me towards the systematic teaching of phonics. I am aware that Bowers’ paper will be used by phonics sceptics to bolster their argument and that predisposes me to find fault in it. Bear that in mind.

Bowers is a professor of psychology from the university of Bristol in the UK. He has written about neuroscience and its potential to be applied to education. He also has an interest in a method of reading instruction known as ‘structured word inquiry‘ which is promoted by his brother, Pete Bowers. This is not a secret and Jeffrey Bowers has been quite upfront about this link. I think it is fair to claim that Bowers’ contention is the evidence that phonics is superior to other forms of reading instruction disappears under close examination and we should therefore look to see if other methods, such as structured word inquiry, are a better bet. I am happy to be corrected on that.

What is phonics?

To determine whether phonics teaching is effective, it is necessary to understand what we mean by this. Unfortunately, this sends us into one of those rather tedious discussions of definitions that I generally try to avoid.

In the Bowers paper, he identifies three broad approaches: no phonics, unsystematic phonics and systematic phonics. A critical point of the paper is that, in his view, there is no evidence for the superiority of systematic phonics over unsystematic phonicsĀ and default classroom teaching as well as approaches such as ‘whole language’ can be characterised as involving unsystematic phonics.

Based upon my experience and a reading of the paper, I think we need to be more explicit than this. To my mind, phonics is a body of knowledge about the relationships between letters or groups of letters (graphemes) and the sounds they represent (phonemes). In the literature these are known as grapheme-phoneme-correspondences or GPCs. For instance, the grapheme ‘ch’ can be used to represent a number of different sounds such as in the words, ‘cheese,’ ‘chivalry’ and ‘chiropractor’.

Phonics teaching therefore becomes the teaching of these GPCs. Systematic phonics teaching is therefore a planned and sequenced approach to teaching these GPCs that follows some kind of logical development.

Some like to point out that, due to the way that many different languages have influenced English, there are a wide range of GPCs which are often overlapping or redundant. In other words, English has a ‘complex orthography’. However, far from pointing to the futility of teaching GPCs, this leads to two conclusions. Firstly, if a reader can narrow down the sound in a word to one of three then he or she can try out all three and see which fits a word known from oral vocabulary. So that’s still helpful. Secondly, a sophisticated approach to phonics may also teach the general rules that govern when and where different graphemes correspond to particular phonemes by, for instance, considering the source language or morphology. Morphology is the study of morphemes, the smallest units of meaning within a word such as ‘ing’ or ‘ed’, and it can be particularly useful when selecting which graphemes to use when spelling words.

Structured word inquiry apparently focuses on teaching morphology from the start of reading instruction.

A key issue in this discussion is the nature of unsystematic phonics instruction. For example, imagine a teacher who asked students first to predict a word from context or maybe guess it from a picture cue. As a last resort, the teacher might ask the student to consider the first letter of the word and what sound this may represent. Is this unsystematic phonics? Teaching GPCs this way would obviously be a very long process because, as a method of last resort, the rate at which students will encounter GPCs will be low. It is also likely that many GPCs will be omitted, either because the are part of sight words or because they don’t tend to occur at the start of words (e.g. ‘ck’) or the position of a particular grapheme in a word affects the related phoneme. In this instance, can we claim that phonics is being taught unsystematically? I would suggest a better description would be that a partial coverage of phonics is being taught unsystematically.

[Incidentally, the fact that the position of GPCs in a word effects the sounds they represent is a reason why the whole language trope that you could spell ‘fish’ as ‘ghoti’ is false. The ‘gh’ grapheme would never represent the ‘f’ sound in ‘fish’ when placed at the start of a word.]Ā 

From the start

For Bowers, part of the issue with the scientific consensus on phonics is its focus on systematically teaching phonics (which I interpret to mean teaching GPCs) ‘from the start’:

“There is a widespread consensus in the research community that early reading instruction in English should emphasize systematic phonics. That is, initial reading instruction should explicitly and systematically teach letter (grapheme) to sound (phoneme) correspondences. This contrasts with the main alternative method called whole language in which children are encouraged to focus on the meanings of words embedded in meaningful text, and where letter-sound correspondences are only taught incidentally when needed (Moats 2000).”

Despite the fact that Bowers can supply ‘countless quotes’ agreeing with this consensus position, he finds no evidence to support it. This is an odd position which I am going to critique with what initially may appear to be an unfair argument based upon structured word inquiry, but that I hope will illustrate a key issue.

Peter Bowers is keen to stress that structured word inquiry also teaches GPCs ‘from the start’ and he has produced a video to demonstrate this. Why is structured word inquiry, favoured by both Bowers brothers, teaching GPCs from the start given Bowers’ claim about the evidence in support of this? Well, as ever, the debate seems to depend on putting an awful lot of weight on the difference between ‘systematic’ and ‘unsystematic’.

Indeed, the Peter Bowers video does hint at an unstructured way of teaching GPCs, but I am dubious that this can be described as ‘from the start’.

The example he gives is the sentence, “Mom says she wants a cat.” Apparently, the child can read all the words apart from ‘cat’. How can this be, if this is teaching reading ‘from the start’? Either the child has already been taught all of the relevant GPCs or they have memorised sight words (my eldest daughter was given a set of ‘golden words’ to work on memorising each night) or they are perhaps using a predictable book where every page is a variant on, “Mom says she wants a…”. If the child has learnt sight words then I would challenge whether GPCs are being taught ‘from the start’. If the child is responding to a predictable book then I would challenge whether they can actually ‘read’ the words ‘Mom says she wants a’. What exactly is in dispute here? That there is no reason to teach GPCs in a logical order? What is the reason notĀ to?

The logic of systematically teaching GPCs – planning which ones to teach and in which order – is that you can start with the ones that give you the biggest bang for your buck and get the child reading meaningfully as quickly as possible. Which leads to the next point.

Reading for meaning

Bowers perpetuates the idea that alternatives to systematic phonics are ‘meaning-based’ as if phonics somehow is not and instead children are learning how to read aloud statements of the first and second laws of thermodynamics without any idea of what they mean.

Many phonics advocates subscribe to the ‘simple view of reading‘. Again, we could debate definitions, but briefly stated, this contends that reading ability depends on two factors, oral comprehension and decoding. Oral comprehension is our ability to comprehend spoken text. Decoding is the ability to turn squiggles on the page into something equivalent to spoken text (I have been slightly ambiguous here because there are debates about exactly how this happens). If you can decode the words on a page, you could read them out aloud accurately, if required, although you may not necessarily know what they mean. Overall reading comprehension is therefore the product of these two abilities. If you have zero decoding ability or zero oral comprehension, you are not going anywhere (more on this later).

Early phonics teaching does not have to focus on teaching the meaning of words because most of the words used will be deliberately chosen to be within the child’s oral comprehension ability. Instead, children need to gain decoding ability in order to unlock these words. Few early readers will need to have the meaning of, “Pip snapped a stick,” explained to them. Maybe some will have very low oral comprehension or might not have come across a word like ‘snapped’. Maybe some won’t realise that the unfamiliar word ‘Pip’ is a name. But, by and large, instruction in meaning is unnecessary. This does not suggest that meaning is unimportant. It is a critical consideration – zero oral comprehension means zero reading comprehension.

On a personal level, I would add that, in my experience, the ‘aha’ moment that goes with unlocking meaning from text in this way is highly motivating for young children.


Bowers surveys the meta-analyses of reading research relied upon by various government panels and researchers to support their claim that systematic phonics is effective. After reanalysing them, he takes the view that although there is evidence that phonics is better than no phonics, there is no evidence that systematic phonics is superior to unsystematic phonics. He also claims that the effects of phonics interventions wash-out over time.

I have some sympathy for Bowers’ complaints about the various meta-analyses. Both the meta-analyses, and Bowers’ critique, rely on comparing effect sizes for different experimental versus control groups, but I am not convinced that an effect size is as stable a metric as many assume. Yes, at least in this case they all relate to the same thing – reading instruction – and not a whole menagerie of different outcomes as in, for example, the Education Endowment Foundation’s meta-meta-analysis of ‘metacognition and self-regulated learning‘, but they still apply to interventions of different durations, sometimes with whole cohorts and sometimes as interventions with specific cohorts, sometimes as initial instruction and sometimes as later catch-up interventions, sometime with non-native speakers and so on.

As the actual studies are embedded in individual papers which are then nested in the Bowers paper, I cannot easily see what the ‘unsystematic phonics’ control conditions are. This is critical to the argument because I doubt they resemble whole language instruction, as Bowers claims. I am sure whole language can be taught with great attention to GPCs and perhaps this is even the intention, but the rhetoric about ‘barking at print’ and so on tends to point in the other direction.

For instance, Brian Cambourne is a notable Australian proponent of whole language instruction and he advocates teaching GPCs only through writing. Is this what we mean by unsystematic phonics? It certainly seems far less focused on GPCs than, say, the structured word inquiry approach. Moreover, Cambourne incorrectly claims that phonics advocates believe ‘reading is decoding’ – phonics advocatesĀ  do not believe this if they adopt the simple view of reading – and describes this as ‘read-i-cide’. It is clear what message teachers are supposed to take from this.

As I have not accessed all of the original papers behind these meta-analyses, I cannot accurately assess Bowers’ claims about them. He may be right. However, as I read his paper, there does seem to be an element of convenience about what he chooses to exclude as he reanalyses the data. Sometimes a large effect size is excluded. At other times, there is a lengthy description of the relative outcomes for different groups of students such as non-native English speakers and so on.

What we have to bear in mind is that this is all post-hoc. Bowers already knows all of the findings and is reanalysing them with a specific hypothesis in mind. That is something very different to starting out from scratch and conducting a study or a new meta-analysis. This is why we have seen a wider trend towards preregistered trials with pre-defined outcome measures.

For those of us who cannot analyse the data ourselves, we are left with weighing the conclusions of all of the teams of researchers who conducted these original analyses prospectively against the post-hoc reanalysis of Bowers. Even if we accept his reinterpretation as correct, we are accepting that phonics is superior to no phonics and are left with this claim that unsystematic phonics – whatever that is – is as good as systematic phonics. It is as if research has discovered that kids with desks in their rooms do better on exams than those without, but there is no statistically significant difference between those who have messy or tidy desks.

Bowers also makes claims that the effects of phonics wash out over time. This is hardly surprising and is a result of basic logic. Presumably, post intervention, most kids will go back to getting the default reading instruction diet. Clearly, the effect of X versus Y is always going to be greater than the effect of XYYYYY versus YYYYYY or XZZZZZ versus YZZZZZ. Think of The Princess and The Pea.

Public Policy in England

Finally, Bowers examines public policy in England. Phonics was mandated in 2007 following the 2006 Rose report and a phonics check was introduced in 2012 to ensure all schools were following this mandate. Surely, Bowers suggests, we should see a signature of that in subsequent standardised reading tests?

No, I am not convinced that we should. Standardised reading tests assess reading comprehension. This is the product of oral comprehension and decoding. Phonics only acts on decoding. If we do not also improve students’ oral comprehension – their knowledge ‘of words and the world‘, as E. D. Hirsch puts it – we should not necessarily see an improvement in reading comprehension (which is why advocate for a knowledge-rich curriculum).

Bowers makes a number of similar arguments and so I shall focus on just two because I think they illustrate the no-win position that he places phonics in.

The Key Stage Two standardised assessments changed in 2016 in order to make them more rigorous. The first cohort of students who completed the phonics check in 2012 would have sat these assessments in 2017. Bowers points to similar scores in the 2016 and 2017 and 2018 assessments and suggests this is evidence that phonics had no effect.

However, phonics has been mandated in England since 2007. Yes, 2012 added another accountability layer but it was well heralded and so a gradual move towards more phonics teaching seems more likely than a sudden step-change in 2012. Anecdotally, I understand that some schools reacted to the check by asking students to memorise nonsense words and that some schools are still, in 2020, using the three-cuing system criticised by Sir Jim Rose in 2006. So I find it hard to assume we went from zero to systematic phonics in 2012. I am also unsure as to how the UK government standardise the new Key Stage Two assessment and whether it would accurately pick up improvements in reading.

A related question then arises about the effect of mandation in 2007. Following Bowers’ previous logic, this should have affected performance on the old version of the Key Stage Two assessments. The first cohort sitting these assessment would have been in 2012 and the scores do indeed increase. However, this reversal of the previous finding is also dismissed by Bowers because maths and science scores rose at the same time and so this could be grade inflation.

This is, of course, all plausible, particularly to someone who is not expecting to see massive effects on standardised assessments, but it does again seem like we are starting with the conclusion and working back from there. Flat results provide evidence for the hypothesis but rising results also provide evidence for the hypothesis.

Where to from here?

I would like the the Education Endowment Foundation (EEF) to set-up a three-armed trial of a high quality systematic phonics programme, such as Sounds-Write, versus the best available unsystematic (or even ‘analytic’) phonics programme and a business-as-usual control. Structured word inquiry may be able to fulfil the role of the unsystematic phonics programme, although, given the discussion above, I am unsure if it is ready for teaching beginning reading. Such a trial may then allow us to pick some of this apart.

As for Bowers’ argument, it is hard to judge. What he desperately needs is something prospective i.e. to make a prediction before a study has been conducted and see that prediction supported by the subsequent evidence. Given the interest in structured word inquiry, there is obviously an advantage for Bowers if something like the EEF study I propose above could be run that involved structured word inquiry.

Those with the means may want to dig back through the papers that sit behind the meta-analyses that Bowers critiques. Setting aside the various technical criticisms Bowers makes, the crucial issue would be to determine the exact nature of these unsystematic phonics programmes that Bowers’ suggests are as effective as systematic phonics under his analysis. Even if Bowers’ analysis does not hold and the more conventional view prevails, it would still be a useful exercise.

In the meantime, I see no reason to ditch systematic phonics. Even if Bower’s turns out to be correct in all of his critique, phonics clearly works and I don’t think anyone in this debate is arguing that making it systematic is a cause of harm or that there are advantages to being unsystematic. If nothing else, a systematic approach makes sense from a planning perspective. It also aligns well with other sources of evidence from cognitive science.

Something decidedly odd

The horse drawn carriage halted outside the imposing building. Sir James Lancefield hurried himself out. Barely pausing at the reception of The Learned Society to remove his coat and hat, he then took the marble staircase two stairs at a time. As he entered the meeting room with his bundle of papers, the gentlemen were already assembled.

“This evening, gentlemen, I am going to examine germ theory.” Sir James announced.

A number of the assembled members smiled wryly and rolled their eyes. Others appeared more expectant.

“Germ theory proposes that infections and, er, lots of other things,” Sir James smiled and drew he hands wide, “are caused by tiny little ‘germs’ that we cannot see.”

“It is an appealing idea that is quite the fashion among certain physicians but tonight I want to expose it to proper, scientific scrutiny.”

“Hear, hear!” Came an affirmation from the back of the room.

“Firstly, what are these germs? Mr Pasteur, the Frenchman, is not clear. The claim is that disease is caused by these diabolical agents but we cannot even count them! A germ theory without germs appears somewhat misnamed!”

The assembled gentlemen laughed.

Mr Callow was not laughing. A slim and youthful physician, he possessed an eager smile. He raised his hand. “I understand that the provision of microscopes and their employment is promising in this regard, sir.”

Sir James paused and surveyed the room. He smiled, “Well, we are always encouraged by promise, Mr Callow.”

The gentlemen laughed.

Sir James continued, “Another issue for the field is the small, circular, insular group of gentlemen who correspond on the matter of germ theory. The theory is not widely discussed in England and so it is the creature of but a few voices, lacking the benefit of the experience of the majority. Indeed, it lacks the insights that could be provided by the gentlemen in this room.”

The gentlemen murmured their agreement that this was a serious and substantive flaw.

“Germ theory claims to be a unique and distinct theory. That is why we are supposed to be drawn to it. And yet similar ideas have been around since at least the Middle Ages, to little heed. And is it so unique really? Other theories of malady stress the need for cleanliness and sanitation. This is no undisputed claim staked by the germ theorists.”

Mr Callow again raised his hand. “Are you talking about miasma?” he asked, eagerly, “I know you have previously corresponded to the society on this matter.”

“Mr Callow,” admonished Sir James, “my correspondences are irrelevant to the matter at hand. I am motivated purely by a disinterested regard for furthering scholarship. I see my role as a public duty – often performed at my own expense I may add – to pursue the greater good.”

Mr Henderson, the stout, middle-aged secretary of the society then intervened. “I think this is a critical juncture. I, for one, am not interested in simplistic discourse about whether germ theory is right or wrong. As a theory it strikes me as rather inelegant. And all the late speculation, inspired by Mr Darwin, about how germs may evolve seems useless to those who work with real patients. I am far more interested in the effects. What are the effects? Are they of any use?”

“Well,” replied Sir James, “the theory does make a large number of predictions about the transmission of disease, ways to prevent this and so on. Many of these are not unique to germ theory and some, for example vaccination, appear to be contradictory to others. The main issue that confronts one who studies germ theory is that you need to create something of a zoology to categorise all of the ventured effects!”

The gentlemen laughed. Mr Henderson continued, “And this is the point. Advocates of germ theory really need to decide upon its scope. Is it just a theory about surgery or does in encompass diseases such as cholera? What about other afflictions? The germ theorists have been most inconsistent on this matter.”

“Inconsistency serves its purpose, Mr Henderson,” Sir James explained conspiratorially as he mentally rehearsed his final flourish. “Let me give two examples. A surgeon of noble bearing does not wash his hands, completes an amputation and the patient remains clear of gangrene. Germ theory can explain this: there must not have been sufficient germs on the surgeon’s hands. Now consider that the patient’s limb became gangrenous. Germ theory can explain this too: there were sufficient germs on the surgeon’s hands in this case. In germ theory, all results may be explained! Why? Because ‘sufficient’ remains helpfully undefined.” Sir James turned to Mr Callow, “Sir, precisely how many germs would you say are needed to cause gangrene?”

“I could not say, sir.” Replied Mr Callow.

“So from the point of view of a surgeon, what does this germ theory really have to offer?”

The question hung in the air.

After a short while, Mr Henderson rose. “Well, I think you will all agree with me that Sir James has gone above and beyond his duty in preparing for this night’s presentation. I cannot think of anyone more informed on this fashionable germ theory nor anyone more appropriate to explain it to the learned gentlemen present.”

The gentlemen applauded.

As Mr Callow clapped, he could not help feeling that something decidedly odd had just happened.

Ridiculing Cognitive Load Theory

According to sources, Casper Hulshof gave a keynote speech at researchED Netherlands over the weekend in which he held cognitive load theory up to ridicule.

Ridicule is a powerful tool and I am not against its deployment. There are too many people on twitter who take offence at jokes as ‘sneering’ and I really don’t want to be among their humourless company. Moreover, humour is a powerful way to deflate the pompous and those with an undeserved sense of their own importance.

However, I would suggest that humour is most powerful when it uncovers an unspoken truth, revealed to all. Like any argument, it rests on a substantive point which cannot easily be dismissed.

I have no idea what Hulshof said about cognitive load theory. However, he is apparently writing a critical paper with Christian Bokhove in which, for some reason, they map the development of the theory over time as if that somehow matters:

Cognitive load theory is not some kind of revealed truth, it is a scientific work-in-progress. It is clearly not finished and the final state and extent of the theory has not yet been determined. I discuss some of the common criticisms of it in this thread:

Of all these, I think the issue with germane load is the most interesting. Clearly, some part of working memory has to be involved in learning – it is ‘germane’ – but the theory has not yet worked out a way of dealing with this that is unfalsifiable and therefore scientific.

Some people become quite animated about the way that cognitive load theory draws upon David Geary’s theory of evolutionary educational psychology. Again, this centres around falsifiability. However, to the extent that this is foundational to cognitive load theory, it generates predictions within the theory that can be tested.

Crucially, cognitive load theory as it stands is falsifiable, as I discuss in this thread:

Perhaps all of this is really funny. Perhaps those who research the theory, like me, are very silly people who are deluding themselves. I don’t know,

Presumably, when the Hulshof / Bokhove paper comes out they will be able to point to empirical evidence that casts doubt on the predictions of cognitive load theory. I assume that must be the basis of the ridicule. Anything less would be something of a disappointment.

Update: Erik Meester, not one of the sources mentioned above, attended the talk and has written about it in a Twitter thread:

Another big fail for inquiry learning

Embed from Getty Images

The UK’s Education Endowment Foundation (EEF) has released a report about a randomised controlled trial they conducted to test the efficacy of CREST, an inquiry-based learning programme. Specifically, they offered CREST to science students and measured the impact on science performance. There wasn’t any. There also was no impact on self-efficacy in science (a key component of motivation) and the proportion of students aspiring to a scientific career, although small positive impacts were estimated for confidence and attitude to school.

These results will come as no great shock to readers of this blog who will be aware that there is no history of inquiry-based learning proving effective in randomised controlled trials and that there are reasons to think inquiry-based learning is at odds with the science of learning (see e.g. here). This finding also aligns with correlational evidence from PISA that the more students engage in inquiry-based science, the worse their PISA science score.

However, this study has a few interesting details it is worth mentioning.

Firstly, it was an ‘intention to treat’ study that focused only on students in Year 9 (Grade 8 in the U.S. and Aus) who were prepared to opt in to the CREST initiative. This meant that in the control condition, students were still asked whether they wanted to participate in CREST but after the randomisation process, were then told it was not available and were given a high street voucher to spend instead. The progress of these students was then tracked against the students who completed CREST, specifically on a standardised science assessment know as ‘GL’s Progress Test in Science’.

These are highly favourable conditions because the participants have self-selected into the CREST initiative and therefore presumably see some value in it, making the trial prone to expectation effects (like the placebo effect).

The trial was also run by the programme developers, as are all EEF trials at the first stage of development. Again, this leads to the most favourable possible conditions.

The actual content of CREST saw students working on science projects – there was a minimum commitment of 30 hours. So the trial was also a test of a form of project-based learning. Schools could choose how to deliver it. Some used time in science lessons and others ran it as an after school club.

There was a high attrition rate both at school and participant level and this reduces the security of the findings, particularly because it disproportionately hit the intervention arm. In addition, the proportion of students in the intervention who finally submitted projects was low and the trial report suggests this may be have been due to a backwash of pressure from the GCSE exams that students in England sit in Year 11.

There is a hint that the mode of delivery mattered. Students who did CREST as an after school club may have made slight gains relative to the control and students who did it in class may have performed slightly worse than the control, presumably because this displaced the actual learning of science in favour of play-acting at being a professional scientist. However, the trial was not designed to measure such differences and so these results must be considered tentative at best.

Experience suggests that proponents of inquiry-based learning are unlikely to revise their view based on the evidence from this trial. They may point out that this trial does not prove that inquiry-based learning never works in any context. That’s true – no trial could – but where is the evidence it does work in these other contexts? They may point to aspects of the CREST programme and its implementation they don’t like. Fine. They are also likely to point to the high attrition rate.

However, I am starting to view attrition as something of an outcome measure in these trials. A while back, the EEF ran a different randomised controlled trial on project based learning. Similarly, they found no effect and high attrition. If schools and students are dropping out of inquiry-based learning or project-based learning when the conditions are as favourable as they could possibly be then this is not a good sign for implementing these approaches effectively in schools that are not part of a project and who do not have access to programme developers. There will be a reason for the attrition and this is likely to be related to the efficacy of the intervention.

I would also note that CREST is almost the archetypal STEM initiative of the kind we have come to hear so much about in recent years and that are somehow meant to deliver a new generation of scientists and engineers. Back to the drawing board on that. Here’s my plan: Let’s instead focus on building a rigorous science curriculum and then explicitly teaching it. That may just work.

Finally, to the EEF – why not start testing things with a clear mechanism of action that is consistent with cognitive science? Such programmes may be a better bet. Just a thought. In the meantime, it is still quite useful to accumulate lots of evidence for interventions that fail.

Differentiation still does not work

There are two sides to the debate around differentiated instruction. The first holds that differentiation, at least as it is commonly understood and practised, is dead because the evidence in support of this long-promoted approach appears chronically lacking. The second view holds that differentiation is not dead and merely pining for the fjords. It is this second view that still seems to hold sway in teacher education institutions, for writers of teaching standards, and, no doubt, in many schools.

An interesting new study from Flanders in Belgium provides yet another largely null result for differentiation. It comprises two randomised controlled trials involving a total of 2407 students in 200 classes in 65 schools.

Interestingly, the students were in Grades 8 and 9 and were learning about financial literacy. This extends an evidence base that is most frequently drawn from early literacy and numeracy teaching. The method of instruction was also quite significant. Students were involved in a ‘serious game’ where learning materials were supplied to them via computers and where they had to complete their work in workbooks with minimal intervention from the teacher.

The researchers seem to be under the impression that this teaching method has been experimentally verified to be more effective than a traditional approach, but the relevance for Australia and our post Gonski 2.0 turn towards technology-facilitated personalised learning is striking.

There were four conditions involved in the trials. In the control condition, schools did not receive the materials at all. I’m not clear as to the extent to which they then taught any financial literacy. In the second condition, schools were given the materials to work through and students were randomly paired-up. The third condition was the same as the second condition except that students were matched with other students of similar maths ability (which apparently correlates closely with financial literacy in PISA data). Finally, in the fourth condition, students were paired as in the third condition but the different ability pairs were given different levels of instruction. In this case, the higher ability pairs were given the original materials with the medium and lower ability pairs given materials with progressively more additional guidance. Comparing conditions three and four therefore measures the effect of differentiation (and comparing conditions two and three also does if you consider ability matching a form of differentiation).

Overall, there were no significant effects for ability matching pairs or for differentiated materials. There appeared to be some advantage of the differentiated materials for non-native Dutch speakers. There also appeared to be some positive effects on long-term retention. However, this was measured by a second, delayed post-test in the form of a homework task. The authors suggest this finding should be treated with caution due to the nature of the task and the ‘selectivity and size of the sample’. I must be missing something because I cannot find the data this is referring to.

So there we have it – another largely null result. Of course, the true believers will not be convinced. They will find some reason to argue that this is not true differentiation or that it has not been enacted properly. They may have a point. However, how many times do we have to fail to find Bigfoot before we are convinced that Bigfoot does not exist?

The ball is firmly in the court of those promoting differentiation. It is they who need to demonstrate that the right kind of differentiation can be effective. Until then, why should any of us assume this?