Latest dispatches from the reading wars

Sarah Mitchell, the New South Wales education minister has announced the roll-out of a phonics screening check across NSW public schools in a robust article in the Sydney Morning Herald with the headline, “The reading wars are over – and phonics has won.” Great News. The phonics check is no panacea – and nobody is suggesting that it is – but we have found it very useful at my place and the findings of the pilot conducted in NSW this year are encouraging.

Mitchell makes the following point that made me wince as I imagined a few ‘balanced literacy’ advocates opening up their morning paper:

“Vice-chancellors need to take a broom to these faculties and clear out the academics who reject evidence-based best practice. A faculty of medicine would not allow anti-vaxxers to teach medical students. Faculties of education should not allow phonics sceptics to teach primary teaching students.”

A trip to Bunnings is on the agenda then.

Anyway, this put me in such a good mood that I decided to produce another one of my flashcards:

I have disagreed with Sarah Mitchell in the past on the topic of exclusions. However, it is not uncommon to find people who are pro-phonics and anti- school discipline – they usually work with children in a one-to-one setting. I guess some of these folks must be advising Mitchell.

It was only as few days ago that Diane Ravitch, once a profoundly sensible voice in the education debate, was objecting the the term, ‘science of reading,’ on her blog. It sounds, to Ravitch, much like talking about the, ‘science of cooking’. There I was thinking there was a science of cooking – an applied form of chemistry involving denatured proteins and the like. But, no, such a thing would be absurd! Good teachers are not scientists and good cooks are not scientists so there can therefore be no science of either. Got that?

However, while bending my head around this logic, I noticed something else. Ravitch mentions the late Jeanne Chall:

“Her 1967 book, Learning to Read: The Great Debate, should have ended the reading wars, but they continued for the next half century. She understood that both sides were right, and that teachers should have a tool-kit of strategies, including phonetic instruction, that they could deploy when appropriate.” [My emphasis]

Perhaps Chall’s 1967 book should have ended the reading wars and saved Sarah Mitchell the bother of doing it 53 years later, but it did not fit with my reading of Chall that she thought ‘both sides were right’. So, I grabbed my copy of Chall’s excellent 2000 book, The Academic Achievement Challenge. In this book, Chall writes:

“Several syntheses of the research comparing the effectiveness, for learning to read, of a meaning (whole language) versus a code emphasis (phonics)… found, in general, that classic approaches to beginning reading instruction (e.g. direct, systematic instruction in phonics – a code emphasis) were more effective than the various innovative approaches with which they were compared (e.g., a meaning emphasis, non phonics, incidental phonics, phonics only as needed, or a whole-language approach). The classic approaches were found to result in higher achievement in both word recognition and reading comprehension. They were more effective for different kinds of children and particularly for children at risk – those from low-income families, those of different cultural and ethnic backgrounds, bilingual children, and those with learning disabilities.”

That reads to me as if Chall had a firm view of which side was right and it is a timely reminder of why systematic phonics programmes are an equity issue.


A manifesto for an Australian College of Teachers

A recent tweet by Dame Alison Peacock, CEO of England’s Chartered College of Teaching reminded me of debacle surrounding that organisation:

I’ve watched it unfold from afar, from a failed crowd-funding campaign to fistfuls of UK government cash, an odd episode with a Russian TV channel and an election that resulted in the promised ‘teacher-led’ college being run largely by non-teachers. The only way the organisation could become worse is if it gained the power to regulate the profession. Teachers in England are almost universal in their criticism, with the main split being between those who still think its worth joining the organisation to change it from within and those who think it is beyond redemption. It clearly is beyond redemption and teachers must boycott this institution to deny it any credibility, making it harder for the UK government to give it any powers.

Nevertheless, it was not always obvious that the College would turn out this way. In the years prior to the crowd-funding campaign, there were some grounds for optimism. Teachers would value professional body focused on the practice of teaching rather than pay and conditions. And there is clearly a need to give voice to teachers, the most patronised and talked-over of the professions, which is why the fact that non-teachers muscled-in and took over the College caused such a deep psychic wound.

So, I began to think about what an Australian College of Teachers  – one that learnt the lessons of the English experience – would look like. Firstly, do we need one? Possibly, yes. There is a patchwork of state-based unions and teaching institutes in Australia but none of them really do this job. Our unions are far better than something like the absurd NEU in the UK with its ideological flights of fancy, but the focus of our unions is pay and conditions rather than teaching practice, as it should be. Our teaching institutes, on the other hand, are simply regulatory bodies. We hand over the cash and they give us permission to teach for another year. There is much to question in this regulatory model, but the fact that it would be pre-existing to any new Australian College of Teachers means that the probability of the new body being drawn into regulation would be low.

So, what would be the point? To test that out, I propose the following draft manifesto.

  • Membership of the Australian College of Teachers would be limited to those who currently teach classes in an Australian P-12 setting for at least three hours per week. This would include independent and government schools as well as specialist provision. Members who cease to meet this teaching commitment could continue as non-voting affiliates but nobody could join on that basis. School leaders, such as principals and deputy principals, and academics would not be barred from being members provided that they met the teaching load requirement. However, they would be barred from holding leadership positions within the College.
  • All leaders and officials would be drawn from the membership and explicitly not from any affiliates.
  • The proceedings would be conducted entirely virtually. There would be no physical meetings. The tyranny of distance in Australia is worse than in most countries and so if the College was based in a city and conducted in-person meetings, it would end-up being dominated by teachers from that city. By conducting all proceedings online, anyone with an internet connection could participate with no need to spend time travelling. A remote Northern Territory teacher could potentially lead the College. All meetings would be open to all members and affiliates as observers, making the College’s processes transparent.
  • The College would regularly survey its members (not affiliates) to establish the balance of membership opinion on matters such as workload, behaviour and the implications of government education policy. It would also commission a series of systematic narrative literature reviews on different aspects of teaching practice. This would improve on meta-meta-analysis by not trying to shoe-horn a complex issue into a single spurious measure. Members of the college would decide upon and pre-register the search and inclusion criteria and academics would produce the report which would then be peer reviewed. Different kinds of evidence such as correlational studies, quasi-experiments and randomised controlled trials could all be included in different sections of each report, with the authors suggesting overall conclusions which would be considered as advisory and provisional. There would be a rolling programme to update these reviews over time. These reviews would be published open-access on the College’s website.
  • When representing the College, leaders would be informed by the results of these opinion surveys and literature reviews e.g. “70% of our membership thought… A review we commissioned found…”
  • The leadership would seek to develop connections with journalists and organisations in order to 1) represent the position of the College in the media and in government consultations and 2) suggest teachers who would be available to give interviews and sit on conference panels in a personal capacity with the idea of injecting the views of actual teachers into education discussions.
  • The College would draw up a code of conduct based on ethical behaviour alone and membership could only be withdrawn on that basis. Any future attempt at turning it into a regulator of teacher competence or accreditation would therefore have to overturn these constitutional provisions.
  • Given the current climate, the constitution would also need an explicit commitment to free speech.
  • Service to the College would be voluntary and membership fees would therefore be low.

I am interested in your thoughts. Do we need such an institution? What do you think of my draft manifesto? What would you add? What would you change? Are there dangers I have not foreseen? Can any such body be designed in a way that avoids these dangers?

Also, to those of you in the UK who are familiar with the Chartered College of Teaching, what lessons can we learn?


Restorative Justice/Practices in Schools

Embed from Getty Images

I am not going to link to it, but there is a video currently going around Twitter that appears to show two schoolboys kicking and punching a Sikh boy in what seems to be a racially motivated attack. The video appears to have been recorded in England because the person who posted it to Twitter tagged the Twitter handle of an English school – which is one of the many reasons I am not sharing the tweet.

When it appeared on my timeline, I thought back to a segment that was broadcast last week on The Project, an Australian current affairs TV show.  I did not see the original broadcast, only the edit that was posted on Twitter.

In this edit, an Australian education academic makes the following claim about school suspensions and exclusions:

“There are other things that we could be doing that are supportive, things like Restorative Justice. So, for example, if one child hits another child, they’re going to learn more if they are required to sit and speak and apologise to the child that they’ve hit rather than, for example, if they’re sent home and they get to play X-Box for the day.”

I find this a strange perspective. Yes, it is important to identify learning opportunities for students, but we have to perhaps balance this against more pressing concerns. You won’t see anyone making a similar argument about, say, domestic violence and with good reason. Instead, our first thought is to protect the victim, with the rehabilitation of the offender a secondary concern. Yes, I do understand that there is a difference between adult offenders and children, but from a victim’s perspective, it’s as scary being a child attacked by another child as it is being an adult attacked by another adult. Sending the attacker home, whether he or she then plays X-Box or not, at least keeps the victim safe in the short term while emotions are still running high.

And what if the victim does not want to meet with their attacker? What if they find this prospect stressful? Do they have a say? Are there not other ways for the attacker to learn that hitting other children is wrong?

And some victims may be wary of taking part in one of these Restorative Justice meetings, sensing that the attacker may mouth an empty apology and then carry on as before when they leave the room.

I decided to ask Twitter for people’s experiences of Restorative Justice (sometimes known as Restorative Practices) in schools.

The thread continues to build, but one theme that has emerged is that the implementations of Restorative Justice that people tend to view positively still contain what some might view as a ‘punitive’ element. For instance, a student may be given a detention in which he or she is required to write about what they did wrong or a student who is temporarily excluded from class is required to have a discussion about the reasons why this happened when they return.

This suggests that framing Restorative Justice – or at least some people’s take on Restorative Justice – in opposition to suspensions and exclusions may be a false choice. Indeed, about 15 years ago when I was involved in suspensions in London, we would have a reintroduction meeting with the student and their parents where we developed a plan that included a report card, a set of targets to meet and a discussion of the supports the school needed to put in place to help the student meet those targets. So, you could describe that as restorative. However, I don’t think it would have occurred to me to involve victims in those meetings.

A number of comments in the Twitter thread suggest negative experiences of Restorative Justice and a few people have also contacted me privately to tell their stories. Here are a couple of examples that I have lightly edited to correct predictive text typos and the like:

“I have worked in a school with Restorative Justice and it was very difficult. In many cases it made no difference to persistent misbehaviour but undermined staff, who began to feel that they could not discipline students. Staff became highly stressed and disillusioned. [They were] Often told lessons weren’t well-planned or differentiated enough and that poor behaviour was their fault. At times staff felt they had no option but to apologise for reprimanding students who didn’t meet basic expectations. Students began to believe they were always in the right. I… left as a result.”

“Being a relatively new teacher I used to think restoratives were a progressive, democratic and effective method for helping kids change their behaviour. Instead, it’s turned into me losing upwards of an hour of my lunchtime throughout the week doing restoratives in which I just sit there. It’s often with the same kids, with no evidence of their behaviour changing at all. Many teachers at my school are hesitant to go through the process because it takes so much time and does not amount to much.

There have been a few cases where restoratives between victims and perpetrators have been held, and some of those victims have confided in me that they feel anxious during the meeting, and that they know the perpetrator is not sorry at all.

It’s a frustrating system that the kids can game, know that it isn’t a deterrent, and I, the other staff and kids suffer and learning is lost.”

Of course, these are just two accounts. It may be the case that the majority of implementations of Restorative Justice are better and it is the motivation caused by negative experiences that has caused these individuals to share their perspectives. That may also be true for the Twitter thread. And I have no means of independently verifying these stories. I doubt whether anyone is making stuff up, but we all emphasise the parts of a story we see as important and downplay the ones we see a less important or that perhaps run counter to the narrative that we want to put forward.

Interestingly, the item on The Project suffers from exactly the same problem. Children and parents with negative experiences were interviewed about their experiences of school suspensions with little obvious attempt made to independently verify these accounts.

Given the largely anecdotal nature of the discussion, it may be better to conduct wider, more rigorous research before adopting Restorative Justice as an alternative to suspensions and exclusions in schools. The evidence we currently have on Restorative Justice from the U.S. is not encouraging, suggesting that it may harm academic outcomes, harm the school climate as perceived by students and is difficult to implement properly, even with extensive support.


Greg Ashman with Kate Barry

In the episode, the tables are turned and Kate Barry, an English and French teacher from Ireland, interviews Greg Ashman about his new book, The Power of Explicit Teaching and Direct Instruction. Greg and Kate discuss Greg’s route into teaching, the nature and value of education research, the meaning of the terms ‘explicit teaching’ and ‘direct instruction’, the different perspectives of academics and practising teachers, the need to look for disconfirming evidence, differentiated instruction and solution to avoiding progressivist/traditionalist pendulum swings. Thanks to Kate for asking the questions. You can read an excerpt from The Power of Explicit Teaching and Direct Instruction here


How to frustrate maths students with mixed ability teaching

What would you conclude if, after persuading a school district to adopt your preferred model of maths education and studying a self-selected number of teachers in a couple of schools identified by the district as exemplars, you found that it wasn’t working out the way that you imagined and students told you that they found elements of the program frustrating? Well, you should perhaps conclude that there is a problem with your model. It would be a stretch, don’t you think, to blame the state’s maths standards? And yet that is the finding of a new study by LaMar, Leshin and Boaler which you can read open access in all its unfalsifiable glory.

Following the usual practice of one of the study’s authors, we are not told the real name of the district where the study was conducted. Instead, it is referred to as ‘Gateside’.

The U.S. has a different way of teaching mathematics to the Australian and English systems that I am familiar with. In the U.S., students typically follow a set sequence of maths courses: Algebra 1, Geometry, Algebra 2, Pre-Calculus and if they can fit it in, AP Calculus. This sequence means that students need to study Algebra 1 in the Eighth Grade if they are going to complete AP Calculus before the end of high school. AP Calculus is effectively, if not explicitly, an entry requirement for many top American colleges.

This tends to lead to a split where more advanced students take Algebra 1 in Eighth Grade, with their less advanced peers waiting until the following year. Gateside had decided to disrupt this model by making all students wait until Ninth Grade to study Algebra 1 and then teaching them in mixed ability classes. It would only be in Eleventh Grade that students could choose to either do Algebra 2 or Algebra 2 plus Pre-Calculus. The district also decided to eschew ‘procedural teaching’ in favour of a form of group work, the Platonic ideal of which the authors refer to as ‘complex instruction’. A dash of mindset theory was also supposed to be added somewhere.

The researchers approached the district and the district provided them with a list of seven out of their fourteen high schools that had, “fully implemented Complex Instruction and the district’s core curriculum.” The researchers then chose two of these schools to study. They approached the ten Algebra 1 teachers and eight of these agreed to participate in the study.

Apparently, one of the features of complex instruction is that students work together on ‘groupworthy’ tasks. The idea is that these tasks should only be possible to complete as a group. Moreover, every member of the group should be able to contribute in a substantial way to the overall solution and nobody in the group would finish the task before everyone in the group finished the task.

Clearly, such an idealised maths task is hard to design. Students who are more advanced at maths are going to be able to contribute more to a maths task than students who are less advanced. So, both groups became frustrated. The more advanced students were frustrated by having to constantly explain the maths to the less advanced students – i.e. by doing the teacher’s job – and by having to wait for them before they could move on. The less advanced students felt under pressure to work quickly. The teachers charged with managing the ensuing chaos developed a system where they would stamp the work so they could keep track of who had finished what and therefore decide when each group could move on. The researchers disapproved of these stamps.

This is a predictable clash between researcher idealism and teacher pragmatism. The researchers gave the teachers an impossible task. The teachers then tried to make this practical. The researchers then disapproved of the teachers’ solution.

The authors suggest that the main problem with the district’s approach – the one that caused all the frustration – lay in the tasks the teachers were setting and the fact that these tasks were driven by the curriculum standards. For instance, asking students to solve (2x+1)(3x+4)=0 is something the authors consider to be only a ‘beginning’ level of task. Finding the ‘zeros’ of the graph of y=(2x+1)(4x+4) is only slightly better and classed as ‘developing’. Instead, what we really want students to be doing is an ‘expanding’ task such as working out the pattern in a growing sequence of squares like this:

To be frank, this is the sort of task I would expect to see in maybe a Fifth Grade classroom. It has some connection to algebra, but not much. Most of the time that students are engaged in a task like this, they will not be developing their understanding of algebra. The first two tasks pay forward to later study of algebra, including calculus, in a way that the squares problem really does not.

So, if the state standards are forcing the teachers into tasks like the ‘beginning’ and ‘developing’ ones, they appear to me to be correct to do so. Attacking these standards seems like an excuse.

The real lesson from this study is something quite different: If you are able to persuade a district to adopt this model, they adopt it and then point you towards the best implementations of it, you will find it does not work. If that’s the case, what are the chances of getting this model to work at scale? I suggest they are very low.

It is worth noting that the researchers cite evidence that the new approach is superior than the old one. Apparently, the algebra failure rate dropped across the district. However, it is my understanding that U.S. schools do not use common, standardised assessments to determine who passes such courses and so this could simply be a function of applying a lower standard. We cannot check, because we don’t know the actual name of the district.

And the authors make a number of strong claims throughout the paper – claims that are potentially falsifiable such as, “When mathematics is taught as a set of procedures to follow, many students disengage, and various studies have shown that procedural teaching is particularly damaging for girls and students of color.” However, the reference provided is often then to another non-randomised study similar to the present one and involving some of the same authors.

One such claim that particularly drew my attention was that, “…procedural teaching encourages students to take a ‘memorization’ approach to mathematics, which has been shown to be associated with low achievement.” This refers to a paper from Scientific American that I have examined before and that I do not believe demonstrates this finding.

Nevertheless, people will point to this paper and districts will embark upon similar reforms, thinking they are evidence-based.


Another failure for Productive Failure

A new study by Valentina Nachtigall, Katja Serova and Nikol Rummel has failed to find evidence for the productive failure hypothesis.

Briefly, advocates of productive failure suggest that a period of unsuccessful problem solving prior to explicit instruction is superior to explicit instruction from the outset. They suggest unsuccessful problem solving may activate prior knowledge, make students more aware of their knowledge gaps and prepare them for recognising deeper structure during subsequent explicit teaching.

There have been a number of studies that seem to show a result in favour of productive failure. However, in the paper I wrote with my supervisors based upon my PhD research, we note that these studies have potential limitations. In fact, in this paper, I report very similar findings to the new study.

The new study involved tenth grade students in Germany who attended classes at a university to learn – in an oddly iterative touch – about experimental design in the social sciences (Experiment 1) and causal versus correlational evidence (Experiment 2). The study follows a quasi-experimental design. Similar to my study, a productive failure group attempted problem solving prior to receiving explicit teaching, whereas a direct instruction group receive explicit teaching prior to problem-solving. Outcomes on a post-test were then compared.

Contrary to the expectations of the researchers, but not to mine, neither experiment found a productive failure effect. In fact, the first experiment found in favour of the direct instruction group, including on an analysis of a subset of questions that assessed ‘deep feature recognition’. In the second experiment, there were no significant differences between the two groups.


New claims about the effectiveness of Quality Teaching Rounds

Quality Teaching Rounds (QTR) has featured before on this blog. As far as I can gather, the story goes something like this: In the 1990s, Fred Newman and colleagues developed an approach in the United States known as ‘authentic pedagogy’ or ‘authentic achievement’. This approach then informed the The Queensland School Reform Longitudinal Study, a correlational study that took place around the turn of the century. Productive Pedagogies then took a road trip down to New South Wales and became known as ‘Quality Teaching Rounds’.

Up until now, QTR has been most notable for an extraordinary randomised controlled trial. I have not been trained in QTR and so do not understand the subtleties, but it revolves around a teaching framework derived from Newman’s work. The ’rounds’ involve teachers working together in a group. On the same day, each group conducts a reading discussion, observes a member of the group teaching and then codes these observations against the framework. The QTR framework is apparently superior to other teaching frameworks:

“While there is growing advocacy for pedagogical frameworks to guide the improvement of teaching, the QT framework differs in several respects from other widely used frameworks… First, the QT framework offers a comprehensive account of teaching, addressing matters of curriculum, student engagement, and social justice, as well as pedagogical practice (Gore, 2007). In this way, it avoids reducing the complex, multi-dimensional enterprise of teaching (Jackson, 1968) to a set of teaching skills or practices. On the contrary, the QT framework is more about a conception of ‘the practice of teaching'”

The randomised controlled trial was extraordinary due to its outcome measure. Rather than do the obvious and judge the effect of QTR professional development on student outcomes, it judged the effect on teaching quality as measured on – you’ve guessed it – the QTR framework. So essentially, teachers who were trained to teach in a way that scores highly on the QTR framework scored more highly on the QTR framework than those who were not.

This unremarkable and possibly tautological finding came with an interesting spin: QTR apparently improved the ‘quality of teaching’.

The obvious next step – which should have perhaps been the first step – was to assess the effects of QTR on what students actually learn. A study has now taken place and apparently found that QTR boosts learning in maths by 25%. At last, some direct evidence for the approach!

Or is it?

QTR has a flashy new website where you can learn more about it and book a workshop. The 25% figure features repeatedly. If you trace this back to a source, you land on this document which repeats the claim and provide a little, but not much, more detail. A diagram suggests that, in a year, the effect size for the control group was about 0.45, about 0.52 for an ‘alternate’ group and nearly 0.60 for the QTR group.

I’m not entirely sure how you get from that to 25%, whether it is statistically significant, whether there are baseline differences, the methodology of the study and so on – all the kinds of issues to consider when evaluating a trial. However, when you look for a reference, you get, “Gore, J., Miller, A., Fray, L., Harris, J., & Prieto, E. (under review). Improving student achievement through professional development: Results from a randomised controlled trial of Quality Teaching Rounds.”

I cannot find this paper by Googling it, which is not surprising if it is still under review. Nevertheless, it strikes me as quite wrong to be making claims of this kind and launching flashy websites when this claim has not yet been peer-reviewed. This wouldn’t be as much of an issue if we could examine a pre-print version of the paper ourselves, but this information appears to be unavailable. I can only hope this is an oversight and that the paper is due to be released very soon.

If you do try Googling the trial name, the main search result is a protocol for conducting a randomised controlled trial of QTR that includes three of the same authors i.e. a plan for a study to be conducted in the future. This may be the same trial that we now have results for and that is under review. If so, the protocol include three main outcome measures in Mathematics, Reading and Science, as well as a student questionnaire. What happened to those outcomes?

All I can do is urge caution to any school that is impressed by the 25% claim and is thinking of jumping in at this point. My advice is to wait at least until the full paper is published.


Advice for virtual bystanders

Whenever I tire of online debate, I remind myself who it is all for – the silent bystanders. Although there are some admirable individuals, such as David Didau, who have changed their views on education due to social media debate, the vast majority do not. Instead, the main purpose of debate is to test ideas so those who are following the discussion and who have not committed publicly to a position may develop an informed view.

In this post, I want to point to some useful sources for the bystander. What should you look for? If you are following a debate on Twitter, how can you tell the good points from the tactical ploys?

One of the best sources to turn to is How to Disagree by Paul Graham. Graham classifies levels of disagreement from the least valid to the best and so his guide is a handy one to apply to a discussion you may be observing. At the bottom is simple name-calling and at the top is refuting the central point. I think we can all easily recognise both of these when we see them, although a variant on name calling – calling people far-right adjacent, for instance – does seem to slip under the radar of some. However, the intermediate levels are worth naming and can be difficult to spot. Far too many comments I receive, for instance, are related to tone. This is what Graham has to say:

“It matters much more whether the author is wrong or right than what his tone is. Especially since tone is so hard to judge. Someone who has a chip on their shoulder about some topic might be offended by a tone that to other readers seemed neutral.”

Quite. I have a Twitter troll who is one of the most impolite people I have interacted with on the medium and yet who often calls for a better standard of debate. I suspect they feel justified in doing so.

I would suggest the vast majority of one-off criticisms I attract on Twitter are either a response to my tone, a general call for more nuance, as if nuance is always a good thing, or a personal attack (ad hominem). Often, one or more are combined as in this example:

It’s actually quite rare for people to provide counterarguments and especially rare to provide contrary evidence.

Personal attacks are particularly pernicious because we take them personally and I worry whether they stop some people from airing their views. I have developed a pretty thick skin and so I am now immune to the frequent claims that I lack sufficient expertise to comment on this or that issue or the regular references to my PhD studies. Presumably, if I did lack expertise, it would cause me to make errors that my critics could gleefully highlight and that would be far more devastating to my argument that any personal slight. The fact that my critics rarely do this suggests they cannot find such errors and that this is the best they’ve got. Watch out for it.

Graham doesn’t outline all of the possible fallacies you may encounter and so another good source is This lists all the most common logical fallacies. I would highlight two of these – ambiguity and burden of proof. The examples given on are not specific to education, but you see these a great deal in the education debate.

Ambiguity or equivocation tends to take the form of questioning the definitions of words or suggesting an alternative interpretation of something. On Twitter, this is a strategy people often deploy to avoid admitting they are wrong and its ready availability, and the face-saving that people think they achieve by deploying it, is one of the reasons why so few people admit their errors.

Burden of proof means that someone who thinks a thing is a thing is the one who is required to provide evidence, not the one who doubts it. It doesn’t matter how the conversation starts. If someone suggests, “This thing is not a thing,” then it is not their duty that show why.

Recently, I have come across another, older source – Arthur Schopenhauer’s (1788-1860) The Art of ControversyThis is written ironically and takes the perspective of giving advice on how to win arguments without consideration of the actual truth of the matter. Some of these are spookily prescient of social media debate and therefore demonstrate that human nature is more constant than we make think.

Take this example:

“If you observe that your opponent has taken up a line of argument which will end in your defeat, you must not allow him to carry it to its conclusion, but interrupt the course of the dispute in time, or break it off altogether, or lead him away from the subject, and bring him to others. In short, you must effect the trick which will be noticed later on, the mutatio controversiae.”

It’s well worth a read.


More criticism of Jeffrey Bowers’ phonics paper

Back in January, Jeffrey Bowers had an article published in what we all can agree is a prestigious journal, Educational Psychology Review (After writing that, I suppose I should mention that a paper I co-wrote based upon my PhD research was published in the same journal):

Bowers’ article claims that the evidence for  the effectiveness of systematic phonics in early reading instruction is not as strong as is proposed by phonics advocates, He concludes that, “The ‘reading wars’ that pitted systematic phonics against whole language is best characterized as a draw.” And that’s a strong statement.

I have written about this paper before. Essentially, Bowers spends most of it arguing against the conclusions of various systematic reviews before focusing on what he sees as a lack of evidence from England, a country that has adopted early systematic phonics as policy. The latter argument is neither here nor there, but the former is potentially more interesting. When targeting these systematic reviews, Bowers is able to find fault with all of them, ranging from criticisms of reported effect sizes from specific studies that are too large or too small, through to a criticism of the Ehri et al. review, based upon the 2000 US National Reading Panel report, which he believes to have tested the wrong things. The Ehri et al. study compares systematic phonics programmes to programmes that don’t emphasise systematic phonics, but Bowers thinks it should have compared them to ‘nonsystematic’ phonics programmes, a presumed subset of the actual comparison group. This is all quite esoteric and we will return to why this argument matters later.

Displaying great patience, Dr. Jennifer Buckingham has highlighted various issues with Bower’s analysis in a paper published in The Educational and Developmental Psychologist, an earlier version of which can be read here. Now, we can add to this a further critique published in the same prestigious journal as the original Bowers paper.

This new paper by Fletcher, Savage and Vaughn has its own quirks. The authors are keen to suggest that it is the explicitness of systematic phonics teaching rather than its systematic nature that may account for the positive effect. In other words, an experienced teacher who understands the field does not necessarily need a meticulously planned curriculum as long as they adhere to the underlying principles. This is an interesting point, but I don’t see any great evidence presented for it and my own experience in schools suggests a meticulously planned curriculum is quite helpful.

When it comes to Bowers’ main claims, Fletcher et al. are about as forthright as it is possible to be in the measured tone of an academic paper. Like Buckingham, they follow Bowers’ idiosyncratic road trip through the literature, pointing out where they believe Bowers has overstated his case. Curiously, there is a table at the end of the paper summarising points of disagreement and potential points of agreement. I cannot help wondering whether this was at the suggestion of a reviewer because the authors take direct aim at Bowers’ central claim of a ‘draw’ between systematic phonics and whole language:

“…we think this conclusion is tantamount to acceptance of the null hypothesis and is not helpful to educators or their students. Not only is this statement not supported by the evidence from which Bowers claims to derive his judgments, it unnecessarily arouses controversy in a field that needs to focus on the best practices available… Evidence is consistently positive and replicable for effects of explicit phonics.”

Education research is messy and complex. Tying down the various factors is a little like tying down helium balloons in a strong wind. And we can all argue about methods and approaches, as I will do. However, the fact that so many different groups of researchers have investigated this question seriously and systematically and have found positive evidence for systematic phonics according to their own predetermined metrics, means that the idea of a draw between phonics and whole language, if not wholly and entirely inconceivable, is a deeply and profoundly eccentric position to take.

Which edges me slowly towards my final point.

I do not care for all the discussion of effect sizes that takes place within these reviews, criticisms of reviews and criticisms of criticisms of reviews. Although I accept that effect size has some validity, once you start mushing together effect sizes from studies with very different designs in order to produce an overall effect size, I start to feel uneasy. At least these are all studies of early literacy, unlike some of the strange attempts at meta-meta-analysis we have seen. Nevertheless, we know study methodology can change effect sizes and so I would prefer a systematic narrative review, encompassing all studies that meet a certain selection criteria but without the need to produce an overall metric. If I had the time and the relevant expertise, I could conduct a systematic review along these lines.

When Torgerson et al. examined the existing literature, they spotted a different, although related, problem to mine. They noted that many of the studies included in analyses like Ehri et al.’s where not randomised controlled trials. And so, given their view that only randomised trials should be used*, they did the right thing – they conducted their own systematic review based on randomised controlled trials alone.

When Bowers decided that he did not like the comparison group in Ehri et al., he should have done the same thing. He should have decided upon selection criteria and then conducted a systematic review of his own. That would have been far more powerful than attempting to critique the reviews of others and the reason is to do with researcher degrees of freedom.

The ideal experiment in the social sciences is preregistered. This means that the researcher sets out in advance what they will do, what measures they will make, and what constitutes a positive result. This is good practice due to the messily statistical nature of social science research. Basically, I have a one-in-20 chance of generating what looks like a significant result even though it is not. Therefore, if I use 20 different outcome measures, report one that is significant but do not mention the others, I can manufacture a pseudo-significant result. Preregistration, where I nominate what I will use as my outcome measure, removes these degrees of freedom.

Systematic reviews are meant to act in the same way as an experiment. At the outset, you nominate what you will use as your selection criteria. This way, if studies meet those criteria but are unhelpful to your overall hypothesis, you still have to include them and account for them. It is fine for someone else to criticise these criteria, but attempts to somehow reanalyse the results or retrospectively cast-out studies is flawed.

Imagine, for instance, that Bowers did as I suggest and decided to conduct his own review based upon systematic versus ‘nonsystematic’ phonics. Once he narrowed down his selection criteria, he may find himself excluding some of the studies used by Ehri et al. However, he may also find that he has to include some other studies not included in Ehri et al. that are not helpful to his argument. By instead critiquing Ehri et al., Bowers has the freedom to post-hoc re-evaluate conclusions without any of the constraints designed into the discipline of systematic review.

And that is a fundamental and fatal flaw.

*For those of you who care about these things, my own view is that we do not need to limit ourselves to randomised controlled trials. These are relatively rare and so such an approach means tossing out most of the evidence we have. In my view, the main problem arises in trying to treat different types of study in the same way and develop an overall metric. I would prefer a triangulation approach where perhaps the evidence from nonrandomised trials is presented in a separate section to that from randomised trials in the kind of narrative review I would wish to see.