# Welcome

This is the homepage of Greg Ashman, a teacher, blogger and PhD candidate living and working in Australia. Everything that I write reflects my own personal opinion and does not necessarily represent the views of my employer or any other organisation.

My podcast lives here

I have a book out for new teachers (which some experienced teachers have also enjoyed):

The Truth about Teaching: An evidence informed guide for new teachers

Watch my researchED talks here and here

I have written a couple of pieces on Australian education for Quillette:

The Tragedy of Australian Education

Australia’s PISA shock

Here is a piece I wrote for The Age, a Melbourne newspaper:

And here I am in the Australian Financial Review:

Ideology crushes teachers’ ability to control classes

Read my articles for the Conversation here:

Why students make silly mistakes

I have also written lots of other things, some of which I have forgotten about.

My most popular blog post is about Explicit Teaching:

What is Explicit Instruction?

This is my LinkedIn page and Filling The Pail has a Facebook page here.

Standard

# How to frustrate maths students with mixed ability teaching

What would you conclude if, after persuading a school district to adopt your preferred model of maths education and studying a self-selected number of teachers in a couple of schools identified by the district as exemplars, you found that it wasn’t working out the way that you imagined and students told you that they found elements of the program frustrating? Well, you should perhaps conclude that there is a problem with your model. It would be a stretch, don’t you think, to blame the state’s maths standards? And yet that is the finding of a new study by LaMar, Leshin and Boaler which you can read open access in all its unfalsifiable glory.

Following the usual practice of one of the study’s authors, we are not told the real name of the district where the study was conducted. Instead, it is referred to as ‘Gateside’.

The U.S. has a different way of teaching mathematics to the Australian and English systems that I am familiar with. In the U.S., students typically follow a set sequence of maths courses: Algebra 1, Geometry, Algebra 2, Pre-Calculus and if they can fit it in, AP Calculus. This sequence means that students need to study Algebra 1 in the Eighth Grade if they are going to complete AP Calculus before the end of high school. AP Calculus is effectively, if not explicitly, an entry requirement for many top American colleges.

This tends to lead to a split where more advanced students take Algebra 1 in Eighth Grade, with their less advanced peers waiting until the following year. Gateside had decided to disrupt this model by making all students wait until Ninth Grade to study Algebra 1 and then teaching them in mixed ability classes. It would only be in Eleventh Grade that students could choose to either do Algebra 2 or Algebra 2 plus Pre-Calculus. The district also decided to eschew ‘procedural teaching’ in favour of a form of group work, the Platonic ideal of which the authors refer to as ‘complex instruction’. A dash of mindset theory was also supposed to be added somewhere.

The researchers approached the district and the district provided them with a list of seven out of their fourteen high schools that had, “fully implemented Complex Instruction and the district’s core curriculum.” The researchers then chose two of these schools to study. They approached the ten Algebra 1 teachers and eight of these agreed to participate in the study.

Apparently, one of the features of complex instruction is that students work together on ‘groupworthy’ tasks. The idea is that these tasks should only be possible to complete as a group. Moreover, every member of the group should be able to contribute in a substantial way to the overall solution and nobody in the group would finish the task before everyone in the group finished the task.

Clearly, such an idealised maths task is hard to design. Students who are more advanced at maths are going to be able to contribute more to a maths task than students who are less advanced. So, both groups became frustrated. The more advanced students were frustrated by having to constantly explain the maths to the less advanced students – i.e. by doing the teacher’s job – and by having to wait for them before they could move on. The less advanced students felt under pressure to work quickly. The teachers charged with managing the ensuing chaos developed a system where they would stamp the work so they could keep track of who had finished what and therefore decide when each group could move on. The researchers disapproved of these stamps.

This is a predictable clash between researcher idealism and teacher pragmatism. The researchers gave the teachers an impossible task. The teachers then tried to make this practical. The researchers then disapproved of the teachers’ solution.

The authors suggest that the main problem with the district’s approach – the one that caused all the frustration – lay in the tasks the teachers were setting and the fact that these tasks were driven by the curriculum standards. For instance, asking students to solve $(2x+1)(3x+4)=0$ is something the authors consider to be only a ‘beginning’ level of task. Finding the ‘zeros’ of the graph of $y=(2x+1)(4x+4)$ is only slightly better and classed as ‘developing’. Instead, what we really want students to be doing is an ‘expanding’ task such as working out the pattern in a growing sequence of squares like this:

To be frank, this is the sort of task I would expect to see in maybe a Fifth Grade classroom. It has some connection to algebra, but not much. Most of the time that students are engaged in a task like this, they will not be developing their understanding of algebra. The first two tasks pay forward to later study of algebra, including calculus, in a way that the squares problem really does not.

So, if the state standards are forcing the teachers into tasks like the ‘beginning’ and ‘developing’ ones, they appear to me to be correct to do so. Attacking these standards seems like an excuse.

The real lesson from this study is something quite different: If you are able to persuade a district to adopt this model, they adopt it and then point you towards the best implementations of it, you will find it does not work. If that’s the case, what are the chances of getting this model to work at scale? I suggest they are very low.

It is worth noting that the researchers cite evidence that the new approach is superior than the old one. Apparently, the algebra failure rate dropped across the district. However, it is my understanding that U.S. schools do not use common, standardised assessments to determine who passes such courses and so this could simply be a function of applying a lower standard. We cannot check, because we don’t know the actual name of the district.

And the authors make a number of strong claims throughout the paper – claims that are potentially falsifiable such as, “When mathematics is taught as a set of procedures to follow, many students disengage, and various studies have shown that procedural teaching is particularly damaging for girls and students of color.” However, the reference provided is often then to another non-randomised study similar to the present one and involving some of the same authors.

One such claim that particularly drew my attention was that, “…procedural teaching encourages students to take a ‘memorization’ approach to mathematics, which has been shown to be associated with low achievement.” This refers to a paper from Scientific American that I have examined before and that I do not believe demonstrates this finding.

Nevertheless, people will point to this paper and districts will embark upon similar reforms, thinking they are evidence-based.

Standard

# Another failure for Productive Failure

A new study by Valentina Nachtigall, Katja Serova and Nikol Rummel has failed to find evidence for the productive failure hypothesis.

Briefly, advocates of productive failure suggest that a period of unsuccessful problem solving prior to explicit instruction is superior to explicit instruction from the outset. They suggest unsuccessful problem solving may activate prior knowledge, make students more aware of their knowledge gaps and prepare them for recognising deeper structure during subsequent explicit teaching.

There have been a number of studies that seem to show a result in favour of productive failure. However, in the paper I wrote with my supervisors based upon my PhD research, we note that these studies have potential limitations. In fact, in this paper, I report very similar findings to the new study.

The new study involved tenth grade students in Germany who attended classes at a university to learn – in an oddly iterative touch – about experimental design in the social sciences (Experiment 1) and causal versus correlational evidence (Experiment 2). The study follows a quasi-experimental design. Similar to my study, a productive failure group attempted problem solving prior to receiving explicit teaching, whereas a direct instruction group receive explicit teaching prior to problem-solving. Outcomes on a post-test were then compared.

Contrary to the expectations of the researchers, but not to mine, neither experiment found a productive failure effect. In fact, the first experiment found in favour of the direct instruction group, including on an analysis of a subset of questions that assessed ‘deep feature recognition’. In the second experiment, there were no significant differences between the two groups.

Standard

# New claims about the effectiveness of Quality Teaching Rounds

Quality Teaching Rounds (QTR) has featured before on this blog. As far as I can gather, the story goes something like this: In the 1990s, Fred Newman and colleagues developed an approach in the United States known as ‘authentic pedagogy’ or ‘authentic achievement’. This approach then informed the The Queensland School Reform Longitudinal Study, a correlational study that took place around the turn of the century. Productive Pedagogies then took a road trip down to New South Wales and became known as ‘Quality Teaching Rounds’.

Up until now, QTR has been most notable for an extraordinary randomised controlled trial. I have not been trained in QTR and so do not understand the subtleties, but it revolves around a teaching framework derived from Newman’s work. The ’rounds’ involve teachers working together in a group. On the same day, each group conducts a reading discussion, observes a member of the group teaching and then codes these observations against the framework. The QTR framework is apparently superior to other teaching frameworks:

“While there is growing advocacy for pedagogical frameworks to guide the improvement of teaching, the QT framework differs in several respects from other widely used frameworks… First, the QT framework offers a comprehensive account of teaching, addressing matters of curriculum, student engagement, and social justice, as well as pedagogical practice (Gore, 2007). In this way, it avoids reducing the complex, multi-dimensional enterprise of teaching (Jackson, 1968) to a set of teaching skills or practices. On the contrary, the QT framework is more about a conception of ‘the practice of teaching'”

The randomised controlled trial was extraordinary due to its outcome measure. Rather than do the obvious and judge the effect of QTR professional development on student outcomes, it judged the effect on teaching quality as measured on – you’ve guessed it – the QTR framework. So essentially, teachers who were trained to teach in a way that scores highly on the QTR framework scored more highly on the QTR framework than those who were not.

This unremarkable and possibly tautological finding came with an interesting spin: QTR apparently improved the ‘quality of teaching’.

The obvious next step – which should have perhaps been the first step – was to assess the effects of QTR on what students actually learn. A study has now taken place and apparently found that QTR boosts learning in maths by 25%. At last, some direct evidence for the approach!

Or is it?

QTR has a flashy new website where you can learn more about it and book a workshop. The 25% figure features repeatedly. If you trace this back to a source, you land on this document which repeats the claim and provide a little, but not much, more detail. A diagram suggests that, in a year, the effect size for the control group was about 0.45, about 0.52 for an ‘alternate’ group and nearly 0.60 for the QTR group.

I’m not entirely sure how you get from that to 25%, whether it is statistically significant, whether there are baseline differences, the methodology of the study and so on – all the kinds of issues to consider when evaluating a trial. However, when you look for a reference, you get, “Gore, J., Miller, A., Fray, L., Harris, J., & Prieto, E. (under review). Improving student achievement through professional development: Results from a randomised controlled trial of Quality Teaching Rounds.”

I cannot find this paper by Googling it, which is not surprising if it is still under review. Nevertheless, it strikes me as quite wrong to be making claims of this kind and launching flashy websites when this claim has not yet been peer-reviewed. This wouldn’t be as much of an issue if we could examine a pre-print version of the paper ourselves, but this information appears to be unavailable. I can only hope this is an oversight and that the paper is due to be released very soon.

If you do try Googling the trial name, the main search result is a protocol for conducting a randomised controlled trial of QTR that includes three of the same authors i.e. a plan for a study to be conducted in the future. This may be the same trial that we now have results for and that is under review. If so, the protocol include three main outcome measures in Mathematics, Reading and Science, as well as a student questionnaire. What happened to those outcomes?

All I can do is urge caution to any school that is impressed by the 25% claim and is thinking of jumping in at this point. My advice is to wait at least until the full paper is published.

Standard

Whenever I tire of online debate, I remind myself who it is all for – the silent bystanders. Although there are some admirable individuals, such as David Didau, who have changed their views on education due to social media debate, the vast majority do not. Instead, the main purpose of debate is to test ideas so those who are following the discussion and who have not committed publicly to a position may develop an informed view.

In this post, I want to point to some useful sources for the bystander. What should you look for? If you are following a debate on Twitter, how can you tell the good points from the tactical ploys?

One of the best sources to turn to is How to Disagree by Paul Graham. Graham classifies levels of disagreement from the least valid to the best and so his guide is a handy one to apply to a discussion you may be observing. At the bottom is simple name-calling and at the top is refuting the central point. I think we can all easily recognise both of these when we see them, although a variant on name calling – calling people far-right adjacent, for instance – does seem to slip under the radar of some. However, the intermediate levels are worth naming and can be difficult to spot. Far too many comments I receive, for instance, are related to tone. This is what Graham has to say:

“It matters much more whether the author is wrong or right than what his tone is. Especially since tone is so hard to judge. Someone who has a chip on their shoulder about some topic might be offended by a tone that to other readers seemed neutral.”

Quite. I have a Twitter troll who is one of the most impolite people I have interacted with on the medium and yet who often calls for a better standard of debate. I suspect they feel justified in doing so.

I would suggest the vast majority of one-off criticisms I attract on Twitter are either a response to my tone, a general call for more nuance, as if nuance is always a good thing, or a personal attack (ad hominem). Often, one or more are combined as in this example:

It’s actually quite rare for people to provide counterarguments and especially rare to provide contrary evidence.

Personal attacks are particularly pernicious because we take them personally and I worry whether they stop some people from airing their views. I have developed a pretty thick skin and so I am now immune to the frequent claims that I lack sufficient expertise to comment on this or that issue or the regular references to my PhD studies. Presumably, if I did lack expertise, it would cause me to make errors that my critics could gleefully highlight and that would be far more devastating to my argument that any personal slight. The fact that my critics rarely do this suggests they cannot find such errors and that this is the best they’ve got. Watch out for it.

Graham doesn’t outline all of the possible fallacies you may encounter and so another good source is yourlogicalfallacyis.com. This lists all the most common logical fallacies. I would highlight two of these – ambiguity and burden of proof. The examples given on yourlogicalfallacyis.com are not specific to education, but you see these a great deal in the education debate.

Ambiguity or equivocation tends to take the form of questioning the definitions of words or suggesting an alternative interpretation of something. On Twitter, this is a strategy people often deploy to avoid admitting they are wrong and its ready availability, and the face-saving that people think they achieve by deploying it, is one of the reasons why so few people admit their errors.

Burden of proof means that someone who thinks a thing is a thing is the one who is required to provide evidence, not the one who doubts it. It doesn’t matter how the conversation starts. If someone suggests, “This thing is not a thing,” then it is not their duty that show why.

Recently, I have come across another, older source – Arthur Schopenhauer’s (1788-1860) The Art of ControversyThis is written ironically and takes the perspective of giving advice on how to win arguments without consideration of the actual truth of the matter. Some of these are spookily prescient of social media debate and therefore demonstrate that human nature is more constant than we make think.

Take this example:

“If you observe that your opponent has taken up a line of argument which will end in your defeat, you must not allow him to carry it to its conclusion, but interrupt the course of the dispute in time, or break it off altogether, or lead him away from the subject, and bring him to others. In short, you must effect the trick which will be noticed later on, the mutatio controversiae.”

Standard

# More criticism of Jeffrey Bowers’ phonics paper

Back in January, Jeffrey Bowers had an article published in what we all can agree is a prestigious journal, Educational Psychology Review (After writing that, I suppose I should mention that a paper I co-wrote based upon my PhD research was published in the same journal):

Bowers’ article claims that the evidence for  the effectiveness of systematic phonics in early reading instruction is not as strong as is proposed by phonics advocates, He concludes that, “The ‘reading wars’ that pitted systematic phonics against whole language is best characterized as a draw.” And that’s a strong statement.

I have written about this paper before. Essentially, Bowers spends most of it arguing against the conclusions of various systematic reviews before focusing on what he sees as a lack of evidence from England, a country that has adopted early systematic phonics as policy. The latter argument is neither here nor there, but the former is potentially more interesting. When targeting these systematic reviews, Bowers is able to find fault with all of them, ranging from criticisms of reported effect sizes from specific studies that are too large or too small, through to a criticism of the Ehri et al. review, based upon the 2000 US National Reading Panel report, which he believes to have tested the wrong things. The Ehri et al. study compares systematic phonics programmes to programmes that don’t emphasise systematic phonics, but Bowers thinks it should have compared them to ‘nonsystematic’ phonics programmes, a presumed subset of the actual comparison group. This is all quite esoteric and we will return to why this argument matters later.

Displaying great patience, Dr. Jennifer Buckingham has highlighted various issues with Bower’s analysis in a paper published in The Educational and Developmental Psychologist, an earlier version of which can be read here. Now, we can add to this a further critique published in the same prestigious journal as the original Bowers paper.

This new paper by Fletcher, Savage and Vaughn has its own quirks. The authors are keen to suggest that it is the explicitness of systematic phonics teaching rather than its systematic nature that may account for the positive effect. In other words, an experienced teacher who understands the field does not necessarily need a meticulously planned curriculum as long as they adhere to the underlying principles. This is an interesting point, but I don’t see any great evidence presented for it and my own experience in schools suggests a meticulously planned curriculum is quite helpful.

When it comes to Bowers’ main claims, Fletcher et al. are about as forthright as it is possible to be in the measured tone of an academic paper. Like Buckingham, they follow Bowers’ idiosyncratic road trip through the literature, pointing out where they believe Bowers has overstated his case. Curiously, there is a table at the end of the paper summarising points of disagreement and potential points of agreement. I cannot help wondering whether this was at the suggestion of a reviewer because the authors take direct aim at Bowers’ central claim of a ‘draw’ between systematic phonics and whole language:

“…we think this conclusion is tantamount to acceptance of the null hypothesis and is not helpful to educators or their students. Not only is this statement not supported by the evidence from which Bowers claims to derive his judgments, it unnecessarily arouses controversy in a field that needs to focus on the best practices available… Evidence is consistently positive and replicable for effects of explicit phonics.”

Education research is messy and complex. Tying down the various factors is a little like tying down helium balloons in a strong wind. And we can all argue about methods and approaches, as I will do. However, the fact that so many different groups of researchers have investigated this question seriously and systematically and have found positive evidence for systematic phonics according to their own predetermined metrics, means that the idea of a draw between phonics and whole language, if not wholly and entirely inconceivable, is a deeply and profoundly eccentric position to take.

Which edges me slowly towards my final point.

I do not care for all the discussion of effect sizes that takes place within these reviews, criticisms of reviews and criticisms of criticisms of reviews. Although I accept that effect size has some validity, once you start mushing together effect sizes from studies with very different designs in order to produce an overall effect size, I start to feel uneasy. At least these are all studies of early literacy, unlike some of the strange attempts at meta-meta-analysis we have seen. Nevertheless, we know study methodology can change effect sizes and so I would prefer a systematic narrative review, encompassing all studies that meet a certain selection criteria but without the need to produce an overall metric. If I had the time and the relevant expertise, I could conduct a systematic review along these lines.

When Torgerson et al. examined the existing literature, they spotted a different, although related, problem to mine. They noted that many of the studies included in analyses like Ehri et al.’s where not randomised controlled trials. And so, given their view that only randomised trials should be used*, they did the right thing – they conducted their own systematic review based on randomised controlled trials alone.

When Bowers decided that he did not like the comparison group in Ehri et al., he should have done the same thing. He should have decided upon selection criteria and then conducted a systematic review of his own. That would have been far more powerful than attempting to critique the reviews of others and the reason is to do with researcher degrees of freedom.

The ideal experiment in the social sciences is preregistered. This means that the researcher sets out in advance what they will do, what measures they will make, and what constitutes a positive result. This is good practice due to the messily statistical nature of social science research. Basically, I have a one-in-20 chance of generating what looks like a significant result even though it is not. Therefore, if I use 20 different outcome measures, report one that is significant but do not mention the others, I can manufacture a pseudo-significant result. Preregistration, where I nominate what I will use as my outcome measure, removes these degrees of freedom.

Systematic reviews are meant to act in the same way as an experiment. At the outset, you nominate what you will use as your selection criteria. This way, if studies meet those criteria but are unhelpful to your overall hypothesis, you still have to include them and account for them. It is fine for someone else to criticise these criteria, but attempts to somehow reanalyse the results or retrospectively cast-out studies is flawed.

Imagine, for instance, that Bowers did as I suggest and decided to conduct his own review based upon systematic versus ‘nonsystematic’ phonics. Once he narrowed down his selection criteria, he may find himself excluding some of the studies used by Ehri et al. However, he may also find that he has to include some other studies not included in Ehri et al. that are not helpful to his argument. By instead critiquing Ehri et al., Bowers has the freedom to post-hoc re-evaluate conclusions without any of the constraints designed into the discipline of systematic review.

And that is a fundamental and fatal flaw.

*For those of you who care about these things, my own view is that we do not need to limit ourselves to randomised controlled trials. These are relatively rare and so such an approach means tossing out most of the evidence we have. In my view, the main problem arises in trying to treat different types of study in the same way and develop an overall metric. I would prefer a triangulation approach where perhaps the evidence from nonrandomised trials is presented in a separate section to that from randomised trials in the kind of narrative review I would wish to see.

Standard

# Should education professors have recent experience in the classroom?

I wondered what my Twitter followers might think of the idea of requiring those educating trainee teachers to have recent and relevant classroom experience themselves:

They overwhelmingly supported it. Some, in the comments, noted that Australian teacher training graduates still seem to be taught about learning styles, despite this concept being a myth.

This chimes with my own experience of interviewing graduates who all seem to have been told things that are demonstrably wrong and skewed toward the constructivist / progressivist agenda. We don’t hold it against them, but we do then have to work to dismantle some misconceptions.

Some suggested my idea would be impractical and would lead to a discontinuity of teaching for the students involved. And yet we manage school placements for trainees just fine, so I don’t see the problem. Others pointed out that there are people in Australian universities who primarily teach trainee teachers and who have never held teacher accreditation. My plan would present these folks with a problem, I suppose.

Clearly, teachers can benefit from hearing from non-teacher experts in areas such as psychology and speech pathology. It would be perverse to insist these folks gain a teaching qualification and that’s why I restricted my proposition to only those who primarily teach trainee teachers.

Even so, is my idea still just an elaborate ad hominem argument? Surely, it doesn’t matter whether the people educating our new cohort of teachers have recent teaching experience, what matters is whether they are right or wrong or effective or not. Right?

At one level, yes. But this is an argument about the probability of teacher educators coming to an informed view. Despite both groups being overwhelmingly left-of-centre politically, there are areas of palpable disagreement between the bulk of teachers and the bulk of teacher educators, at least if responses on social media are anything to go by. Teachers are wary of teacher educators trying to build coalitions with activists to oppose school exclusions, for instance, and they tend to respond positively to my arguments about differentiation, whereas teacher educators do not.

I put this down to a difference in disconfirmation. The ideas about teaching that are espoused by teacher educators do not have to butt up against reality. Teacher trainers can hold forth on approaches that are quite misconceived without receiving that feedback. Teachers, when they try these ideas out, find they don’t work. The result is a mix of self-blame and pragmatic compromise that is absent from the ideological safe space of university education faculties.

So, the purpose of teaching placements for teacher educators would be to provide a bit of a reality check.

Nevertheless, there are other possible solutions to the problem. As some mentioned on the thread, there is a strong argument for requiring teachers to have a degree in a relevant subject but then providing all or almost all of their teacher education on a paid school placement. Such schemes already exist and could replace the traditional system. I am open to that.

What’s clear is that we are failing many of our graduate teachers at present and we need to do something.

Standard

# Education research – the evidence

I posted a flashcard I had made about education research on Twitter. My intention was to provoke a discussion with some challenge to the points I had made. I wasn’t quite prepared for just how much engagement the flashcard would receive. I wouldn’t exactly call it viral, but it’s far more than I am used to.

As I expected, a number of people asked for the evidence to support these points and so this is the blogpost in which I intend to set this out. On reflection, there are a mix of issues requiring different levels of evidence. For instance, it is really up to those asserting that learning styles exist to provide supporting evidence and not up to me to prove they don’t. In contrast, I do hold the burden of proof for assertions such as that achievement boosts motivation. Nevertheless, I will do the best I can to outline the evidence as it stands for each point.

You do not learn something better if you figure it out for yourself

It is the nature of something like a flashcard that statements are pithy. And it is the nature of Twitter to argue about definitions and multiple interpretations. So let me be clear, I do not contend that being taught something really badly is as good as figuring something out for yourself. However, I don’t expect that anyone honestly interpreted it that way. Instead, take it as a statement predicated on all other things being equal i.e. the best teaching versus the best discovery learning.

The evidence for the effectiveness of discovery learning for learning something new is extremely thin and, in my experience, positive results usually come from comparing it to something that everyone agrees is not the most effective form of instruction such as completing worksheets or listening to a non-interactive lecture. If you want a better survey of the whole field then this paper by Richard Mayer lays out the case against ‘pure’ discovery while also making an interesting point about constructivism. There is also, of course, the seminal paper 2006 by Kirschner, Sweller and Clark that prompted three rebuttals, a reply from the authors, a conference and then a book. The Kirschner et al. paper discusses the ineffectiveness of ‘minimal guidance’. There has been much Twitter-style equivocation about that term and so it is also worth reading a version of this argument that the authors make in American Educator which clarifies that they are focusing on the superiority of fully guided instruction.

However, this evidence is not enough. It could be the case that discovery learning is less efficient than explicit teaching or that fewer students learn through this method, but that those who do manage to learn in this way learn the concepts at some deeper level. One study attempted to test this in the context of primary school children learning about controlling variables in science. As expected, fewer successfully learnt through discovery. However, those who did manage to learn through discovery performed no better on a task that asked them to then evaluate science fair posters to see whether variables had been controlled.

We should perhaps not be surprised. The largest education experiment of all time, Project Follow Through, found that Direct Instruction, that researchers labelled a ‘basic skills’ approach, was more effective than a whole range of models at improving outcomes for early elementary education. Importantly, models that emphasised self-esteem or constructivist models that involved elements of discovery, did not perform better than Direct Instruction on measures of problem-solving ability, reading comprehension or, indeed, self-esteem.

I am also sceptical about how the supposed superiority of discovery learning fits with theory. The messy process of figuring something out and potentially attaching new items to the wrong schema would imply a degraded form of learning compared to a structured approach. It is therefore the burden of advocates of discovery learning to start supplying some solid evidence of its benefits.

Learning styles are a myth

Do people express preferences for how they learn? Yes. However, when we try to improve teaching by catering to these learning styles, we find no effect. So that’s the myth. The concept of learning styles seems to be yet another manifestation of the WEIRD cult of individualism and we may be better off teaching to what students have in common, especially given that we only have the resources to teach them in batches of 25-30.

If you give students choice over how they learn, the often choose the least effective method

This is a key finding of Richard Clark when analysing the results of aptitude-treatment interaction (ATI) studies. Low knowledge students who would benefit from a highly structured approach tended to prefer less structured activities where they could keep a ‘low profile’. In contrast, high knowledge students who were ready to practice independent application tended to want to hold on to highly structured explicit teaching. Similarly, Foster, Rawson and Dunlosky found students were reluctant to choose to study worked examples when they would have been beneficial and Singer and Alexander found that students preferred digital to print texts, predicted they would comprehend digital texts better but actually showed a slight superiority in comprehension when using the print texts.

Again, this chimes with teaching experience. We all know students whose main revision strategy is to rewrite or even just reread their notes, when quizzing is more effective (more later) and we all have seen the group poster task where the lowest knowledge student is allocated the task of drawing the title in bubble writing. So, teachers really need to shoulder the responsibility they are paid for and take charge.

Testing is a highly effective way to boost learning

The act of retrieving something from long-term memory appears to boost that retrieval pathway. This is probably the most solid finding in all of educational psychology. Unfortunately, many people mix up frequent, in-class, low-stakes testing with high-stakes tests and examinations which they (wrongly) perceive to be a bad thing.

Karpicke outlines the argument here. Dunlosky make a teacher-friendly version of it here.

The evidence for the effectiveness of whole-class, interactive explicit teaching comes from a wide range of studies

Probably the largest body of evidence supporting whole-class explicit teaching is a body of studies that largely took place in the 1950s-1970s and are known as ‘process-product’ research. Briefly, researchers would observe classes and log various teacher behaviours. They would then look at the gains students made in assessments of their learning. Brophy and Good summarised this research in the 1980s and described the most effective models as ‘active teaching’:

“Students achieve more in classes where they spend most of their time being taught or supervised by their teachers rather than working on their own (or not working at all). These classes include frequent lessons (whole class or small group, depending on grade level and subject matter) in which the teacher presents information and develops concepts through lecture and demonstration, elaborates this information in the feedback given following responses to recitation or discussion questions, prepares the students for follow up seatwork activities by giving instructions and going through practice examples, monitors progress on assignments after releasing the students to work independently, and follows up with appropriate feedback and reteaching when necessary. The teacher carries the content to the students personally rather than depending on the curriculum materials to do so, but conveys information mostly in brief presentations followed by recitation or application opportunities, There is a great deal of teacher talk, but most of it is academic rather than procedural or managerial, and much of it involves asking questions and giving feedback rather than extended lecturing.”

This research is correlational rather than experimental and so we cannot be certain that the behaviours of these teachers cause the increased achievement of students. However, Rosenshine points out that subsequent to these studies, a range of different experimental studies involving a diverse selection of learning objectives have confirmed these findings. After practice testing, this is probably the next most solid finding we have in education.

Hirsch makes the argument here. Willingham does so here. I won’t rehearse them. Its’ worth pointing out that the experimentally validated simple view of reading posits that reading comprehension is the product of decoding ability – turning the squiggles on the page into words – and oral language competence – understanding what the words mean. The latter is not just about vocabulary, it’s about being able to form a mental model of what is being discussed and that requires sufficient knowledge.

You cannot just Google facts when you need them

This is linked to the previous point. Partly, this is a common sense argument. If you lack knowledge, how will you know what to look for? How will you know what it is that you don’t know? If you find something and lack the world knowledge to comprehend what you find then this is of little use. This is confirmed by research that shows that if you ask children to use words they have looked-up in a dictionary, you get statements like, “Mrs. Morrow stimulated the soup,” because one of the dictionary definitions of stimulate is stir-up.

But there is perhaps a more profound level to this argument. Cognitive load theory, which is based on a large number of empirical studies, posits a simplified model of the mind that consists of working memory – our conscious thoughts – and long-term memory. Working memory is severely constrained and we can only process about four items at a time. However, we can get around these constraints by drawing on schemas – webs of related knowledge – held in long-term memory. We can effectively process an entire schema without being subject to working memory’s constraints. So, knowledge in long-term memory is something you think with. It boosts your brain power. You cannot do that with knowledge held on the internet. Knowing stuff still matters. A lot.

Achievement boosts motivation

Many people seem to think that we need to motivate students to make them learn. This is often couched in terms of ‘engagement’. However, this idea may have cause-and-effect the wrong way around, or at least may neglect an important pathway. One Canadian study found that, for primary school children, maths achievement predicted later motivation but maths motivation did not predict later achievement. Pekrun and colleagues found more of a two-way relationship, but they still found a clear achievement-to-motivation pathway. This makes sense. Getting better at something is motivating.

Conversely, the classic approach of trying to excite students about, say, science by doing a funky demonstration, may not be as effective as we think. We may generate so-called ‘situational’ interest but this is unlikely to feed forward into a long-term interest in science on its own. Instead, we should probably investigate the most effective ways of teaching science – such as explicit teaching – to ensure students learn more, gain a sense of achievement and become motivated as a result.

Final thoughts

We often run up against a falsifiability issue in discussions of this kind. People have a tendency toward blanket rejections of whole swathes of evidence because it is based on test scores and they perceive these to be an invalid measure. However, if you are going to assert something to be superior, such as discovery learning, you should also be open to questions about what measures are capable of confirming your view or proving you wrong. If you cannot think of any, then you do not have an evidence-based position, you have a belief. That’s fine, but nobody else has to accept your belief or even pay it much attention.

And finally, some people on Twitter have suggested that tweeting the flashcard was just a transparent attempt at getting people to buy my book. This is not right. I actually have two books available and I encourage you to buy a copy of each. Or maybe a few copies. Or perhaps one for everyone in your organisation. That sounds like a good idea to me.

Standard

# Testing is not the problem, it is part of the solution

Imagine an expert in public health gave a media conference and said, “We have to stop this obsession with COVID-19 testing. It is causing too much stress. Instead of focusing on testing, we should focus on preventing the spread of COVID-19. Testing does not prevent the disease and it does not cure it.”

You may think the expert had lost their mind.

Clearly, this is a false choice. Nobody has ever claimed that testing a person for COVID-19 would cure or prevent the disease in that person. Testing instead gives information that tells officials how widespread the problem is or how effective their attempts to suppress the virus have been. You cannot use a thermometer to heat a room. This is a category error. Only someone operating perpendicular to reality would propose that a solution to rising COVID-19 infections would be to perform fewer tests.

Nevertheless, people make such claims about educational testing all the time. The latest example is in a piece in The Conversation that is ostensibly about the problem of convincing more students to study maths to a high level at school.

There are a number of problems with this article. Firstly, it takes a naïve approach to international assessments, assuming that the position a state occupies in a league table of results gives us information about the quality of that state’s education system rather than, say, a whole battery of other demographic and cultural factors that can affect this ranking. As a result of this, the authors suggest looking to Singapore and Estonia (the new Finland) for answers.

The authors then make questionable claims about these education systems. Singapore apparently eschews rote learning in favour of supposedly deep learning. This claim seems to originate in the fact that many years ago, the Singaporean ministry of education drew on the work of the psychologist Jerome Bruner, an advocate of discovery learning, when developing its maths curriculum, leading to the famous bar-model approach that has since been adopted elsewhere. However, this does not validate the entirety of Bruner’s views.

When you examine the detail of the Singapore mathematics syllabus, it includes statements like, “use strategies such as ‘count on’, ‘count back’, ‘make ten’ and ‘subtract from 10’ for addition and subtraction within 20 (before committing the number facts to memory) and thereafter, within 100,” for students in the first year of primary school and, “achieve mastery of multiplication and division facts,” for students in the third year. So memorisation is clearly a critical feature of the Singaporean approach.

The authors also discuss high-performing Estonia and suggest it has, “almost no high-stakes tests for school children.” But, as Mike Salter pointed out on Twitter, education in Estonia does not appear to lack assessment:

But perhaps the key is in the term ‘high-stakes for school children’. As a modifier, this may be doing a lot of work. In Australia, we only really have one assessment that is ‘high-stakes for school children’ and this is at the end of Year 12. Elsewhere the authors refer to Australian NAPLAN assessments in Years 3, 5, 7 and 9. These can be high-stakes for schools, with school results published on the MySchool website, and it is possible that schools and perhaps some parents put pressure on students to perform well in NAPLAN, but NAPLAN assessments are not intrinsically high-stakes for the students who sit them.

NAPLAN has its flaws. I have written about the changes I would make to improve this suite of assessments (e.g. here and here). But I am clear that I would rather they exist in their current form than not at all. Yes, they can distort the curriculum, particularly for reading and writing where schools attempt to directly teach students how to answer assessment questions rather than teach reading and writing more broadly – a strategy that is frankly not very successful (see e.g. here). But this is not an argument for removing assessment. It is an argument for better professional development, better teacher knowledge of the available evidence and more and better forms of assessment targeting a wider range of knowledge and skills.

In fact, an Australian state government that was serious about improving outcomes could draw on assessment as a lever. Curriculum documents are often vague, abstract and aspirational. Assessments define the curriculum in more concrete terms, but until the final year of schooling, we only really have such assessments in maths and a misleadingly decontextualised form of literacy. A state education department could develop a suite of assessments in English, maths, history, science and maybe a few more key academic subjects and then offer them to schools on an optional basis, in a similar way to the voluntary phonics check. Schools who opt in would then be given comparative data i.e. a full analysis of how their school results compare to other schools who have opted into the assessment.

You can imagine, for instance, assessments at Year 10 related to the physics of motion or a classic work of literature. At the other end of the scale, there could be assessments of basic general knowledge or sentence construction at the primary school level. Such assessments would encourage the adoption of a broader approach, less focused on NAPLAN. Coupled with a plan for school improvement, these assessments could help schools gradually become more effective as they target evidence-informed approaches on areas of weakness. By avoiding imposition, the schools that took part would be those who are most interested in learning from this evidence and so most likely to integrate it into a wider plan. Over time, other schools may become convinced of the value of opting in. The public accountability of NAPLAN would still exist but it would be supplemented by a more fine-grained layer that would be of greater us to schools.

Setting assessments in opposition to learning is fallacious. In fact, low-stakes quizzing has been shown to enhance learning and so these processes are not as distinct as the medical analogy at the start of this post implies. Nevertheless, measurement through NAPLAN is not enough to improve outcomes, even if it is capable of raising the alarm when things are going wrong. Rather than making large-scale, global and probably invalid comparisons and attempting to emulate what we imagine countries such as Singapore and Estonia are doing, I suspect a far more promising approach is to develop assessments that enable us to engage with small details and local comparisons.

After all, if you add together a lot of small improvements…

Standard

# Satisfying nobody in its entirety

From this vantage point, it looks like electors in the United States have delivered a result that satisfies nobody in its entirety. Trump is defeated but Biden will probably not be given a free run because he will not control The Senate. Instead, he will be called upon to exercise his famed deal-making skills. The pollsters, Twitter and much of the media got this wrong, but I am not surprised. After 2016 and its Trump and Brexit shocks, I added some right-of-centre sources to my media consumption so that I would not get fooled again.

American electors are perhaps wise. Democracy is a glittering prize but sheer familiarity and a lack of cultural self-confidence among western liberals has tarnished it. But this is what democracy is for – to curb power.

What’s the verdict mean? Americans are telling us some simple messages that I think we all probably know if we can climb down out of out mottes long enough to contemplate them.

The left must build broad coalitions if it wants to win power. Chopping the electorate up into identity groups, exalting and patronising some while disdaining others is not an election winning strategy nor will it ever be.

On the other hand, Trumpism with Trump extracted is essentially a complaint against unrestricted global capitalism. Shoed and saddled, we can ride capitalism to better living standards, more leisure, better health and a more sustainable future. But let it run wild and all we get is the formation of multinational monopolies and a system rigged so that its gains accrue to an ever smaller elite, while the average wage stagnates against a backdrop of rising healthcare, energy, transport and housing costs.

What we have seen is the great disconfirmation – ideology, be it neoliberal or social justice, tested against reality. In our own circles of confirmation bias on social media or with friends of our own class and level of education, fantasies and conspiracies fly and half-baked philosophies are competitively accelerated to their (il)logical conclusions. But in a democracy, we have a check against that. And that is good.

More prosaically, it is this check that I want to see replicated in the school curriculum. At present, fearful politicians largely avoid the stinging nettles of exactly what history or science we should teach, which works of literature children should read, or exactly how we should educate children about sex and relationships. It’s far more pleasant to stroll the board meadows of abstract concepts like ‘literacy’ or ‘numeracy’ or ‘critical thinking’. Nobody is going to complain if you say, ‘literacy is important’ because you are saying nothing.

If we don’t harness public sentiment through elected representatives to decide the content of the curriculum, these decisions are still made: by unaccountable bodies that set exams, by faceless state bureaucracies and even by individual schools and individual teachers. Many of these decisions will be wise, but not all. We risk the circle of confirmation bias. We risk selecting sources that provide a skewed view of reality. Occasionally, we risk flare-ups as members of the community react to something we have taught.

If we are to provide students with a balanced education and the potential to make up their own minds then we need some disharmony and contrast. Like I did after 2016, we deliberately need to seek perspectives that we disagree with and we need to ensure they are present in our schools. We must fight against the gravitational pull of teaching ideologies as truth and instead we must teach about these ideologies and then navigate the tensions with humility and compromise.

We need a curriculum that satisfies nobody in its entirety.

Standard

# Rabbiting on in the New South Wales parliament

This morning, I appeared as a witness at the New South Wales (NSW) parliamentary inquiry into the recent NSW Curriculum Review conducted by Professor Geoff Masters. This was prompted by my submission to this inquiry. I appeared alongside Dr. Fiona Mueller whose submission was far more comprehensive than mine.

As part of this process, I was able to submit further documents. So, I sent my chapter on differentiation from The researchED Guide to Education Myths which expanded on points I made in my submission and I wrote a piece explicitly dealing with the concept of Learning Progressions.

I hope I did myself justice. I found myself rabbiting on a little, in contrast to the carefully weighed and measured comments of Dr Mueller. The fact is that there is so much I want to say on this topic and careening around the finer points of international comparisons was probably not needed. At one point, I lost the thread and at another, I failed to explain my alternative to learning progressions. Thankfully, Dr Mueller made this point – we need a set of rigorous assessments.

Standard