How AITSL judges teachingPosted: December 2, 2016
The Australian Institute of Teaching and School Leadership (AITSL) recently asked teachers to take part in a survey. I clicked the link and immediately noticed something a little odd. I was asked to answer multiple-choice questions about my teaching and the survey explained that, “The items in each question [the possible answers] are hierarchical with regard to expertise.” So if I chose the first response in each question then that is representative of the lowest level of expertise. This is an odd structure because people don’t like to think of themselves as lacking expertise and so this might bias the survey results.
I then started the survey. The exact set of questions you get depends upon what you enter as your birth month, so this will vary for different teachers. I found that I couldn’t select any of the answers for some questions. For others, I began to wonder what evidence was being used to decide that some teacher behaviours were characteristic of a higher degree of expertise than others.
For instance, question 22 was:
22. Engage in substantive conversation
○ I pose questions to the whole class and respond to individual student answers
○ I encourage interaction between students and between teacher and students about the ideas of the topic
○ I structure conversation to enable student talk to predominate over teacher talk
Huh. So the first response shows the lowest level of expertise and the third response the highest?
Teacher effectiveness research actually suggests that whole class questioning is a strategy used by the most effective teachers who also ensure that students maximise their time involved in academic learning. I am not aware of any research that shows that more effective teachers ‘enable student talk to predominate over teacher talk’. It seems likely that this would reduce academic learning time.
The survey continued like this and so I stopped taking it seriously and started to simply record the questions. The questions and statements in the survey are based upon AITSL’s classroom practice continuum, the most striking feature of which is that it looks like a lesson observation rubric. And we all know that lesson observation is not really a valid way of assessing teacher performance, right? Perhaps not.
So I decided to contact AITSL about the classroom practice continuum through the contact page on its website. I asked if they would be able to send me information regarding the evidence used for producing the Classroom Practice Continuum. Specifically, I asked for the evidence that they had drawn upon to support the claim that the following statements are characteristics of teachers who have greater expertise:
– The teacher makes students responsible for establishing deliberate practice routines.
– They provide students with a choice of learning activities that apply discipline specific knowledge
– The teacher facilitates processes for the students to select activities based on the agreed learning goals
– The teacher supports the students to generate their own questions that lead to further inquiry.
– They negotiate assessment strategies with student
Sue Buckley of AITSL responded, was really helpful and seems very nice. She wasn’t able to provide evidence for the specific points above and I wasn’t surprised by this, given that I suspect that there isn’t any. But she was able to provide information on the evidence base more generally.
Which is intriguing.
Firstly, Sue pointed me to section 2 of the ‘Looking at Classroom Practice’ document. This explains that an expert teacher group was convened in order to assist AITSL with developing a classroom practice continuum that aligned with the AITSL Standards. This was, “guided and informed by Professor [Patrick] Griffin’s methodology that is based on the learning theories of Rasch, Glaser, Vygotsky and Bruner.” In the validation process, the development of quality criteria was informed by additional learning theories that all use developmental models of learning, including the theories of Piaget, Bruner, Griffin and Callingham, Anderson and Krathwohl, Gagne, and Dreyfus and Dreyfus.
This seems a little odd. Not only is it based upon theory rather than teacher effectiveness research, but some of these theories are demonstrably flawed. Stage theories such as those of Piaget and Dreyfus and Dreyfus, and Bruner’s ideas on discovery learning have largely been debunked (e.g. here and here). Piaget and Vygotsky tend to be considered as the fathers of modern constructivism and yet, in 2011, John Hattie stated that, “We have a whole rhetoric about discovery learning, constructivism, about learning styles that has got zero evidence for them anywhere.”
I am inclined to agree with John Hattie’s frank assessment but he is now the Chair of AITSL. So at least some of the theories that AITSL have used to construct this continuum have been debunked by their own Chair. This strikes me as an eccentric position for an organisation to be in.
AITSL then managed a feat that seems nothing short of a miracle. I have to admit that I am not familiar with Rasch analysis but I think I am going to read more about it because of what it was able to achieve. In order to validate the newly minted criteria, the folks at AITSL wrote them into a set of survey questions and were able to get 2561 teachers to respond to the survey (it seems like the survey I attempted was a repeat of this process). They then used, “Rasch analysis to identify both teacher ability and the relative difficulty of the criteria,” thus validating the criteria statements. Yes, you read that right. They were able to identify teacher ability via a survey. This is astonishing. We have no more need for lesson observation. We can forget the tortuous attempts to determine teacher effectiveness via value-added analysis. All we have to do in order to work out who the best teachers are is give them a survey and do Rasch analysis.
Unless they used the teachers’ survey responses to the criteria statements to work out their ability. But that wouldn’t make any sense because they were trying to validate those very same statements. The logic would be circular:
- Statement X is a good measure of teacher expertise.
- How do we know?
- Because the more expert teachers tend to select it.
- How do we know that these teachers are more expert?
- Because they selected Statement X.
Perhaps another proxy was used such as level of experience? But that would only couple loosely with teaching ability and might just demonstrate that more experienced teachers are better able to say the right things. I’m just not clear on this point and I am not sure that it provides any evidence for the validity of these criteria.
Apart from a discussion of some of Hattie’s own research – research that does not seem to be clearly reflected in the continuum – the only other empirical evidence is a comparison with a similar continuum developed in the U.S.
In her email to me, Sue Buckley mentioned that a literature review has now been completed that compared the practices within the continuum to the five lesson observations instruments used in the Measures of Effective Teaching (MET) project. This is interesting because it confirms that the intended purpose of the continuum is as a lesson observation tool. And yet it is evidence from the MET project that led to people like Rob Coe (linked above) questioning the validity of lesson observation as it is usually conducted.
In order to gain any kind of reliability, MET project teachers were observed teaching multiple lessons by multiple raters. Not only that, the raters viewed videos of the lessons rather than viewing them live and the teachers did not know the criteria on which they were being judged. This is important because it eliminates the effect of teachers trying to demonstrate what they believed the observers wished to see.
Even with all of these safeguards in place – ones that could not be practically replicated in schools – the resulting lesson observation scores were less accurate at predicting the future test score gains for any given teacher than were the prior test score gains of that teacher. In the end, the researchers settled on a measure that combined classroom observation scores with past test score gains. This was worse at predicting future standardised test score gains than prior test scores alone but was slightly better for predicting performance on teacher developed tests.
AITSL’s review found that ‘almost all of the elements in the MET scales can be found in the Standards and the Continuum’. This does not make a convincing case for the continuum. We don’t even know whether there are things in the continuum – such as the statements I highlighted above – that are not in the MET scales.
All Australian teachers should be concerned about this issue. As Britain emphatically moves away from judgments based upon lesson observations, the Australian government is indicating that it is going to use the AITSL standards to determine performance-related pay. If that is the case, we need a robust system built on quality empirical evidence and not something based on a menagerie of educational theories, many of which are known to be false.
Note: This is a lengthy post and so I have avoided an additional explanation of why I think many of the statements in the continuum are not only wrong, but possibly quite harmful. For a flavour of this evidence, you could take a look at Richard E Clark’s work from the 1980s that shows that students tend to enjoy the instructional activities that are least suited to them and yet student choice of activities is encouraged in the continuum.