Experiments aren’t everything

Embed from Getty Images

In my own PhD research, I run randomised controlled trials (RCTs). These involve setting up two or more experimental conditions, varying only one factor between them and then randomly assigning subjects – in this case, students – to each of the conditions. RCTs are considered the gold standard for working out if one thing causes another because you manipulate just that one thing and nothing else. By randomly assigning students, we know that there are no other systematic differences between the members of the groups that could account for any difference in outcomes.

You may therefore expect me to be an evangelist for experiments. You might expect me to take a dim view of other ways of trying to establish cause and effect. But that’s not quite right.

I am also impressed by correlations and the evidence that correlations provide adds to the evidence we have in education. It is true that correlation does not equal causation, but this is the starting point for a discussion rather than the end point. I mentioned correlational evidence on Twitter recently and Dylan Wiliam responded with a link to this paper by Austin Bradford Hill, written in 1965 and addressing correlations in medicine; a field commonly known as ‘epidemiology’. It makes for an interesting read.

It’s worth outlining the key problem with correlations: We might find that as one thing changes, another thing also changes; it rises or falls. However, this does not necessarily mean that the first thing caused the change in the second thing. Three possibilities are worth illustrating:

  1. There is no relationship and its just chance that the two things correlate. I may, for instance, note a pattern where shorter teachers tend to have more pens in their pockets than taller teachers, but this may just be due to the particular teachers I have sampled. If we repeated it with a different group of teachers we may find no pattern or a reversal of the pattern.
  2. Changes in both things are actually caused by a third factor. An example of this might be the discovery that living in Florida correlates with an increased risk of dementia when compared to the U.S. average. It this case, living in Florida does not cause dementia. Instead, it would likely be the fact that Florida has a larger proportion of senior citizens when compared to other states, and that senior citizens are more likely to get dementia, that is the cause of Florida’s increased rate of dementia.
  3. The arrow of causation may point in a different direction to the one we assume. For instance, we might see a correlation between students’ motivation for mathematics and their maths ability. Perhaps motivation causes students to work harder and this increases their ability. Alternatively, being more able at mathematics might cause students to be more motivated. Both sound plausible and there may even be a causal arrow pointing in both directions; a virtuous circle.

Given such issues, why not toss out correlational research altogether and simply conduct experiments?

The answer is that experiments are hard to do. Big, long experiments are particularly hard to do so if we want to know the effect of a policy change in a state education system then running a true experiment is virtually impossible. This would not be such a problem if experiments were perfectly scalable; if small, short experiments just generated little versions of the results we get with bigger, longer ones. But there is much reason to doubt this. Lab-based findings rarely have a smooth path towards implementation as a long term policy change.

In contrast to the difficulty of running large experiments, it’s pretty easy to amass correlational data and it’s getting easier all the time in this data-rich age. Correlations can also circumvent some of the ethical issues with experiments, such as when one group of students perhaps has to miss out on a promising intervention in order to act as a control. You can also often have a sort-of control group for correlational data; a ‘quasi-experimental’ design. For instance, regression discontinuity is a technique where a small change causes an individual to flip from one category to another. Imagine two children, the first of whom is born on the 31st December and the second who is born on the 1st of January. If the cut-off date for school entry is the 1st of January then, although the two children have very similar ages, the January child will have a whole year more of schooling. A different kind of quasi-experiment might involve two neighbouring districts adopting the same policy change at different times, with the late adopter then acting as a control. These two examples are drawn from a paper by Stuart Ritchie and Elliot Tucker-Drob that analyses the effect of education on general intelligence.

Correlations also have the advantage of testing real-world examples. In education, we are plagued by bad experiments where a gold-plated version of the favoured intervention is tested against a do-nothing or bog-standard control. It is probable that an inferior teaching method, delivered with lots of thought and plenty of commitment, will fare better than a mediocre enactment of a technically superior teaching method. Correlations can tell us something about everyday, ordinary examples of the two approaches under investigation e.g. this study of science teaching methods.

However, we are still left with the cause and effect problem. Bradford Hill offers some useful suggestions for evaluating correlational data but some of this is clearly most relevant to medicine and public health. I would like to focus on just a few things that I would look for when assessing the validity of inferring cause and effect from a correlation.

Key is what Bradford Hill refers to as ‘consistency’ and what we might also term ‘replication’. If we see this correlation in a range of different situations then we can probably rule out the idea that it’s a chance finding. For instance, if three quite different states adopt the same education policy at different times and, subsequently, maths scores rise in each of these states then that would seem to be telling us something. It is particularly convincing if we can take a correlation and replicate it in an experiment.

An example of this would be the process-product research of the 1960s and 1970s that sought to correlate various teacher behaviours with test score rises. A number of behaviours were identified that we might broadly term ‘explicit teaching’. However, these could just have been proxies; a particular teacher personality type, for instance, might have caused teachers to teach in a particular way and also have caused the test score gains. To try to figure this out, we could and should ask whether it is plausible that teacher behaviours cause student learning and of course it is – a plausibility test. However, that still doesn’t rule out a third factor.

Which is why a number of researchers set up experiments (e.g. here) where they taught teachers these behaviours and then looked to see if these teachers’ students performed better than a control group. We still have a problem if these experiments are badly designed but if we have a large number of correlations and reasonably well-designed experiments all pointing the same way then I think it is reasonable to infer a cause and effect relationship.

Ultimately, our inferences should depend upon triangulation. It is about more than exactly replicating an experimental finding. To be reasonably sure of a cause and effect relationship, we need to see similar effects in a range of different correlations of different designs, sizes and duration, ideally supported by experimental evidence. It’s a lot to ask for but I think we have the tools at our disposal to amass such evidence in a way that is relevant to common debates about teaching approaches.


12 thoughts on “Experiments aren’t everything

  1. I know I’ve mentioned this before on your blog, but one thing that always makes me cautious about RCTs for educational interventions is…how do you ensure proper double-blinding? A teacher’s attitude could, I would imagine, make a considerable (i.e. statistically significant) difference to the data.

      • But who supervises/runs the interventions? And how can you be sure that the affect/attitude of those who do is “equal”? And if it’s the same person (I suppose in big trials it wouldn’t be), can you rule out conscious or unconscious bias in favour of one intervention over the other?

  2. Correlations provide the only evidence we ever have for causation. Nor do we have any reason to think that causation is anything *other* than a particularly secure correlation. Read David Hume on the subject. Or my discussion of Hume at https://edtechnow.net/2015/09/10/red15/#section_2_4.

    Your first objection to correlations, that they are due to chance, really applies to coincidences, which are frequently confused with correlations by those who do not understand what a correlation *is*. The chance of coincidence in a correlation based on a large random dataset is quantifiable, very remote, and becomes even more remote as the experiment is repeated.

    • You’re right about the second part, but Hume is hardly the last word on causation. Try Judea Pearl. One of his examples of the distinction is telling a robot “If it rains, my roof will get wet” and “My neighbor’s roof gets wet whenever mine does”, whereupon the robot concludes that if I hose down my roof, my neighbor’s roof will get wet. Knowledge of causal relationships can guide interventions in systems; knowledge of correlations cannot (unless you’re implicitly using them to infer causes).

    • Hi Jane,

      I do not cite David Hume as an authority but as someone who expresses the argument very persuasively and with great elegance. It is not that I think “Ah, David Hume says x, therefore I must believe x”, but that having read and understood David Hume on x, I myself do not find it possible to believe anything other than x.

      On the same basis, I am not going to spend time reading Judea Pearl until you can mount a sufficiently interesting challenge to my current beliefs that I think it might be worth my while. I am afraid I do not find the examples you give mount any sort of a challenge at all. It merely suggests to me that you do not understand the point that I make in my blog (indeed, I find it hard to believe that you read it) or the argument that Hume made.

      In the example of the roofs getting wet, A (my roof getting wet) and B (my neighbour’s roof getting wet) can both be caused by C (it raining), in which case there is a strong correlation between A and B, explained by their common causality; or A can be caused by D (my hosing it down) in which case there is no correlation between A and B. Aggregating the two cases (which I have to do because in merely observing the wetness of the roof, I don’t know what has caused it), the correlation between A and B will be significant but imperfect until a distinction is made between A1 “my roof being rained on” and A2 “my roof being hosed down”, when the correlation between A1 and B1 will improve to near perfect, giving very strong evidence through correlation of the cause of both A1 and B1 – the fact that it is raining.

      You might object that to say “my roof being rained on” is caused by “it raining” is self evident (i.e. a tautology) – but I don’t think that is necessarily the case if I described A e.g. in terms of “my roof getting wet not as a result of human intervention”.

      The point here is that correlations need to be triangulated and interpreted. The fact that A and B are correlated cannot be taken to imply that A causes B, but it does imply either a) that A causes B, b) B causes A, or c) C causes both A and B – and all three of those cases involve some sort of causal relationship. There is no other explanation for the correlation, except for chance, which we agree becomes very remote indeed as the experiment is repeated.

      Your example does not refute the propositions:
      1. that correlation implies causation,
      2. that correlation is the only evidence that we ever have for causation (for that, you would need to show a different sort of evidence)
      3. that causation is nothing other than a particular sort of strong correlation (for that, you would need to take on the elegant argument made by Hume himself).

      Finally, the fact that Judea Pearl is a more recent writer than Hume I count of no significance at all. For me, he will stay on the shelf until you can persuade me that he has said something interesting!


  3. mc says:

    What’s important is finding a useful model. Once it’s been found and more closely correlated to success than it’s alternatives, the job is done. Rigorous experiments are a useful tool in doing this, but as you say, not the only way. Everything we do can be treated as an experiment, and providing a source of evidence.

    • You make the case that formal experiments are not the only sort of experiment. You do not make the case that there are other ways of showing the predictive reliability of a model, other than experiments.

      Perhaps Greg needs could insert “Formal” before “experiments” in order to make his position at least defensible?

      But in making a distinction between formal and informal experiments, I think we should make two distinctions. The first is about whether data are recorded and shared. Personal experience is a type of empirical experiment in which this does not happen – and as a result, it is highly unreliable.

      The other distinction is to do with the careful selection of samples, controls etc. I think that what might loosely be called “big data principles” shows that the craft of the traditional, academic researcher can to a certain degree be blown out of the water by sheer scale of data. This is particularly important for education in which there are ethical and practical problems in mounting RCTs.

      We could apply big data principles to education if and only if the trouble of collecting big-ish amounts of data was justified by the business of teaching itself and not merely for the purposes of research. That is why the answers both to how to implement good pedagogy and to how to research good pedagogy is the same thing: edtech.

      Until we start taking the application of digital technology to education seriously, we are not taking education seriously.


  4. Pingback: Correlation Isn't Causation, Is It? -Education & Teacher Conferences

  5. Pingback: Reduce maths anxiety with explicit teaching | Filling the pail

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.