Are there times when practice testing is ineffective?

If there is one solid finding about learning from cognitive science, it is that testing or ‘retrieval practice’ enhances learning. In a typical study, subjects are randomly assigned to one of two groups. Both groups are initially presented with target material, such as a list of names or a passage of text. One group is then given the same material to restudy whereas the other is tested on it. In a delayed follow-up test, the subjects in the testing group outperform those in the restudy group.

This is probably the most replicated result in the field and would be slide one in any PowerPoint presentation on the science of learning. However, the testing effect is a little more slippery than it first seems. You may not be aware, for instance, that well-controlled studies have been conducted where no testing effect was found (e.g. here). What is going on? Should we start redesigning those PowerPoint slides?

Academic controversy

The study in the previous link is from a 2015 special edition of Educational Psychology Review. There are two key papers in this special edition, fortunately neither of which are behind a paywall. The first of these papers is by Tamara van Gog and John Sweller and the second is a response by Jeffrey Karpicke and William Aue.

The essence of van Gog & Sweller’s argument is that the testing effect disappears when the complexity of the material used in either the learning or the recall tasks, is high. van Gog & Sweller define complexity in terms of ‘element interactivity’, that is the degree to which different elements of a task relate to and depend upon one another; the connectedness of the ideas. They conduct a review of the testing effect literature which they do not claim is systematic but which they suggest shows that the testing effect disappears for tasks where element interactivity is high. van Gog & Sweller also claim that this is something of a forgotten finding, with studies as far back as 1917 hinting at the fact that testing works well for learning randomly organised material or nonsense words, but fades in effectiveness when the targeted material is more structured.

Karpicke & Aue hit back with a scathing rebuke. They suggest that the concept of element interactivity could be well defined but van Gog & Sweller have not done this. Instead, Karpicke & Aue claim that van Gog & Sweller have deployed the concept subjectively and sometimes in a nonsensical way. They claim that a number of studies support the assertion that the testing effect can be found for complex subject matter. Finally, they point to the fact that no experiments have been conducted where element interactivity was varied within the one experiment – this would be the best way to establish its effect.

How does the testing effect work?

One interesting aspect of the van Gog & Sweller paper is the discussion of how the testing effect might work. One possibility is that it is the process of testing that enhances connections between material to be recalled, related material and cues. In fact, in the 2017 Kuhn paper that they refer to, subjects who were asked to memorise nonsense syllables were observed grouping them, as if imposing order and connectedness on them. This is the kind of tactic mnemonists use to remember the sequence of cards in a deck or perform some other party trick. If material is already highly connected, there would be no need to create connections through testing.

It also seems clear that for testing to work, you have to have something to retrieve. van Gog & Sweller discuss results where items that are not recalled on the initial test are not recalled on the later test unless feedback is provided after the initial test. This makes sense but it does raise an important question: If material is complex and has not yet been fully grasped by students, will testing be beneficial? In this instance, students may benefit from restudying the material. In a practical classroom situation, they may benefit from reteaching or recapping the material.

One paper, two interpretations

The argument over the imprecision of the term ‘element interactivity’ leads to an interesting contradiction between Karpicke & Aue and van Gog & Sweller on how to interpret the findings of an experiment conducted by Roediger and Karpicke. This experiment is typical of the more recent testing effect literature, both in its design and the fact it explores the value of the testing effect with ‘educationally relevant’ materials as opposed to, say, nonsense syllables.

In the Roediger & Karpicke experiment, university undergraduates were asked to read a passage on either “The Sun” or “Sea Otters”. Some students restudied the text and others were tested on it by being asked to write down everything they could remember from the passage. This was then repeated in a follow-up test and student were scored on how many ideas they recalled. The study demonstrated a testing effect.

To van Gog & Sweller, the test is low element interactivity because the subjects simply had to recall facts in any order. This experiment therefore supports their contention. To Karpicke & Aue, the passage is high element interactivity because of the links between ideas within it (that it is apparently possible to analyse objectively and quantify) and so it supports their contention that a testing effect is to be found with complex materials.

I don’t think this question will be easily resolved because cognitive load theory’s conception of element interactivity is really about what is happening in the mind rather than in a text that is available to objectively analyse. Moreover, cognitive load theory suggests that element interactivity decreases as learner expertise increases, because more information can be processed directly from long-term memory, requiring fewer connections to be tracked in working memory.

What would this look like in my maths class?

I have to wonder about the educational relevance of a task like that in the Roediger and Karpicke paper. It would be similar to me asking students to study a mathematics worked example and then, rather than testing them on solving similar problems, I asked them to recall as many features of the example as they could. Perhaps teachers need to be on hand to help design some of these studies so that we could evaluate the testing effect in the context of maths problems, or essay writing or solving chemical equations. On the other hand, perhaps recalling steps in a worked example is a useful intermediate stage that teachers are currently neglecting and that would lead to better later application.

More and better experiments

Despite the difficulty of resolving differences over the construct of element interactivity, we should still be able to make progress on the broader question by designing suitable experiments. If it can reliably be shown that a testing effect occurs for a wide range of educationally relevant tasks, including things we can all agree, by whatever definition, are pretty complex, then Karpicke & Aue win the debate.

Subsequent to 2015, researchers have started to address this question. Some have taken up the challenge of varying element interactivity (or complexity or whatever you prefer to call it) within the one study. This paper, for instance, shows a testing effect for recalling a formula but an advantage for restudying for novices learning a mathematical problems-solving approach. Sweller is one of the co-authors so it would be good if advocate of the testing effect conducted similar experiments.


5 thoughts on “Are there times when practice testing is ineffective?

  1. Tom Burkard says:

    Seing that “As levels of expertise increased in Experiment 2, thus reducing effective complexity, this interaction was replaced by a generation effect for all materials”, this would indicate that the problem-solving approach is less effective than achieving automaticity in subskills–thus arguing for testing at each stage of learning a complex skill.

  2. Stan says:

    You make this statement “If material is already highly connected, there would be no need to create connections through testing”

    But there are two sets of connections here. The ones that an individual can recall and the ones that exist irrespective of any individual’s awareness of them.
    The testing would aim to reinforce connections an individual can recall.

    An interesting test case would be the times tables. (I know I am a broken record on this one.) You can see it as 100 unconnected items or two sets about half the size connected by commutativity, or more smaller sets connected by features such as replacing 9 cases for 9x with one rule and so on.

    Optimal study habits for memorization would seem to be one of the most useful general skills you could learn. I would guess if a survey was done of university students this would be high on the list of skills they would love to have well developed prior to university.

  3. Pingback: Práctica espaciada y autoevaluación del aprendizaje. – Haz tuyo fol

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.