The Problem With PISA's Problem Solving Results: What The Scores Really Mean

By Daniel Willingham

RCEd Commentary

When the results for a new international problem solving test were released last week, news -- whether American, British, Israeli or Malaysian -- predictably centered around how well local students fared, rather than what should have been the main focus: what the test actually measures.

The Programme for International Student Assessment for the first time in 2012 measured Creative Problem Solving. The PISA, administered by the Paris-based Organisation for Economic Co-Operation and Development, has tested 15 year-olds on reading, math and science across 44 countries and economies since 2000.

How do we know that a test measures what it purports to measure? There are a few ways to approach this problem.

One is when the content of the test seems, on the face of it, to represent what you're trying to test. For example, math tests should require the solution of mathematical problems. History tests should require that test-takers display knowledge of history and the ability to use that knowledge as historians do.

Things get trickier when you're trying to measure a more abstract cognitive ability like intelligence. In contrast to math, where we can at least hope to specify what constitutes the body of knowledge and skills of the field, intelligence is not domain-specific. So we must devise other ways to validate the test. For example, we might say that people who score well on our test show their intelligence in other commonly accepted ways, like doing well in school and on the job.

Another strategy is to define what the construct means -- "here's my definition of intelligence" -- and then make a case for why your test items measure that construct as you've defined it.

So what approach does PISA take to problem solving? It uses a combined strategy that ought to prompt serious reflection in education policymakers.

There is not any attempt to tie performance on the test to everyday measures of problem solving. (At least, none have been offered so far, but there is more detail on the construction of the test to come, in an as-yet-unpublished technical report.)

From the scores report, it appears that the problem solving test was motivated by a combination of the other two methods.

First, the OECD describes a conception of problem solving -- what they think the mental processes look like. That includes the following processes:

  • Exploring and understanding 
  • Representing and formulating
  • Planning and executing
  • Monitoring and reflecting

So we are to trust that the test measures problem solving ability because these are the constituent processes of problem solving, and we are to take it that the test authors could devise test items that tap these cognitive processes.

Now, this candidate taxonomy of processes that go into problem solving seems reasonable at a glance, but I wouldn't say that scientists are certain it's right, or even that it's the consensus best guess. Other researchers have suggested that different dimensions of problem solving are important -- for example, well-defined problems versus ill-defined problems. So pinning the validity of the PISA test on this particular taxonomy reflects a particular view of problem solving.

But the OECD uses a second argument as well. They take an abstract cognitive process -- problem solving -- and vastly restrict its sweep by essentially saying, "Sure, it's broad, but there is a limited way that we really care about how it's implemented. So we just test those."

That's the strategy adopted by the National Adult Assessment of Literacy. Reading comprehension, like problem solving, is a cognitive process, and, like problem solving, it is intimately intertwined with domain knowledge. We're better at reading about topics we already know something about. Likewise, we're better at solving problems in domains we know something about. So in addition to (as best they could) requiring very little background knowledge for the test items, the designers of the NAAL wrote questions that they could argue reflect the kind reading people must do for basic citizenship. Things like reading a government-issued pamphlet about how to vote, reading a bus schedule, and reading the instructions on prescription medicine.

The PISA problem solving test does something similar. The authors sought to present problems that students might really encounter, like figuring out how to work a new MP3 player, finding the quickest route on a map, or figuring out how to buy a subway ticket from an automated kiosk.

So with this justification, we don't need to make a strong case that we really understand problem solving at a psychological level at all. We just say "this is the kind of problem solving that people do, so we measured how well students do it."

This justification makes me nervous because the universe of possible activities we might agree represent "problem solving" seems so broad, much broader than what we would call activities for "citizenship reading." A "problem" is usually defined as a situation in which you have a goal and you lack a ready process in memory that you've used before to solve the problem or one similar to it. That covers a lot of territory. So how do we know that the test fairly represents this territory?

The taxonomy is supposed to help with that problem. "Here's the type of stuff that goes into problem solving, and look, we've got some problems for each type of stuff." But I've already said that psychologists don't have a firm enough grasp of problem solving to advance a taxonomy with much confidence.

So the PISA 2012 is surely measuring something, and what it's measuring is probably close to something I'd comfortably call "problem solving." But beyond that, I'm not sure what to say about it.

I probably shouldn't get overwrought just yet -- as I've mentioned, there is a technical report yet to come that will, I hope, leave all of us with a better idea of just what a score on this test means. Gaining that better idea will entail some hard work for education policymakers. The authors of the test have adopted a particular view of problem solving -- that's the taxonomy -- and they have adopted a particular type of assessment: Novel problems couched in everyday experiences. Education policymakers in each country must determine whether that view of problem solving syncs with theirs, and whether the type of assessment is suitable for their educational goals.

The way that people conceive of the other PISA subjects (math, science and reading) is almost surely more uniform than the way they conceive of problem solving. Likewise, the goals for assessing those subjects is also more uniform. Thus, the problem of interpreting the problem solving PISA scores is formidable compared to interpreting other scores. So no one should despair or rejoice over their country's performance just yet.


Daniel Willingham is a columnist for RealClearEducation and professor of psychology at the University of Virginia. He also writes the Daniel Willingham science and education blog.

Copyright RealClearEducation 2014. All rights reserved.


Daniel Willingham
Author Archive