The What Works Clearinghouse Isn't Working
The U.S. Education Department’s What Works Clearinghouse (WWC) is an institutional legacy of the George W. Bush administration. Its purpose is to survey the education research literature to find studies relevant to a particular topic, evaluate the methodological quality of those studies and report a summary conclusion about “what works.”
I confess that I doubted the utility of the enterprise at its beginning. In my humble opinion, the most common threat to the validity of education research is bias, not poor methodology.
Even within the grand domain of research methodology, however, the WWC evaluation process is astonishingly limited. WWC personnel check to see if a study has properly incorporated random assignment and, well, that’s pretty much it. If a study did not randomly assign students to control and experimental groups, it is nothing at all.
So, it is, admittedly, with some skepticism that I would read the WWC report on a topic within my domain of expertise, ACT/SAT Test Preparation and Coaching Programs:
Test preparation programs—sometimes referred to as test coaching programs—have been implemented with the goal of increasing student scores on college entrance tests. They generally (a) familiarize students with the format of the test; (b) introduce general test-taking strategies (e.g., get a good night’s sleep); (c) introduce specific test-taking strategies (e.g., whether the test penalizes incorrect answers, and what this means for whether or not one should guess an answer if it is not known); and (d) specific drills.
According to the What Works Clearinghouse, does test coaching for college admission exams work? Apparently, it does.
The What Works Clearinghouse (WWC) identified six studies of ACT/SAT Test Preparation and Coaching Programs that … meet WWC group design standards. Three studies meet WWC group design standards without reservations, and three studies meet WWC group design standards with reservations. Together, these studies included 65,603 high school students across the United States.
ACT/SAT Test Preparation and Coaching Programs were found to have positive effects on general academic achievement (high school) for high school students, with a medium to large extent of evidence.
Several aspects of the summary struck me as odd. First, it claims a “medium to large extent of evidence,” yet it qualified as valid only six studies out of forty reviewed. Moreover, perusing the reference list of those forty studies, I do not find some of the most important studies conducted on the topic, such as the meta-analyses conducted by Becker (1990) or Kulik, Kulik and Bangert (1984) or the 23 and 40 studies, respectively, that they summarized.
Why not? It would seem that the WWC believes that education research began in 1994, or about the time the World Wide Web was born. Granted, some older studies did not incorporate randomized designs, but some did. Contrary to what some young researchers these days seem to believe, randomized controlled trials have been implemented for over a century. In the old days, they were simply called “experiments.”
A second aspect of the test prep research summary seems odd: an indifference to scale. One study with positive findings comprised 17 students in the experimental group and 7 in the control group. By contrast, one of the “not statistically significant” studies included 64,567 students. Indeed, over 98 percent of the students the WWC claims for the six studies come from this one study, with negative results.
Weighting the six valid WWC studies produces this distribution: 99.6 percent with no significant difference and 0.4 percent with a significantly positive difference.
#studies | #students | Proportion | |
Significantly Positive | 3 | 290 | 0.4% |
Indeterminate Effect | 3 | 65,313 | 99.6% |
Yet, the WWC finds the preponderance of evidence tipping toward the significantly positive result. Moreover, one of the three positive studies randomly assigned students within a single high school, making crossover effects probable.
Recently, I attended the researchED Conference in Brooklyn. Among the speakers was a representative of the WWC. I asked the helpful and knowledgeable fellow about the weighting issue. He replied that the WWC had discussed weighting studies by size – and might well do it sometime in the near future – and no longer counting studies with equal weightings.
That solution seemed to me as suggestively rigid as the current method of study aggregation. In the case of the WWC test prep study, weighting the studies would flip the results. So, not weighting suggests a single definitive answer now (i.e., test prep works), but future weighting would suggest the opposite single definitive answer (i.e. test prep doesn’t work).
I suggested that they keep both and perhaps add others (e.g., weighting by logN). For some WWC summarized interventions, the end results, and suggestions for “what works,” would become more ambiguous. I, for one, would consider that a positive, however. The current, or any other, WWC single summarizing method implies more certainty than is warranted.
Which brings us back to the idealization of random assignment as the only worthwhile research method. As it currently operates, the WWC makes no use whatsoever of other studies—the great majority of which either do not incorporate random assignment, were conducted before the World Wide Web was introduced or have been overlooked by WWC staff for any other reason.
That wastes possibly useful information, and that’s a shame, in my humble opinion.
Richard Phelps is the author of four books on testing policy and the founder of the Nonpartisan Education Review.