Randomized trials are so 1930s

Jim Manzi, the CEO of Applied Predictive Technologies (a randomized trial software firm), reminds us that we’ve been subjecting public policy to experimental methods for quite some time:

In fact, Peirce and others in the social sciences invented the RFT decades before the technique was widely used for therapeutics. By the 1930s, dozens of American universities offered courses in experimental sociology, and the English-speaking world soon saw a flowering of large-scale randomized social experiments and the widely expressed confidence that these experiments would resolve public policy debates. RFTs from the late 1960s through the early 1980s often attempted to evaluate entirely new programs or large-scale changes to existing ones, considering such topics as the negative income tax, employment programs, housing allowances, and health insurance.

So the randomistas aren’t so much as a “new wave” as the “next wave.” More interesting though, are Manzi’s thoughts on external validity:

By about a quarter-century ago, however, it had become obvious to sophisticated experimentalists that the idea that we could settle a given policy debate with a sufficiently robust experiment was naive. The reason had to do with generalization, which is the Achilles’ heel of any experiment, whether randomized or not. In medicine, for example, what we really know from a given clinical trial is that this particular list of patients who received this exact treatment delivered in these specific clinics on these dates by these doctors had these outcomes, as compared with a specific control group. But when we want to use the trial’s results to guide future action, we must generalize them into a reliable predictive rule for as-yet-unseen situations. Even if the experiment was correctly executed, how do we know that our generalization is correct?

One example he discusses the frequent experimentation used in crime-prevention, and how the (very few) subsequent attempts:

Criminologists at the University of Cambridge have done the yeoman’s work of cataloging all 122 known criminology RFTs with at least 100 test subjects executed between 1957 and 2004. By my count, about 20 percent of these demonstrated positive results—that is, a statistically significant reduction in crime for the test group versus the control group. That may sound reasonably encouraging at first. But only four of the programs that showed encouraging results in the initial RFT were then formally replicated by independent research groups. All failed to show consistent positive results.

My biggest fear about the current trend in social science RCT work is not only the failure to confirm positive results, but the failure to confirm negative results. While there is a small, but real incentive to repeat a ‘proven’ randomized study in a new setting, there isn’t much being done to confirm that a negligible treatment effect doesn’t improve elsewhere. While big RCT research groups do care about external validity, it is the initial findings that get seared into the mind of the policymakers. Flashy graphs which generalize without concern don’t help.

Here’s part of the closing to Manzi’s piece, which is a must-read if you’re interested or involved in this type of work:

It is tempting to argue that we are at the beginning of an experimental revolution in social science that will ultimately lead to unimaginable discoveries. But we should be skeptical of that argument. The experimental revolution is like a huge wave that has lost power as it has moved through topics of increasing complexity. Physics was entirely transformed. Therapeutic biology had higher causal density, but it could often rely on the assumption of uniform biological response to generalize findings reliably from randomized trials. The even higher causal densities in social sciences make generalization from even properly randomized experiments hazardous. It would likely require the reduction of social science to biology to accomplish a true revolution in our understanding of human society—and that remains, as yet, beyond the grasp of science.

4 thoughts on “Randomized trials are so 1930s

  1. Lee Crawfurd

    August 9, 2010 at 2:20pm

    I see your Jim Manzi and raise you a Tim Harford.


    “This is … simply adding the essential ingredient of randomisation to a standard pilot project that would have happened anyway

    … politicians experiment on us all the time with their latest policy wheezes. We learn little or nothing because the experiments are badly designed.”

  2. Matt

    August 9, 2010 at 2:26pm

    Lee – I think Manzi and Harford (and I!) would agree completely – it’s worth having RCTs for public policy. The point that Manzi is making is that we need to be careful how much we read into the evidence from the RCTs, and how we apply it to other contexts. Not enough is being done to properly validate these experiments, and the evidence is thus being blown way out of proportion.

  3. Ryan

    August 9, 2010 at 9:07pm

    I wrote something about the biological inspiration for RCTs a wile ago (http://ryancbriggs.net/post/142667211/randomized-controlled-trials). RCTs often work in medicine because we can usually get away with assuming that most people are biologically alike and will remain that way. We usually can’t make that strong assumption between countries (or cities, or other social units), so while RCTs can tell us if a policy intervention worked in a given time and place, they have a really hard time saying what works in general.

    In development, RCTs produce reliable, but highly context-specific, knowledge, and they will stay this way so long as the cost (in time and money) of running them remains high enough to stop us from running thousands of experiments testing the same question in different places and time periods. The biggest problem I see with RCTs is the perception that they can answer big general questions. They very likely can’t, and that is okay.

  4. Kartik Akileswaran

    August 10, 2010 at 5:45pm

    @Ryan: you are exactly right about the difference in RCT usage between medicine and social science–external validity is much easier to prove in the former than the latter. And you’re right in saying that that’s okay.

    @Matt: your point about confirmation of negative results is an important one, and I think it doesn’t get the attention it deserves. The problem in the academic world, of course, is publication bias.

Comments are closed.