Randomized trials are so 1930s

Jim Manzi, the CEO of Applied Predictive Technologies (a randomized trial software firm), reminds us that we’ve been subjecting public policy to experimental methods for quite some time:

In fact, Peirce and others in the social sciences invented the RFT decades before the technique was widely used for therapeutics. By the 1930s, dozens of American universities offered courses in experimental sociology, and the English-speaking world soon saw a flowering of large-scale randomized social experiments and the widely expressed confidence that these experiments would resolve public policy debates. RFTs from the late 1960s through the early 1980s often attempted to evaluate entirely new programs or large-scale changes to existing ones, considering such topics as the negative income tax, employment programs, housing allowances, and health insurance.

So the randomistas aren’t so much as a “new wave” as the “next wave.” More interesting though, are Manzi’s thoughts on external validity:

By about a quarter-century ago, however, it had become obvious to sophisticated experimentalists that the idea that we could settle a given policy debate with a sufficiently robust experiment was naive. The reason had to do with generalization, which is the Achilles’ heel of any experiment, whether randomized or not. In medicine, for example, what we really know from a given clinical trial is that this particular list of patients who received this exact treatment delivered in these specific clinics on these dates by these doctors had these outcomes, as compared with a specific control group. But when we want to use the trial’s results to guide future action, we must generalize them into a reliable predictive rule for as-yet-unseen situations. Even if the experiment was correctly executed, how do we know that our generalization is correct?

One example he discusses the frequent experimentation used in crime-prevention, and how the (very few) subsequent attempts:

Criminologists at the University of Cambridge have done the yeoman’s work of cataloging all 122 known criminology RFTs with at least 100 test subjects executed between 1957 and 2004. By my count, about 20 percent of these demonstrated positive results—that is, a statistically significant reduction in crime for the test group versus the control group. That may sound reasonably encouraging at first. But only four of the programs that showed encouraging results in the initial RFT were then formally replicated by independent research groups. All failed to show consistent positive results.

My biggest fear about the current trend in social science RCT work is not only the failure to confirm positive results, but the failure to confirm negative results. While there is a small, but real incentive to repeat a ‘proven’ randomized study in a new setting, there isn’t much being done to confirm that a negligible treatment effect doesn’t improve elsewhere. While big RCT research groups do care about external validity, it is the initial findings that get seared into the mind of the policymakers. Flashy graphs which generalize without concern don’t help.

Here’s part of the closing to Manzi’s piece, which is a must-read if you’re interested or involved in this type of work:

It is tempting to argue that we are at the beginning of an experimental revolution in social science that will ultimately lead to unimaginable discoveries. But we should be skeptical of that argument. The experimental revolution is like a huge wave that has lost power as it has moved through topics of increasing complexity. Physics was entirely transformed. Therapeutic biology had higher causal density, but it could often rely on the assumption of uniform biological response to generalize findings reliably from randomized trials. The even higher causal densities in social sciences make generalization from even properly randomized experiments hazardous. It would likely require the reduction of social science to biology to accomplish a true revolution in our understanding of human society—and that remains, as yet, beyond the grasp of science.

The tricky ethics of education information, J-Pal edition

The Roving Bandit discovered this graph produced by J-Pal on cost-effective interventions for education.

That red bar is a result from a RCT in Madagascar which provided families with information on the “returns to education,” resulting in a reasonable increase in attendance (3.5%).

What’s the catch? The study wasn’t actually giving people an accurate measure of the returns to education in Madagascar, it was giving people the average correlation between education and income, job availability.

Why the distinction? Most economists believe that education attainment is highly endogenous – this means that brighter people and those from my advantaged backgrounds are more likely to attend and do well in school. This muddies the waters, as high-ability people are also more likely to earn higher wages.

There have been many, many attempts to control for the sort of characteristics that might bias the effect of education on earnings, but the study cited by J-PAL doesn’t really do this. This means they aren’t presenting people with an accurate estimate of their own gains from further education, but instead giving them a picture of the sort of lives educated people live.

Shouldn’t we be luring people into school anyway, though? Isn’t more education, in general, a good thing? J-Pal’s cost-effectiveness chart assumes this (notice that it assumes education is an end in itself, even though the intervention uses education as a means to further income).

Possibly, but I am dubious about the ethics of boosting people’s underlying demand for education. Providing accurate information on returns would allow them to make informed decisions (i.e. does this really make sense given my situation?), but instead providing them with a rosy picture may be leading them to make decisions that aren’t actually in their own best interest.

Supply-side interventions that lower the cost of attending schooling – pretty much everything to the right of the red bar – are more likely improve outcomes without the implicit deception.

What lies beneath

Tim Harford endorses testing (some) public policy ideas using randomised trials. One part leapt out at me:

Realising that inconvenient – or just plain boring – trial results are less likely to appear in print, medical journals now refuse to publish trials that were not logged before they started in a register of trials. Such registers ensure embarrassing results cannot be made to disappear. This is vital in medicine, and just as important in social policy.

imagine if we had some way of tracking every regression someone ran? Unless the issue is particularly contentious, journals tend to favour articles that show some sort of relationship (although authors are getting better and better at reporting all their specifications). Ideally, we’d like to know every statistical study that wasn’t published, but unfortunately running a loop in Stata is much more private than running a large RCT.

Luckily, RCTs in the social sciences are convincing enough that studies that show no effect are still worthy of publication. As they become more and more popular and competition stiffens, will this still be the case?

Much as academics squabble over minute assumptions in econometric specifications, will we be soon be arguing over differences in intervention design? When it comes to translating research into policy prescriptions I wonder how much better off we’ll be.

Even if the social sciences move closer to the “full disclosure” practices we see in the medical community – there are other filters which will still lead to bias. The most potent: the media. Positive results tend to get reporter first, and tend to stick in the public mind longer, even after they’ve been contested or discredited.