Note: this is an expanded version of a post published at CGD’s Views from the Center Blog
Randomized controlled trials (RCTs) in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement a conventional RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed varieties to a sample of farmers. Effort responses can be quantitatively important—for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got; however, people who knew they had received the traditional seeds did much worse. Importantly, we also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.
So it appears that most of the treatment effects in this study are driven by changing behaviour by the farmers who knowingly-received modern seed varieties. Toro touts this as some sort of massive blow to the randomista movement:
This gap between the results of the open and the double-blind RCTs raises deeply troubling questions for the whole field. If, as Bulte et al. surmise, virtually the entire performance boost arises from knowing you’re participating in a trial, believing you may be using a better input, and working harder as a result, then all kinds of RCT results we’ve taken as valid come to look very shaky indeed.
Still, the study is an instant landmark: a gauntlet thrown down in front of the large and growing RCT-Industrial Complex. At the very least, it casts serious doubt on the automatic presumption of internal validity that has long attached to open RCTs. And without that presumption, what’s left, really?
Even if you take the results of this paper at face value (and there are some good reasons we shouldn’t), it’s hard to see here why these results should be that troubling.
The reason that medical researchers use double-blind protocol in clinical trials is to try and pin down the exact physiological impact of a medicine, independent of any conscious or subconscious behavioural response. Placebo effects have been fairly well established, so figuring out that medicine X has an effect above and beyond the health effects created by taking a sugar pill are important. One very important thing to note, however, is that unless there is a no-placebo control group, researchers using double-blind protocol will be unable to identify the total average treatment effect of a medicine: we will know what the impact is compared to someone else given a pill, but if we randomly selected someone in the population (with the same characteristics of the study group) to receive the treatment, we can’t say much about what the overall effect will be. Also, fairly critically, while double-blind studies allow us to make the assumption that placebo effects are similar across treatment and control groups, we cannot say anything about how they would compare to an explicit non-blind clinical trial (i.e. placebo effects might be quite different when the treated know they are treated).
Most development randomistas are answering substantially different questions than medical scientists. It is fairly easy to establish the efficacy of a set of agricultural inputs in a controlled setting: we know fertilizer `works’ in that it improve yields. We know vaccines work in savings lives and that increasing educational inputs, to some extent, improves educational outcomes. This was Jeffrey Sachs’s reasoning when he sold much of the world on the Millennium Village Project: we know what works, we just need to implement it. But most of us running RCTs aren’t interested in the direct impact of an intervention, holding behaviour constant, because it is precisely this behaviour that matters the most. If our question is “do improved seeds work in a controlled setting?” then a double-blind RCT is well and fine, but if our question is, “do improved seeds work when you distribute them openly, as you would do in pretty much any standard intervention,” then you need transparent protocols to get at the average treatment effect you are interested in.
Many economists are interested in mechanisms – in picking apart the behavioural responses to a given treatment. In this respect, the Bulte et. al. paper is very interesting: here we have an intervention which works primarily through behavioural response rather than a change in household resources, etc. This is intriguing and worth picking apart for getting a better sense of why interventions like these work. However, from the perspective of a policy wonk, we might care less: if you give people improved seeds then yields go up. If you de-worm children then schooling goes up. These are answers worth knowing even if that’s all we know.
For those of us interested in behavioural responses, we don’t necessarily need to run around running double-blind RCTs to get a handle on them. Consider this excellent paper by Jishnu Das and others on the effect of anticipated versus unanticipated school grants: when parents knew that their child’s school would be receiving more money, they reduced their own spending on school inputs enough to completely offset the gains from the grants. In a world in which we could have run the grant programme as a blinded RCT, it would have looked like grants were successfully in raising test scores – but it would have told us preciously little about how grants operate in the real world.
There’s another issue here: imposing blinding in many development RCTs creates some substantial ethical issues. Imagine, for instance, that you could fool a Kenyan farmer into not knowing whether or not she received high quality fertilizer or a bag of dirt. The average farmer might behave as if she has received nothing, she might also behave as if she had received a perfectly good bag of fertilizer, or she might hedge and use some of it, realizing that it may not be useful. Some of these decisions may be sub-optimal: if the farmer knew she was in the control group, she might have opted for a different planting method, one which would have resulted in a higher yield. In this particular example, obscuring the treatment from our study group actually runs the risk of doing them harm, especially if they believe they are treated and take complementary actions which are in fact wasteful if they are not actually in the treatment group.
The thing you should take away from the Bulte et. al. study shouldn’t be “all RCTs are biased because we aren’t measuring placebo effects” but instead “behavioural response matters for evaluating real-world policies.” The latter statement actually reinforces the need to have transparent RCTs, rather than to try and mimic the double-blind nature of clinical trials.