On the ethicical approval of RCTs

From Nicolas A. Christakis:

Incidentally, another thing that’s fascinating to me is that, there’s a very funny saying when it comes to the ethical review of science, or an anecdote, which is that if a doctor wakes up in the morning and decides that, for the next 100 patients with cancer that he or she sees that have this condition, he’s going to treat them all with this new drug because he thinks that drug works, he can do that. He doesn’t need to get anyone’s permission. He can use any drug “off-label” he wants when, in his judgment, it is helpful to the patient. He’ll talk to the patient. He needs to get the patient’s consent. He can’t administer the drug without the patient knowing. But, he can say to the patient, “I recommend that you do this,” and he can make this recommendation to every one of the next 100 patients he sees.

If, on the other hand, the doctor is more humble, and more judicious, and says “you know, I’m not sure that this drug works, I’m going to only give it to half of the next 100 patients I see,” then he needs to get IRB approval, because that’s research. So even though he’s giving it to fewer patients, now there’s more review.

It would be interesting to think of the off-label analogues in development. You could argue that a lot of new government policy is essentially off-label.

Hat tip to Marginal Revolutio

Almost as awesome as the title suggests


Charles Remington *is* the treatment

A new working paper, titled “Household Vulnerability to Wild Animal Attacks in Developing Countries: Experimental Evidence from Rural Pakistan.” Alas, this does not involve crazy academics running around unleashing wild animals on unsuspecting villages. The abstract:

Based on a three-year panel dataset of households collected in rural Pakistan, we first quantify the extent to which farmers are vulnerable to attacks by wild boars; we then examine the impact of an intervention on households’ capacity to reduce related income losses. A local nongovernmental organization implemented the intervention as a randomized controlled trial at the beginning of the second survey year. This experimental design enabled us to cleanly identify the impact of the intervention. We find that the intervention was highly effective in eliminating the crop-income loss of treated households in the second year, but that effects were not discernible in the third year. The finding from the third year could be due to the high implicit cost incurred by the households in implementing the treatment. Regarding the impact of the intervention on a number of consumption measures, the difference-in-difference estimate for the impact on consumption was insignificant in the second year, but highly positive in the third year when estimated without other controls. A part of this consumption increase was because of changes in remittance inflows. The overall results indicate the possibility that treatment in the absence of subsidies was costly for households due to hidden costs, and hence, the income gain owing to the initial treatment was transient.

So instead of randomising boar attacks, they randomised what I will dub a boar counter-insurgency strategy:

With the help of the district’s agriculture and livestock departments, PHKN designed a pilot version of the Anti-WBA Program (AWBAP). The main objective of this program was to prevent WBAs and subsequent crop-income losses. The program comprises HRD training that focuses on the awareness and prevention of WBAs. The prevention component of the program imparts information on basic techniques for scaring or trapping animals and for curtailing boar-population growth. Moreover, under the program, some basic equipment and animal drugs were provided free of charge to the treated households, upon the successful completion of training.

Drugs? From the footnote:

Drugs are used in the long term to control the boar population. It is claimed that female boars lose their fertility after consuming the drugs; however, the efficacy of the drugs has not yet been established.

So, using The Ghost and the Darkness as an analytical framework (which, frankly, I do for most things in life), they aren’t randomising the lions, they’re randomising Michael Douglas.

Hat tip to Ranil for finding this one.

Random thoughts left lying around

There has been much talk of economists starting up a trial registry for randomised interventions, or at least promoting the use of pre-analysis plans. One of the chief reasons for doing this is to curb data mining – if researchers make it clear up front which hypotheses they plan to test, this will reduce the incentive to report new results, discovered only after the researchers have had time to dig around.

While I think trial registries are worth a try, I have already discussed my worries their effects on the quantity of viable research (even if quality increases). These concerns aside, my question here is: why are trial registries primarily associated with randomised trials? Shouldn’t we also be moving to an equilibrium where all empirical research begins with a published pre-analysis plan?

I suppose the main hurdle is honesty here – for any dataset which already exists, it’s easy for me to download it, mine the data, then base my pre-analysis plan on empirical results I already know to exist. Furthermore, for any given dataset, the number of potential  hypotheses (and thus the number of pre-analysis plans which can be written by different researchers) is very large. This suggests that there is something special about writing a pre-analysis plan before the data is even collected, rather than before someone opens up Stata.

When an RCT would have been really handy

The BBC reports on a study by two psychologists, purporting that staying hydrated can improve grades:

Students who bring water into the examination hall may improve their grades, a study of 447 people found.

Controlling for ability from previous coursework results, researchers found those with water scored an average of 5% higher than those without.

The study, from the universities of East London and Westminster, also noted that older students were more likely to bring in water to exam halls

I don’t believe an RCT is needed to answer every question out there, but it is a little silly in instances like this where a simple intervention could test the same hypothesis: just hand out water bottles to a random group of students before an exam, and see who performs better.

Surely, even controlling for ability (lagged dependent variable, anyone?) students who choose to bring water into exams might be different in some unobservable way. Of course, this doesn’t stop the researchers from making policy recommendations.

White men can’t run experimental games

"Don't mind me, I'm just here as a passive observer."

The  Roving Bandit tipped me off about a (preliminary, so results may change) paper by Jacobus Cilliers, Oeindrila Dube and Bilal Siddiqi which finds that replacing a passive Sierra Leonean supervisor with a white foreigner causes experimental subjects to act more generous in dictator games:

Can the presence of white foreigners in‡uence measured behavior in developing countries? We experimentally vary foreigner presence across behavioral games conducted in 60 communities in Sierra Leone, and assess its impact on standard measures of generosity. We fi…nd that foreigner presence substantially increases player contributions in dictator games, by as much as 23 percent.

This is the first time I’ve seen an explanatory variable labeled “white-man.” It suddenly makes me wonder about every single interview I’ve ever sat in on.

Jacobus sits behind me in the economics department at Oxford – I can’t say for sure if his being there has made me a more giving person or not.


Some thoughts on More Than Good Intentions

It is more than a little embarrassing that it has taken me this long to write up my thoughts on More Than Good Intentions, given that I received and read the book back in April (note to PR people, please don’t let this stop you sending us review copies of interesting books). I doubt readers an introduction – Karlan and Appel’s book has been reviewed by pretty much every other blogger and development wonk out there, to general acclaim.

Perhaps having a bit more time to reflect and see how the book is received by others is a good thing – while my initial review would have been more comprehensive, it would also have had a fair bit of needless nitpicking. Instead of the review I would have written several months ago, I offer just a few thoughts on what the books gets right, and where it falls short:

Firstly, the acclaim is well-deserved – the book is a nice outlet into some of the more recent experimental evidence from developing countries. While it is obviously meant to be accessible to those with little working knowledge of development interventions and experimental knowledge, it can be still enjoyed by the more experienced researcher. However, those who are already  quite familiar with Dean Karlan’s work, or that of  Innovations for Poverty Action in general might find much of the book redundant.

In terms of what it lends big development debates, More Than Good Intentions is more a marginal than a seminal contribution – it will be most successful at helping its audience update their priors on conventional development interventions, rational behaviour, etc than significantly shifting the terms of the debate. It is refreshing to see a book a little more humble in scope and (mostly) less certain that we’ve got all the answers we need (there is some language at the end about proven interventions, but it isn’t too strong). This is when the randomistas are at their best: concentrating on slowly shifting the body of evidence, rather than petitioning for seismic shifts policy.

Yet, MTGI occasionally feels a little too limited in scope for its own good. By relying almost-exclusively on rigorous experimental work  (most of which they were involved or closely associated with), the authors have really limited the amount of evidence they can draw on. Subsequently, while the book feels in full stride discussing the nuances of microcredit and microsavings (one could almost see a whole new book here), it begins to feel thin as we move on to other themes such as education and agriculture, where we get less of a sense where the research Karlan and Appel highlight sits in the greater body of evidence. Development economists have been researching many of these issues for decades using less rigorous but still worthwhile methods, and it seems odd to discuss a few choice experiments without explaining how it connects to what we already know.

Chances are that you’ve already picked up or read this book. I’d definitely recommend it – but with the same grain of salt needed for all new works – use it to update your priors, not redefine them.

The external validity double standard

David McKenzie goes to town on those that complain about the lack of external validity in experimental methods. For one, the standard seems to be applied more often to research in developing countries:

So let’s look at the April 2011 AER. It contains among other papers (i) a lab experiment in which University of Bonn students were asked to count the number of zeros in tables that consisted of 150 randomly ordered zeros and ones; (ii) a paper on contracts as reference points using students of the University of Zurich or the Swiss Federal Institute of Technology Zurich; (iii) an eye-tracking experiment to see consumer choices done with 41 Caltech undergraduates; and (iv) a paper in which 62 students from an unnamed university were presented prospects for three sources of uncertainty with unknown probabilities; (v) a paper on backward induction among world class chess players.

And then, a swipe against those withing-development who argue that experimental methods aren’t externally valid:

Consider some of the most cited and well-known non-experimental empirical development research papers: Robert Townsend’s Econometrica paper on risk and insurance in India has over 1200 cites in Google Scholar, and is based on 120 households in 3 villages in rural India; Mark Rozenzweig and Oded Stark’s JPE paper on migration and marriage is based on the same Indian ICRISAT sample; Tim Conley and Chris Udry’s AER paper on technology adoption and pineapples is based on 180 households from 3 villages in southern Ghana; on a somewhat larger scale, Shankar Subramanian and Angus Deaton’s work on the demand for calories comes from 5630 households from one state in India in 1983.

From the perspective of a researcher (and one currently working on an experiment in a developing country), I completely agree with McKenzie here. Micro-empirical evidence is always useful, whether or not it is immediately generalizable or not – as long as we update our priors with care every time we read a new study.

From the perspective of a blogger who has taken swipes at the randomistas over external validity a few times, I think much of the push back on the external validity front has less to do with the research itself, and more with how the research is being trumpeted outside the academic sphere – there haven’t been any NYT articles about how eye-tracking experiments herald the end of poverty.

Some thoughts for the year

I am prepared to abandon these beliefs at the first sign of trouble.

In light of a recent shift into my late twenties and the arrival of the new year, I felt it would be reasonable to write down some of the relevant things that I have to come to think I believe. These beliefs are not necessarily backed by hard, empirical evidence and I may be prepared to abandon many of them in the future. Still, it might be useful to clarify some of these thoughts, as many will seem quite obvious to frequent readers, while others will appear counter-intuitive.
Continue reading

The Proven Mean Initiative (external validity edition)

What's the sex ratio in Malawi? We're not sure, but we *know* the sex ratio in Mozambique

Welcome to the Proven Mean Initiative (PMI), where we strive to bring you only the proven, true expected values of your variables of interest.

Want to know the true mean value of something? Wondering whether or not your assumptions about household consumption are correct?  Here at the Proven Mean Initiative, we have generated evidence for these important values through rigorous, randomized sampling in key locations of the world.

You might ask yourself “what is the mean income of the poor in Zambia?” Look below at our chart of results from our completed randomized surveys:

Want to know the life expectancy where you are? Well, we’ve got the life expectancy in Turkana (43) and Kathmandu (59.3)!

Here are PMI, we ask you to stop making potential hazardous policy decisions without knowing whether or not you know your assumptions about pertinent values have been put to the test. If no one has performed a rigorous, randomized survey on your variable of interest, then you should consider any estimates unproven, and shy away from any relevant policy decisions.

Latest news: two new studies undertaken PMI researchers has discovered that the female adult literacy rate in Eastern Ouagadougou is higher than the infant mortality rate.

Randomized trials are so 1930s

Jim Manzi, the CEO of Applied Predictive Technologies (a randomized trial software firm), reminds us that we’ve been subjecting public policy to experimental methods for quite some time:

In fact, Peirce and others in the social sciences invented the RFT decades before the technique was widely used for therapeutics. By the 1930s, dozens of American universities offered courses in experimental sociology, and the English-speaking world soon saw a flowering of large-scale randomized social experiments and the widely expressed confidence that these experiments would resolve public policy debates. RFTs from the late 1960s through the early 1980s often attempted to evaluate entirely new programs or large-scale changes to existing ones, considering such topics as the negative income tax, employment programs, housing allowances, and health insurance.

So the randomistas aren’t so much as a “new wave” as the “next wave.” More interesting though, are Manzi’s thoughts on external validity:

By about a quarter-century ago, however, it had become obvious to sophisticated experimentalists that the idea that we could settle a given policy debate with a sufficiently robust experiment was naive. The reason had to do with generalization, which is the Achilles’ heel of any experiment, whether randomized or not. In medicine, for example, what we really know from a given clinical trial is that this particular list of patients who received this exact treatment delivered in these specific clinics on these dates by these doctors had these outcomes, as compared with a specific control group. But when we want to use the trial’s results to guide future action, we must generalize them into a reliable predictive rule for as-yet-unseen situations. Even if the experiment was correctly executed, how do we know that our generalization is correct?

One example he discusses the frequent experimentation used in crime-prevention, and how the (very few) subsequent attempts:

Criminologists at the University of Cambridge have done the yeoman’s work of cataloging all 122 known criminology RFTs with at least 100 test subjects executed between 1957 and 2004. By my count, about 20 percent of these demonstrated positive results—that is, a statistically significant reduction in crime for the test group versus the control group. That may sound reasonably encouraging at first. But only four of the programs that showed encouraging results in the initial RFT were then formally replicated by independent research groups. All failed to show consistent positive results.

My biggest fear about the current trend in social science RCT work is not only the failure to confirm positive results, but the failure to confirm negative results. While there is a small, but real incentive to repeat a ‘proven’ randomized study in a new setting, there isn’t much being done to confirm that a negligible treatment effect doesn’t improve elsewhere. While big RCT research groups do care about external validity, it is the initial findings that get seared into the mind of the policymakers. Flashy graphs which generalize without concern don’t help.

Here’s part of the closing to Manzi’s piece, which is a must-read if you’re interested or involved in this type of work:

It is tempting to argue that we are at the beginning of an experimental revolution in social science that will ultimately lead to unimaginable discoveries. But we should be skeptical of that argument. The experimental revolution is like a huge wave that has lost power as it has moved through topics of increasing complexity. Physics was entirely transformed. Therapeutic biology had higher causal density, but it could often rely on the assumption of uniform biological response to generalize findings reliably from randomized trials. The even higher causal densities in social sciences make generalization from even properly randomized experiments hazardous. It would likely require the reduction of social science to biology to accomplish a true revolution in our understanding of human society—and that remains, as yet, beyond the grasp of science.