The Lynchian Randomization

"We're going to estimate treatment effects via an ancient rock-throwing technique."

“We’re going to estimate treatment effects via an ancient rock-throwing technique.”

We developmentistas often associate randomized impact evaluations solely with development interventions (I’m looking at you Eva Vivalt), so it’s easy to forget that there are other researchers out there doing some really bizarre RCTs. For example, did you know that randomzing paracetamol  is still a thing? Psychologists seem to think that it augments an individual’s emotions, in addition to the palliative effect it has on pain. In a recent Psychological Science article, several researchers wanted to observe whether paracetamol blunted our emotional responses to distressing events.

The first experiment they ran was, well, moderately distressing. There were two types of treatment: some participants were asked to write about a `placebo subject’ – something innocuous, where the treatment group was asked to write about their own death (distress! distress!). This was cross-cut with a standard double-blind randomization of paracetamol. Then the researchers recorded their outcome of interest, which was a bit….odd:

Finally, participants read a hypothetical arrest report about a prostitute and were asked to set the amount of the bail (on a scale from $0 to $999). This measure has been used in a number of other meaning-threat studies (Proulx & Heine, 2008; Proulx et al., 2010; Randles et al., 2011; Rosenblatt, Greenberg, Solomon, Pyszczynski, & Lyon, 1989). Participants are expected to increase the bond amount after experiencing a threat, because trading sex for money is both at odds with commonly held cultural views of relationships and against the law. Increasing the bond assessment provides participants an opportunity to affirm their belief that prostitution is wrong.

Um, I think we’ll probably leave that out of our next household survey, but fine.  What was the result? The average bond levels set by each treatment group was similar, except for the group which received a distressing event but not paracetamol.


The researchers claim this means that acetaminophen (paracetamol) is actually blunting people’s normal response to the emotionally-distressing task (i.e. punishing prostitutes). In the difference between the control placebo and the `mortality salience’ placebo – approximately $120 dollars more, but there appears to be no significant difference between the treatment and control groups who were not given the drug.

Now things get even a little more bizarre. The researchers want to replicate the experiment with a similar premise but a different outcome measure and a different distressing activity. So this time they made the control group watch a four minute clip of The Simpsons, where the treatment group had to watch four minutes of the David Lynch short film “Rabbits”, which is composed of creepy humanoid rabbits You can watch the entire thing here. I recommend having something lined up to cheer you up afterwards.


In this case the respondents had to choose how much to fine a group of public rioters. The results were very similar to the first experiment: the treatment group which did not receive any paracetamol ended up fining the rioters substantially more, but there was little difference between the other three groups. Again the researchers argue that the paracetamol made the difference.


Before you start slipping people paracetamol before you give them bad news, there’s a number of reasons we might be very wary of these results. First, the theoretical groundwork is a bit shaky – while there are some psychology experiments that paracetamol does influence what they call “social pain,” there is no compelling physiological link, other than some inconclusive evidence cited at the beginning of the article. We should discount results more heavily when they don’t have such a strong grounding in either theory or prior evidence. We certainly shouldn’t use them for anything as headline-grabbing as “What is Tylenol Doing to Our Minds?”


The results also rely on what the psychologists call a meaning-maintenance model which predicts that individuals will seek compensatory affirmations of their beliefs when their expectations or `meanings’ are threatened by outside stimuli. Thus, punishing a prostitute or a rioter – the authors argue – gives the respondent a chance to affirm their belief that these practices are wrong. I don’t know enough about the subject to say whether or not the meaning-maintenance model is a sensible way of describing human behaviour, but the result seems dependent on a few too many assumptions: A) that paracetamol interacts with a part of the brain that generates these compensatory desires B) that the treatments in this experiment themselves would generate compensatory desires and C) that the outcomes of these experiment are meaningfully measuring this desire to assert one’s beliefs after a distressing event.

That said – this is why we do replications, and the researchers do well to set up two separate experiments. Plus they got to randomize David Lynch. This is awesome.

On the ethicical approval of RCTs

From Nicolas A. Christakis:

Incidentally, another thing that’s fascinating to me is that, there’s a very funny saying when it comes to the ethical review of science, or an anecdote, which is that if a doctor wakes up in the morning and decides that, for the next 100 patients with cancer that he or she sees that have this condition, he’s going to treat them all with this new drug because he thinks that drug works, he can do that. He doesn’t need to get anyone’s permission. He can use any drug “off-label” he wants when, in his judgment, it is helpful to the patient. He’ll talk to the patient. He needs to get the patient’s consent. He can’t administer the drug without the patient knowing. But, he can say to the patient, “I recommend that you do this,” and he can make this recommendation to every one of the next 100 patients he sees.

If, on the other hand, the doctor is more humble, and more judicious, and says “you know, I’m not sure that this drug works, I’m going to only give it to half of the next 100 patients I see,” then he needs to get IRB approval, because that’s research. So even though he’s giving it to fewer patients, now there’s more review.

It would be interesting to think of the off-label analogues in development. You could argue that a lot of new government policy is essentially off-label.

Hat tip to Marginal Revolutio

Almost as awesome as the title suggests


Charles Remington *is* the treatment

A new working paper, titled “Household Vulnerability to Wild Animal Attacks in Developing Countries: Experimental Evidence from Rural Pakistan.” Alas, this does not involve crazy academics running around unleashing wild animals on unsuspecting villages. The abstract:

Based on a three-year panel dataset of households collected in rural Pakistan, we first quantify the extent to which farmers are vulnerable to attacks by wild boars; we then examine the impact of an intervention on households’ capacity to reduce related income losses. A local nongovernmental organization implemented the intervention as a randomized controlled trial at the beginning of the second survey year. This experimental design enabled us to cleanly identify the impact of the intervention. We find that the intervention was highly effective in eliminating the crop-income loss of treated households in the second year, but that effects were not discernible in the third year. The finding from the third year could be due to the high implicit cost incurred by the households in implementing the treatment. Regarding the impact of the intervention on a number of consumption measures, the difference-in-difference estimate for the impact on consumption was insignificant in the second year, but highly positive in the third year when estimated without other controls. A part of this consumption increase was because of changes in remittance inflows. The overall results indicate the possibility that treatment in the absence of subsidies was costly for households due to hidden costs, and hence, the income gain owing to the initial treatment was transient.

So instead of randomising boar attacks, they randomised what I will dub a boar counter-insurgency strategy:

With the help of the district’s agriculture and livestock departments, PHKN designed a pilot version of the Anti-WBA Program (AWBAP). The main objective of this program was to prevent WBAs and subsequent crop-income losses. The program comprises HRD training that focuses on the awareness and prevention of WBAs. The prevention component of the program imparts information on basic techniques for scaring or trapping animals and for curtailing boar-population growth. Moreover, under the program, some basic equipment and animal drugs were provided free of charge to the treated households, upon the successful completion of training.

Drugs? From the footnote:

Drugs are used in the long term to control the boar population. It is claimed that female boars lose their fertility after consuming the drugs; however, the efficacy of the drugs has not yet been established.

So, using The Ghost and the Darkness as an analytical framework (which, frankly, I do for most things in life), they aren’t randomising the lions, they’re randomising Michael Douglas.

Hat tip to Ranil for finding this one.

Random thoughts left lying around

There has been much talk of economists starting up a trial registry for randomised interventions, or at least promoting the use of pre-analysis plans. One of the chief reasons for doing this is to curb data mining – if researchers make it clear up front which hypotheses they plan to test, this will reduce the incentive to report new results, discovered only after the researchers have had time to dig around.

While I think trial registries are worth a try, I have already discussed my worries their effects on the quantity of viable research (even if quality increases). These concerns aside, my question here is: why are trial registries primarily associated with randomised trials? Shouldn’t we also be moving to an equilibrium where all empirical research begins with a published pre-analysis plan?

I suppose the main hurdle is honesty here – for any dataset which already exists, it’s easy for me to download it, mine the data, then base my pre-analysis plan on empirical results I already know to exist. Furthermore, for any given dataset, the number of potential  hypotheses (and thus the number of pre-analysis plans which can be written by different researchers) is very large. This suggests that there is something special about writing a pre-analysis plan before the data is even collected, rather than before someone opens up Stata.

When an RCT would have been really handy

The BBC reports on a study by two psychologists, purporting that staying hydrated can improve grades:

Students who bring water into the examination hall may improve their grades, a study of 447 people found.

Controlling for ability from previous coursework results, researchers found those with water scored an average of 5% higher than those without.

The study, from the universities of East London and Westminster, also noted that older students were more likely to bring in water to exam halls

I don’t believe an RCT is needed to answer every question out there, but it is a little silly in instances like this where a simple intervention could test the same hypothesis: just hand out water bottles to a random group of students before an exam, and see who performs better.

Surely, even controlling for ability (lagged dependent variable, anyone?) students who choose to bring water into exams might be different in some unobservable way. Of course, this doesn’t stop the researchers from making policy recommendations.

White men can’t run experimental games

"Don't mind me, I'm just here as a passive observer."

The  Roving Bandit tipped me off about a (preliminary, so results may change) paper by Jacobus Cilliers, Oeindrila Dube and Bilal Siddiqi which finds that replacing a passive Sierra Leonean supervisor with a white foreigner causes experimental subjects to act more generous in dictator games:

Can the presence of white foreigners in‡uence measured behavior in developing countries? We experimentally vary foreigner presence across behavioral games conducted in 60 communities in Sierra Leone, and assess its impact on standard measures of generosity. We fi…nd that foreigner presence substantially increases player contributions in dictator games, by as much as 23 percent.

This is the first time I’ve seen an explanatory variable labeled “white-man.” It suddenly makes me wonder about every single interview I’ve ever sat in on.

Jacobus sits behind me in the economics department at Oxford – I can’t say for sure if his being there has made me a more giving person or not.


Some thoughts on More Than Good Intentions

It is more than a little embarrassing that it has taken me this long to write up my thoughts on More Than Good Intentions, given that I received and read the book back in April (note to PR people, please don’t let this stop you sending us review copies of interesting books). I doubt readers an introduction – Karlan and Appel’s book has been reviewed by pretty much every other blogger and development wonk out there, to general acclaim.

Perhaps having a bit more time to reflect and see how the book is received by others is a good thing – while my initial review would have been more comprehensive, it would also have had a fair bit of needless nitpicking. Instead of the review I would have written several months ago, I offer just a few thoughts on what the books gets right, and where it falls short:

Firstly, the acclaim is well-deserved – the book is a nice outlet into some of the more recent experimental evidence from developing countries. While it is obviously meant to be accessible to those with little working knowledge of development interventions and experimental knowledge, it can be still enjoyed by the more experienced researcher. However, those who are already  quite familiar with Dean Karlan’s work, or that of  Innovations for Poverty Action in general might find much of the book redundant.

In terms of what it lends big development debates, More Than Good Intentions is more a marginal than a seminal contribution – it will be most successful at helping its audience update their priors on conventional development interventions, rational behaviour, etc than significantly shifting the terms of the debate. It is refreshing to see a book a little more humble in scope and (mostly) less certain that we’ve got all the answers we need (there is some language at the end about proven interventions, but it isn’t too strong). This is when the randomistas are at their best: concentrating on slowly shifting the body of evidence, rather than petitioning for seismic shifts policy.

Yet, MTGI occasionally feels a little too limited in scope for its own good. By relying almost-exclusively on rigorous experimental work  (most of which they were involved or closely associated with), the authors have really limited the amount of evidence they can draw on. Subsequently, while the book feels in full stride discussing the nuances of microcredit and microsavings (one could almost see a whole new book here), it begins to feel thin as we move on to other themes such as education and agriculture, where we get less of a sense where the research Karlan and Appel highlight sits in the greater body of evidence. Development economists have been researching many of these issues for decades using less rigorous but still worthwhile methods, and it seems odd to discuss a few choice experiments without explaining how it connects to what we already know.

Chances are that you’ve already picked up or read this book. I’d definitely recommend it – but with the same grain of salt needed for all new works – use it to update your priors, not redefine them.

The external validity double standard

David McKenzie goes to town on those that complain about the lack of external validity in experimental methods. For one, the standard seems to be applied more often to research in developing countries:

So let’s look at the April 2011 AER. It contains among other papers (i) a lab experiment in which University of Bonn students were asked to count the number of zeros in tables that consisted of 150 randomly ordered zeros and ones; (ii) a paper on contracts as reference points using students of the University of Zurich or the Swiss Federal Institute of Technology Zurich; (iii) an eye-tracking experiment to see consumer choices done with 41 Caltech undergraduates; and (iv) a paper in which 62 students from an unnamed university were presented prospects for three sources of uncertainty with unknown probabilities; (v) a paper on backward induction among world class chess players.

And then, a swipe against those withing-development who argue that experimental methods aren’t externally valid:

Consider some of the most cited and well-known non-experimental empirical development research papers: Robert Townsend’s Econometrica paper on risk and insurance in India has over 1200 cites in Google Scholar, and is based on 120 households in 3 villages in rural India; Mark Rozenzweig and Oded Stark’s JPE paper on migration and marriage is based on the same Indian ICRISAT sample; Tim Conley and Chris Udry’s AER paper on technology adoption and pineapples is based on 180 households from 3 villages in southern Ghana; on a somewhat larger scale, Shankar Subramanian and Angus Deaton’s work on the demand for calories comes from 5630 households from one state in India in 1983.

From the perspective of a researcher (and one currently working on an experiment in a developing country), I completely agree with McKenzie here. Micro-empirical evidence is always useful, whether or not it is immediately generalizable or not – as long as we update our priors with care every time we read a new study.

From the perspective of a blogger who has taken swipes at the randomistas over external validity a few times, I think much of the push back on the external validity front has less to do with the research itself, and more with how the research is being trumpeted outside the academic sphere – there haven’t been any NYT articles about how eye-tracking experiments herald the end of poverty.

Some thoughts for the year

I am prepared to abandon these beliefs at the first sign of trouble.

In light of a recent shift into my late twenties and the arrival of the new year, I felt it would be reasonable to write down some of the relevant things that I have to come to think I believe. These beliefs are not necessarily backed by hard, empirical evidence and I may be prepared to abandon many of them in the future. Still, it might be useful to clarify some of these thoughts, as many will seem quite obvious to frequent readers, while others will appear counter-intuitive.
Continue reading

The Proven Mean Initiative (external validity edition)

What's the sex ratio in Malawi? We're not sure, but we *know* the sex ratio in Mozambique

Welcome to the Proven Mean Initiative (PMI), where we strive to bring you only the proven, true expected values of your variables of interest.

Want to know the true mean value of something? Wondering whether or not your assumptions about household consumption are correct?  Here at the Proven Mean Initiative, we have generated evidence for these important values through rigorous, randomized sampling in key locations of the world.

You might ask yourself “what is the mean income of the poor in Zambia?” Look below at our chart of results from our completed randomized surveys:

Want to know the life expectancy where you are? Well, we’ve got the life expectancy in Turkana (43) and Kathmandu (59.3)!

Here are PMI, we ask you to stop making potential hazardous policy decisions without knowing whether or not you know your assumptions about pertinent values have been put to the test. If no one has performed a rigorous, randomized survey on your variable of interest, then you should consider any estimates unproven, and shy away from any relevant policy decisions.

Latest news: two new studies undertaken PMI researchers has discovered that the female adult literacy rate in Eastern Ouagadougou is higher than the infant mortality rate.