The Lynchian Randomization

"We're going to estimate treatment effects via an ancient rock-throwing technique."

“We’re going to estimate treatment effects via an ancient rock-throwing technique.”

We developmentistas often associate randomized impact evaluations solely with development interventions (I’m looking at you Eva Vivalt), so it’s easy to forget that there are other researchers out there doing some really bizarre RCTs. For example, did you know that randomzing paracetamol  is still a thing? Psychologists seem to think that it augments an individual’s emotions, in addition to the palliative effect it has on pain. In a recent Psychological Science article, several researchers wanted to observe whether paracetamol blunted our emotional responses to distressing events.

The first experiment they ran was, well, moderately distressing. There were two types of treatment: some participants were asked to write about a `placebo subject’ – something innocuous, where the treatment group was asked to write about their own death (distress! distress!). This was cross-cut with a standard double-blind randomization of paracetamol. Then the researchers recorded their outcome of interest, which was a bit….odd:

Finally, participants read a hypothetical arrest report about a prostitute and were asked to set the amount of the bail (on a scale from $0 to $999). This measure has been used in a number of other meaning-threat studies (Proulx & Heine, 2008; Proulx et al., 2010; Randles et al., 2011; Rosenblatt, Greenberg, Solomon, Pyszczynski, & Lyon, 1989). Participants are expected to increase the bond amount after experiencing a threat, because trading sex for money is both at odds with commonly held cultural views of relationships and against the law. Increasing the bond assessment provides participants an opportunity to affirm their belief that prostitution is wrong.

Um, I think we’ll probably leave that out of our next household survey, but fine.  What was the result? The average bond levels set by each treatment group was similar, except for the group which received a distressing event but not paracetamol.


The researchers claim this means that acetaminophen (paracetamol) is actually blunting people’s normal response to the emotionally-distressing task (i.e. punishing prostitutes). In the difference between the control placebo and the `mortality salience’ placebo – approximately $120 dollars more, but there appears to be no significant difference between the treatment and control groups who were not given the drug.

Now things get even a little more bizarre. The researchers want to replicate the experiment with a similar premise but a different outcome measure and a different distressing activity. So this time they made the control group watch a four minute clip of The Simpsons, where the treatment group had to watch four minutes of the David Lynch short film “Rabbits”, which is composed of creepy humanoid rabbits You can watch the entire thing here. I recommend having something lined up to cheer you up afterwards.


In this case the respondents had to choose how much to fine a group of public rioters. The results were very similar to the first experiment: the treatment group which did not receive any paracetamol ended up fining the rioters substantially more, but there was little difference between the other three groups. Again the researchers argue that the paracetamol made the difference.


Before you start slipping people paracetamol before you give them bad news, there’s a number of reasons we might be very wary of these results. First, the theoretical groundwork is a bit shaky – while there are some psychology experiments that paracetamol does influence what they call “social pain,” there is no compelling physiological link, other than some inconclusive evidence cited at the beginning of the article. We should discount results more heavily when they don’t have such a strong grounding in either theory or prior evidence. We certainly shouldn’t use them for anything as headline-grabbing as “What is Tylenol Doing to Our Minds?”


The results also rely on what the psychologists call a meaning-maintenance model which predicts that individuals will seek compensatory affirmations of their beliefs when their expectations or `meanings’ are threatened by outside stimuli. Thus, punishing a prostitute or a rioter – the authors argue – gives the respondent a chance to affirm their belief that these practices are wrong. I don’t know enough about the subject to say whether or not the meaning-maintenance model is a sensible way of describing human behaviour, but the result seems dependent on a few too many assumptions: A) that paracetamol interacts with a part of the brain that generates these compensatory desires B) that the treatments in this experiment themselves would generate compensatory desires and C) that the outcomes of these experiment are meaningfully measuring this desire to assert one’s beliefs after a distressing event.

That said – this is why we do replications, and the researchers do well to set up two separate experiments. Plus they got to randomize David Lynch. This is awesome.

You just don’t get me

Timothy Taylor has an excellent write up on the behavioural economics results coming out of the recently-released 2015 World Development Report. One of the most striking findings is that World Bank staff tend to overestimate the tendency for poor people to be fatalistic. From Taylor’s post:

What do development experts think that the poor believe, and how does it compare to what the poor actually believe? For example, development experts were asked if they thought individuals in low-income countries would agree with the statement: “What happens to me in the future mostly depends on me.”  The development experts thought that maybe 20% of tthe poorest third would agree with this statment, but about 80% actually did. In fact, the share of those agreeing with the statement in the bottom third of the income distribution was much the same as for the upper two-thirds–and higher than the answer the devleopment experts gave for themselves!

A number of other bloggers have picked up on this result, albeit without too much discussion about what this implies. I think the implicit assumption here are that development professionals are out of touch with the poor. I think there’s a number of ways we can interpret these results. Here’s the graph in question:


So the first possibility is the implicit one, that Bank staff don’t know what the poor believe, and possibly even that they assume the poor are fatalistic, possibly to a fault. Development economics is only starting to turn its head towards the convergence of fatalism, aspirations and economic outcomes (see, for example, the recent paper by Kate Orkin and her co-authors on aspirations in Ethiopia). The story that development experts buy into this belief is an easy one to believe, but not necessarily the right one. Note that it doesn’t at all take into account what the truth is, only perceptions.

Imagine your life’s outcomes are determined by (A) your own actions and (B) everything else, including randomness. How much weight would you put on (A) vs (B)? There’s no easy answer to this, but it is perfectly possible that the world’s poor ARE poor because (B) is actually much larger than (A). When you live in a country with terrible institutions, no social safety net, frequent economic or environmental shocks, it becomes very clear that (B) dominates (A).

So the second possibility is that Bank staff aren’t assuming the poor are being fatalistic, but that they are being realistic. That they (correctly?) judge that they have little control over their own lives. If they did, then they probably wouldn’t be poor. In this case, if the responses from the above sample are genuine (we might worry that respondents would be unwilling to admit that they have little control), then it’s the poor who have it the wrong way around: they are too optimistic about how much control they have over their own lives.

The second possibility isn’t necessarily any more likely than the first, but we should be cautious about what stories eventually emerge out of the above figure – there are a number of potentially overlapping biases at play, to the extent that it is not just a straightforward story of development professionals not `getting’ the poor.

When blind is not beautiful

"Hello? Is it a placebo effect that you're looking for?

“Hello? Is it a placebo effect that you’re looking for?

Note: this is an expanded version of a post published at CGD’s Views from the Center Blog

Over at Boring Development, Francisco Toro picks up on the recent Bulte et. al. paper which attempts to implement a double-blind protocol in a `standard’ policy RCT. The study’s abstract:

Randomized controlled trials (RCTs) in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement a conventional RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed varieties to a sample of farmers. Effort responses can be quantitatively important—for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got; however, people who knew they had received the traditional seeds did much worse. Importantly, we also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.

So it appears that most of the treatment effects in this study are driven by changing behaviour by the farmers who knowingly-received modern seed varieties. Toro touts this as some sort of massive blow to the randomista movement:

This gap between the results of the open and the double-blind RCTs raises deeply troubling questions for the whole field. If, as Bulte et al. surmise, virtually the entire performance boost arises from knowing you’re participating in a trial, believing you may be using a better input, and working harder as a result, then all kinds of RCT results we’ve taken as valid come to look very shaky indeed.


Still, the study is an instant landmark: a gauntlet thrown down in front of the large and growing RCT-Industrial Complex. At the very least, it casts serious doubt on the automatic presumption of internal validity that has long attached to open RCTs. And without that presumption, what’s left, really?

Even if you take the results of this paper at face value (and there are some good reasons we shouldn’t), it’s hard to see here why these results should be that troubling.

The reason that medical researchers use double-blind protocol in clinical trials is to try and pin down the exact physiological impact of a medicine, independent of any conscious or subconscious behavioural response. Placebo effects have been fairly well established, so figuring out that medicine X has an effect above and beyond the health effects created by taking a sugar pill are important. One very important thing to note, however, is that unless there is a no-placebo control group, researchers using double-blind protocol will be unable to identify the total average treatment effect of a medicine: we will know what the impact is compared to someone else given a pill, but if we randomly selected someone in the population (with the same characteristics of the study group) to receive the treatment, we can’t say much about what the overall effect will be. Also, fairly critically, while double-blind studies allow us to make the assumption that placebo effects are similar across treatment and control groups, we cannot say anything about how they would compare to an explicit non-blind clinical trial (i.e. placebo effects might be quite different when the treated know they are treated).

Most development randomistas are answering substantially different questions than medical scientists. It is fairly easy to establish the efficacy of a set of agricultural inputs in a controlled setting: we know fertilizer `works’ in that it improve yields. We know vaccines work in savings lives and that increasing educational inputs, to some extent, improves educational outcomes. This was Jeffrey Sachs’s reasoning when he sold much of the world on the Millennium Village Project: we know what works, we just need to implement it. But most of us running RCTs aren’t interested in the direct impact of an intervention, holding behaviour constant, because it is precisely this behaviour that matters the most. If our question is “do improved seeds work in a controlled setting?” then a double-blind RCT is well and fine, but if our question is, “do improved seeds work when you distribute them openly, as you would do in pretty much any standard intervention,” then you need transparent protocols to get at the average treatment effect you are interested in.

Many economists are interested in mechanisms – in picking apart the behavioural responses to a given treatment. In this respect, the Bulte et. al. paper is very interesting: here we have an intervention which works primarily through behavioural response rather than a change in household resources, etc. This is intriguing and worth picking apart for getting a better sense of why interventions like these work. However, from the perspective of a policy wonk, we might care less: if you give people improved seeds then yields go up. If you de-worm children then schooling goes up. These are answers worth knowing even if that’s all we know.

For those of us interested in behavioural responses, we don’t necessarily need to run around running double-blind RCTs to get a handle on them. Consider this excellent paper by Jishnu Das and others on the effect of anticipated versus unanticipated school grants: when parents knew that their child’s school would be receiving more money, they reduced their own spending on school inputs enough to completely offset the gains from the grants. In a world in which we could have run the grant programme as a blinded RCT, it would have looked like grants were successfully in raising test scores – but it would have told us preciously little about how grants operate in the real world.

There’s another issue here: imposing blinding in many development RCTs creates some substantial ethical issues. Imagine, for instance, that you could fool a Kenyan farmer into not knowing whether or not she received high quality fertilizer or a bag of dirt. The average farmer might behave as if she has received nothing, she might also behave as if she had received a perfectly good bag of fertilizer, or she might hedge and use some of it, realizing that it may not be useful. Some of these decisions may be sub-optimal: if the farmer knew she was in the control group, she might have opted for a different planting method, one which would have resulted in a higher yield. In this particular example, obscuring the treatment from our study group actually runs the risk of doing them harm, especially if they believe they are treated and take complementary actions which are in fact wasteful if they are not actually in the treatment group. 

The thing you should take away from the Bulte et. al. study shouldn’t be “all RCTs are biased because we aren’t measuring placebo effects”  but instead “behavioural response matters for evaluating real-world policies.” The latter statement actually reinforces the need to have transparent RCTs, rather than to try and mimic the double-blind nature of clinical trials.

Come work for me in Tanzania (short notice)

We are looking for a short term field manager to run a household survey in Dar es Salaam, Tanzania on short notice.

  • The field manager would oversee data collection exploiting a natural experiment in the roll-out of land titling in the city. The aim of the study is to investigate whether the provision of short-term land titles by the government has led to observable differences in household behaviour and welfare. This work is part of a larger portfolio of research of urban property rights in Tanzania.
  • The position would be for approximately 3 months, based  in Dar es Salaam from early January through March 2014
  • Required: we are looking for candidates who have experience with Stata and working with household survey data. A bachelor’s or MA in a quantitative field is preferred. Previous experience working in developing countries, running surveys or managing complex projects is a plus.
  • You would be working with me, Justin Sandefur and Andy Zeitlin on on a fixed contract with the University of Oxford and would oversee a third-party firm which would cover all the practical logistics of data collection. We need someone who understands data, can grasp the research design and can ensure quality.

If you are interested, please write to me at this e-mail address: Please send a CV, an e-mail cover letter explaining why you are interested in the position and your relative strengths, and a description of your experience working with data, field experience, etc. In the subject line, please write: “DSM position: <your name here>”. E-mails which fail to do this will not be considered. We will only be in contact if you make the short list.

UPDATE: the position has now been filled. Thank you everyone who submitted.

I felt a great disturbance in the force


” It’s as if millions of economists searching for a natural experiment suddenly cried out in joy.”

From the BBC:

China is to relax its policy of restricting most couples to having only a single child, state media say.

In future, families will be allowed two children if one parent is an only child, the Xinhua news agency said.

The proposal follows this week’s meeting of a key decision-making body of the governing Communist Party.

An apple a day means nothing in a complex system

"But Mulder, what I'm seeing here goes against every single case study and ethnographic paper ever written."

“But Mulder, the evidence I’m seeing here goes against every single case study and ethnographic paper I’ve ever read.”

Recently there has been much fuss made over how researchers and practitioners should be more cognisant of how development policy plays out in environments which are characterised by complexity. While many have used the presence of complex systems to motivate a move towards more experimentation, tracking and empiricism, others have argued that we should instead eschew rigorous empirical methods (such as RCTs) and one-shot policy instruments and opt towards a more dynamic, qualitative approach to development policy.

As of late I have been particularly wary of this second camp, especially when the argument that data-driven methods and randomised controlled trials have little place in a world of complexity. Let me explain why this makes me uneasy.

The human body is itself a complex system, characterised by feedback loops and a lot of unknown parameters. Despite the fact that we know a surprising amount about what makes us tick, thanks to both theory and evidence from biology and medical science, we’re surprising inept at determining long term outcomes. Even so, when my complex system throws up signs that things are not well, I go to see my doctor. After examining me and assessing my symptoms, sometimes through laboratory testing, he makes a diagnosis. Based on that diagnosis, he chooses a treatment, often by selecting a pre-approved medication which has been tested using an RCT.

Let’s think about this for a moment. Most medical research is able to cleanly discern short-term benefits to taking a certain medication. While these medicines are developed using a heavy dose of (biological) theory and iterative testing, trials are rarely long enough to determine what the long term benefits or side-effects will be. While researchers can use previous results and theory to determine that chemical X will result in reaction Y in a human body, they rarely can account for all the possible effects. Randomised controlled trials get us part of the way there, but frequently cannot account for long term effects. So, while we can measure the aggregate effect of a treatment on an incredibly complex system on the short run, we really can’t say that much in the long term, nor can we say much about how these treatments might interact with other treatments.

In fact, it is with predictions about health over the long term where the precision of experimentation often gives way to less robust evidence (such as extended observational studies) or more ad hoc forms of rationalization (is milk good or bad for you?). Similarly, many of the bigger questions in development (how do we improve institutions? What causes economic growth?) are more difficult to address using the most rigorous methods. It is in these areas that, quite naturally, the randomistas have been least successful in their domination of the policy debate.

While we should find all of this disconcerting, the (current) inability of medical RCTs to give us definitive answers on what makes us live longer or be healthier in aggregate is hardly a reason we should rely on them any less. Imagine a world in which your doctor didn’t have access to any randomised medical research. Health professionals would have to resort to casual Bayesian inference to treat people (did John die when I gave him chemical Z?), and would have little sense of which medicines were `proven’ to work. We tend to look down on off-label use of medication, but in a world where rigorous scientific testing isn’t the norm, all prescriptions become off-label. It is a world not a million miles away from the one portrayed in the Mitchell & Web sketch “Homoeopathy A&E.”

The sketch also highlights what the development policy world is like when we toss out rigorous empirical evidence. Yes decisions are made based on qualitative expertise, but they are made without either definitive evidence (did this make a difference?) or appropriate empirical feed back (are things getting better?). A healthy dose of qualitative work is essential in development policy-making, but a world in which all decisions are done qualitatively is  far from ideal: how many of you would wish to be treated by that doctor who had been practising for 40 years, but had never read (or believed) a single medical study?

Just as medicines shown to work using rigorous clinical trials are an essential tool for a doctor navigating the complexities of human health, policies which have been shown to work in some context with an RCT become one of many tools policy-makers can use when operating within a complex policy environment. These types of rigorous trials certainly won’t solve all of our problems, but they are still extremely, extremely useful, even in a complex system. I’m glad that someone is putting out useful albeit marginal medicines which make me feel better when I get sick. It would be even better if someone could figure out more comprehensive interventions which take into account my entire biology, but in the meantime I’ll take what I can get.

The unbearable lightness of being a dead salmon

Oh my god, how many salmon died by my hands during my Odell Lake sessions?

Oh my god, how many salmon died by my hands during my Odell Lake sessions?

The neuroscientist Gregory Burns describes how he used functional MRI scans to show that dogs have similar basic emotional reactions to humans roughly equivalent to a level of sentience of a human child:

In dogs, we found that activity in the caudate increased in response to hand signals indicating food. The caudate also activated to the smells of familiar humans. And in preliminary tests, it activated to the return of an owner who had momentarily stepped out of view. Do these findings prove that dogs love us? Not quite. But many of the same things that activate the human caudate, which are associated with positive emotions, also activate the dog caudate. Neuroscientists call this a functional homology, and it may be an indication of canine emotions.

The ability to experience positive emotions, like love and attachment, would mean that dogs have a level of sentience comparable to that of a human child. And this ability suggests a rethinking of how we treat dogs.

Interesting. I’m not so sure that sentience is a necessary prerequisite for the humane treatment of animals, but it certainly would add weight to them oral argument.

Except I’m not so sure I believe the result. So I had a quick look at the study and it appears this entire result is based off of just two dogs, and the statistically significant increase in caudate activity appears to be restricted to a specific point in time after the test (so while the caudate activity seems to be significant X seconds after, it doesn’t appear to be significant X-1 or X+1 seconds after, which does raise some suspicions that the researchers cherry-picked the results ). The presence of similar activity in two dogs does bolster the result somewhat…. but come on, two dogs?

This might be a good time to bring up the famous dead salmon study again. fMRI studies are known for being particularly dodgy on the statistical inference front. A few years ago a group of Dartmouth scientists highlighted this point when they put a dead Atlantic salmon in a fMRI, showed it pictures of humans, and managed to get a statistically significant emotional response.

You see, fMRI is apparently a particular noisy way of measuring brain activity, so it’s fairly easy to throw up false positives, especially given the unit of analysis is essentially a voxel. The Dartmouth study revealed what happens when you don’t make simple corrections to account for this. Now, Burn’s study did make these corrections, so we can’t quite claim that dead salmon have a similar level of sentience to dogs. Still, we should be somewhat sceptical of an fMRI result until it is replicated again.

Of tribes and titles


“Sol, we’re new here and don’t really know anybody, so get over to Swearengen and secure us a title deed to some property.”

Things have been a bit quiet recently – part of this is due to a lengthy field-based ethnographic research trip focused on the interaction between late 80s and early 90s UK dance music and Croatian culture. I also was tied up by the always-impressive `Growth Week‘ held by the International Growth Centre Growth at LSE. I’ll let you guess which was more fun.

So let’s start with some blatant self promotion – I’ve got a new working paper out. Here’s the short, short version: most unplanned settlements or `slums’  in most of SSA are dominated by informal tenure, where your right over land is more likely to be determined by customary law, social connections, or ad hoc semi-formal methods of establishing occupancy, than it is by a formal land title. Some households are going to have an easier time of securing their tenure through informal means, others who face higher costs to doing so might be more likely to accept property rights provided by the state. I examine this by looking to see whether or not households in Dar es Salaam which are ethnically-isolated (surrounded by neighbours from other tribes) are more likely to buy property rights offered by the Tanzanian government.

For more detail, head over to the CSAE blog, where I talk about the paper in a little more detail.

I’ll leave you with an image which sums up all the fears and uncertainties of tenure in slums: a landowner on Oxford Street, Accra, who desperately wants to avoid the sale of his/her property (thanks to Elwyn Davies for this photo):


On the ethicical approval of RCTs

From Nicolas A. Christakis:

Incidentally, another thing that’s fascinating to me is that, there’s a very funny saying when it comes to the ethical review of science, or an anecdote, which is that if a doctor wakes up in the morning and decides that, for the next 100 patients with cancer that he or she sees that have this condition, he’s going to treat them all with this new drug because he thinks that drug works, he can do that. He doesn’t need to get anyone’s permission. He can use any drug “off-label” he wants when, in his judgment, it is helpful to the patient. He’ll talk to the patient. He needs to get the patient’s consent. He can’t administer the drug without the patient knowing. But, he can say to the patient, “I recommend that you do this,” and he can make this recommendation to every one of the next 100 patients he sees.

If, on the other hand, the doctor is more humble, and more judicious, and says “you know, I’m not sure that this drug works, I’m going to only give it to half of the next 100 patients I see,” then he needs to get IRB approval, because that’s research. So even though he’s giving it to fewer patients, now there’s more review.

It would be interesting to think of the off-label analogues in development. You could argue that a lot of new government policy is essentially off-label.

Hat tip to Marginal Revolutio

On day care and large impacts


“Stop whining! You kids are soft. You lack discipline. Think of the cognitive benefits.”

When I was five I distinctly remember my parents debating whether or not they should leave me at a day care centre for the afternoon, or bring me with them to the showing of Tim Burton’s Batman. They eventually decided I was old enough to come along. I was absolutely terrified, especially during that scene where the Joker shocked some guy to death with an electric buzzer.

I’m sure my traumatized state detracted from my parent’s enjoyment of Michael Keaton’s performance. Indeed, there are a lot of reasons to think why day care might be a valuable service for households – not only because they can go off and enjoy Batman films unhindered by easily-scared children, but because – if the child is young enough – the alternative to to day care involves someone in the family staying home, rather than working or going to school.

Today at the Young Lives conference I saw Pedro Carneiro present a paper (see the talk here) which suggests that the effects of access to daycare might actually seriously improve household welfare, at least within the context of poor slums in Brazil. Carneiro was luck enough to stumble across a nice national experiment: although there were several eligibility factors, and a little bit of discretionary selection, most households living in the favelas only received access to state-provided daycare if they were allocated a slot through a lottery. Thus it was relatively simple to see how households fared several years later after being allocated a slot.

The results were a bit astonishing – household income went up (8%!), as did the labour supply of the carer (usually the mother). Children also fared better in terms of cognitive and anthropometric outcomes. Carneiro very much sold this as as story of day care freeing up the time of the carer, which led to more work and thus more income. It is still unclear whether or not the effects on children were directly due to the day care centres themselves or indirectly through the change in household and carer characteristics. This is kind of an important distinction – if all the effects are driven by the latter channel, then we might focus on just getting kids out of the favelas for the day, rather than worrying as much about the educational quality of day care. If the latter – especially if carer work effort is one of the activities which are complementary to child education (I see my kid is learning a lot at the day care centre, so I’ll work a bit more to buy, for example, some books for her to read at home).

During the discussion, I asked a question which economists love to ask when they aren’t sure what else to ask: if there are $100 dollar bills on the sidewalk, why is no one picking them up? Translation: if day care in Brazil offers these amazing returns to people living in favelas, why aren’t we seeing either A) a lot more private provision or B) more local collective action, where neighbours coordinate to watch each others kids and free up time to go out and work more. Carneiro argued that not everyone understood these benefits – I found this hard to believe, given the extremely high levels of demand for day care (50% of those who didn’t win the lottery still managed to get their kid into a day care centre). Lee gave a more convincing answer: perhaps violent favelas are just awful places to have daycare centres.