When blind is not beautiful

“Hello? Is it a placebo effect that you’re looking for?

Over at Boring Development, Francisco Toro picks up on the recent Bulte et. al. paper which attempts to implement a double-blind protocol in a `standard’ policy RCT. The study’s abstract:

Randomized controlled trials (RCTs) in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement a conventional RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed varieties to a sample of farmers. Effort responses can be quantitatively important—for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got; however, people who knew they had received the traditional seeds did much worse. Importantly, we also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.

So it appears that most of the treatment effects in this study are driven by changing behaviour by the farmers who knowingly-received modern seed varieties. Toro touts this as some sort of massive blow to the randomista movement:

This gap between the results of the open and the double-blind RCTs raises deeply troubling questions for the whole field. If, as Bulte et al. surmise, virtually the entire performance boost arises from knowing you’re participating in a trial, believing you may be using a better input, and working harder as a result, then all kinds of RCT results we’ve taken as valid come to look very shaky indeed.


Still, the study is an instant landmark: a gauntlet thrown down in front of the large and growing RCT-Industrial Complex. At the very least, it casts serious doubt on the automatic presumption of internal validity that has long attached to open RCTs. And without that presumption, what’s left, really?

Even if you take the results of this paper at face value (and there are some good reasons we shouldn’t), it’s hard to see here why these results should be that troubling.

The reason that medical researchers use double-blind protocol in clinical trials is to try and pin down the exact physiological impact of a medicine, independent of any concious or subconscious behavioural response. Placebo effects have been fairly well established, so figuring out that medicine X has an affect above and beyond the health effects created by taking a sugar bill are important. One very important thing to note, however, is that unless there is a no-placebo control group, researchers using double-blind protocol will be unable to identify the total average treatment effect of a medicine: we will know what the impact is compared to someone else given a pill, but if we randomly selected someone in the population (with the same characteristics of the study group) to receive the treatment, we can’t say much about what the overall effect will be. Also, fairly critically, while double-blind studies allow us to make the assumption that placebo effects are similar across treatment and control groups, we cannot say anything about how they would compare an explicit non-blind clinical trial (i.e. placebo effects might be quite different when the treated know they are treated).

Most development randomistas are answering substantially different questions than medical scientists. It is fairly easy to establish the efficacy of a set of agricultural inputs in a controlled setting: we know fertilizer `works’ in that it improve yields. We know vaccines work in savings lives and that increasing educational inputs, to some extent, improves educational outcomes. This was Jeffrey Sachs’s reasoning when he sold much of the world on the Millennium Village Project: we know what works, we just need to implement it. But most of us running RCTs aren’t interested in the direct impact of an intervention, holding behaviour constant, because it is precisely this behaviour that matters the most. If our question is “do improved seeds work in a controlled setting?” then a double-blind RCT is well and fine, but if our question is, “do improved seeds work when you distribute them openly, as you would do in pretty much any standard intervention,” then you need transparent protocols to get at the average treatment effect you are interested in.

Many economists are interested in mechanisms – in picking apart the behavioural responses to a given treatment. In this respect, the Bulte et. al. paper is very interesting: here we have an intervention which works primarily through behavioural response rather than a change in household resources, etc. This is intriguing and worth picking apart for getting a better sense of why interventions like these work. However, from the perspective of a policy wonk, we might care less: if you give people improved seeds then yields go up. If you de-worm children then schooling goes up. These are answers worth knowing even if that’s all we know.

For those of us interested in behavioural responses, we don’t necessarily need to run around running double-blind RCTs to get a handle on them. Consider this excellent paper by Jishnu Das and others on the effect of anticipated versus unanticipated school grants: when parents knew that their child’s school would be receiving more money, they reduced their own spending on school inputs enough to completely offset the gains from the grants. In a world in which we could have run the grant programme as an RCT, it would have looked like grants were successfully in raising test scores – but it would have told us preciously little about how grants operate in the real world.

There’s another issue here: imposing blinding in many development RCTs creates some substantial ethical issues. Imagine, for instance, that you could fool a Kenyan farmer into not knowing whether or not he received high quality fertilizer or a bag of dirt. The average farmer might behave as if she has received nothing, she might also behave as if she had received a perfectly good bag of fertilizer, or she might hedge and use some of it, realizing that it may not be useful. Some of these decisions may be sub-optimal: if the farmer knew she was in the control group, she might have opted for a different planting method, one which would have resulted in a higher yield. In this particular example, obscuring the treatment from our study group actually runs the risk of doing them harm, especially if they believe they are treated and take complementary actions which are in fact wasteful if they are not actually in the treatment group. 

The thing you should take away from the Bulte et. al. study shouldn’t be “all RCTs are biased because we aren’t measuring placebo effects”  but instead “behavioural response matters for evaluating real-world policies.” The latter statement actually reinforces the need to have transparent RCTs, rather than to try and mimic the double-blind nature of clinical trials.

Come work for me in Tanzania (short notice)

We are looking for a short term field manager to run a household survey in Dar es Salaam, Tanzania on short notice.

  • The field manager would oversee data collection exploiting a natural experiment in the roll-out of land titling in the city. The aim of the study is to investigate whether the provision of short-term land titles by the government has led to observable differences in household behaviour and welfare. This work is part of a larger portfolio of research of urban property rights in Tanzania.
  • The position would be for approximately 3 months, based  in Dar es Salaam from early January through March 2014
  • Required: we are looking for candidates who have experience with Stata and working with household survey data. A bachelor’s or MA in a quantitative field is preferred. Previous experience working in developing countries, running surveys or managing complex projects is a plus.
  • You would be working with me, Justin Sandefur and Andy Zeitlin on on a fixed contract with the University of Oxford and would oversee a third-party firm which would cover all the practical logistics of data collection. We need someone who understands data, can grasp the research design and can ensure quality.

If you are interested, please write to me at this e-mail address: matt@aidthoughts.org. Please send a CV, an e-mail cover letter explaining why you are interested in the position and your relative strengths, and a description of your experience working with data, field experience, etc. In the subject line, please write: “DSM position: <your name here>”. E-mails which fail to do this will not be considered. We will only be in contact if you make the short list.

UPDATE: the position has now been filled. Thank you everyone who submitted.

I felt a great disturbance in the force


” It’s as if millions of economists searching for a natural experiment suddenly cried out in joy.”

From the BBC:

China is to relax its policy of restricting most couples to having only a single child, state media say.

In future, families will be allowed two children if one parent is an only child, the Xinhua news agency said.

The proposal follows this week’s meeting of a key decision-making body of the governing Communist Party.

An apple a day means nothing in a complex system

“But Mulder, the evidence I’m seeing here goes against every single case study and ethnographic paper I’ve ever read.”

Recently there has been much fuss made over how researchers and practitioners should be more cognisant of how development policy plays out in environments which are characterised by complexity. While many have used the presence of complex systems to motivate a move towards more experimentation, tracking and empiricism, others have argued that we should instead eschew rigorous empirical methods (such as RCTs) and one-shot policy instruments and opt towards a more dynamic, qualitative approach to development policy.

As of late I have been particularly wary of this second camp, especially when the argument that data-driven methods and randomised controlled trials have little place in a world of complexity. Let me explain why this makes me uneasy.

The human body is itself a complex system, characterised by feedback loops and a lot of unknown parameters. Despite the fact that we know a surprising amount about what makes us tick, thanks to both theory and evidence from biology and medical science, we’re surprising inept at determining long term outcomes. Even so, when my complex system throws up signs that things are not well, I go to see my doctor. After examining me and assessing my symptoms, sometimes through laboratory testing, he makes a diagnosis. Based on that diagnosis, he chooses a treatment, often by selecting a pre-approved medication which has been tested using an RCT.

Let’s think about this for a moment. Most medical research is able to cleanly discern short-term benefits to taking a certain medication. While these medicines are developed using a heavy dose of (biological) theory and iterative testing, trials are rarely long enough to determine what the long term benefits or side-effects will be. While researchers can use previous results and theory to determine that chemical X will result in reaction Y in a human body, they rarely can account for all the possible effects. Randomised controlled trials get us part of the way there, but frequently cannot account for long term effects. So, while we can measure the aggregate effect of a treatment on an incredibly complex system on the short run, we really can’t say that much in the long term, nor can we say much about how these treatments might interact with other treatments.

In fact, it is with predictions about health over the long term where the precision of experimentation often gives way to less robust evidence (such as extended observational studies) or more ad hoc forms of rationalization (is milk good or bad for you?). Similarly, many of the bigger questions in development (how do we improve institutions? What causes economic growth?) are more difficult to address using the most rigorous methods. It is in these areas that, quite naturally, the randomistas have been least successful in their domination of the policy debate.

While we should find all of this disconcerting, the (current) inability of medical RCTs to give us definitive answers on what makes us live longer or be healthier in aggregate is hardly a reason we should rely on them any less. Imagine a world in which your doctor didn’t have access to any randomised medical research. Health professionals would have to resort to casual Bayesian inference to treat people (did John die when I gave him chemical Z?), and would have little sense of which medicines were `proven’ to work. We tend to look down on off-label use of medication, but in a world where rigorous scientific testing isn’t the norm, all prescriptions become off-label. It is a world not a million miles away from the one portrayed in the Mitchell & Web sketch “Homoeopathy A&E.”

The sketch also highlights what the development policy world is like when we toss out rigorous empirical evidence. Yes decisions are made based on qualitative expertise, but they are made without either definitive evidence (did this make a difference?) or appropriate empirical feed back (are things getting better?). A healthy dose of qualitative work is essential in development policy-making, but a world in which all decisions are done qualitatively is  far from ideal: how many of you would wish to be treated by that doctor who had been practising for 40 years, but had never read (or believed) a single medical study?

Just as medicines shown to work using rigorous clinical trials are an essential tool for a doctor navigating the complexities of human health, policies which have been shown to work in some context with an RCT become one of many tools policy-makers can use when operating within a complex policy environment. These types of rigorous trials certainly won’t solve all of our problems, but they are still extremely, extremely useful, even in a complex system. I’m glad that someone is putting out useful albeit marginal medicines which make me feel better when I get sick. It would be even better if someone could figure out more comprehensive interventions which take into account my entire biology, but in the meantime I’ll take what I can get.

The unbearable lightness of being a dead salmon

Oh my god, how many salmon died by my hands during my Odell Lake sessions?

The neuroscientist Gregory Burns describes how he used functional MRI scans to show that dogs have similar basic emotional reactions to humans roughly equivalent to a level of sentience of a human child:

In dogs, we found that activity in the caudate increased in response to hand signals indicating food. The caudate also activated to the smells of familiar humans. And in preliminary tests, it activated to the return of an owner who had momentarily stepped out of view. Do these findings prove that dogs love us? Not quite. But many of the same things that activate the human caudate, which are associated with positive emotions, also activate the dog caudate. Neuroscientists call this a functional homology, and it may be an indication of canine emotions.

The ability to experience positive emotions, like love and attachment, would mean that dogs have a level of sentience comparable to that of a human child. And this ability suggests a rethinking of how we treat dogs.

Interesting. I’m not so sure that sentience is a necessary prerequisite for the humane treatment of animals, but it certainly would add weight to them oral argument.

Except I’m not so sure I believe the result. So I had a quick look at the study and it appears this entire result is based off of just two dogs, and the statistically significant increase in caudate activity appears to be restricted to a specific point in time after the test (so while the caudate activity seems to be significant X seconds after, it doesn’t appear to be significant X-1 or X+1 seconds after, which does raise some suspicions that the researchers cherry-picked the results ). The presence of similar activity in two dogs does bolster the result somewhat…. but come on, two dogs?

This might be a good time to bring up the famous dead salmon study again. fMRI studies are known for being particularly dodgy on the statistical inference front. A few years ago a group of Dartmouth scientists highlighted this point when they put a dead Atlantic salmon in a fMRI, showed it pictures of humans, and managed to get a statistically significant emotional response.

You see, fMRI is apparently a particular noisy way of measuring brain activity, so it’s fairly easy to throw up false positives, especially given the unit of analysis is essentially a voxel. The Dartmouth study revealed what happens when you don’t make simple corrections to account for this. Now, Burn’s study did make these corrections, so we can’t quite claim that dead salmon have a similar level of sentience to dogs. Still, we should be somewhat sceptical of an fMRI result until it is replicated again.

Of tribes and titles


“Sol, we’re new here and don’t really know anybody, so get over to Swearengen and secure us a title deed to some property.”

Things have been a bit quiet recently – part of this is due to a lengthy field-based ethnographic research trip focused on the interaction between late 80s and early 90s UK dance music and Croatian culture. I also was tied up by the always-impressive `Growth Week‘ held by the International Growth Centre Growth at LSE. I’ll let you guess which was more fun.

So let’s start with some blatant self promotion – I’ve got a new working paper out. Here’s the short, short version: most unplanned settlements or `slums’  in most of SSA are dominated by informal tenure, where your right over land is more likely to be determined by customary law, social connections, or ad hoc semi-formal methods of establishing occupancy, than it is by a formal land title. Some households are going to have an easier time of securing their tenure through informal means, others who face higher costs to doing so might be more likely to accept property rights provided by the state. I examine this by looking to see whether or not households in Dar es Salaam which are ethnically-isolated (surrounded by neighbours from other tribes) are more likely to buy property rights offered by the Tanzanian government.

For more detail, head over to the CSAE blog, where I talk about the paper in a little more detail.

I’ll leave you with an image which sums up all the fears and uncertainties of tenure in slums: a landowner on Oxford Street, Accra, who desperately wants to avoid the sale of his/her property (thanks to Elwyn Davies for this photo):


On the ethicical approval of RCTs

From Nicolas A. Christakis:

Incidentally, another thing that’s fascinating to me is that, there’s a very funny saying when it comes to the ethical review of science, or an anecdote, which is that if a doctor wakes up in the morning and decides that, for the next 100 patients with cancer that he or she sees that have this condition, he’s going to treat them all with this new drug because he thinks that drug works, he can do that. He doesn’t need to get anyone’s permission. He can use any drug “off-label” he wants when, in his judgment, it is helpful to the patient. He’ll talk to the patient. He needs to get the patient’s consent. He can’t administer the drug without the patient knowing. But, he can say to the patient, “I recommend that you do this,” and he can make this recommendation to every one of the next 100 patients he sees.

If, on the other hand, the doctor is more humble, and more judicious, and says “you know, I’m not sure that this drug works, I’m going to only give it to half of the next 100 patients I see,” then he needs to get IRB approval, because that’s research. So even though he’s giving it to fewer patients, now there’s more review.

It would be interesting to think of the off-label analogues in development. You could argue that a lot of new government policy is essentially off-label.

Hat tip to Marginal Revolutio

On day care and large impacts


“Stop whining! You kids are soft. You lack discipline. Think of the cognitive benefits.”

When I was five I distinctly remember my parents debating whether or not they should leave me at a day care centre for the afternoon, or bring me with them to the showing of Tim Burton’s Batman. They eventually decided I was old enough to come along. I was absolutely terrified, especially during that scene where the Joker shocked some guy to death with an electric buzzer.

I’m sure my traumatized state detracted from my parent’s enjoyment of Michael Keaton’s performance. Indeed, there are a lot of reasons to think why day care might be a valuable service for households – not only because they can go off and enjoy Batman films unhindered by easily-scared children, but because – if the child is young enough – the alternative to to day care involves someone in the family staying home, rather than working or going to school.

Today at the Young Lives conference I saw Pedro Carneiro present a paper (see the talk here) which suggests that the effects of access to daycare might actually seriously improve household welfare, at least within the context of poor slums in Brazil. Carneiro was luck enough to stumble across a nice national experiment: although there were several eligibility factors, and a little bit of discretionary selection, most households living in the favelas only received access to state-provided daycare if they were allocated a slot through a lottery. Thus it was relatively simple to see how households fared several years later after being allocated a slot.

The results were a bit astonishing – household income went up (8%!), as did the labour supply of the carer (usually the mother). Children also fared better in terms of cognitive and anthropometric outcomes. Carneiro very much sold this as as story of day care freeing up the time of the carer, which led to more work and thus more income. It is still unclear whether or not the effects on children were directly due to the day care centres themselves or indirectly through the change in household and carer characteristics. This is kind of an important distinction – if all the effects are driven by the latter channel, then we might focus on just getting kids out of the favelas for the day, rather than worrying as much about the educational quality of day care. If the latter – especially if carer work effort is one of the activities which are complementary to child education (I see my kid is learning a lot at the day care centre, so I’ll work a bit more to buy, for example, some books for her to read at home).

During the discussion, I asked a question which economists love to ask when they aren’t sure what else to ask: if there are $100 dollar bills on the sidewalk, why is no one picking them up? Translation: if day care in Brazil offers these amazing returns to people living in favelas, why aren’t we seeing either A) a lot more private provision or B) more local collective action, where neighbours coordinate to watch each others kids and free up time to go out and work more. Carneiro argued that not everyone understood these benefits – I found this hard to believe, given the extremely high levels of demand for day care (50% of those who didn’t win the lottery still managed to get their kid into a day care centre). Lee gave a more convincing answer: perhaps violent favelas are just awful places to have daycare centres.

Too cool for school

“Perhaps it’s time to re-examine the notion that kids really love going to school.”

Despite, as Lee pointed out, having a bloody good time at the Cowley Road Carnival yesterday, I rolled out of bed in time today to get to the Young Lives conference on inequalities in child outcomes. I’ll share some thoughts on the other plenaries later on, but I was particularly entertained by the final talk of the day by Lant Pritchett on why kids in developing countries might not always want to spend all of their time in school.

Pritchett’s point was fairly simple: in many settings school can be a pretty awful place to be, especially if the curriculum is moving faster than you can keep up with it. Eventually, all but a select few are left behind, leading to a “flattening out” of the learning curve. At this point, you can’t really learn anything when you are this far behind, so why stick around? At one point – and without warning – Pritchett presented an entire slide in Spanish, to give the audience a sense of how this must feel.

His argument was backed up by some fairly disconcerting evidence – Karthik Muralidharan had presented results showing that learning trajectories were nearly flat in many Indian schools, the result of a system which adheres too strictly to a curriculum designed to weed out the best at the expense of other children (which Pritchett referred to as the Russian gymnastics theory of education).

This all reminded me of my time spent running a survey in Dar es Salaam – for simplicity and safety I would meet with my enumerators within a primary school compound. Often, when the school’s security guards opened the front gate for me, they’d physically strike at children with a switch to prevent them from slipping out and running off. Not exactly the picture painted by most of those working on education in developing countries.

While Pritchett laid most of the blame on overambitious curriculum, there were some complains about teachers themselves, especially from the audience, who pointed out that dismal learning outcomes were equally a result of teacher discrimination and absenteeism.

This is a popular line to take nowadays – and has led to a focus on interventions which directly change incentives for teachers, such as improving local accountability, performance pay, or using cameras to make sure they show up. Of course, these interventions feel increasingly marginal when the entire system is broken.

We also tend to forget that schools can also be miserable places for teachers. You might have to live in places you really don’t want to live in. Teaching dozens of children whose learning outcomes are all over the place. Not everyone can be Edward James Olmos. When I lived in Malawi, I briefly volunteered at a local orphanage – attempting to teach math to a group of kids aged 8-15, whose understanding was all over the place. I lasted one day.

Perhaps the most successful interventions are those which are complementary – incentivising both teachers and students to show up and make things happen.

PS – there should be a video of Pritchett’s talk up sometime soon – watch this space. Dude is so famous he doesn’t even bother wearing a name tag.

In which Roving Bandit and I join forces, for the greater good!


Next week Lee and I will attending the Young Lives conference on child inequality in developing countries, being held here in Oxford. They’ve got some great speakers lined up (including my former supervisor Stefan Dercon) and a lot of interesting papers (including several by *ahem* good friends of mine – you can see a draft programme here). If you’re interested in child development at all, take a look here. We’ll be blogging as interesting things crop up as well as tweet at #younglives.