A randomista for hire is a dangerous thing

Our research shows that the treated (caged) group was 30% more likely to return home than the non-caged group.

Our research shows that the treated (caged) group was 30% more likely to return home than the control (non-caged) group.

The Behavioural Insights Team is a research unit made up of randomistas who prefer to rely on behavioural economics and psychology to develop and test `nudges’ to achieve certain policy goals. They originally grew out of the Cabinet Office, but eventually went private (the CO has retained a stake in the BIT).

I was always excited by the mere existence of the Behavioural Insights Team – this was the first clear example of government investing in rigorous randomisation to test some of its policies.

That said, while the BIT likely comprises a group of people who want to make the world a better place, they are beholden to their clients. One of these clients is the Home Office, which is currently paying the BIT to find ways to convince illegal migrants to voluntarily leave the UK. From the BIT’s update report:

Increasing voluntary departures of illegal migrants

BIT has been working with the Home Office to consider new measures to help illegal migrants to voluntarily return home, focusing initially on engagement at reporting centres. Reporting centres are seen as an important but underutilised opportunity to prompt illegal migrants to consider whether leaving the UK voluntarily would be a preferable option in their circumstances.

Starting in December 2014, BIT undertook a short piece of ethnographic research at reporting centres across London, reviewing current procedures and interaction points to gain an understanding of the reporting centre experience from the perspective of a member of the reporting population and the reporting agent.

Informed by this, BIT developed several options for Home Office consideration to employ behaviourally informed trials in reporting centres that could encourage higher numbers of voluntary departures from the UK.

At this stage, the precise scope of a trial is still being finalised, with the aim to combine a number of behavioural elements to create a distinct reporting centre experience that encourages members of the reporting population to consider voluntary departure as an alternative to their current situation.

Note that many people who end up in reporting centres are asylum seekers, not just illegal `economic’ migrants. The BIT has another project in the pipeline aimed at targeting business who hire illegal migrants, with a similar end goal of convincing the migrants to voluntarily go home. The Home Office got a lot of push back from trying this before, in the not-too-subtle form of a van driving around telling migrants to go home:


So now the UK government has turned to more insidious methods, aided by a team of randomistas. It’s useful reminder that rigorous, evidence-based policy can be used for stupid, short-sighted policy as well.


*Disclaimer: I once applied to work at the BIT, but dropped out midway through the selection process to work on a project in Oxford.

The IMF, inequality and the trickle-down of empirical research

"It took so many assumptions to put you together!"

“It took so many assumptions to put you together!”

By Nicolas Van de Sijpe

recent IMF staff discussion note has received a lot of attention for claiming that a smaller income share of the poor lowers economic growth (see also here and here). This piece in the FT is fairly typical, arguing that the paper “establishes a direct link between how income is distributed and national growth.”

It quotes Nicolas Mombrial, head of Oxfam International’s office in Washington DC, saying that (my emphasis): “the IMF proves that making the rich richer does not work for growth, while focusing on the poor and the middle class does” and that “the IMF has shown that `trickle down’ economics is dead; you cannot rely on the spoils of the extremely wealthy to benefit the rest of us.”

The aim of this blog post is to clarify that the results in Table 1 of  the paper, which are based on system GMM estimation, rely on assumptions that are not spelled out explicitly and whose validity is therefore very difficult to assess. In not reporting this and other relevant information, the paper’s application of system GMM falls short of current best practices. As a result, without this additional information, I would be wary to update my prior on the effect of inequality on growth based on the new results reported in this paper.

The paper attempts to establish the causal effect of various income quintiles (the share of income accruing to the bottom 20%, the next 20% etc.) on economic growth. It finds that a country will grow faster if the share of income held by the bottom three quintiles increases. In contrast, a higher income share for the richest 20% reduces growth. As you can imagine, establishing such a causal effect is difficult: growth might affect how income is distributed, and numerous other variables (openness to trade, institutions, policy choices…) might affect both growth and the distribution of income. Clearly, this implies that any association found between the income distribution and growth might reflect things other than just the causal effect of the former on the latter.

To try to get around this problem, the authors use a system GMM estimator. This estimator consists of (i) differenced equations where the changes in the variables are instrumented by their lagged levels and (ii) equations in levels where the levels of variables are instrumented by their lagged differences (Bond, 2002, is an excellent introduction). Roughly speaking, the hope is that these lagged levels and differences isolate bits of variation in income share quintiles that are not affected by growth or any of the omitted variables. These bits of variation can then be used to identify the causal effect of the income distribution on growth. The problem with the IMF paper is that it does not tell you exactly which lagged levels and differences it uses as instruments, making it hard for readers to assess how plausible it is that the paper has identified a causal effects.

Continue reading

I drink your milkshake


The Ethiopians appear to be close to finalizing construction of a large hydroelectric dam on the Omo river, primarily to generate power but also to support local irrigation efforts.  Over the past five years the project has received substantial foreign financing and investment by China and indirectly by the World Bank. However, there appears to have been little consideration of the potential downstream impacts: the Omo river feeds Lake Turkana, which is a source of livelihood for a large number of communities in northern Kenya. The possibility that the lake may be partially drained is obviously upsetting a lot of people, although it does not seem that the Kenyan government is making a big fuss over the project.

This is a typical problem of negative externalities: the Ethiopians aren’t factoring in the welfare of Kenyan Turkana residents in the decision to build the dam. There’s actually some research showing that this is a common problem. From a recent World Bank paper by Sheila Olmstead and Hilary Sigman:

This paper examines whether countries consider the welfare of other nations when they make water development decisions. The paper estimates econometric models of the location of major dams around the world as a function of the degree of international sharing of rivers. The analysis finds that dams are more prevalent in areas of river basins upstream of foreign countries, supporting the view that countries free ride in exploiting water resources. There is weak evidence that international water management institutions reduce the extent of such free-riding.

By their very nature dams generate inequality in the flow of water between upstream and downstream areas. It is easier to pay the cost of hurting downstream communities when they are are in a different country (hey, they don’t vote for you). Ergo, countries are more likely to build dams when the costs are external.

It would be interesting to see what mitigates these effects – it is possible that Kenya’s relative indifference is due to lack of political power on the part of the northern tribes. Are dams with substantial cross-border costs less likely in areas where the proximate ethnic group is quite powerful?


LaTeX Wars Episode V: The Word Users Strike Back

Of course I edit all my documents using the original Nintendo Power Glove

Of course I edit all my documents using the original Nintendo Power Glove

Throughout the mid-90s, my father used a DOS-based typesetting program called PC-Write to produce his books and journal articles. In stark contrast to more-popular word processing programs, PC-Write relied on a what-you-get-is-what-you-mean approach to typesetting: dad would indicate his formatting preferences as he wrote, but he would be forced to print out a page in order to see his formatting options being applied. By contrast, I grew up working with Microsoft Word and so with each passing year I found my father’s system to be increasingly archaic. Eventually, after a substantial amount of healthy mockery from his son, he migrated over to Word and hasn’t looked back since.

However, by the time I arrived in grad school an increasing number of other (economics) students were using LaTeX, a typesetting language that was much closer in design to the old-fashioned PC-Write than to the what-you-see-is-what-you-get format of Word. Although I suspected that LaTeX was another manifestation of the academic economist’s tendency to choose overly-complex methods and technical mastery over user-friendliness, I eventually became a convert. Somehow, I found my preferences begun to mirror Dad’s original love of PC-Write.

If you ever feel like experiencing a wonderfully-arbitrary argument, ask a group of economists if they prefer LaTeX or Word. Within the profession there is a pretty serious division between those who prefer the look and workflow of the former  and those who prefer the accessibility of the latter. While there are some of us who are comfortable working in both formats, each camp has its stalwarts who find members of the other camp to be bizarrely inefficient.

The two sides appeared to be in a stable stalemate until recently, when a new study comparing the efficiency and error rates among LaTeX and Word users appeared in PLOS One. The headline result: Word users work faster AND make less errors than LaTeX users.


Ooof – I hear the sound of a thousand co-authors crying out with righteous indignation. The Word camp was quick to seize upon this study as clear evidence that LaTeX users were probably deluding themselves and that now would be a good time for everyone to get off of their high horse. The authors of report  even went as far to suggest that LaTeX users were wasting public resources and that journals should consider not accepting manuscripts written up using LaTex:

Given these numbers it remains an open question to determine the amount of taxpayer money that is spent worldwide for researchers to use LaTeX over a more efficient document preparation system, which would free up their time to advance their respective field. Some publishers may save a significant amount of money by requesting or allowing LaTeX submissions because a well-formed LaTeX document complying with a well-designed class file (template) is much easier to bring into their publication workflow. However, this is at the expense of the researchers’ labor time and effort. We therefore suggest that leading scientific journals should consider accepting submissions in LaTeX only if this is justified by the level of mathematics presented in the paper.

Pretty damning, eh? Not so fast! There are several reasons we should doubt the headline result.

For one, rather than randomly assigning participants to Word or LaTex, the researchers decided to allow participants to self-select into their respective groups. On one hand, this makes the result even more damning: even basic Word users outperformed expert LaTeX users. The authors themselves admit that preference for the two typesetting programs varied wildly across disciplines (e.g. computer scientists love LaTeX and health researchers prefer Word). It’s perfectly possible that the types of people that select into more math-based disciplines are inherently less efficient at performing the sort of formatting tasks set by the researchers. Indeed, the researchers found that LaTeX users actually outperformed Word users when it came to more complex operations such as formatting equations.

Furthermore, the researchers only evaluated these typesetting programs along two basic dimensions: formatting speed and error-rates, ignoring other advantages that LaTeX might have over Word. As an empirical researcher, I find it enormously easier to link LaTeX documents to automated data output from programs like Stata, making it simple to update results in a document without having to copy and paste all the time. Word can also do this, but it has always been far clunkier.

So, in short, the jury is still out. Feel free to return to your respective camps and let the war continue.

Troubling aspirations

"On second thought, I think I'll keep the ring and become a lawyer."

“On second thought, I think I’ll keep the ring and become a lawyer.”

From a new paper in the Journal of Development Economics:

This paper sheds light on the relationship between oil rent and the allocation of talent, toward rent-seeking versus more productive activities, conditional on the quality of institutions. Using a sample of 69 developing countries, we demonstrate that oil resources orient university students toward specializations that provide better future access to rents when institutions are weak. The results are robust to various specifications, datasets on governance quality and estimation methods. Oil affects the demand for each profession through a technological effect, indicating complementarity between oil and engineering, manufacturing and construction; however, it also increases the ‘size of the cake’. Therefore, when institutions are weak, oil increases the incentive to opt for professions with better access to rents (law, business, and the social sciences), rather than careers in engineering, creating a deviation from the optimal allocation between the two types of specialization.

In plain speak, the authors posit that when there are large windfalls from natural resources, people will choose careers (and the necessary education) which will allow them to reap the benefits from those windfalls. Normally this involves choosing careers associated with oil extraction, like engineering. However, in weak states where it’s possible to gain access to oil rents in a less-than-legitimate manner, people choose to go into careers which better allow them to get access to those rents, like law or business. Hence talent is `misallocated’ in developing countries with weak institutions and oil booms, as the possibility of getting access to oil rents sends people into careers which they are less fit for.

I would not despair so quickly – the empirical results in the paper are more suggestive than definitive, dependent on a handful of mainly cross-country regressions. Still, the results are disconcerting – the authors do not investigate further, but the prospect of societies re-orienting themselves into a structure better suited for rent-seeking likely means that true institutional reform becomes all the more difficult.

The Lynchian Randomization

"We're going to estimate treatment effects via an ancient rock-throwing technique."

“We’re going to estimate treatment effects via an ancient rock-throwing technique.”

We developmentistas often associate randomized impact evaluations solely with development interventions (I’m looking at you Eva Vivalt), so it’s easy to forget that there are other researchers out there doing some really bizarre RCTs. For example, did you know that randomzing paracetamol  is still a thing? Psychologists seem to think that it augments an individual’s emotions, in addition to the palliative effect it has on pain. In a recent Psychological Science article, several researchers wanted to observe whether paracetamol blunted our emotional responses to distressing events.

The first experiment they ran was, well, moderately distressing. There were two types of treatment: some participants were asked to write about a `placebo subject’ – something innocuous, where the treatment group was asked to write about their own death (distress! distress!). This was cross-cut with a standard double-blind randomization of paracetamol. Then the researchers recorded their outcome of interest, which was a bit….odd:

Finally, participants read a hypothetical arrest report about a prostitute and were asked to set the amount of the bail (on a scale from $0 to $999). This measure has been used in a number of other meaning-threat studies (Proulx & Heine, 2008; Proulx et al., 2010; Randles et al., 2011; Rosenblatt, Greenberg, Solomon, Pyszczynski, & Lyon, 1989). Participants are expected to increase the bond amount after experiencing a threat, because trading sex for money is both at odds with commonly held cultural views of relationships and against the law. Increasing the bond assessment provides participants an opportunity to affirm their belief that prostitution is wrong.

Um, I think we’ll probably leave that out of our next household survey, but fine.  What was the result? The average bond levels set by each treatment group was similar, except for the group which received a distressing event but not paracetamol.


The researchers claim this means that acetaminophen (paracetamol) is actually blunting people’s normal response to the emotionally-distressing task (i.e. punishing prostitutes). In the difference between the control placebo and the `mortality salience’ placebo – approximately $120 dollars more, but there appears to be no significant difference between the treatment and control groups who were not given the drug.

Now things get even a little more bizarre. The researchers want to replicate the experiment with a similar premise but a different outcome measure and a different distressing activity. So this time they made the control group watch a four minute clip of The Simpsons, where the treatment group had to watch four minutes of the David Lynch short film “Rabbits”, which is composed of creepy humanoid rabbits You can watch the entire thing here. I recommend having something lined up to cheer you up afterwards.


In this case the respondents had to choose how much to fine a group of public rioters. The results were very similar to the first experiment: the treatment group which did not receive any paracetamol ended up fining the rioters substantially more, but there was little difference between the other three groups. Again the researchers argue that the paracetamol made the difference.


Before you start slipping people paracetamol before you give them bad news, there’s a number of reasons we might be very wary of these results. First, the theoretical groundwork is a bit shaky – while there are some psychology experiments that paracetamol does influence what they call “social pain,” there is no compelling physiological link, other than some inconclusive evidence cited at the beginning of the article. We should discount results more heavily when they don’t have such a strong grounding in either theory or prior evidence. We certainly shouldn’t use them for anything as headline-grabbing as “What is Tylenol Doing to Our Minds?”


The results also rely on what the psychologists call a meaning-maintenance model which predicts that individuals will seek compensatory affirmations of their beliefs when their expectations or `meanings’ are threatened by outside stimuli. Thus, punishing a prostitute or a rioter – the authors argue – gives the respondent a chance to affirm their belief that these practices are wrong. I don’t know enough about the subject to say whether or not the meaning-maintenance model is a sensible way of describing human behaviour, but the result seems dependent on a few too many assumptions: A) that paracetamol interacts with a part of the brain that generates these compensatory desires B) that the treatments in this experiment themselves would generate compensatory desires and C) that the outcomes of these experiment are meaningfully measuring this desire to assert one’s beliefs after a distressing event.

That said – this is why we do replications, and the researchers do well to set up two separate experiments. Plus they got to randomize David Lynch. This is awesome.

You just don’t get me

Timothy Taylor has an excellent write up on the behavioural economics results coming out of the recently-released 2015 World Development Report. One of the most striking findings is that World Bank staff tend to overestimate the tendency for poor people to be fatalistic. From Taylor’s post:

What do development experts think that the poor believe, and how does it compare to what the poor actually believe? For example, development experts were asked if they thought individuals in low-income countries would agree with the statement: “What happens to me in the future mostly depends on me.”  The development experts thought that maybe 20% of tthe poorest third would agree with this statment, but about 80% actually did. In fact, the share of those agreeing with the statement in the bottom third of the income distribution was much the same as for the upper two-thirds–and higher than the answer the devleopment experts gave for themselves!

A number of other bloggers have picked up on this result, albeit without too much discussion about what this implies. I think the implicit assumption here are that development professionals are out of touch with the poor. I think there’s a number of ways we can interpret these results. Here’s the graph in question:


So the first possibility is the implicit one, that Bank staff don’t know what the poor believe, and possibly even that they assume the poor are fatalistic, possibly to a fault. Development economics is only starting to turn its head towards the convergence of fatalism, aspirations and economic outcomes (see, for example, the recent paper by Kate Orkin and her co-authors on aspirations in Ethiopia). The story that development experts buy into this belief is an easy one to believe, but not necessarily the right one. Note that it doesn’t at all take into account what the truth is, only perceptions.

Imagine your life’s outcomes are determined by (A) your own actions and (B) everything else, including randomness. How much weight would you put on (A) vs (B)? There’s no easy answer to this, but it is perfectly possible that the world’s poor ARE poor because (B) is actually much larger than (A). When you live in a country with terrible institutions, no social safety net, frequent economic or environmental shocks, it becomes very clear that (B) dominates (A).

So the second possibility is that Bank staff aren’t assuming the poor are being fatalistic, but that they are being realistic. That they (correctly?) judge that they have little control over their own lives. If they did, then they probably wouldn’t be poor. In this case, if the responses from the above sample are genuine (we might worry that respondents would be unwilling to admit that they have little control), then it’s the poor who have it the wrong way around: they are too optimistic about how much control they have over their own lives.

The second possibility isn’t necessarily any more likely than the first, but we should be cautious about what stories eventually emerge out of the above figure – there are a number of potentially overlapping biases at play, to the extent that it is not just a straightforward story of development professionals not `getting’ the poor.

When blind is not beautiful

"Hello? Is it a placebo effect that you're looking for?

“Hello? Is it a placebo effect that you’re looking for?

Note: this is an expanded version of a post published at CGD’s Views from the Center Blog

Over at Boring Development, Francisco Toro picks up on the recent Bulte et. al. paper which attempts to implement a double-blind protocol in a `standard’ policy RCT. The study’s abstract:

Randomized controlled trials (RCTs) in the social sciences are typically not double-blind, so participants know they are “treated” and will adjust their behavior accordingly. Such effort responses complicate the assessment of impact. To gauge the potential magnitude of effort responses we implement a conventional RCT and double-blind trial in rural Tanzania, and randomly allocate modern and traditional cowpea seed varieties to a sample of farmers. Effort responses can be quantitatively important—for our case they explain the entire “treatment effect on the treated” as measured in a conventional economic RCT. Specifically, harvests are the same for people who know they received the modern seeds and for people who did not know what type of seeds they got; however, people who knew they had received the traditional seeds did much worse. Importantly, we also find that most of the behavioral response is unobserved by the analyst, or at least not readily captured using coarse, standard controls.

So it appears that most of the treatment effects in this study are driven by changing behaviour by the farmers who knowingly-received modern seed varieties. Toro touts this as some sort of massive blow to the randomista movement:

This gap between the results of the open and the double-blind RCTs raises deeply troubling questions for the whole field. If, as Bulte et al. surmise, virtually the entire performance boost arises from knowing you’re participating in a trial, believing you may be using a better input, and working harder as a result, then all kinds of RCT results we’ve taken as valid come to look very shaky indeed.


Still, the study is an instant landmark: a gauntlet thrown down in front of the large and growing RCT-Industrial Complex. At the very least, it casts serious doubt on the automatic presumption of internal validity that has long attached to open RCTs. And without that presumption, what’s left, really?

Even if you take the results of this paper at face value (and there are some good reasons we shouldn’t), it’s hard to see here why these results should be that troubling.

The reason that medical researchers use double-blind protocol in clinical trials is to try and pin down the exact physiological impact of a medicine, independent of any conscious or subconscious behavioural response. Placebo effects have been fairly well established, so figuring out that medicine X has an effect above and beyond the health effects created by taking a sugar pill are important. One very important thing to note, however, is that unless there is a no-placebo control group, researchers using double-blind protocol will be unable to identify the total average treatment effect of a medicine: we will know what the impact is compared to someone else given a pill, but if we randomly selected someone in the population (with the same characteristics of the study group) to receive the treatment, we can’t say much about what the overall effect will be. Also, fairly critically, while double-blind studies allow us to make the assumption that placebo effects are similar across treatment and control groups, we cannot say anything about how they would compare to an explicit non-blind clinical trial (i.e. placebo effects might be quite different when the treated know they are treated).

Most development randomistas are answering substantially different questions than medical scientists. It is fairly easy to establish the efficacy of a set of agricultural inputs in a controlled setting: we know fertilizer `works’ in that it improve yields. We know vaccines work in savings lives and that increasing educational inputs, to some extent, improves educational outcomes. This was Jeffrey Sachs’s reasoning when he sold much of the world on the Millennium Village Project: we know what works, we just need to implement it. But most of us running RCTs aren’t interested in the direct impact of an intervention, holding behaviour constant, because it is precisely this behaviour that matters the most. If our question is “do improved seeds work in a controlled setting?” then a double-blind RCT is well and fine, but if our question is, “do improved seeds work when you distribute them openly, as you would do in pretty much any standard intervention,” then you need transparent protocols to get at the average treatment effect you are interested in.

Many economists are interested in mechanisms – in picking apart the behavioural responses to a given treatment. In this respect, the Bulte et. al. paper is very interesting: here we have an intervention which works primarily through behavioural response rather than a change in household resources, etc. This is intriguing and worth picking apart for getting a better sense of why interventions like these work. However, from the perspective of a policy wonk, we might care less: if you give people improved seeds then yields go up. If you de-worm children then schooling goes up. These are answers worth knowing even if that’s all we know.

For those of us interested in behavioural responses, we don’t necessarily need to run around running double-blind RCTs to get a handle on them. Consider this excellent paper by Jishnu Das and others on the effect of anticipated versus unanticipated school grants: when parents knew that their child’s school would be receiving more money, they reduced their own spending on school inputs enough to completely offset the gains from the grants. In a world in which we could have run the grant programme as a blinded RCT, it would have looked like grants were successfully in raising test scores – but it would have told us preciously little about how grants operate in the real world.

There’s another issue here: imposing blinding in many development RCTs creates some substantial ethical issues. Imagine, for instance, that you could fool a Kenyan farmer into not knowing whether or not she received high quality fertilizer or a bag of dirt. The average farmer might behave as if she has received nothing, she might also behave as if she had received a perfectly good bag of fertilizer, or she might hedge and use some of it, realizing that it may not be useful. Some of these decisions may be sub-optimal: if the farmer knew she was in the control group, she might have opted for a different planting method, one which would have resulted in a higher yield. In this particular example, obscuring the treatment from our study group actually runs the risk of doing them harm, especially if they believe they are treated and take complementary actions which are in fact wasteful if they are not actually in the treatment group. 

The thing you should take away from the Bulte et. al. study shouldn’t be “all RCTs are biased because we aren’t measuring placebo effects”  but instead “behavioural response matters for evaluating real-world policies.” The latter statement actually reinforces the need to have transparent RCTs, rather than to try and mimic the double-blind nature of clinical trials.

Come work for me in Tanzania (short notice)

We are looking for a short term field manager to run a household survey in Dar es Salaam, Tanzania on short notice.

  • The field manager would oversee data collection exploiting a natural experiment in the roll-out of land titling in the city. The aim of the study is to investigate whether the provision of short-term land titles by the government has led to observable differences in household behaviour and welfare. This work is part of a larger portfolio of research of urban property rights in Tanzania.
  • The position would be for approximately 3 months, based  in Dar es Salaam from early January through March 2014
  • Required: we are looking for candidates who have experience with Stata and working with household survey data. A bachelor’s or MA in a quantitative field is preferred. Previous experience working in developing countries, running surveys or managing complex projects is a plus.
  • You would be working with me, Justin Sandefur and Andy Zeitlin on on a fixed contract with the University of Oxford and would oversee a third-party firm which would cover all the practical logistics of data collection. We need someone who understands data, can grasp the research design and can ensure quality.

If you are interested, please write to me at this e-mail address: matt@aidthoughts.org. Please send a CV, an e-mail cover letter explaining why you are interested in the position and your relative strengths, and a description of your experience working with data, field experience, etc. In the subject line, please write: “DSM position: <your name here>”. E-mails which fail to do this will not be considered. We will only be in contact if you make the short list.

UPDATE: the position has now been filled. Thank you everyone who submitted.

I felt a great disturbance in the force


” It’s as if millions of economists searching for a natural experiment suddenly cried out in joy.”

From the BBC:

China is to relax its policy of restricting most couples to having only a single child, state media say.

In future, families will be allowed two children if one parent is an only child, the Xinhua news agency said.

The proposal follows this week’s meeting of a key decision-making body of the governing Communist Party.