Machine learning and the replication struggle

“Of course you can have my replication code. Good luck with that.”

I was in a seminar a few months ago where someone presented results from a panel study that had a significant attrition problem. This can be a serious issue in empirical work, as the surviving sample could be different in unobservable ways that might also be correlated with intervention the authors are studying. While there are already a large number of ways that one can tackle the attrition problem, the authors of this study had done something novel: they had handed that surviving sample over to a machine learning algorithm which used [insert black box here] to re-weighted the sample so it matched the original study population as closely as possible.

The author presenting the results explained that they had made this choice because the machine learning algorithm was arguably neutral.  Normally, the researchers might have consciously or unconsciously chosen weights that gave them the result that they wanted. But by handing things over to to an algorithm that used its own sufficiently-inscrutable method of determining the appropriate sample weights, the authors could not be accused that their results were the product of researcher degrees of freedom.

Machine learning is rapidly becoming a tool that empirical economists are relying on. It seems to be most useful for measurement and generating new data, such as creating new spatial measures of poverty to measuring conflict, but it is also more frequently being used for estimation issues like the ones the authors above were facing. The rise of the machines is undoubtedly going to be incredibly useful for our work, drastically widening our ability to tackle difficult, computationally-intense problems. But I wonder if it also has implications for the way we check and challenge each others work.

So far economics has been spared the full brunt of the empirical replication crisis, but there are also occasionally warning signs¬†that our own reckoning might not be far off. To ensure that research in empirical econ remains credible, there needs to be a general assurance that published results can be replicated. The term “replicate” means different things to different people, but I find Michael Clemens’s¬†proposed definitions to be helpful. There are two types of replication that I think are likely to be affected by machine learning. The first is what Clemens refers to as `verification,’ the ability of a third party to take the same data as the original researchers, run the same code and generate the same results (and check that there are no errors in any of these processes). The second is a `reanalysis’, where a third party takes the same data as the original researchers, but investigates whether the results hold up to different ways of analyzing the data (such as different estimation methods, assumptions, perturbations of the data, etc).

Wading through someone else’s Stata or R code in an effort to verify another researcher’s findings is not much fun, but it is manageable. PhD students and younger researchers may not have the same resources as their senior colleagues, but they usually have the time and diligence to go through other’s work and figure out whether the results really are there. But imagine a future where you first need secure sizable amount of server time and computing power before you can even think about a basic replication. In the above paper, the presenter noted that it took several days for them to run their re-weighting algorithm. Things seem even more daunting when it comes time for reanalysis, in this case changing the basic structure of the original algorithm (or re-training it on different set of data). As methods grow more complex, results may be harder for a replicating party to parse. It would seem easy enough for the authors to hand over their algorithms and for third parties to run them, but significantly more difficult for the replicators to understand what the precise limitations of a particular approach might be.

Despite these concerns, I suspect these problems will be transitory. Empirical economics is continuously going through ways of methodological innovation. Each wave begins with a few pioneers using a new tool or a new set of data, but the in-depth expertise to really critique those new methods always lags behind by a few years. More and more econ students are learning to code along the way. New norms around replicating results (such as posting data and code in an online repository like Github) are coalescing.

And if the abilities of the replicators can’t keep up with the growing sophistication of algorithms, then maybe the same technology can be used to make the replicator’s life easy. Researchers in psychology have already used code to check large numbers of published papers for basic mathematical errors. Maybe in ten years, a normal part of the peer review process will entail turning over your results and code to the machines, so they can check it for errors and run an automated re-analysis.

As if dealing with peer reviewers wasn’t harrowing enough. Imagine referee reports that go something like this:





A Checklist for the Modern Development Bureaucrat

If you are ¬†a fan of podcasts then you really should be listening to NPR’s Hidden Brain. This week’s episode is a fascinating look at the impact of “checklists” or “to-do” lists used across a number of different professions to offset human error. It recounts how,¬†during the development of the¬†Boeing B-17 “Flying Fortress” bomber in the late 1930s, a fatal cash of a prototype plane led to the US Air Force to mandate the practice of running through check lists prior to future flights. While experts in high-skill professions like piloting or surgery typically feel confident in their abilities to get the job done, the addition of routine (and mundane) checks forces them to guard against unlikely but high-cost events.

In another recent podcast the journalist Sarah Kliff frames this as treating mistakes as plane crashes rather than car crashes, the former requiring a full rethink of how procedures are performed. She recounts how hospitals in the US began using checklists to reduce the incidence of central line infections. Even though medical staff are highly-trained professionals who should know the correct procedures to reduce the chance of contamination, infection rates plummeted once they were forced to use checklists prior to a procedure (there were of course other complementary changes in policy).

This led me to wonder whether or not `checklist culture’ is reflected in modern development policy.¬†One might argue that in some ways policy has become too checklist oriented. Many reform agendas – ranging from anti-money laundering standards to private sector reforms –¬† rely on a simple list of indicators or policy changes that a country needs to check off in order to be compliant.¬† While many of these agendas are centered around outcomes or processes worth achieving anyway, many are shallow in nature. This results in brittle institutions that look good on paper, but are incapable of doing much beyond that, a point laid out a long time ago by Lant Pritchett when he first spoke of isomorphic mimicry.

But there may be some ways in which the checklist mentality might be useful for decision makers in the development space. We know from psychology and behavioral economics that people often exhibit cognitive biases in their decision-making. This leads them to make decisions that are bad for them in the long term, but while the ramifications can be substantial, they are largely confirmed to the individual level.

The stakes are potentially a lot higher when those cognitive biases and errors are being made by those that make decisions that effect hundreds, thousands or millions of other people. It would then seem important that development professionals be able to act as impartial, rational decision makers, alas there is evidence that we’re just as flawed as the rest of humankind. A recent working paper by researchers from the World Bank and the Universities of East Anglia and Oxford brought development professionals from DFID and the WB to investigate, aaaaaaand the results ain’t too pretty.¬†From the paper’s abstract:

“Experiments conducted on a novel subject pool of development policy professionals (public servants of the World Bank and the Department for International Development in the United Kingdom) show that policy professionals are indeed subject to¬†decision making traps, including sunk cost bias, the framing of losses and gains, frame-dependent risk-aversion, and, most strikingly, confirmation bias correlated with ideological priors, despite having an explicit mission to promote evidence-informed and impartial decision making. These findings should worry policy professionals and their principals in governments and large organizations, as well as citizens themselves.”

Thankfully, development professionals are not unchecked autocrats, out decisions are confined by the structures of the institutions we work for. But what we don’t know is whether those institutions mitigate or amplify our biases or priors – I think cases can be made in either direction. Development¬†bureaucrats certainly have to jump through a lot of hurdles to get their projects off the ground – but those `checklists’ are largely around mitigating risk, ensuring a proposal has been properly vetted and that it is likely development-friendly. There is some evidence from the above paper that deliberation is effective in reducing these biases, but one wonders whether the type of deliberation that the subjects (in this case DFID economists) participated in mirrors at all the kind of peer-review or administrative checks that the average bureaucrats at DFID or the World Bank goes through.

So maybe we need checklists specifically to offset our biases. I’ll start with a few ideas for both bureaucrats and researchy-economists working on a proposal or note or paper, but would be interested to hear what yours would be.

  1. Is there any rigorous evidence supporting the argument I am making?
  2. Have I sat down and examined whether my beliefs are based on emotion or reasoning?
  3. Would an ordinary person who doesn’t study development or economics understand what I am saying?
  4. Have I made the case that my proposal addresses a development/poverty question, rather than justifying its existence through internal or external politics or momentum around some issue?
  5. Have I listed, at least in my own head, the reasons why I might be wrong about this?
  6. Would someone in another team/department/institution make better use of these resources that I control?
  7. Have I written down a contingency plan for when things go wrong?
  8. Have I thought about how I will know if something has gone wrong?

By the way, you can find the podcasts I mentioned here:

Help me test a (very silly) hypothesis by answering a few questions


I’ve held a silly hypothesis in my head ever since I was a grad student, but never had the time/resources to test it. I just recently came across a publication which drastically reduced the costs¬†to testing the idea. It will almost certainly result in a “jokey” paper, but a fun one nonetheless.

But I could use your help. I have constructed an online survey displaying photos of people, and I need respondents to tell me whether these people are smiling, frowning or have neutral expressions. There are over 170 questions, but they are randomized, so even if you only manage to answer a few (and then close the window), it still would help a lot!

I can’t tell you what the idea is just yet, because it might spoil how you answer the questions. More information to follow, once enough people have answered the survey.

Click through here to enter the survey.

The limitations of the Absolute Palma Index, in two graphs


Last year, the ODI’s Chris Hoy released a really useful and thoughtful paper¬†pointing¬†out that the basic maths of inequality are often not on the side of the poor. Even if economic growth is evenly spread, the absolute difference between the incomes of the poor and the richest must increase. That is, if you are 10 times as rich as I am and our incomes both grow by 10%, you’ll be taking home more money than I will at the end of the day. If we wanted to see a decrease in¬†absolute differences of income around the world, it would require that the income of the poorest grow a great, great deal faster than that of the richest, something we are unlikely to see any time soon.

The unanswered question, and one that Hoy even posits himself¬†end of the paper, is whether or not focusing on absolute differences in income makes more sense than doubling down on the relative differences in income that are captured by traditional inequality measures such as the Gini, Thiel or Palma indices. We know that income is correlated with lots of good outcomes for the beholder – better health, education, happiness and political power. However, if we are being truly honest with ourselves, we would have to admit that we don’t quite fully understand whether relationships¬†are absolute or relative in nature¬†(although we suspect both matter for happiness).¬†Do the richest 1% of Americans have more political power in the US¬†than the richest 1% of Nigerians have in Nigeria? These are the questions we must ask ourselves if we are to make a strong case for caring about absolute income differences.

In the meantime, I woke up this morning to find that Nick Galasso from Oxfam has made a pitch for using the¬†“Absolute Palma Index” as the next big measure of inequality. The Absolute Palma is a variation of the Palma Index of inequality, which itself is the ratio of the share of income earned by the top 10% of the distribution and that of the bottom 40% of the distribution. The Absolute Palma, by contrast, is the absolute difference between the average income of the top 10% and the average income of the bottom 40%.

As the title suggests, I think there are limitations to the Absolute Palma Index, so consider the post a word of caution.¬†I can think of one strong case against absolute measures: while they might be reasonable at describing immediate gains across a country’s income distribution after a year of growth, they aren’t very useful at describing differences between¬†countries across the globe.

I happened to be playing around with data from Christoph Lakner and Branco Milanovic’s paper on the global income distribution, so I decided to see how the Absolute Palma Index varied across countries. Check out the graph below, which looks at how the Absolute Palma Index varies with mean income across countries. I’ve also highlighted countries which are either very unequal, very equal or somewhere in the middle as measured by the traditional Palma Index.



The first thing to note is that there is almost a one-to-one relationship between the log of GDP and the log of the absolute Palma. This is hardly surprising Рtake any income distribution and raise all incomes by a set percentage and by definition you will see an increase in the Absolute Palma. What this means is that on this index, poor countries do really, really well and rich countries do terribly. And that is most of the story. Log per capita income explains about 93% of the variance in the log of the Absolute Palma. The relative Palma explains most of the remaining unexplained variance, but on the whole has very, very little explanatory power.

The result is that we get some pretty counter-intuitive results.¬†Even though Denmark, Sweden and Norway ¬†are considered by pretty much every person I’ve ever ever spoken to be the most equal places on the planet, they¬†come out as being more unequal than countries that are at the top of the relative Palma Rankings, places like South Africa, Honduras and Brazil.

Which of these countries would you rather be poor in? Presumably the one with the highest average income for the poorest 10%. If we graph the same relationship, instead using the average income of the bottom decile, we find the relationship is less strong, especially so for the poorest countries of the world. But if I had to choose whether I wanted to be born poor in a country with a high or low Absolute Palma index, sign me up for more inequality!



Now for the caveats: the data here is as good as 2008, so the basic cross-sectional relationship may have changed (although it hasn’t appeared to have done so ipapen the years leading up to 2008). There is also a difference between moving between countries of different average/median/poorest decile levels and observing individual countries as they grow richer or poorer. This means that there might be use in keeping track in how growth is `allocated’ across the income distribution, something which is already done (and was done carefully in Chris Hoy’s paper).

Absolute measures might tell us something interesting in the world, and I welcome more work on them. But there is a world of difference between adding a tool to the (now overflowing) box of inequality measures and pushing for headline¬†measure that automatically penalizes rich, developed countries for being rich and developed. In addition, before we begin agonizing about absolute differences within countries, someone needs to make a pretty compelling case that they matter more than both absolute levels or relative differences, because these are things we already go through great pains to measure.¬†If we are worried that the incomes of the poor aren’t growing fast enough, then¬†why isn’t it enough to measure that?

Stata code and underlying data available here.

Update: good comments from Chris Hoy below.

The difficulty of getting good feedback

Most of us have very little clue if what we are doing makes any sense

In a piece for Project Syndicate released today, Ricardo Hausmann makes a grand case against evidence-based policies, specifically the rise of randomized controlled trials:

My main problem with RCTs is that they make us think about interventions, policies, and organizations in the wrong way. As opposed to the two or three designs that get tested slowly by RCTs (like putting tablets or flipcharts in schools), most social interventions have millions of design possibilities and outcomes depend on complex combinations between them. This leads to what the complexity scientist Stuart Kauffman calls a ‚Äúrugged fitness landscape.‚ÄĚ

After presenting a theoretical case of an RCT which tests for and fails to find an impact of tablets on learning in schools, he offers up an alternative approach, one that relies on rapid experimentation and adaptation:

Consider the following thought experiment: We include some mechanism in the tablet to inform the teacher in real time about how well his or her pupils are absorbing the material being taught. We free all teachers to experiment with different software, different strategies, and different ways of using the new tool. The rapid feedback loop will make teachers adjust their strategies to maximize performance.

Over time, we will observe some teachers who have stumbled onto highly effective strategies. We then share what they have done with other teachers.

Notice how radically different this method is. Instead of testing the validity of one design by having 150 out of 300 schools implement the identical program, this method is ‚Äúcrawling‚ÄĚ the design space by having each teacher search for results. Instead of having a baseline survey and then a final survey, it is constantly providing feedback about performance. Instead of having an econometrician do the learning in a centralized manner and inform everybody about the results of the experiment, it is the teachers who are doing the learning in a decentralized manner and informing the center of what they found.

Hausmann makes a compelling argument here, but it all hinges on an exceptional¬†premise: that teachers have access to a magical device that gives them *real time* feedback on student learning. Iteration and adaptation makes a lot of sense….. if you are in an environment where you can actually observe the immediate effects of your decisions and¬†be sure that those decisions are having a causal impact.

But most of us are not in those environments. Many teachers might have an idea of how good their particular method is, but in absence¬†of a technology which can provide them with high-quality real-time feedback, it would be very hard to be sure.¬†Most of us are in an environment where we have little idea of what we are doing is effective at all. Even after 32 years of direct observation and some experimentation, I still can’t figure out if spicy food gives me indigestion.

Even when we can successfully parse the noise of life and match an action with a reaction, low-level experimentation still opens up the door to all sorts of internal biases. Human beings are fantastic at creating narratives (I feel good today, it must have been because of that thing I did yesterday) which would whither under larger-scale experimentation.

Of course there are clear examples of low level, rapid experimentation being successful when we have access to technologies that give us good, quick feedback. Bridge Academies, which is now one of the largest private school providers in the world, succeeded largely due to a very high degree of internal experimentation. But to accomplish this, Bridge had to have access to a wealth of real time data on student achievement and attendance as well as enough centralized control to be able to experiment across classrooms and schools.

But in reality these kinds of feedback technologies just don’t exist in many contexts, at least not yet. If I am working in a Ministry of Health in a developing country and I want to discern whether a given health intervention has had an impact, I won’t necessarily have access to real time data on hospital admissions. Instead, I would have to rely on costly household surveys which take time to collect. This slows down the process of iteration and adaptation to a point where a randomized controlled trial combined with some qualitative fieldwork actually looks pretty attractive.

RCTs are far from a perfect solution and Hausmann is correct to point out that they can be slow and blunt tools for figuring out exactly how an intervention should be implemented. But that is a reason to complement them with other methods Рnot to  chuck them out the door. If a teacher has come up with a new method of using a tablet through rapid experimentation and it is rolled out to the entire school, that method should be rigorously empirically tested. If an RCT of some new intervention finds no effect, we should turn to more rapid experimentation to find a better way.

We’ve been arguing about RCTs for years now – it is disheartening that this debate still feels very black and white.

The problem with nudges is that sometimes they don’t move things very much

Have you ever prescribed azithromycin when you didn't have to? Know what I mean?

Have you ever prescribed azithromycin when you didn’t have to? Know what I mean?

Over-prescribing of antibiotics is a problem because it speeds up the rate at which bacteria develop resistance. In a new study was published in the Lancet yesterday, researchers attempted to use a simple `nudge’ to get doctors in the UK to prescribe less often:

In this randomised, 2 √ó 2 factorial trial, publicly available databases were used to identify GP practices whose prescribing rate for antibiotics was in the top 20% for their National Health Service (NHS) Local Area Team. Eligible practices were randomly assigned (1:1) into two groups by computer-generated allocation sequence, stratified by NHS Local Area Team. Participants, but not investigators, were blinded to group assignment. On Sept 29, 2014, every GP in the feedback intervention group was sent a letter from England’s Chief Medical Officer and a leaflet on antibiotics for use with patients. The letter stated that the practice was prescribing antibiotics at a higher rate than 80% of practices in its NHS Local Area Team. GPs in the control group received no communication. The sample was re-randomised into two groups, and in December, 2014, GP practices were either sent patient-focused information that promoted reduced use of antibiotics or received no communication. The primary outcome measure was the rate of antibiotic items dispensed per 1000 weighted population, controlling for past prescribing. Analysis was by intention to treat.

This is a fairly standard behavioural intervention – use information (or, less graciously,¬†spam) to nudge people into behaving in a more optimal way. The behavioural insights/economics crowd loves these interventions because they are cheap, so the cost-effectiveness hurdle is easy to overcome. However, that cheapness sometimes overshadows a bigger problem, that frequently these interventions just don’t have very large effects. Here are the results from the Lancet study:

Between Sept 8 and Sept 26, 2014, we recruited and assigned 1581 GP practices to feedback intervention (n=791) or control (n=790) groups. Letters were sent to 3227 GPs in the intervention group. Between October, 2014, and March, 2015, the rate of antibiotic items dispensed per 1000 population was 126¬∑98 (95% CI 125¬∑68‚Äď128¬∑27) in the feedback intervention group and 131¬∑25 (130¬∑33‚Äď132¬∑16) in the control group, a difference of 4¬∑27 (3¬∑3%; incidence rate ratio [IRR] 0¬∑967 [95% CI 0¬∑957‚Äď0¬∑977]; p<0¬∑0001), representing an estimated 73 406 fewer antibiotic items dispensed. In December, 2014, GP practices were re-assigned to patient-focused intervention (n=777) or control (n=804) groups. The patient-focused intervention did not significantly affect the primary outcome measure between December, 2014, and March, 2015 (antibiotic items dispensed per 1000 population: 135¬∑00 [95% CI 133¬∑77‚Äď136¬∑22] in the patient-focused intervention group and 133¬∑98 [133¬∑06‚Äď134¬∑90] in the control group; IRR for difference between groups 1¬∑01, 95% CI 1¬∑00‚Äď1¬∑02; p=0¬∑105).

Let’s focus on the intervention that worked: the peer information treatment. There was a clear decline in antibiotic use for the treatment group, and so the study focuses on the sheer number of prescriptions that were prevented (73,406). However, in terms of relative impact, the study barely changed behaviour. The treatment group’s¬†prescription rate¬†was a mere 3% lower than the control group’s rate.¬†

So if this is about finding cost effective ways to reduce prescribing, then I’m on board. But clearly these sort of nudges are not going to win the war on antibacterial resistance any time soon.

So how do you feel about not winning the lottery?

"Here's to exogenous shocks to our neighbour's wealth"

“Here’s to exogenous shocks to our neighbour’s wealth”

Happy New Year. So I’ve been thinking a lot about the charity GiveDirectly recently. They were my charity of choice a year ago and I am planning to make another donation soon. For those of you who are not in the know, GiveDirectly makes unconditional cash transfers to poor people in Kenya and Uganda. For every dollar I donate, roughly 91 cents of that ends up with a household, which is then free to do whatever they want with it.

The other day GiveDirectly sent me an e-mail which linked to a series of interviews with residents of a single village that had been on the receiving end of these unconditional transfers. What is particularly astonishing is that the charity not only asked recipients how they were faring (pretty good, thank you very much), but roughly half of the interviews are with households which were not deemed eligible.

What I might have expected was a degree of unhappiness or animosity over not being selected to receive a $1000 USD transfer. GiveDirectly uses its own methods of determining whether or not a household is classified as “poor” (in the village in question it was households without a metal roof on their primarily residence). Even though (I presume) the charity goes through great pains to make the selection criteria transparent, to people on the ground the whole endeavour might seem a bit, well, random. A bit like a manna from heaven.

Recently, three academics who have previously studied GiveDirectly released a paper suggesting that these transfers¬†do have some sort of negative spillovers on households that didn’t receive the transfer. Johannes Haushofer, Jeremy Shapiro and James Reisinger found that non-recipients in villages which received GiveDirectly transfers reported substantially lower levels of life satisfaction. So if this negative spillover, which I will go ahead and call the Haushofer Effect (there – I just branded it – coming to a book store near you), really exists, then I would expect a substantial amount of lamentation in the GiveDirectly interviews of non-recipients.

To the contrary, most non-recipients said that, overall, they were happy that their village had received the transfers. I found this hard to believe, but after going through 50 interviews of non-recipients, most replied positively to the question “Are you, overall, happy that GiveDirectly came to your village?,” a handful replied neutrally, and only one was vocally unhappy about it. There was another question aimed more at the negative effects of not being selected, and even then only about 25% responded with identifiably-negative comments.

So what is going on here? Why is the Haushofer Effect not appearing in these qualitative interviews? As much as I would like to believe that people do feel happy about seeing their neighbours get a shitload of money, I think I am more likely to believe one of the following:

(1)¬†People don’t want to appear selfish, especially in front of a charity which might might might might give them a ton of cash some day. One respondent actually spelled it out: “”I am happy with your coming with the hope that one day I will also benefit.”

(2) The more complicated answer is that there is something about the conditionality of the question that changes its meaning. These families might be honestly happy about the fact that their neighbours (who are poorer) got transfers. But all the negative externalities associated with that (envy, local prices, etc) still make them unhappy in aggregate. A great example of this appeared in a recent episode of This American Life, where Neil Drummond tried to reconcile the fact that he really was happy his old friend Ta-Nehisi Coates had found fame and fortune with the reality that their friendship was slowly dissolving as a result of it.

(3) This village is different than the average village in the study above in some unobservable way.


I have no sense as to which answer is the most likely. And none of it will stop me from donating to GiveDirectly again. That said, while the charity should be praised for putting these interviews up on their website, they could take a step further and link to the paper on negative spillovers.


Update:¬†GiveDirectly’s Max Chapnick has a helpful reply/explanation in the comments below, rightly pointing out that the academic paper I cited relies on within-village randomization (rather than GD’s method of targeting poor households), so the Haushofer effect might be primarily driven by the unfairness inherent in that lottery mechanism. This is a pretty plausible reason for the differences between the empirical study and the informal interviews.

A randomista for hire is a dangerous thing

Our research shows that the treated (caged) group was 30% more likely to return home than the non-caged group.

Our research shows that the treated (caged) group was 30% more likely to return home than the control (non-caged) group.

The Behavioural Insights Team¬†is a research unit made up of randomistas who prefer to rely on behavioural economics and psychology to develop and test `nudges’ to achieve certain policy goals. They originally grew out of the Cabinet Office, but eventually went private (the CO has retained a stake in the BIT).

I was always excited by the mere existence of the Behavioural Insights Team – this was the first clear example of government investing in rigorous randomisation to test some of its policies.

That said, while the BIT likely comprises a group of people who want to make the world a better place, they are beholden to their clients. One of these clients is the Home Office, which is currently paying the BIT to find ways to convince illegal migrants to voluntarily leave the UK. From the BIT’s update report:

Increasing voluntary departures of illegal migrants

BIT has been working with the Home Office to consider new measures to help illegal migrants to voluntarily return home, focusing initially on engagement at reporting centres. Reporting centres are seen as an important but underutilised opportunity to prompt illegal migrants to consider whether leaving the UK voluntarily would be a preferable option in their circumstances.

Starting in December 2014, BIT undertook a short piece of ethnographic research at reporting centres across London, reviewing current procedures and interaction points to gain an understanding of the reporting centre experience from the perspective of a member of the reporting population and the reporting agent.

Informed by this, BIT developed several options for Home Office consideration to employ behaviourally informed trials in reporting centres that could encourage higher numbers of voluntary departures from the UK.

At this stage, the precise scope of a trial is still being finalised, with the aim to combine a number of behavioural elements to create a distinct reporting centre experience that encourages members of the reporting population to consider voluntary departure as an alternative to their current situation.

Note that many people who end up in reporting centres are asylum seekers, not just illegal `economic’ migrants. The BIT has another project in the pipeline aimed at targeting business who hire illegal migrants, with a similar end goal of convincing the migrants to voluntarily go home.¬†The Home Office got a lot of push back from trying this before, in the not-too-subtle form of a van driving around telling migrants to go home:


So now the UK government has turned to more insidious methods, aided by a team of randomistas. It’s useful¬†reminder that rigorous, evidence-based policy can be used for stupid, short-sighted¬†policy as well.


*Disclaimer: I once applied to work at the BIT, but dropped out midway through the selection process to work on a project in Oxford.

The IMF, inequality and the trickle-down of empirical research

"It took so many assumptions to put you together!"

“It took so many assumptions to put you together!”

By Nicolas Van de Sijpe

A¬†recent IMF staff discussion note¬†has received a lot of attention¬†for claiming that a smaller income share of the poor lowers economic growth (see also¬†here and here). This piece in the FT¬†is fairly typical, arguing that the paper “establishes a direct link between how income is distributed and national growth.”

It quotes Nicolas Mombrial, head of Oxfam International’s office in Washington DC, saying that (my emphasis):¬†“the IMF proves¬†that making the rich richer does not work for growth, while focusing on the poor and the middle class does” and that “the IMF has shown that `trickle down’ economics is dead; you cannot rely on the spoils of the extremely wealthy to benefit the rest of us.”

The aim of this blog post is to clarify that the results in Table 1 of ¬†the paper, which are based on system GMM estimation, rely on assumptions that are not spelled out explicitly and whose validity is therefore very difficult to assess. In not reporting this and other relevant information, the paper’s application of system GMM falls short of current best practices. As a result, without this additional information, I would be wary to update my prior on the effect of inequality on growth based on the new results reported in this paper.

The paper attempts to establish the causal effect of various income quintiles (the share of income accruing to the bottom 20%, the next 20% etc.) on economic growth. It finds that a country will grow faster if the share of income held by the bottom three quintiles increases. In contrast, a higher income share for the richest 20% reduces growth. As you can imagine, establishing such a causal effect is difficult: growth might affect how¬†income is distributed, and numerous other variables (openness to trade, institutions, policy choices…) might affect both growth and the distribution of income. Clearly, this implies that any association found between the¬†income distribution and growth might reflect things other than just the causal effect of the former on the latter.

To try to get around this problem, the authors use a system GMM estimator. This estimator consists of (i) differenced equations where the changes in the variables are instrumented by their lagged levels and (ii) equations in levels where the levels of variables are instrumented by their lagged differences (Bond, 2002, is an excellent introduction). Roughly speaking, the hope is that these lagged levels and differences isolate bits of variation in income share quintiles that are not affected by growth or any of the omitted variables. These bits of variation can then be used to identify the causal effect of the income distribution on growth. The problem with the IMF paper is that it does not tell you exactly which lagged levels and differences it uses as instruments, making it hard for readers to assess how plausible it is that the paper has identified a causal effects.

Continue reading

I drink your milkshake


The Ethiopians appear to be close to finalizing construction of a large hydroelectric dam on the Omo river, primarily to generate power but also to support local irrigation efforts.  Over the past five years the project has received substantial foreign financing and investment by China and indirectly by the World Bank. However, there appears to have been little consideration of the potential downstream impacts: the Omo river feeds Lake Turkana, which is a source of livelihood for a large number of communities in northern Kenya. The possibility that the lake may be partially drained is obviously upsetting a lot of people, although it does not seem that the Kenyan government is making a big fuss over the project.

This is a typical problem of negative externalities: the Ethiopians aren’t factoring in the welfare of Kenyan Turkana residents in the decision to build the dam. There’s actually some research showing that this is a common problem. From a recent World Bank paper by¬†Sheila Olmstead and Hilary Sigman:

This paper examines whether countries consider the welfare of other nations when they make water development decisions. The paper estimates econometric models of the location of major dams around the world as a function of the degree of international sharing of rivers. The analysis finds that dams are more prevalent in areas of river basins upstream of foreign countries, supporting the view that countries free ride in exploiting water resources. There is weak evidence that international water management institutions reduce the extent of such free-riding.

By their very nature dams generate inequality in the flow of water between upstream and downstream areas. It is easier to pay the cost of hurting downstream communities when they are are in a different country (hey, they don’t vote for you). Ergo, countries are more likely to build dams when the costs are external.

It would be interesting to see what mitigates these effects – it is possible that Kenya’s relative indifference is due to lack of political power on the part of the northern tribes. Are dams with substantial cross-border costs less likely in areas where the proximate ethnic group is quite powerful?