Archive for category Research

Africa, the safest web region?

Despite the masses of negative publicity heaped on the continent by the famed Nigerian spam industry, Africa is actually one of the world’s safest places to go online in—featuring seven of the ten nations least attacked by malware.

Virus-checker company AVG surveyed 127 million computers in 144 countries and calculated the average rate of attacks—with the African nation of Sierra Leone emerging as the least assaulted, with only one virus event logged per 692 web users.

Really? My experience (admittedly primarily limited to the offices of Malawian government) is that you should treat any internet-capable computer south of the Sahara as a instant death.

Since internet and e-mail access on the continent tends to be a less reliable and more expensive, a lot of information transfer is done using memory sticks. Even if computers aren’t subjected to very many attacks from the outside,  it just takes one infected stick and a few marginally motivated employees to spread a virus to every other computer in the office. Many of these are the nasty, older viruses/trojans/worms which knock out the antivirus program’s ability to function, which means that AVG can’t see them.

This can happen astonishingly quickly. Tired of spending five minutes scanning my colleagues’ USB drives every time I wanted to get an Excel table from them, I once tried to quarantine and clear every computer in my department,  installing new (trial) antivirus on each cleared system. Unfortunately I missed a couple of computers, and within month when the trial software stopped working, the entire department had been reinfected.

It’s possible that AVG’s results are due to pretty extreme selection bias  on two fronts:

  1. AVG users are probably a little more concerned and careful than those who don’t bother to update (as most don’t).
  2. As I mentioned before, many attacks can knock out AVG, which means no reporting.
  3. Many don’t bother to update AVG’s virus definitions, leaving the program incapable of detecting or reporting new viruses.

So yes, Africa might be a safe continent to go online by yourself in a locked room with tape over your USB drives, but any file-swapping outside the net should be handled with extreme caution.

Hat tip to Chris Blattman’s Google Reader shared items.

The migrant’s dilemma

Where would people end up if there were no barriers to movement?

The folks at Gallup, who recently produced some interesting figures on the large number of people from developing countries who  would like to permanently emigrate, have followed up with new data on where people would like to move to.

Using their survey data to predict the proportion of the population who would move if all barriers were dropped, they constructed net migration indices, basically showing the increase/decrease in adult population which would result if everyone got their wish. For example, below we have the top gainers (in percentage terms):

Read the rest of this entry »

Tags: ,

Randomized trials are so 1930s

Jim Manzi, the CEO of Applied Predictive Technologies (a randomized trial software firm), reminds us that we’ve been subjecting public policy to experimental methods for quite some time:

In fact, Peirce and others in the social sciences invented the RFT decades before the technique was widely used for therapeutics. By the 1930s, dozens of American universities offered courses in experimental sociology, and the English-speaking world soon saw a flowering of large-scale randomized social experiments and the widely expressed confidence that these experiments would resolve public policy debates. RFTs from the late 1960s through the early 1980s often attempted to evaluate entirely new programs or large-scale changes to existing ones, considering such topics as the negative income tax, employment programs, housing allowances, and health insurance.

So the randomistas aren’t so much as a “new wave” as the “next wave.” More interesting though, are Manzi’s thoughts on external validity:

By about a quarter-century ago, however, it had become obvious to sophisticated experimentalists that the idea that we could settle a given policy debate with a sufficiently robust experiment was naive. The reason had to do with generalization, which is the Achilles’ heel of any experiment, whether randomized or not. In medicine, for example, what we really know from a given clinical trial is that this particular list of patients who received this exact treatment delivered in these specific clinics on these dates by these doctors had these outcomes, as compared with a specific control group. But when we want to use the trial’s results to guide future action, we must generalize them into a reliable predictive rule for as-yet-unseen situations. Even if the experiment was correctly executed, how do we know that our generalization is correct?

One example he discusses the frequent experimentation used in crime-prevention, and how the (very few) subsequent attempts:

Criminologists at the University of Cambridge have done the yeoman’s work of cataloging all 122 known criminology RFTs with at least 100 test subjects executed between 1957 and 2004. By my count, about 20 percent of these demonstrated positive results—that is, a statistically significant reduction in crime for the test group versus the control group. That may sound reasonably encouraging at first. But only four of the programs that showed encouraging results in the initial RFT were then formally replicated by independent research groups. All failed to show consistent positive results.

My biggest fear about the current trend in social science RCT work is not only the failure to confirm positive results, but the failure to confirm negative results. While there is a small, but real incentive to repeat a ‘proven’ randomized study in a new setting, there isn’t much being done to confirm that a negligible treatment effect doesn’t improve elsewhere. While big RCT research groups do care about external validity, it is the initial findings that get seared into the mind of the policymakers. Flashy graphs which generalize without concern don’t help.

Here’s part of the closing to Manzi’s piece, which is a must-read if you’re interested or involved in this type of work:

It is tempting to argue that we are at the beginning of an experimental revolution in social science that will ultimately lead to unimaginable discoveries. But we should be skeptical of that argument. The experimental revolution is like a huge wave that has lost power as it has moved through topics of increasing complexity. Physics was entirely transformed. Therapeutic biology had higher causal density, but it could often rely on the assumption of uniform biological response to generalize findings reliably from randomized trials. The even higher causal densities in social sciences make generalization from even properly randomized experiments hazardous. It would likely require the reduction of social science to biology to accomplish a true revolution in our understanding of human society—and that remains, as yet, beyond the grasp of science.

Tags: ,

Questionable parentage

Gabriel Demombynes over at the World Bank blog has s0me more interesting things to say about the Multidimensional Poverty Index (MPI). There’s one claim he makes a claim which I find particularly interesting:

The MPI is a descendant of the earlier Human Development Index and is similar to the various Unsatisfied Basic Needs indices long used in many countries.

Several others, including Duncan Green, have also stated that the MPI is a natural follow-on from the Human Development Index (HDI), which I’m not sure is correct, as the two have a very different conceptual basis.

As its name implies, the MPI falls into a class of indices known as poverty measures. While they can get quite complex and opaque, the more basic of these have a similar approach: First we have to pick a welfare measure. This could really be anything that is measurable, but is most commonly income, consumption or asset wealth. Then comes the surprisingly contentious task of choosing a threshold, under which people will be classified as being poor if they do not meet it. These poverty lines can be absolute or relative, the latter indicating a greater concern for inequality than absolute deprivation. Counting the poor gives us a final tally of those living below the poverty line.

The MPI is an extension of this approach, instead using a range of indicators wrangled together a multidimensional poverty line. While single-dimension poverty lines make very precise statements about people along one dimension (Person i can only be not-poor if their income Xi exceeds the poverty threshold P (Xi >P), multidimensional lines can classify two households as being poor even when they face vastly different circumstances. For example: two people might be equally unhealthy, but one has enough asset wealth to be classified as “not-poor”. The MPI also tries to include information on the severity of poverty, for those that face many different deprivations all at once, a conceptually similar approach to the poverty gap and squared poverty gap indices.

The MPI, like the other poverty measures that came before it, focuses on a particular segment of the population, discarding all information about the non-poor. Because it is derived by counting individuals whole fall into a pre-specified condition, it is best thought of as a way to describe the state of this sub-population, rather than as a comprehensive indicator.

In contrast, the Human Development Index was intended to be used to make statements about the overall progress of a country’s development. While all of its components are aggregated from individual or household information, or from counting those in a certain condition (i.e. those that are literate, or who have died this year), they do not give the same type of insight. The education component is similar (we are just counting those who are in the state of literacy or who are enrolled in school), but with GNI and life expectancy, we aren’t really counting anything, we’re expressing moments and expectations from interesting country-wide distributions. We cannot say “X number of people have an HDI of Y.”

The HDI was initially introduced as an alternative to just relying on income as a measure of human welfare. This way of looking at the world, which became very popular following Sen’s work on the capabilities approach, also motivates the MPI as an alternative to only considering poverty in income. The weakness in both the indices is in their method with dealing with multidimensionality – by using ad hoc methods of averaging different dimensions together to come up with a single number.

So, when describing the MPI to someone new, one might refer to it as “an extension of traditional income-based poverty measures, taking into account the multidimensional nature of poverty, much as the Human Development Index considers the multidimensional nature of development. Both consider just measuring income, or consumption, to be insufficient,” rather than as a natural follow on from the HDI.

Tags: , , ,

How does the MPI measure up?

Duncan Green introduces us to the new Multi-Dimensional Poverty Index (MPI), developed by the Oxford Poverty and Human Development Initiative (OPHI):

The MPI brings together 10 indicators of health (child mortality and nutrition), education (years of schooling and child enrolment) and standard of living (access to electricity, drinking water, sanitation, flooring, cooking fuel and basic assets like a radio or bicycle). It’s thus a logical extension of its predecessor, UNDP’s pioneering Human Development Index, launched in the first Human Development Report back in 1990, which combined life expectancy, education (literacy + enrolment rates) and GDP per capita.

The measure, like the HDI, is part of an attempt to get a “better measure” of poverty, by including many non-income indicators. While I think most would agree that policymakers and researchers should always consider non-income indicators of welfare, does it make sense to average them out into a single index?

What precisely are we measuring when the HDI for a given country increases by .01? These questions always seem to lead back to the original indicators: “A advanced in rank because of education improvements” or  “B is lower than C despite being richer, because life expectancy in B is much lower.” Given that we need to unpack these indices to figure out what’s going on, why do we bother to pack them in the first place?

Duncan, always open for a healthy debate, has already posted a criticism of the MPI by Martin Ravallion of the World Bank, which questions the implicit values placed on different indicators when they are weighted:

The index is essentially adding up “apples and oranges” without knowing their relative price. When one measures aggregate consumption from household-survey data for the purpose of measuring poverty, as in the World Bank’s “$1 a day” measures, one relies on economic theory, which says that (under certain conditions) market prices provide the correct weights for aggregation. We have no such theory for an index like the MPI. A decision has to be taken, and no consensus exists on how the multiple dimensions should be weighted to form the composite index.

On closer scrutiny, the embedded trade-offs (stemming from the weights chosen by the analyst) can be questioned, and may be unacceptable to many people.  In the context of the HDI, I pointed out 15 years ago that by aggregating GDP per capita with life expectancy the HDI implicitly put a value on an extra year of life, and I showed that this value rises from a very low level in poor countries to a remarkably high level in rich ones (4-5 times GDP per capita).   If it was made clearer to users, I expect that they would question this trade-off embedded in the HDI.

The MPI index faces the same problem. How can one contend (as the MPI does implicitly) that the death of a child is equivalent to having a dirt floor, cooking with wood, and not having a radio, TV, telephone, bike or car?  Or that attaining these material conditions is equivalent to an extra year of schooling (such that someone has at least 5 years) or to not having any malnourished family member?  These are highly questionable value judgments. Sometimes such judgments are needed in policy making at country level, but we would not want to have them buried in some aggregate index.  Rather, they should be brought out explicitly in the specific country and policy context, which will determine what trade off is considered appropriate; any given dimension of poverty will have higher priority in some countries and for some policy problems than others.

One could continue to argue about the weights – but Ravallion’s argument will still stand. I fail to see why these indices amount to anything more than intellectual exercises – while the HDI has got us all thinking about other things than income, has it really been useful as a method of actually measuring development? Is the MPI likely to do any better with poverty?

Oxford Poverty and Human Development Initiative (OPHI)

Tags: , , , , ,

Counting desires

Preference aggregation can be tricky business

A key assumption behind the Global Burden of Disease project is that it is possible to come up with a “Disability Weight” for each health state.  Diseases conditions that are considered worse than other carry higher disability weights than others.  A very important issue in the development of such weights is the question of who should define these conditions?  Should those who have the conditions be the best judge or are they biased?  Should healthy people who have never experienced these conditions be the judge?  Should doctors decide?  Should policy makers?  Should health economists (gasp!!)?

In the past, the GBD has relied upon “expert opinion” to make such decisions.  Well, it seems for the next update of the GBD, which is currently underway, you can also be an expert.  I came across a link to the following survey earlier today that allows you to have some input in these weights.

That’s Karen Grepin discussing an attempt to aggregate beliefs over disease burdens to better define the weights given to different ailments. This is a very similar exercise to preference aggregation, where we attempt to construct a unified set of beliefs that will govern public policy. The result is something approaching a social welfare function, which allows us to make statements like “Society strictly prefers A to B.” One way of doing this is to get a sample of individuals to compare different states and to try and tease out an overall ordinal ranking of these states. Using Grepin’s example, each person has to make a pairwise comparison:

The first person has swelling and tenderness in the testicles and pain during urination.

The second person has lost part of both legs, leaving pain, tingling, and frequent sores in the stumps. The person has great difficulty moving around and has episodes of depression, anxiety and flashbacks to the injury.

By asking enough people to compare different states with different combinations of symptoms, we can tease out their overall ranking of those symptoms – how this is done can sometimes be contentious and quite technical. That ranking then represents the best approximation of everyone’s relative rankings of disease burden.

Read the rest of this entry »

Tags: , , ,

The burden of proof

Honestly, folks, we don't know what the impact will be.

Over at Aidwatch, Alanna Shaikh, citing a few others, considers the limits of impact analysis. At one point she cites a post by Steven Lawry:

“Many real-world problems are not easily described with the kind of precision that professional mathematicians insist upon. This is due to the limitations of data, the costs of collecting and analyzing data, and the inherent difficulties of giving mathematical expression to the complexity of human behavior.” This strikes me as very true. At what point are we expecting too much from our impact assessments?

While the more rigorous impact assessments certainly require some statistical knowhow and reliable data, they don’t necessarily require giving “mathematical expression” to human behaviour. Even though the resulting academic publications might have some calculus window-dressing, an impact measurement is generally about as atheoretical as they come: what was the impact of X on Y? When academics move on and start asking why X impacts Y, they then often retreat to the black box of mathematical modeling (usually in a desperate attempt to avoid qualitative methods, which Chris Blattman writes an excellent post about here).

Alanna also discusses a point made by Andrew Natsios:

Natsios points out that USAID has begun to favor health programs over democracy strengthening or governance programs because health programs can be more easily measured for impact. Rule of law efforts, on the other hand, are vital to development but hard to measure and therefore get less funding.

I think this is the most important criticism of over-reliance on empirical assessment – donors will prefer to fund causes that can easily signal an impact that can be touted back home. A reasonable counter is that those that swim in murkier waters just have to work harder to show their impacts, but in reality they are more likely to either let effort collapse, or just migrate over to programs that do get the funding.

While I’m partially sympathetic to doubts about impact-analysis, I think that much of (but not Alanna’s) the criticism is self-serving: let us continue using the same methods we’ve always used, which happen to always show an impact despite the never-ending micro-macro paradox.

That’s fine if you choose to reject statistical rigour, but please don’t pop up five years later and claim that your project/aid flow is responsible for all sorts of wonderful things you can’t really prove. Some may be content with photos, anecdotes and correlations, but don’t be surprised if the rest of us aren’t.

This doesn’t mean that everything should (or could) be judged by a hardcore RCT starting tomorrow, but when the evidence is less direct, the onus is on the presenter to be more modest and careful with their assertions.

Tags: ,

The tricky ethics of education information, J-Pal edition

The Roving Bandit discovered this graph produced by J-Pal on cost-effective interventions for education.

That red bar is a result from a RCT in Madagascar which provided families with information on the “returns to education,” resulting in a reasonable increase in attendance (3.5%).

What’s the catch? The study wasn’t actually giving people an accurate measure of the returns to education in Madagascar, it was giving people the average correlation between education and income, job availability.

Why the distinction? Most economists believe that education attainment is highly endogenous – this means that brighter people and those from my advantaged backgrounds are more likely to attend and do well in school. This muddies the waters, as high-ability people are also more likely to earn higher wages.

There have been many, many attempts to control for the sort of characteristics that might bias the effect of education on earnings, but the study cited by J-PAL doesn’t really do this. This means they aren’t presenting people with an accurate estimate of their own gains from further education, but instead giving them a picture of the sort of lives educated people live.

Shouldn’t we be luring people into school anyway, though? Isn’t more education, in general, a good thing? J-Pal’s cost-effectiveness chart assumes this (notice that it assumes education is an end in itself, even though the intervention uses education as a means to further income).

Possibly, but I am dubious about the ethics of boosting people’s underlying demand for education. Providing accurate information on returns would allow them to make informed decisions (i.e. does this really make sense given my situation?), but instead providing them with a rosy picture may be leading them to make decisions that aren’t actually in their own best interest.

Supply-side interventions that lower the cost of attending schooling – pretty much everything to the right of the red bar – are more likely improve outcomes without the implicit deception.

Tags: , ,

Be careful who you nudge

A new study by economists from UCLA looked at the effect of a randomised program which informed households of their energy expenditure relative to their peers:

We show that while the electricity conservation “nudge” of providing feedback to households on own and peers’ home electricity usage works with liberals, it can backfire with conservatives. Our regression estimates predict that a Democratic household that pays for electricity from renewable sources, that donates to environmental groups, and that lives in a liberal neighborhood reduces its consumption by 3 percent in response to this nudge. A Republican household that does not pay for electricity from renewable sources and that does not donate to environmental groups increases its consumption by 1 percent.

An overview of the study at Slate. Hat tip to MR.

Tags: ,

Do the MDGs influence national policy? Should they?

Duncan Greene looks over a paper by Sakiko Fukuda-Parr on how well PRSPs in developing countries reflect the priorities in the Millennium Development Goals. The results reveal mixed involvement:

The analysis found a high degree of commitment to MDGs as a whole but both PRSPs and donor statements are selective, consistently emphasising income poverty and social investments for education, health and water but not other targets concerned with empowerment and inclusion of the most vulnerable such as gender violence or women’s political representation.

Fukuda-Parr and (to a lesser extent) Greene seem to be making an implicit judgement: that further alignment between national development strategies and the MDGs is the most desirable outcome.

While the MDGs have been incredibly important for shaping how we perceive development and offer a reasonable set of indicators for tracking the progress of poor societies (which we should continue to use), I think it’s unreasonable to expect or promote broad policy harmonisation around them.

For one, the MDGs are a broad set of international goals, but they do not comprise a one-size-fits-all policy, yet we continue to treat them as comparable indicators and implicitly weight them equally (this is reinforced by the structure of the MDGs). Why should India, where less than 80 children per 1,000 die before their fifth birthday, put the same weight on halving under-five mortality as Malawi, where over 130 children suffer the same fate? What if Indonesia decides it wants to put more weight on industrial policy than agricultural policy, with the expectation that the former will do more to reduce poverty in the long run? I think policy-makers and researchers often confuse the normative aspects of the MDGs (what we want to achieve) with the operational side (by trying to directly target each of the things we want to happen).

To be fair, Fukuda-Parr isn’t suggesting that developing countries should be just copying and pasting, even if that’s what appears to be happening:

Most [PRSPs], however, appear to have applied MDG Targets somewhat
mechanistically, without adaptation.

There seems to be a preference for adaptation; taking the normative framework of the MDGs and adjusting it for the local context. Even so, why must developing countries mold their strategies around a normative framework they don’t truly own? The MDGs represent an international consensus, but it’s not clear that the goal set that results from such a negotiation will bear much resemblance to any individual country’s aspirations.

Several months ago, I suggested that the next set of MDGs to be built from the ground-up, an aggregation of the goals of multiple development strategies. Instead of the international community telling developing countries what their priorities should be, then scouring planning documents to ensure adherence, the structure should grow from the opposite direction. Governments and civil societies in poor countries need to determine their own objectives for development, after which the international community should do its best to help them achieve it.

Tags: , ,