Research error and the reliability of big reports

Reports produced by NGOs and think tanks are often a ragged amalgamation of other, questionable research. In other news, Dr. Frankenstein's results are still awaiting verification through a randomized trial.

Christ Blattman points to an Atlantic Monthly article on the likelihood that that most published research delivers false results:

Simply put, if you’re attracted to ideas that have a good chance of being wrong, and if you’re motivated to prove them right, and if you have a little wiggle room in how you assemble the evidence, you’ll probably succeed in proving wrong theories right.

Following the same train of thought, Alex Tabarrok, points out that by pure statistical chance, about 5% of all false hypotheses that are tested will give statistically significant results. If you believe that most hypothesis are false, and that we’re only successful in identifying true hypothesis part of the time, then our collection of statistically significant results could be largely contaminated with false positives (Tabarrok gives an arbitrarily alarming figure of 25%).

This makes it terribly important to take these results carefully, and not to treat each individual study as the final word on a subject. It is also a very, very good reason to be distrustful of big, conclusive reports; the sort that are often produced by international NGOs and many of the bilateral donors and think tanks.

Researchers are often called upon to make expansive, sometimes global  statements about tenuous, uncertain relationships (like the relationship between climate change and HIV/AIDS). The tendency is for these researchers to mine results for useful `impacts’, then use those as underpinning assumptions for bigger leaps of logic: a researcher takes person A’s results and person B’s results, proclaims they are true, and uses them to produce results C.

Even if report-writers do this very carefully, they are still bound by the limitations of the original study – and the probability of error goes up. If there is a 25% chance that person A’s results are a fluke, and a 25% chance that person B’s results are a fluke, then there is only a .75 x .75 = 56% chance that result C isn’t constructed from some false results. This is less of a problem if the researcher is considering many results from a single hypothesis, but if the researcher cherry picks different hypothesis (say, for example, an assumption of the impact of X on Y and Z on Q) and strings them together, such flaws will be more and more pronounced.

Tabarrok has a list of things everyone should consider in a world where most research is false. I’ll add a few more, pertinent to the uncertain world of development reports and policy briefs:

  1. Be extremely wary of reports touting specific numbers. A report which says that climate change will cost us exactly $50 billion in the next ten years probably has many, many, many assumptions behind it. For each additional assumption, consider the collective probability that the whole estimate is wrong.
  2. Read the footnotes  and references behind assumptions, and follow-up with the source literature. Be wary if a number is taken from a study that has never been published, or for which there is no clear evidence of inspired debate. Be wary if the author does not mention and appreciate the potential problems with those assumptions, or the reference’s place in the general literature.
  3. Please, please, put on your causality cap before you start touting any numbers.

4 thoughts on “Research error and the reliability of big reports

  1. Ranil Dissanayake

    November 19, 2010 at 5:37am

    I think these insights are much stronger when you’re talking about economics and specifically econometric studies.

    Historical/anthropological and to some extent sociological studies rely on very different research methodologies, and rarely try and deliver big grand ‘answers’. They demonstrate linkages, try and establish causality to the best of their ability through a range of evidence and normally stop short of any attempt to say ‘wherever x happens this will happen to y’.

    Grand statements in these disciplines are often greeted with deep skepticism, and rightly so. Fukuyama’s ‘End of History’ is one example – it got a lot of popular press, but plenty of historians saw the big flaw: his history of capitalism v. communism was a lot newer and had a lot less staying power than the much older narratives of religious conflict.

    btw – This is the second time we’ve used a shot from this precise scene of Young Frankenstein!

  2. Matt

    November 19, 2010 at 10:09am

    Sort of – the basic probability of research `wrongness’ is mostly due to statistical probability – any study that uses traditional methods of inference will be subject to it (I reject a hypothesis at a low-enough p-value, which is the probability that this result is just randomly positive). That covers pretty much all of the social sciences, medical research, and statistics, and by no means is concentrated in economics.

    It’s also a problem independent of causality issues – it exists even if I want to use statistical inference to say “X is associated with Y”.

    You could make an argument that big reports are more likely to involve economic research, but I don’t see the basic probability of being wrong to be worse for econometric research than for, say, psychometrics, or a medical trial with 30 people.

    These reports are not usually compiled by academic economists, usually because they are the ones that, on average, are more nervous about making grand statements. There are exceptions to this rule. But because we often deal with numbers, and produce statistical papers which have a numerical result, our results are more likely going to be taken and used as some underlying numerical assumption. Anthropology and history are, by default, immune to such cribbing.

  3. Ranil Dissanayake

    November 19, 2010 at 11:14am

    Good points. But I’m also suggesting that the ways in which Historians and anthropologists make their argument are less prone to ‘definitiveness’ in proving or not proving a hypothesis, which is both a strength and a weakness. You provide evidence for it, but don’t generally attempt to quantify the likelihood of it being correct – you always allow for the counterpoint.

    Point taken re: other disciplines, and you make good points throughout though.

  4. MJ

    November 19, 2010 at 4:20pm

    Excellent post. I am also v worried when I see big numbers thrown about as being the cost to do X. In statistics there is, of course, a solution to this: publish your numbers with confidence limits, but very few non-academic reports do this. These sorts of things can matter a lot. E.g. in conservation people get v worried about rates of illegal logging and there are some highly cited numbers out there which, upon further inspection, turn out to be v soft, but are treated as hard estimates by most, and some major policy decisions made on the back of them. Unfortunately, for the most part, the media are also not interested in such ranges, and want instead the simplest answer. All is not completely lost, though; I note that these days opinion polls are usually published with +/- ‘margin of error’, so people clearly are capable to grasping that not everything is as simple as it is often first presented.

Comments are closed.