The Freakonomics blog has an interesting blurb on a South African study which uses randomized response to ask sensitive questions about poaching. Here’s how it works: the interviewer gives the respondent a six-sided die, which is rolled while the interviewer’s back is turned. When asked if he has poached, the respondent answers “no” or “yes” if the die roll results in a one or six, respectively, and answers truthfully if the result is anything else.
This is basically a form of self-induced measurement error – but it protects respondents from revealing their illegal actions with certainty, and so should lead to more honest reporting (on average). If respondent A tells me he poached, I can give you the probability that he really poached (80%), but I cannot say with certainty that he poached.
This simple example of removing culpability by introducing randomness also illustrates how difficult it is to determine causality in other events which are subject to even a small degree of uncertainty. Many have tied the recent drought in and around the Horn of Africa to global warming. At a first glance, this isn’t an unreasonable position: most models of climate change predict that the variation in rainfall will increase (increasing the probability of extreme events, like droughts).
Yet, just because we can determine statistically that an intervention X an impact on the probability of event Y, doesn’t mean that event Y was caused by intervention X. In these cases, we have no true counter-factual – we cannot rewind to a world where intervention X did not occur to see if event Y still happened. If the drought was still possible in a world without climate change, then it becomes very difficult for us to link the two with certainty*. We can also use data on the incidence of riots to show that they are correlated with budget cuts, but that doesn’t mean the recent rioting in London had anything to do with budget cuts, even if those cuts made this type of riot more likely.
If this seems overbearingly defeatist to you, let me go a little further. Much of this uncertainty is due to the way we model things statistically – data informs the empirical model, not the other way around. Luckily, we have other ways of determining causality for single observations (my co-blogger Ranil has almost certainly blown a gasket by now, waiting for me to step away from statistics). If we find a randomly-allocated nutritional intervention reduces the probability of death for toddlers in a village, we as researchers can’t use that result to pick out single children whose lives have been saved, but a doctor who administers the intervention might be able to.
We can, if we’re confident enough in our models, make statements like “more riots like this will happen the worse the cuts get” or “more droughts like this one will happen if global warming gets worse” and use these assertions to inform policy. We can also use the empirical evidence of a relationship between budget cuts and rioting to shape the way we study the recent riots, even if we can’t use the same results to `prove’ that the riots were caused by cuts. Our empirical evidence is still invaluable, but we should use caution in how we bring it into arguments about today’s events.
*Although this does depend on how informative our model is – culpability would be more assured if, for example, the respondent in the first example rolled a 100 sided die, answering truthfully in the range of 2-99.