An apple a day means nothing in a complex system

"But Mulder, what I'm seeing here goes against every single case study and ethnographic paper ever written."

“But Mulder, the evidence I’m seeing here goes against every single case study and ethnographic paper I’ve ever read.”

Recently there has been much fuss made over how researchers and practitioners should be more cognisant of how development policy plays out in environments which are characterised by complexity. While many have used the presence of complex systems to motivate a move towards more experimentation, tracking and empiricism, others have argued that we should instead eschew rigorous empirical methods (such as RCTs) and one-shot policy instruments and opt towards a more dynamic, qualitative approach to development policy.

As of late I have been particularly wary of this second camp, especially when the argument that data-driven methods and randomised controlled trials have little place in a world of complexity. Let me explain why this makes me uneasy.

The human body is itself a complex system, characterised by feedback loops and a lot of unknown parameters. Despite the fact that we know a surprising amount about what makes us tick, thanks to both theory and evidence from biology and medical science, we’re surprising inept at determining long term outcomes. Even so, when my complex system throws up signs that things are not well, I go to see my doctor. After examining me and assessing my symptoms, sometimes through laboratory testing, he makes a diagnosis. Based on that diagnosis, he chooses a treatment, often by selecting a pre-approved medication which has been tested using an RCT.

Let’s think about this for a moment. Most medical research is able to cleanly discern short-term benefits to taking a certain medication. While these medicines are developed using a heavy dose of (biological) theory and iterative testing, trials are rarely long enough to determine what the long term benefits or side-effects will be. While researchers can use previous results and theory to determine that chemical X will result in reaction Y in a human body, they rarely can account for all the possible effects. Randomised controlled trials get us part of the way there, but frequently cannot account for long term effects. So, while we can measure the aggregate effect of a treatment on an incredibly complex system on the short run, we really can’t say that much in the long term, nor can we say much about how these treatments might interact with other treatments.

In fact, it is with predictions about health over the long term where the precision of experimentation often gives way to less robust evidence (such as extended observational studies) or more ad hoc forms of rationalization (is milk good or bad for you?). Similarly, many of the bigger questions in development (how do we improve institutions? What causes economic growth?) are more difficult to address using the most rigorous methods. It is in these areas that, quite naturally, the randomistas have been least successful in their domination of the policy debate.

While we should find all of this disconcerting, the (current) inability of medical RCTs to give us definitive answers on what makes us live longer or be healthier in aggregate is hardly a reason we should rely on them any less. Imagine a world in which your doctor didn’t have access to any randomised medical research. Health professionals would have to resort to casual Bayesian inference to treat people (did John die when I gave him chemical Z?), and would have little sense of which medicines were `proven’ to work. We tend to look down on off-label use of medication, but in a world where rigorous scientific testing isn’t the norm, all prescriptions become off-label. It is a world not a million miles away from the one portrayed in the Mitchell & Web sketch “Homoeopathy A&E.”

The sketch also highlights what the development policy world is like when we toss out rigorous empirical evidence. Yes decisions are made based on qualitative expertise, but they are made without either definitive evidence (did this make a difference?) or appropriate empirical feed back (are things getting better?). A healthy dose of qualitative work is essential in development policy-making, but a world in which all decisions are done qualitatively is  far from ideal: how many of you would wish to be treated by that doctor who had been practising for 40 years, but had never read (or believed) a single medical study?

Just as medicines shown to work using rigorous clinical trials are an essential tool for a doctor navigating the complexities of human health, policies which have been shown to work in some context with an RCT become one of many tools policy-makers can use when operating within a complex policy environment. These types of rigorous trials certainly won’t solve all of our problems, but they are still extremely, extremely useful, even in a complex system. I’m glad that someone is putting out useful albeit marginal medicines which make me feel better when I get sick. It would be even better if someone could figure out more comprehensive interventions which take into account my entire biology, but in the meantime I’ll take what I can get.

10 thoughts on “An apple a day means nothing in a complex system

  1. Development Intern

    October 14, 2013 at 9:55pm

    Aren’t you ignoring the issue of limited external validity? This is much more relevant to development than in medicine – while human biology might be broadly similar around the world, human society sure isn’t.

  2. Matt

    October 14, 2013 at 10:16pm

    Good point. External validity is a major issue – I’ve written frequently about the need to be more cautious in our assumptions that RCT results will be generalisable. And it’s true, the unifying `theoretical’ framework of human biology is more coherent and useful than than economic contexts. But the point here is about complex systems: even though results are more generalisable across different people,* human beings still represent a complex system. If we are willing to accept that clinical trials have some use in this particular complex system, it isn’t clear that the complexity of economic settings is a good enough reason to reject RCT evidence.

    *Although I would still question this – would we really expect similar treatment effect sizes if we replicated clinical trials in different settings? . Generally results point in the same direction, but all sorts of factors (subject characteristics, other medicines, underlying health status) can change the size of the treatment effect.

  3. Jeff

    October 14, 2013 at 11:56pm


    Thanks for another interesting contribution to the debate on RCTs in development. I can’t pretend that I’ve been much of a randomista, but I do think there is an important role for different types of evidence in policy making. RCTs certainly play a role – they are brilliant at answering the question of ‘did x intervention work on this occasion?’ And as another respondent has brought up external validity, I won’t go down this path here. Rather, I would just say just as important is understanding the question ‘why did x intervention work on this occasion?’ — which can’t always be understood by an RCT.

    And from a policy-maker’s perspective, the question is actually ‘if I implement this policy/programme will it be successful?’ At its heart, that IS a Bayesian question of probability. I would suggest that RCTs help give us a more informed and potentially more accurate Bayesian Prior ( upon which to base that prediction. Nothing more, nothing less. But yes, that’s much better than a shot in the dark!

  4. Søren

    October 15, 2013 at 12:27am

    Yesterday I wrote a comment to Kirsty Newman’s blog that I think is relevant here as well.

    … let me explain why I and many others aren’t enthusiastic about RCTs – I believe you’re missing it. Perhaps it’s because we disagree about the nature of most development interventions. I believe it’s mostly about changing institutions. Now, if institutions were merely observable rules, RCTs would be terrific. RCTs would similarly be alright even if we consider institutions as icebergs where some parts are not observable – you just need to accept the black-box approach.

    However, more and more research is uncovering the complexity of institutions. How institutions are continuously pieced together strategically and unconsciously by people and groups situated in different localities and circumstances. Drawing on Frances Cleaver and Avner Greif, I define an institution as: a complex and dynamical emergent property of socially positioned actors organising of rules, beliefs, norms, and organizations that together generate a regularity of (social) behaviour.

    If this is true, what we need to identify is the mechanism more than the inputs by which institutions change. RCTs are not terrific for this – and that’s why I’m not enthusiastic about them.

    In other words: Yes, external validity is a pretty major issue but the point is not that another tool will do the job. It’s that the doctor’s recipe analogy doesn’t really hold water. Cells, unlike humans, are not social and political actors.

  5. Matt

    October 15, 2013 at 10:23am


    You and many others seem to think this post was some argument that RCTs should be the gold standard for all evidence on development interventions. That wasn’t what I was arguing. What I was arguing (and what I repeatedin the comment above), is that the presence of complex systems in development contexts does not negate the usefulness of RCTs, no more than the presence of complexity in human biology negates the usefulness of clinical trials. We can argue about analogous these two are (and this largely falls on issues of external validity), but the point still stands.

    Your argument is a separate one, and one I’ve heard you make time and time again. You are right: development is largely an issue of institutional change, and RCTs are not very good at answering these questions (just as clinical trials cannot answer exactly what we need to do to live to 100, although we might suspect that they contribute at the margin).

    That argument in itself doesn’t get us very far because it’s clear that 1) RCTs are *incredibly* useful at answering questions about specific policy interventions that institutions themselves must decide to take. i.e. they represent an extremely powerful tool for policymakers in countries (NOT donors) to consider when trying to choose their optimal bundle of policies at any given time . and 2) Donors can push these micro-level interventions and improve welfare enormously at the margin, even if institutional change isn’t happening.

    Does this get us all the way there, no it doesn’t – but that’s not what I’m arguing here!

  6. Alan Hudson

    October 16, 2013 at 1:05pm

    Thanks Matt for a useful piece and comments.

    Here’s my take. RCTs are good for understanding whether a specific intervention works in a particular context. But development is more about the evolution of a social/political system, with “internal” dynamics the primary driver rather than external interventions.

    I think the language of interventions is very problematic. It suggests that outsiders have the solution – they know what works and how to administer it – when in many cases, particularly when the focus is institutional change rather than vaccination, that simply isn’t true. Development isn’t and shouldn’t be about external experts intervening.

  7. Søren

    October 16, 2013 at 4:19pm

    Hey Matt, erh, thanks. Apologies in advance but I’m gonna make my argument one more time. I think the feeling of being misunderstood is mutual.

    Let me put it differently. I’m not thinking you’re arguing for the universal supremacy of RCTs. Neither am I against RCTs on principle. I agree they can be *incredibly* valuable in some settings. I believe RCTs can be very useful in mixed method approaches. And to be honest, they are the gold standard, although sometimes a costly one, when it comes to impact assessments, aren’t they.

    However, I disagree with you about their relevance for complex systems. Complexity fundamentally does in fact negate the value of RCTs. Correct me if I’m wrong but for an RCT to be useful, what we’re studying has to be an acceptably stable system. The average, or the specific point in time, need to be of interest. If the subject of study is dynamic, this is not true. Then it’s the trajectories that are of interest. Identifying a particular point in time is of little value if we have no idea how we got there and where we might be heading. Similarly, the average pathway isn’t all that relevant – at least without the probability function.

  8. Søren

    October 16, 2013 at 8:19pm

    I’m now realising my use of the word ‘negate’ in the last paragraph makes me look a bit mad. My intention was tad less categorical. I hope it’s not taking the attention away from the overall argument.

  9. Adam Cernea Clark

    October 17, 2013 at 8:10am

    It doesn’t seem to me that a focus on institutions is incompatible with RCT-based intervention development. Neither does it seem like the fact that an RCT is not going to provide one with a generalizable solution is terribly disconcerting. True enough, an RCT will not give you a remedy for every situation. How could it. But I think that this critique misses the mark a bit. One of the whole points of trying to look at complex adaptive systems as such is to uncover some of the non-linear dynamics going on. If you find out that an RCT works in one set of instances but not another set, the inquiry from a complexity-perspective does not end there. In fact, if your results were truly generalizable, you would have a much harder time understanding why a particular intervention is optimal. When looking at complex adaptive systems, the failures of a model (or intervention based on a model or set of models, such as the parameters for an RCT) is just as (if not more) important as its successes. Indeed, without parameters framing the efficacy of a particular intervention, there is little explanatory power for why that measure works, particularly in the case of an RCT.
    The development world could do well to take failure more seriously as a crucial programmatic/policy tool. RCTs can at least identify important failures. One of the benefits of identifying the set of systems in which the optimal intervention yielded from an RCT does not work is that it points out real, important differences between those systems where the intervention works and those where it doesn’t. These differences may provide the actor with a greater understanding of both kinds of systems—those where the measure works and those where it doesn’t. RCTs are not just important for identifying what might work, but for making the bases upon which interventions are based and the understanding of the impacted systems more robust. They can also point out important research areas in the course of identifying the parameters for success and failure of an RCT. No doubt, institutions play a crucial role, but they do not exist over-and-above the systems that development agencies attempt to intervene in and they may well precisely what is identified in further research into the parameters of success and failure of an RCT. Further, if actors employing RCTs are indeed looking at the world through a complexity lens, it seems likely that any resultant explanation of the role those institutions play in the success or failure of an intervention will be more useful for identifying the dynamics of institutional failure.

  10. Kartik

    October 17, 2013 at 10:20pm

    “…the presence of complex systems in development contexts does not negate the usefulness of RCTs, no more than the presence of complexity in human biology negates the usefulness of clinical trials.”

    Matt, it may not negate the usefulness of RCTs, but I would argue that it mitigates their usefulness.

    I’m with Soren here—the usefulness of RCTs rests heavily on their ability to tell us something valuable that can be applied across contexts. In fact, this is one of the rejoinders that RCT supporters use to justify the cost/length of time critique against RCTs. So the ability of RCTs to answer micro questions is only one part of the usefulness equation, and application of these answers across contexts is another part of the equation. Hence, we can’t just put the external validity critique to the side.

    Although RCTs can teach us about the effectiveness of micro interventions despite complexity does not mean they are useful in dealing with complexity.

    -First, different systems have different levels of complexity (as you hint at). Just because the human body is a complex system and an economy/community is a complex system does not mean that they are analogous enough that RCTs are equally useful in both realms (Soren’s point from above).

    -Second, as complexity within a certain setting increases, external validity is likely to decrease, so we should pay a lot more attention to mechanisms and impact trajectories for more complex interventions to increase the usefulness of evaluation. RCTs have the potential to help with this, but do not address this issue by definition or by current practice (in large part). Plus, qualitative methods may be more well suited to filling this role.

    -Third, even in medicine, we’re seeing that results that have been accepted by the literature/profession do not stand up to closer scrutiny (example: If we believe that external validity in human biology is higher than it is across social systems, then this raises a note of caution for the usefulness of RCTs in development.

    -Some recent work suggests that RCTs may not in fact tackle complexity as well as we might think: (Sandefur et al. on extra teacher program in Kenya, which you already know about) (Paper on team learning vs. RCTs in dealing with complex problems)

    Ultimately, it’s not entirely right to leave it at “RCTs are useful for policymaking.” The more important question is not whether RCTs are useful, but how useful they are compared to other methods (e.g. in grappling with complexity). This is an empirical question, one that we have a dearth of evidence on in development.

    I think the sentiment that it’s hard to determine the value of any given type of research ex ante applies here. Hence, I’m all for rigor in learning about how things work, but I’m against rigor narrowly defined and practiced.

    P.S. There is in fact a lot of understanding about how life expectancies have increased dramatically over the past 150 years, and many of the innovations that produced this result were not tested by RCTs. Instead, they required a lot of experimentation (writ large), which is not necessarily the same thing as RCTs. See here:

Comments are closed.