There has been much talk of economists starting up a trial registry for randomised interventions, or at least promoting the use of pre-analysis plans.¬†One of the chief reasons for doing this is to curb data mining – if researchers make it clear up front which hypotheses they plan to test, this will reduce the¬†incentive¬†to report new results, discovered only after the researchers have had time to dig around.
While I think trial registries are worth a try, I have already¬†discussed my worries their effects on the quantity of viable research (even if quality increases). These concerns aside, my question here is: why are trial registries primarily associated with randomised trials? Shouldn’t we also be moving to an equilibrium where all empirical¬†research¬†begins with a published pre-analysis plan?
I suppose the main hurdle is honesty here – for any dataset which already exists, it’s easy for me to download it, mine the data, then base my pre-analysis plan on empirical results I already know to exist. Furthermore, for any given dataset, the number of potential ¬†hypotheses (and thus the number of pre-analysis plans which can be written by different researchers) is very large. This suggests that there is something special about writing a pre-analysis plan before the data is even collected, rather than before someone opens up Stata.