Random thoughts left lying around

There has been much talk of economists starting up a trial registry for randomised interventions, or at least promoting the use of pre-analysis plans. One of the chief reasons for doing this is to curb data mining Рif researchers make it clear up front which hypotheses they plan to test, this will reduce the incentive to report new results, discovered only after the researchers have had time to dig around.

While I think trial registries are worth a try, I have already¬†discussed my worries their effects on the quantity of viable research (even if quality increases). These concerns aside, my question here is: why are trial registries primarily associated with randomised trials? Shouldn’t we also be moving to an equilibrium where all empirical¬†research¬†begins with a published pre-analysis plan?

I suppose the main hurdle is honesty here – for any dataset which already exists, it’s easy for me to download it, mine the data, then base my pre-analysis plan on empirical results I already know to exist. Furthermore, for any given dataset, the number of potential ¬†hypotheses (and thus the number of pre-analysis plans which can be written by different researchers) is very large. This suggests that there is something special about writing a pre-analysis plan before the data is even collected, rather than before someone opens up Stata.