Scientists redid 100 psychology studies — and most didn't hold up
A 2015 effort to repeat 100 published experiments triggered psychology's 'replication crisis'.
A finding only counts as science if someone else can get the same result. In 2015, the Reproducibility Project: Psychology — a collaboration of hundreds of researchers coordinated by Brian Nosek — put that principle to a brutal test.
The teams carefully re-ran 100 experiments drawn from leading psychology journals, using large samples and, where possible, the original materials. When they tallied the results, only about 39 clearly replicated, and even the effects that did repeat were, on average, half the original size.
The culprits had names. P-hacking — slicing the data until something crosses the significance line. Publication bias, the ‘file drawer’ problem, where null results never see daylight and the literature skews positive. Small samples that turn noise into apparent signal. And HARKing — hypothesizing after the results are known, then dressing up a lucky pattern as a prediction.
The shortfall didn’t prove the original studies were false. An effect can be genuine but smaller than first reported, or real only in a particular context; replication is a measurement, not a verdict. The rot also ran wider than psychology, surfacing in medicine, economics, and cancer biology.
The shock fueled a reform movement: pre-registration locking in hypotheses before data arrives, Registered Reports that peer-review the method before results exist, open-data and open-materials badges, and large multi-lab efforts like the Many Labs project — now reshaping how research is done across fields.
Sources & references
2 referencesWell-established. Corroborated by 2 independent sources.



