So I’ve finally just got around to reading John Ioannidis’ article in PLoS Medicine, “Why Most Published Research Findings Are False”. I’m a little surprised to see it hinges on a concept which I was taught scant weeks ago, positive predictive value. That article might be jibberish to somebody without a background in biochemistry, statistics or epidemiology, but I think it’s really important for anybody with an interest in science, particularly health science / fitness / nutrition, to understand. For simplicity I’m not going to adress bias or effect sizes, but Ioannidis explains their effects well nearer to the bottom of the article.
When studies are being designed, researchers accept that there is a possibility they will interpret their data to mean their hypothesis is true when it’s actually false (a false positive), or false when it’s actually true (a false negative). The standard in health research is to design studies in such a way that the probability of false negatives is either 10% or 20%, and that of false positives is less than 5% (and ideally less than 1% – these are the biologist’s beloved p<0.05 and p<0.01). Lowering the probability of one of these events increases the likelihood of the other for reasons which are slightly too fiddly to explain here, but keeping false positives below 5% is always regarded as more important. Consequently, a great deal of studies in underfunded fields like exercise physiology are designed in a way that far more than 20% of results will be false negatives, because lowering this proportion requires greater numbers of participants than funding can be obtained for. By extension, a great number of true positives are “lost”.
Let’s say I’m going to spend a long and illustrious career in epidemiology. Let’s say I’m going to do 1,000 studies to see whether individual things make you healthier, but that only 10% of those things really work. Particular vitamin supplements, for instance. Let’s say I always have enough funding to keep my chances of false negatives to 20%, and false positives to <5%.
So, I have 1000 supplements, 900 of which are useless, and 100 of which work. Of the 900 useless ones, I will incorrectly believe that 45 are useful (5% false positives). From the 100 useful ones, I will get 20 false negatives (I’ll “lose” 20% of my true positives, and retain 80 of them).
So although only 100 of my hypothesis are really true, I will publish articles saying that 125 are true. Of these 125 positive results, 45 are false positives, so only 80 of my 125 published articles (64%) are correct.
Now, assuming 10% of my hypotheses are correct is actually quite optimistic, as is assuming I would always have funding to keep my false negative rate to 20%. If I were working in exercise physiology, it might be more like 50% – I just wouldn’t have the funding to do better. So now I have just 50 true positives against 45 false positives – I’ll publish 95 articles, and almost half of them will be incorrect.
Imagine again that only 1% of my hypotheses are correct. Maybe I’m trying to identify which of 1,000 bacteria in our guts contribute to a certain outcome, gastric cancer for instance. I have to test all of them, but only ten of them are really to blame. My lab facilities are good and we’ve got some cash, so I can keep my false negatives down to 10%. Things are looking up! But wait. Even now, one of my little guys will slip under the radar (a false negative), and 50 innocent bugs will be falsely accused! I will publish articles claiming 59 species cause gastric cancer – and a mere 15% of my publications will be correct.
Now imagine I was looking for the one gene in ten thousand which caused a particular disease. We are in big trouble here.