Taken together, headlines like these might suggest that science is a shady enterprise that spits out a bunch of dressed-up nonsense. But I’ve spent months investigating the problems hounding science, and I’ve learned that the headline-grabbing cases of misconduct and fraud are mere distractions. The state of our science is strong, but it’s plagued by a universal problem: Science is hard — really fucking hard.
If we’re going to rely on science as a means for reaching the truth — and it’s still the best tool we have — it’s important that we understand and respect just how difficult it is to get a rigorous result. I could pontificate about all the reasons why science is arduous, but instead I’m going to let you experience one of them for yourself. Welcome to the wild world of p-hacking.
A very important piece at 538.com on p-values and the likely prevalence of p-hacking.
The p-value reveals almost nothing about the strength of the evidence, yet a p-value of 0.05 has become the ticket to get into many journals. “The dominant method used [to evaluate evidence] is the p-value,” said Michael Evans, a statistician at the University of Toronto, “and the p-value is well known not to work very well.”
But that doesn’t mean researchers are a bunch of hucksters, a la LaCour. What it means is that they’re human. P-hacking and similar types of manipulations often arise from human biases. “You can do it in unconscious ways — I’ve done it in unconscious ways,” Simonsohn said. “You really believe your hypothesis and you get the data and there’s ambiguity about how to analyze it.” When the first analysis you try doesn’t spit out the result you want, you keep trying until you find one that does. (And if that doesn’t work, you can always fall back on HARKing — hypothesizing after the results are known.)
The larger lessons apply not just to science. Journalism is hard, especially investigative journalism. You can spend months reporting a piece only to find no real striking narrative, no clear conclusions of note. And yet, if you have to fill a certain number of pages every day...
In tech, really successful and/or counterintuitive A/B test results are passed around like koans. However, anyone who has done enough A/B testing in the tech world knows that most experiments show no statistically significant results. To design a test that won't show the obvious and that will reveal some hidden truth is not easy.
All data suggests most of us should hold our unproven beliefs more loosely than we're inclined to. Who first came up with the saying “Strong opinions, weakly held” (sometimes “loosely” is substituted). Most of us are good at the first half, not so good at the second, a dangerous combination when it turns out that truth is low yield.
Some things might help. One is something of a reference that is a collection of links to all studies that have tried to answer a particular question along with a summary of the current state of thinking. For example, does drinking a glass of red wine a day improve your health? Why are Americans obese? Does eating a multivitamin every day really do anything for your health? What's the best exercise to improve core strength? And so on. Imagine something like the genetic offspring of Vox, Wikipedia, and Richard Feynman.
Another is something like Github but for research data from all these studies. 538's small experiment widget in this piece was a simplified example of the type of tool that might enable more people to get experience and a deeper understanding of the craft of designing studies and the slippery nature of truth. Also, the more people that can analyze a data set, the greater the likelihood that biases of different types balance each other out and that mistakes are caught. Strong hypotheses can often lead one to control for the very variable that explains a result.
The web is so sprawling, information so infinite now, we need more structured ways to traverse it intelligibly. It's no coincidence one of the words that's entered our vocabulary this past year is “explainer” (here is an explainer on the term explainer). We have so much flow, we need more stock.