Where I am, I get news stories every day about some new medical “breakthrough” or discovery. It might be on the television news program, or the newspaper, or a magazine. And these are what are called legitimate news sources! Online there are always posts about some amazing trick doctors don’t want you to know (just why they don’t want you to know is never really explained), or some amazing new diet breakthrough that lets you eat as much as you want of anything you like and still lose weight. If you actually believe any of that I would like to interest you in a bridge I have for sale. But even the legitimate news sources have a problem, which is that they have a “news hole” to fill every day, and stories about health and medicine are popular. The problem is that making these stories sound exciting almost always means at the very least over-stating the results, and may mean hyping a result that does not exist.
To give you some idea of how bad this is, I will reference an article from JournalistsResource.org called “Covering health research? Choose your studies (and words) wisely”. This article is very enlightening if you have never looked closely at this issue. In this article they cover the results of a study by Noah Haber et al, published in PLOS|One.
You can go to the original PLOS|One paper if you like, but if you are not used to reading academic papers I think the JournalistsResource.org article is much more accessible.
Haber et al enlisted 21 reviewers, all of whom had at least a Masters degree, and a majority of them had enrolled in or completed a doctoral program. They in turn looked at 64 articles that were among the most-shared on Facebook and Twitter, and then at the 50 studies that were the basis for these stories.
The first issue they dealt with was causality. To say in a scientific study that “A causes B” requires some pretty strict, high-quality evidence. You may have heard the truism that correlation is not causation, which is true. For example, a study of food and drug use can show that drinking milk as a child causes opiod addiction. After all, the addicts all consumed milk as children. In reality, though, that is not the case, and no reputable scientific study would claim that. (There is an opposite error, which is when someone used “Correlation is not causation” to dismiss any evidence they don’t like. This is an error because every causation relationship starts with a correlation, by definition.)
In Haber’s study, they felt that the claims in many papers were stronger than the evidence really supported. They said that a third of the papers they reviewed made claims that the data could not support. And often the language used is a bit weasel-worded, like saying that there is an “association” between A and B, which is just another term for a correlation that may or may not mean anything. It is not technically wrong to say that, but what happens when a journalist gets that paper? Will the journalist be as careful as they should be? In many cases, no. But they also warn against dismissing all “associational” studies. Now, the article in JournalistsResource.org is aimed at journalists and wants to encourage better studies, so they say to check with the author of a paper and ask them straight out if what they found is causal. For you and I that might not be an option (though there is no law against it as far as I know), but it is one way to get a handle on something.
Peer Review
Next, you want to consider the peer review process. The best quality research goes through peer review before it is published, which means that scientists who are in the field have read the paper, examined the methods employed, and looked at the conclusions to determine if the appropriate standards have been met. In many cases, the reviewer will raise questions, or even suggest additional work be done before a paper meets the standards for publishing. This is certainly the process for the major journals, but lately there has been a push to open up the process. Many of the major journals are very expensive, and can delay publication by a year or more. In the age of the Internet that is seen as unnecessary and a bit elitist, so many researchers have taken to publishing their papers online. PLOS|One, in fact, is an online journal that incorporates peer review and it is part of a family (Public Library of Science) of such journals that focus on Biology, Medicine, and Life Sciences.
In other sciences, there is something called ArXiv (pronounced “Archive”)
This focuses on what are called “pre-print” articles that may later be published in a traditional journal, though I think ArXiv is starting to have some status on its own. There is moderation, but no technical peer review, for these articles.
Statistical Significance
Medical and biological statistics is complex. People get PhDs in this stuff, and it is regarded as one of the more difficult ones to get. And I am not one of the people who has done this. I have, however, taught Statistics at the University level, so I feel I can offer some basic guidance here. If a study is well-done, there will be a test of significance that determines whether or not you have a real result. Generally, the way you should proceed is to state a hypothesis up front (e.g. eating breakfast will raise a child’s grades) Then you gather the data that can test this hypothesis. Ideally, you have a study that has a study group (children getting breakfast), a control group (children who do not get breakfast), and right away you see how tricky this is. Who on earth is going to make a bunch of children go without breakfast? I can just picture the politicians holding hearings on that one!
But, once the data is gathered, you do a test. The way this is done may be a little counter-intuitive, but it works like this:
- Your hypothesis is that eating breakfast results in better grades.
- You therefore have what is called the Null Hypothesis, which is that eating breakfast does *not* improve grades.
- You employ a statistical test. In this case, let’s suppose it a t-test. And you choose a level of significance, and in the vast majority of cases it will be .05.
- You compute a test statistic from the data, and compare that to your table of t-statistics.
- You may well have data that looks like eating breakfast improves grades, but you want to be on guard against any random chance, so if the probability of getting that result due to pure chance from a population where there is no effect of breakfast is .05 or more, you “Fail to reject the Null hypothesis”. In other words, you did not find a statistically significant improvement in student grades.
Now, there are some interesting consequences of this. First, by definition, a certain percentage of the time you will reach the wrong conclusion. Statisticians refer to this as Type I and Type II error, but you may know them better as False Positive and False Negative. In this case, we assume t-test with a significance level of .05. We will fail to reject the Null Hypothesis if we get a p-value (probability) of over .05, or 5%. Well, given randomness, that means we will be wrong in our conclusion 5% of the time, or one time out of 20, even if the research was done 100% properly by good researchers who do not make any mistake.
Now, the proper conclusion to all of this is *not*, as some might have it, that nobody knows anything, so do whatever you feel like. We have made great strides in medicine in the last few decades, such as many diseases that were once automatic death sentences (such as many forms of cancer) but which now can be managed or even cured. We do have a big problem, though, with misplaced cynicism and distrust that leads to insane ideas like the one that vaccinations are a bad thing. The only way to reliably avoid such things is to ground our thinking in science, but that in turn means understanding how science works and how we should interpret the results we get.
Listen to the audio version of this post on Hacker Public Radio!