Part 1: Top 5 Reasons Correlation Does Not Imply Causation

Correlation Does Not Imply Causation, So What Does?

This video is part of the Correlation Does Not Imply Causation, So What Does series, presented by Jim Colton, Lead Statistical Consultant at GraphPad.


You know, this topic has been around for as long as I've been in statistics. I've come across this, and people don't go into a lot of det ...

This video is part of the Correlation Does Not Imply Causation, So What Does series, presented by Jim Colton, Lead Statistical Consultant at GraphPad.


You know, this topic has been around for as long as I've been in statistics. I've come across this, and people don't go into a lot of detail. Everyone seems to know about correlation does not imply causation, but it usually stops there, and maybe there's one example. But I put this together. I really wanted to take all of the thoughts I had over the years as a statistician on this topic and kind of put it down on paper. So this is my take on all the important aspects of this conversation about correlation does not imply causation. And I'm going to talk about what does imply causation, towards the end of the talk.

Before I get into the main talk though, I do want to clarify one thing. I'm going to use a definition of correlation that's different from the classic definition, because in the conversations that people have about correlation and causation, they typically are not referring to the strict definition of a correlation between -1 and 1, a value between -1 and 1 that represents the relationship between two numeric variables. That's the classic definition.

I'm going to use the English language definition, which is just a mutual relationship or connection between two or more things, because a lot of the things I'm going to correlate together are not numeric. And if you've been involved with statistics, it's going to sound a little strange, and you're going to say, "Wait a second, how can you correlate two things that are not even numeric?" But the English language definition allows us to, and that's what people usually use when they talk about this topic of correlation and causation.

All right, with that out of the way, I want to go through five reasons for a strong correlation. And a lot of times researchers are trying to prove A causes B. Unfortunately, that's only one of the five reasons for a high correlation, and the other four are gotcha moments; are fallacies in being able to say it's a causal effect, because ...

Well, the first one; a lurking or hidden variable, this is actually a very common one in medical. I'm going to focus a lot on medical examples. Having a third variable that causes both A and B to move. So the only cause and effect is between A and some third variable, and B in some third variable, but A and B happened to move together and it makes it look like A and B are in a causal relationship. And I'll give you examples of all of these, plenty of examples in a minute. This is just an overview.

Another one is a coincidence. Two things trend together over time. This isn't so common in medical datasets, but there's a lot of examples of this. I'll give a few fun ones about global warming as temperatures increase over time, or decrease. In most cases, increase. Anything that increases over time is actually correlated with that.

So another one is B causes A, and if you're trying to show A causes B; a lot of times, well, it's B causing A, but there's a high correlation, and you may be tempted to think it's A causing B.

A variation of that is you get a very high correlation and there is a cause and effect, but it's a weak cause and effect. It looks very strong, though, because you have A causing B and B causing A. And I know this is kind of vague right now, but with some examples, I think you'll see what that one's about.

The fifth reason for a strong correlation, well A causes B. And let's say that's what we're trying to show. So sometimes what you want to prove is actually the case, but in four of those cases, you have something else causing the high correlation that's not A causing B.

One of the things I'm doing here is giving you some practical, simple examples that everyone can understand. So you can share this with other people, not necessarily in statistics. Going to bed with shoes on is actually correlated to getting a headache the next day. That is a true correlation. Some people are laughing. Anyone know why? Maybe you've heard this one before. 

Hangover. Number of drinks the night before. So this is a lurking or hidden variable. The third variable are number of alcoholic drinks. In each of these cases, there's going to be some other variable causing both A and B to move.

This one's a little harder to figure out. Let's see if anyone can get this one. In early elementary school, astrological sign ... let's say first grade, for example. Astrological sign is correlated with how the first graders would do on an IQ test. But this correlation disappears by adulthood. So it's truly a correlation that gets weaker over time. As you get to the 8th and 9th and 12th grade, you won't see that correlation as much anymore.

What you have is you have the ... I think in, at least in California, if you're born in September, you're going to be one of the oldest students in the class. And age matters when you're in first grade; it doesn't matter as much as you go through the years. So that third variable, age in months is hidden or lurking. And we have to account for that to properly analyze this data if we had this data.

So a third one I'll give, and I'll then get onto sort of a medical example. The size of your hand negatively correlates with how long you live. So if you have a larger hand, you're not going to live as long as if you have a smaller hand. This one might be easier than the last, but it's not maybe so obvious. It's males and females. So females have smaller hands, they live than men who have bigger hands. But you could substitute in height and in this example, too, and work that in. That's a good one.

Okay, so those are some fun examples. Like I say, put these out there just to make it easy to remember what this issue is about with the lurking hidden variable. Let me go through a medical example.

This is, I don't know. This was quite a few years ago. This was kind of a big topic, I think on a hormone replacement therapy for women. It actually correlated pretty strongly to lower than average rates of heart disease. And so researchers were pretty optimistic that, "Hey, maybe there's something really good about this hormone replacement therapy." And it turns out that people on this therapy, and maybe it's still true today, but back 10, 20 years ago tended to be higher economic status, probably insurance maybe didn't cover it and they'd pay for out of their pocket, or you needed a great healthcare plan to cover this, so they tended to have better diet, better exercise. And it turns out that actually the opposite was true. When they ran a randomized trial, they found that hormone replacement therapy actually increased ... There was a significant increase in heart attacks, strokes, breast cancer and blood clots.

So it shows you how dangerous correlations can be when you actually have a reversal. You thought it was a good thing, and it turns out it's a bad thing. So we'll talk later, a lot about randomized clinical trials and how valuable they are. That's how you get to really cause and effect.

All right, coincidence. This one's a little bit shorter to go through. Not as medically related, but ... So if you want to stop global warming, become a pirate. What this is about, if you look at the number of pirates, it's been decreasing since probably the early 1800s, over time. And the ocean temperature has been increasing. So of course both are just trending over time. Anything that trends over time is going to be correlated with ocean temperatures, either positively or negatively. And there's just a coincidence. There's nothing that would be cause and effect about that.

So another example. I saw this on the internet, I guess this is from the Organic Trade Association; that if you look at autism over time is increasing, and organic food sales over time increasing. But, clearly, I'm pretty sure there's no relationship in there; no causal. But you know, it just shows you how you can't just look at two things that both increase over time and say one caused the other. So that's kind of the implication here.

And B causes A was the third one. So the more firefighters sent to a fire, the more damage is done. So what do we do, send fewer firefighters? Well, it's not the firefighters that are causing more damage. It's the more damage that's causing more firefighters to come.

I thought this one was funny. By the way, I didn't invent any of these. I did a lot of research to find good ones, but these are all ones I got from the internet. And so passengers who are in a rush at the airport tend to be late. So maybe if you don't rush you won't be late. Well it's clearly being late that's causing them to rush, and not the other way around.

So here's one that's a little bit more serious. And I think this is still a debate that's going on. People have noticed a correlation between drinking diet soda and obesity. People who drink diet soda tend to be overweight, but the soda companies clearly are claiming that it's people who are overweight and are maybe trying to lose, or not gain as as much, are turning to diet soda. So they're claiming it's B causes A. So a lot of these things can't get resolved that easily, and there has to be a lot of lawyers, I think, probably involved in this, but that's at least the argument.

And in general ... Here's a more general take on this topic, using a medical scenario. A lot of times, the body reacts to a disease that you have, and so there's a correlation to something in your body, some protein level or some change in your body; and, say, a tumor size. There might be some correlation there. And so you might be tempted to think that thing that's changing your body is actually causing the disease, when in reality, it's the disease causing your body to change. And it's hard to tell in just observational data, whether it is a cause and effect relationship or not.

But there are a lot of studies like this one here that show a correlation between disease severity, and this is urine PLIN2 in your body. And all the articles I found did a really good job of saying that this is a correlation and it's not cause and effect; that this one had pointed that out and there are ... I found, just in a quick search, four other ones that all talk about a correlation between some overexpression of a protein and severity of disease. And they all were all saying this is just a correlation, it's not cause and effect. So no mistakes were made by these articles, but it's something to be aware of that you have to be careful of in your research that you don't just assume that you're seeing a cause and effect relationship in one direction. It could be the other.

This one, B causes A, and A causes B. Couple examples. This is probably not as common, but drug use. A lot of people feel that people who abuse drugs tend to ... It may cause them to have more psychiatric disorders. And also, people who have psychiatric disorders tend to abuse drugs. So there may be a cause and effect in both directions. There's probably not a strong relationship in either direction, but because there's a small cause and effect in both directions, the correlation is really high. So with these examples, you're getting a really high correlation but there's a very weak cause and effect.

Another one like that is, there was a study done about people who cycle and body mass index. If you just look at raw data and correlations between people's body mass index and whether or not they cycle, there's a very strong correlation. But a lot of that ... There's probably a small effect for people cycling to have a better body mass index. But really, a lot of that correlation is people who are overweight don't tend to cycle. So it goes both ways and you probably seeing a very high correlation when there's actually a very weak cause and effect. So be aware of that; can happen.

And then finally there's A causes B, which I won't spend much time on this, because this is not really a fallacy. Alarm clock doesn't go off and you're late for work. That causes you to be late for work. If you exercise, you'll burn calories. Amount of light in a room and your pupil size, it causes the pupil size to change. Smoking and lung cancer, which is interesting because there is not really a randomized trial, but there is a lot of science behind that. So early in the day, layers were arguing, "Hey that's just a correlation. It's not cause and effect." But as more research has been done, there's good science behind it being a cause and effect.

Show more