Part 2: How to Use Correlation

Correlation Does Not Imply Causation, So What Does?

This video is part of the Correlation Does Not Imply Causation, So What Does series, presented by Jim Colton, Lead Statistical Consultant at GraphPad.

Transcript:

So what are correlations good for?And then I'll get into how to establish cause and effect.Because I've just torn apart correlations pret ...

This video is part of the Correlation Does Not Imply Causation, So What Does series, presented by Jim Colton, Lead Statistical Consultant at GraphPad.

Transcript:

So what are correlations good for?

And then I'll get into how to establish cause and effect.

Because I've just torn apart correlations pretty good. And I might say some negative things going forward still, but for the most part I'm going to talk about the good side to correlations. They're good at coming up with hypotheses to formally test. In other words you see a correlation, it's not necessarily cause and effect, but you might explore it further. It's a signal that says, hey look here.

Actually, over the last I'd say maybe about 10 years, anyone who's known I'm a statistician has wondered if I've watched the movie Moneyball, I actually watched two thirds of it this past weekend for the first time. And I didn't make it all the way through. But it kind of motivated me to put an example in related to that.

And so here what I have is all the 2008 Major League batting statistics for each team. The total home runs and doubles and on base percentage. And I've got this big correlation matrix and no one can read anything off this really individually. But, well, the dark blue is a positive correlation, a strong positive. And the dark red is a strong negative. And one thing you can do with correlations, and I think they did this in the movie, they just looked at a lot of things and said, hey, I think on base percentage is what they were focusing on, and actually is a very dark blue one.

I don't think I have my laser, let me see. Oh yeah, I do. It's this one right here. I'm sorry, OBP. So that's a pretty dark blue.

That's a very strong correlation with win-loss percentage. That's what they care about the most, the W-L. So you can look at all these variables, all of these combinations and look at the correlations and say, hey, let's pick off a few of these and study them further, okay.

We don't know that on base percentage is causing the wins or losses, but hey, it's a signal. Let's look at that signal and study it further.

You want to be careful though. I saw this in the USA Today newspaper about five years ago, there was a statement in there about, hey, NFL teams that pass more tend to lose, which is true. Teams who pass more tend to lose, but it doesn't mean you shouldn't pass. So think about why that is and we'll talk at the end.

So another use of correlation is for predictions. Just because it's not cause and effect doesn't mean you can't predict with it. You can predict fantastically just on correlations. For example, here's an example, over expression of ANLN, I've seen a lot of that in the literature correlated to a variety of cancers getting worse. The higher the ANLN, the worse the cancer, all right. And here's a graph that just shows that the blue is low amounts of ANLN, and the green is high amounts, and then the survival is on the Y. So you can see the survival rate for the green line is pretty poor, dropping fast. And that's, again, high expression and the blue is low expression, so the survival rate isn't dropping as fast.

I don't care if that's cause and effect or not. It's a great predictor of survival rate. So correlations can be used for prediction. A lot of people don't realize that, even if they're not cause and effect.

Show more