What Is Correlation?
The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two continuous variables. It ranges from -1 (perfect negative) to +1 (perfect positive), with 0 indicating no linear relationship. It is one of the most widely used statistical measures in research, analytics, and data science.
Correlation helps identify whether changes in one variable are associated with changes in another. It forms the basis for regression analysis and predictive modeling. However, it only captures linear relationships and can miss non-linear patterns entirely.
Pearson Formula
Interpreting r
| |r| Range | Strength | Example |
|---|---|---|
| 0.9 - 1.0 | Very Strong | Height and weight |
| 0.7 - 0.9 | Strong | Study hours vs scores |
| 0.5 - 0.7 | Moderate | Income vs happiness |
| 0.3 - 0.5 | Weak | Coffee vs productivity |
| 0.0 - 0.3 | Very Weak | Shoe size vs IQ |
Correlation vs Causation
A strong correlation does not imply causation. Ice cream sales and drowning deaths are positively correlated because both increase in hot weather. Always consider confounding variables and use experimental designs to establish causation.
Frequently Asked Questions
What is R-squared?
R-squared (r²) is the proportion of variance in Y explained by the linear relationship with X. An r² of 0.85 means 85% of Y variability is explained by X.
When should I use Spearman instead?
Use Spearman rank correlation for ordinal data, non-normal distributions, or monotonic non-linear relationships. It measures whether ranks move together rather than raw values.
Can r be 0 even if data is related?
Yes. Pearson r only measures linear relationships. A perfect quadratic or circular pattern can give r = 0. Always plot your data before interpreting correlation coefficients.