What Is Pearson Correlation?
Pearson's correlation coefficient (r) quantifies the strength and direction of the linear relationship between two continuous variables. It ranges from -1 (perfect negative correlation) through 0 (no correlation) to +1 (perfect positive correlation).
Developed by Karl Pearson in the 1890s, this statistic is one of the most widely used measures of association in science, medicine, psychology, and business analytics. It is particularly useful when both variables are normally distributed and their relationship is expected to be linear.
Formula
Interpreting r Values
| |r| Range | Strength | Example |
|---|---|---|
| 0.00 - 0.19 | Very weak | Shoe size vs. IQ |
| 0.20 - 0.39 | Weak | Income vs. happiness |
| 0.40 - 0.59 | Moderate | Study hours vs. GPA |
| 0.60 - 0.79 | Strong | Height vs. weight |
| 0.80 - 1.00 | Very strong | Temperature C vs. F |
Assumptions
- Both variables should be continuous and measured on interval or ratio scales.
- The relationship between variables should be approximately linear.
- Both variables should be roughly normally distributed.
- Data should be free from significant outliers that could distort the result.
- Observations should be independent of each other.
Frequently Asked Questions
Does correlation imply causation?
No. A high Pearson r only indicates that two variables move together linearly. It does not prove that one causes the other. Confounding variables, reverse causation, or coincidence may explain the observed correlation.
What is R-squared?
R-squared (r²) is the coefficient of determination. It represents the proportion of variance in one variable explained by the other. For example, r = 0.8 means r² = 0.64, indicating 64% of variance is shared.
When should I use Spearman instead?
Use Spearman's rank correlation when your data is ordinal, contains outliers, or the relationship is monotonic but not necessarily linear. Pearson assumes linearity and continuous data.