Table of Contents
What Is a Residual?
A residual is the difference between an observed value and its predicted (fitted) value from a statistical model. In regression analysis, residuals indicate how well the model fits individual data points. A positive residual means the model underestimated the actual value, while a negative residual means the model overestimated it.
Residuals are fundamental to regression diagnostics. By examining the pattern of residuals, statisticians can assess whether the assumptions of a regression model (linearity, constant variance, normality of errors) are reasonable. Systematic patterns in residual plots indicate model inadequacy and suggest that a different model form may be needed.
Residual Formula
Types of Residuals
| Type | Formula | Use |
|---|---|---|
| Raw Residual | y - ŷ | Basic error measurement |
| Standardized | (y - ŷ) / s | Outlier detection |
| Studentized | (y - ŷ) / (s × √(1-h)) | Influence analysis |
| Pearson | (y - ŷ) / √ŷ | GLM diagnostics |
Residual Analysis
- Random scatter: Residuals should show no systematic pattern when plotted against predicted values. Random scatter confirms the model assumptions are met.
- Normality: Residuals should be approximately normally distributed, which can be checked using a Q-Q plot or the Shapiro-Wilk test.
- Homoscedasticity: The variance of residuals should be constant across all levels of the predictor variable. A funnel-shaped pattern indicates heteroscedasticity.
- Independence: Residuals should not be correlated with each other, which is particularly important in time series data.
Frequently Asked Questions
What does a negative residual mean?
A negative residual means the observed value is less than the predicted value. The model overestimated the actual outcome. In a scatterplot, this data point falls below the regression line.
Should residuals sum to zero?
In ordinary least squares (OLS) regression with an intercept term, the sum of residuals always equals zero. This is a mathematical property of the least squares solution, not something that needs to be verified.
How do I identify outliers using residuals?
Standardized residuals with absolute values greater than 2 are potential outliers, and those greater than 3 are strong outliers. However, context matters, and statistical tests like Cook's distance provide more robust outlier detection.