Residual Calculator

Calculate the residual (error) between observed and predicted values. Residuals measure how far data points fall from the regression line.

What Is a Residual?
Residual Formula
Types of Residuals
Residual Analysis
Frequently Asked Questions

What Is a Residual?

A residual is the difference between an observed value and its predicted (fitted) value from a statistical model. In regression analysis, residuals indicate how well the model fits individual data points. A positive residual means the model underestimated the actual value, while a negative residual means the model overestimated it.

Residuals are fundamental to regression diagnostics. By examining the pattern of residuals, statisticians can assess whether the assumptions of a regression model (linearity, constant variance, normality of errors) are reasonable. Systematic patterns in residual plots indicate model inadequacy and suggest that a different model form may be needed.

Residual Formula

Residual = Observed Value - Predicted Value = y - ŷ

Sum of Squared Residuals (SSR) = Σ(y_i - ŷ_i)²

Types of Residuals

Type	Formula	Use
Raw Residual	y - ŷ	Basic error measurement
Standardized	(y - ŷ) / s	Outlier detection
Studentized	(y - ŷ) / (s × √(1-h))	Influence analysis
Pearson	(y - ŷ) / √ŷ	GLM diagnostics

Residual Analysis

Random scatter: Residuals should show no systematic pattern when plotted against predicted values. Random scatter confirms the model assumptions are met.
Normality: Residuals should be approximately normally distributed, which can be checked using a Q-Q plot or the Shapiro-Wilk test.
Homoscedasticity: The variance of residuals should be constant across all levels of the predictor variable. A funnel-shaped pattern indicates heteroscedasticity.
Independence: Residuals should not be correlated with each other, which is particularly important in time series data.

Frequently Asked Questions

What does a negative residual mean?

A negative residual means the observed value is less than the predicted value. The model overestimated the actual outcome. In a scatterplot, this data point falls below the regression line.

Should residuals sum to zero?

In ordinary least squares (OLS) regression with an intercept term, the sum of residuals always equals zero. This is a mathematical property of the least squares solution, not something that needs to be verified.

How do I identify outliers using residuals?

Standardized residuals with absolute values greater than 2 are potential outliers, and those greater than 3 are strong outliers. However, context matters, and statistical tests like Cook's distance provide more robust outlier detection.

Table of Contents