Table of Contents
What Is Sum of Squares?
The sum of squares (SS) is the sum of the squared deviations of each data point from the mean. It is a fundamental quantity in statistics that measures the total variability in a dataset. Sum of squares is the basis for calculating variance, standard deviation, and is central to ANOVA (Analysis of Variance) and regression analysis.
In ANOVA, the total sum of squares is partitioned into components: the sum of squares between groups (SSB), which measures variability due to group differences, and the sum of squares within groups (SSW), which measures variability within groups. The ratio of these components forms the F-statistic used to test for significant differences between group means.
Formulas
Types in ANOVA
| Type | Formula | Measures |
|---|---|---|
| SS Total | Σ(xij - x̄)² | Total variability |
| SS Between | Σnj(x̄j - x̄)² | Between-group variability |
| SS Within | ΣΣ(xij - x̄j)² | Within-group variability |
Frequently Asked Questions
Why do we square the deviations?
Squaring serves two purposes: it eliminates negative deviations (which would cancel out positive ones) and it gives more weight to larger deviations. The sum of raw deviations from the mean is always zero by definition, so squaring is necessary to obtain a useful measure of spread.
What is the relationship between SS and R-squared?
In regression, R-squared = 1 - (SS_residual / SS_total). It represents the proportion of total variability explained by the model. An R-squared of 0.80 means the model explains 80% of the total sum of squares.