What Is Polynomial Regression?
Polynomial regression is a form of regression analysis that models the relationship between an independent variable x and a dependent variable y as an nth-degree polynomial. Unlike linear regression which fits a straight line, polynomial regression can capture curved relationships in the data.
This technique is widely used in engineering, economics, and natural sciences where the relationship between variables is nonlinear. A quadratic polynomial (degree 2) creates a parabola, while a cubic polynomial (degree 3) can model S-shaped curves and inflection points.
Formula
Coefficients are found by minimizing the sum of squared residuals using the normal equations or matrix algebra: (X'X)a = X'Y.
Choosing the Degree
| Degree | Type | Shape | Min Points |
|---|---|---|---|
| 1 | Linear | Straight line | 2 |
| 2 | Quadratic | Parabola (U or inverted U) | 3 |
| 3 | Cubic | S-curve with inflection | 4 |
| 4 | Quartic | W or M shape | 5 |
Overfitting Warning
- Higher degrees always improve fit on training data but may overfit.
- Use R-squared and adjusted R-squared to evaluate model quality.
- A degree n-1 polynomial perfectly fits n points but has no predictive power.
- Generally keep the degree at 2 or 3 unless you have strong theoretical reasons for higher.
Frequently Asked Questions
How is this different from linear regression?
Linear regression fits y = ax + b (a straight line). Polynomial regression fits y = ax^n + ... + c (a curve). Polynomial regression includes linear regression as the special case where degree = 1.
What is R-squared?
R-squared measures how well the model explains the variance in your data. A value of 1.0 means a perfect fit; 0 means the model explains nothing. R-squared always increases with higher degree, which is why adjusted R-squared is also used.
How many data points do I need?
You need at least (degree + 1) data points to fit a polynomial. For reliable results, use significantly more data points than the polynomial degree, ideally at least 3 times more.