Polynomial Regression Calculator
Introduction: What this polynomial regression calculator does
This calculator fits a polynomial curve to your data using least squares regression. You provide paired x and y values and choose the polynomial degree (for example, linear, quadratic, or cubic). The tool then computes the polynomial coefficients, the predicted values for each x, and the residuals (the differences between the observed and predicted values).
Use it to explore curved relationships between variables, build simple predictive models, or compare how different polynomial degrees fit the same data.
How to enter your data
- x values: Enter your independent variable values as a comma-separated list, for example:
0, 1, 2, 3, 4. - y values: Enter the corresponding dependent variable values, also comma-separated, for example:
1.0, 2.1, 3.9, 6.2, 8.1. - Matching lengths: The number of
xandyvalues must be the same because each pair forms one data point. - Degree: Choose an integer polynomial degree between 1 and 5. Degree 1 is a straight line, degree 2 is a quadratic curve, degree 3 is cubic, and so on.
After you click the button to compute the fit, the calculator solves a least squares problem and displays the resulting polynomial, together with predictions and residuals for each data point.
Polynomial regression in a nutshell
Polynomial regression generalizes straight-line (linear) regression by allowing the model to include powers of x. Instead of fitting
y ≈ b₀ + b₁ x
we allow higher powers of x up to some degree d:
y ≈ a₀ + a₁ x + a₂ x² + ⋯ + a_d x^d.
In compact mathematical notation, the fitted polynomial is
P(x) = Σ (from j = 0 to d) a_j x^j,
where the coefficients a₀, a₁, …, a_d are chosen to make the curve follow the pattern of your data as closely as possible.
Least squares objective
Suppose you have n observations (x₁, y₁), …, (xₙ, yₙ). For a given set of coefficients, the prediction at xᵢ is P(xᵢ) and the residual is
rᵢ = yᵢ − P(xᵢ).
Least squares regression chooses the coefficients that minimize the sum of squared residuals:
S(a₀, …, a_d) = Σ (from i = 1 to n) (yᵢ − P(xᵢ))².
Squaring the residuals emphasizes larger errors and makes the optimization problem smooth and easier to solve with linear algebra.
Vandermonde design matrix
The calculator constructs a design matrix (often called a Vandermonde matrix) that contains powers of each x value. For a polynomial of degree d, each row of the matrix corresponds to one data point:
[ 1 x₁ x₁² … x₁^d ] [ 1 x₂ x₂² … x₂^d ] [ ⋮ ⋮ ⋮ ⋮ ] [ 1 xₙ xₙ² … xₙ^d ]
If we denote this matrix by X, the coefficient vector by a, and the vector of observed y-values by y, we can write the model compactly as
y ≈ X a.
Normal equations
The least squares solution is found by solving the normal equations
(Xᵀ X) a = Xᵀ y.
Here Xᵀ is the transpose of X. The calculator forms these matrices and solves the resulting linear system to obtain the coefficients.
Formula: Normal equations in MathML
The same relationship can be expressed using MathML as
Solving this equation gives the coefficient vector a, which defines the fitted polynomial.
Interpreting the output
Once you compute the fit, you will typically see three main pieces of information:
- Polynomial coefficients: The numbers
a₀, a₁, …, a_dthat define the fitted curve. - Predicted values: For each input
xᵢ, the calculator showsP(xᵢ), the corresponding value on the fitted polynomial. - Residuals: For each data point, the residual
rᵢ = yᵢ − P(xᵢ)measures how far the fitted curve is from the observed value.
You can use these outputs to judge how well the polynomial captures the trend in the data. Residuals that are small in magnitude and show no clear pattern when plotted against x or P(x) indicate a reasonably good model for the chosen degree.
Worked example
To see how the calculator behaves, consider the following simple dataset. Suppose we enter
xvalues:0, 1, 2, 3yvalues:1, 2, 5, 10- Polynomial degree:
2(quadratic)
The calculator will construct the Vandermonde matrix
X = [ 1 0 0² ] [ 1 1 1² ] [ 1 2 2² ] [ 1 3 3² ] = [ 1 0 0 ] [ 1 1 1 ] [ 1 2 4 ] [ 1 3 9 ]
and then solve the normal equations to find coefficients (rounded here for illustration)
a₀ ≈ 0.9, a₁ ≈ 0.1, a₂ ≈ 0.9.
The fitted polynomial is therefore approximately
P(x) ≈ 0.9 + 0.1 x + 0.9 x².
The calculator then evaluates this polynomial at each input value:
- At
x = 0,P(0) ≈ 0.9, residual≈ 1 − 0.9 = 0.1. - At
x = 1,P(1) ≈ 1.9, residual≈ 2 − 1.9 = 0.1. - At
x = 2,P(2) ≈ 4.1, residual≈ 5 − 4.1 = 0.9. - At
x = 3,P(3) ≈ 8.9, residual≈ 10 − 8.9 = 1.1.
A table summarizing the results might look like:
| x | Observed y | Predicted P(x) | Residual y − P(x) |
|---|---|---|---|
| 0 | 1.0 | 0.9 | 0.1 |
| 1 | 2.0 | 1.9 | 0.1 |
| 2 | 5.0 | 4.1 | 0.9 |
| 3 | 10.0 | 8.9 | 1.1 |
This example is intentionally simple and uses rounded numbers, but it illustrates what the calculator is doing for any dataset you provide.
Choosing the polynomial degree: comparison
The degree you choose controls how flexible the curve is. Lower degrees are simpler but may miss subtle curvature; higher degrees can follow the data more closely but risk overfitting noise. The table below summarizes typical behavior.
| Degree | Model form | Flexibility | Typical use |
|---|---|---|---|
| 1 | a₀ + a₁ x |
Low | Approximately linear trends |
| 2 | a₀ + a₁ x + a₂ x² |
Moderate | Single bend (U-shaped or inverted U curves) |
| 3 | a₀ + a₁ x + a₂ x² + a₃ x³ |
Higher | More complex curvature with up to two bends |
| 4–5 | Includes terms up to x⁴ or x⁵ |
High | Exploratory fitting on small to medium datasets; can capture wiggly patterns but more prone to overfitting |
As a rule of thumb, start with a low degree and increase it only if the residuals suggest systematic curvature that a simpler model cannot explain.
Assumptions and limitations
This calculator is designed for small to moderate datasets and low polynomial degrees. Keep the following assumptions and limitations in mind when interpreting the results:
- Polynomial relationship: The method assumes that a polynomial is a reasonable approximation for the relationship between
xandyover the range of your data. If the true relationship is very non-polynomial (for example, highly periodic or discontinuous), the fit may be misleading. - Enough data points: You need at least
degree + 1distinct data points to estimate a polynomial of a given degree at all. In practice, it is better to have significantly more thandegree + 1points to avoid overfitting and numerical instability. - Distinct x values: The underlying linear algebra requires that the design matrix have full column rank. This is usually satisfied if the
x-values are not all the same and the degree is not too close to the number of data points. - Sensitivity to outliers: Because least squares minimizes squared residuals, large errors carry a lot of weight. A few extreme outliers can strongly influence the fitted polynomial.
- Numerical stability: Solving normal equations for higher-degree polynomials or for
x-values that are very large, very small, or tightly clustered can lead to numerical issues. For best results, keep degrees modest (up to 5, as supported here) and avoid extreme scaling of the input variable when possible. - Extrapolation risk: The polynomial may behave unpredictably outside the range of your observed
x-values. Use caution when using the fitted model to predict far beyond the data you supplied. - Not a full statistical analysis: The tool focuses on computing the least squares fit and basic diagnostics like residuals. It does not provide confidence intervals, hypothesis tests, or formal model comparison statistics.
Within these limits, polynomial regression is a powerful, easy-to-apply tool for exploring nonlinear trends and building simple predictive models. Use the residuals and your domain knowledge to decide whether the chosen polynomial degree gives a fit that is both accurate and interpretable.
How to use this calculator
- Enter x values (comma-separated) using the unit or time period shown by the field.
- Enter y values (comma-separated) using the unit or time period shown by the field.
- Enter degree using the unit or time period shown by the field.
- Run the calculation and compare the output with a second scenario before acting on it.
Arcade Mini-Game: Polynomial Regression Calculator Calibration Run
Use this quick arcade run to practice separating useful scenario inputs from common planning mistakes before you rely on the calculator output.
Start the game, then use your pointer or arrow keys to catch useful inputs and avoid bad assumptions.
