Polynomial Regression Calculator

JJ Ben-Joseph headshot JJ Ben-Joseph

What this polynomial regression calculator does

This calculator fits a polynomial curve to your data using least squares regression. You provide paired x and y values and choose the polynomial degree (for example, linear, quadratic, or cubic). The tool then computes the polynomial coefficients, the predicted values for each x, and the residuals (the differences between the observed and predicted values).

Use it to explore curved relationships between variables, build simple predictive models, or compare how different polynomial degrees fit the same data.

How to enter your data

  • x values: Enter your independent variable values as a comma-separated list, for example: 0, 1, 2, 3, 4.
  • y values: Enter the corresponding dependent variable values, also comma-separated, for example: 1.0, 2.1, 3.9, 6.2, 8.1.
  • Matching lengths: The number of x and y values must be the same because each pair forms one data point.
  • Degree: Choose an integer polynomial degree between 1 and 5. Degree 1 is a straight line, degree 2 is a quadratic curve, degree 3 is cubic, and so on.

After you click the button to compute the fit, the calculator solves a least squares problem and displays the resulting polynomial, together with predictions and residuals for each data point.

Polynomial regression in a nutshell

Polynomial regression generalizes straight-line (linear) regression by allowing the model to include powers of x. Instead of fitting

y ≈ b₀ + b₁ x

we allow higher powers of x up to some degree d:

y ≈ a₀ + a₁ x + a₂ x² + ⋯ + a_d x^d.

In compact mathematical notation, the fitted polynomial is

P(x) = Σ (from j = 0 to d) a_j x^j,

where the coefficients a₀, a₁, …, a_d are chosen to make the curve follow the pattern of your data as closely as possible.

Least squares objective

Suppose you have n observations (x₁, y₁), …, (xₙ, yₙ). For a given set of coefficients, the prediction at xᵢ is P(xᵢ) and the residual is

rᵢ = yᵢ − P(xᵢ).

Least squares regression chooses the coefficients that minimize the sum of squared residuals:

S(a₀, …, a_d) = Σ (from i = 1 to n) (yᵢ − P(xᵢ))².

Squaring the residuals emphasizes larger errors and makes the optimization problem smooth and easier to solve with linear algebra.

Vandermonde design matrix

The calculator constructs a design matrix (often called a Vandermonde matrix) that contains powers of each x value. For a polynomial of degree d, each row of the matrix corresponds to one data point:

[ 1   x₁   x₁²   …   x₁^d ]
[ 1   x₂   x₂²   …   x₂^d ]
[ ⋮    ⋮    ⋮         ⋮   ]
[ 1   xₙ   xₙ²   …   xₙ^d ]
  

If we denote this matrix by X, the coefficient vector by a, and the vector of observed y-values by y, we can write the model compactly as

y ≈ X a.

Normal equations

The least squares solution is found by solving the normal equations

(Xᵀ X) a = Xᵀ y.

Here Xᵀ is the transpose of X. The calculator forms these matrices and solves the resulting linear system to obtain the coefficients.

Normal equations in MathML

The same relationship can be expressed using MathML as

( X T X ) a = X T y

Solving this equation gives the coefficient vector a, which defines the fitted polynomial.

Interpreting the output

Once you compute the fit, you will typically see three main pieces of information:

  • Polynomial coefficients: The numbers a₀, a₁, …, a_d that define the fitted curve.
  • Predicted values: For each input xᵢ, the calculator shows P(xᵢ), the corresponding value on the fitted polynomial.
  • Residuals: For each data point, the residual rᵢ = yᵢ − P(xᵢ) measures how far the fitted curve is from the observed value.

You can use these outputs to judge how well the polynomial captures the trend in the data. Residuals that are small in magnitude and show no clear pattern when plotted against x or P(x) indicate a reasonably good model for the chosen degree.

Worked example

To see how the calculator behaves, consider the following simple dataset. Suppose we enter

  • x values: 0, 1, 2, 3
  • y values: 1, 2, 5, 10
  • Polynomial degree: 2 (quadratic)

The calculator will construct the Vandermonde matrix

X =
[ 1   0   0² ]
[ 1   1   1² ]
[ 1   2   2² ]
[ 1   3   3² ]
  =
[ 1   0   0 ]
[ 1   1   1 ]
[ 1   2   4 ]
[ 1   3   9 ]
  

and then solve the normal equations to find coefficients (rounded here for illustration)

a₀ ≈ 0.9, a₁ ≈ 0.1, a₂ ≈ 0.9.

The fitted polynomial is therefore approximately

P(x) ≈ 0.9 + 0.1 x + 0.9 x².

The calculator then evaluates this polynomial at each input value:

  • At x = 0, P(0) ≈ 0.9, residual ≈ 1 − 0.9 = 0.1.
  • At x = 1, P(1) ≈ 1.9, residual ≈ 2 − 1.9 = 0.1.
  • At x = 2, P(2) ≈ 4.1, residual ≈ 5 − 4.1 = 0.9.
  • At x = 3, P(3) ≈ 8.9, residual ≈ 10 − 8.9 = 1.1.

A table summarizing the results might look like:

x Observed y Predicted P(x) Residual y − P(x)
0 1.0 0.9 0.1
1 2.0 1.9 0.1
2 5.0 4.1 0.9
3 10.0 8.9 1.1

This example is intentionally simple and uses rounded numbers, but it illustrates what the calculator is doing for any dataset you provide.

Choosing the polynomial degree: comparison

The degree you choose controls how flexible the curve is. Lower degrees are simpler but may miss subtle curvature; higher degrees can follow the data more closely but risk overfitting noise. The table below summarizes typical behavior.

Degree Model form Flexibility Typical use
1 a₀ + a₁ x Low Approximately linear trends
2 a₀ + a₁ x + a₂ x² Moderate Single bend (U-shaped or inverted U curves)
3 a₀ + a₁ x + a₂ x² + a₃ x³ Higher More complex curvature with up to two bends
4–5 Includes terms up to x⁴ or x⁵ High Exploratory fitting on small to medium datasets; can capture wiggly patterns but more prone to overfitting

As a rule of thumb, start with a low degree and increase it only if the residuals suggest systematic curvature that a simpler model cannot explain.

Assumptions and limitations

This calculator is designed for small to moderate datasets and low polynomial degrees. Keep the following assumptions and limitations in mind when interpreting the results:

  • Polynomial relationship: The method assumes that a polynomial is a reasonable approximation for the relationship between x and y over the range of your data. If the true relationship is very non-polynomial (for example, highly periodic or discontinuous), the fit may be misleading.
  • Enough data points: You need at least degree + 1 distinct data points to estimate a polynomial of a given degree at all. In practice, it is better to have significantly more than degree + 1 points to avoid overfitting and numerical instability.
  • Distinct x values: The underlying linear algebra requires that the design matrix have full column rank. This is usually satisfied if the x-values are not all the same and the degree is not too close to the number of data points.
  • Sensitivity to outliers: Because least squares minimizes squared residuals, large errors carry a lot of weight. A few extreme outliers can strongly influence the fitted polynomial.
  • Numerical stability: Solving normal equations for higher-degree polynomials or for x-values that are very large, very small, or tightly clustered can lead to numerical issues. For best results, keep degrees modest (up to 5, as supported here) and avoid extreme scaling of the input variable when possible.
  • Extrapolation risk: The polynomial may behave unpredictably outside the range of your observed x-values. Use caution when using the fitted model to predict far beyond the data you supplied.
  • Not a full statistical analysis: The tool focuses on computing the least squares fit and basic diagnostics like residuals. It does not provide confidence intervals, hypothesis tests, or formal model comparison statistics.

Within these limits, polynomial regression is a powerful, easy-to-apply tool for exploring nonlinear trends and building simple predictive models. Use the residuals and your domain knowledge to decide whether the chosen polynomial degree gives a fit that is both accurate and interpretable.

Enter data and degree.

Embed this calculator

Copy and paste the HTML below to add the Polynomial Regression Calculator - Fit Data with Least Squares to your website.