Linear Regression Calculator

JJ Ben-Joseph headshot JJ Ben-Joseph

What this linear regression calculator does

This tool fits a straight line to your paired X and Y data using simple linear regression (ordinary least squares). It returns the slope, intercept, correlation, and R-squared, along with the equation of the best fit line so you can make predictions and understand how strongly X and Y move together.

You provide two equal-length lists of numbers (X values and Y values). The calculator then computes the line Y = slope × X + intercept that best summarizes the relationship between them, in the least squares sense.

How to enter your data

Enter your X and Y values as two lists of the same length. Each position in the X list must match the same position in the Y list (they form pairs).

  • Equal length: The X and Y lists must contain the same number of values (e.g., 10 X values and 10 Y values).
  • Allowed separators: Use commas, spaces, tabs, or line breaks between numbers; the calculator treats them all as separators.
  • Numeric values only: Remove text, headers, or symbols; keep only numbers (e.g., 3.5, -2, 10).
  • Order matters: The first X value pairs with the first Y value, the second X with the second Y, and so on.
  • Check for outliers: Extremely large or unusual values can strongly influence the fitted line.

You can copy a column directly from a spreadsheet and paste it into the X or Y box. Then paste the matching column into the other box, making sure you do not insert or delete any rows in between.

Example dataset you can try

To see how the calculator works, try this small example of study time (hours) and test scores (points out of 100):

X (hours studied):

1
2
3
4
5
6

Y (test score):

52
57
63
68
74
79

If you paste these values into the calculator, you will get a positive slope. That slope tells you how many test score points tend to increase, on average, for each additional hour of study. You will also see an R-squared close to 1, meaning the straight line explains most of the variation in scores for this simple example.

Formulas used by the linear regression calculator

The calculator uses the standard least squares formulas for simple linear regression with one predictor X and one outcome Y.

Let there be n data pairs (x1,y1),(x2,y2),,(xn,yn). First compute the sample means of X and Y:

\u03bc_X = \u2211i=1nxi n , \u03bc_Y = \u2211i=1nyi n

Here, the summation symbol means “add up over all data points.” The slope b1 and intercept b0 of the best fit line are:

bb1=(xiμX)(yiμY)(xiμX)2

bb0=μμY-b1μX

The resulting regression line is:

y^=b0+b1x

For any chosen X value, y^ is the predicted Y on the line.

Correlation and R-squared

The calculator often also computes the Pearson correlation coefficient r between X and Y, and then R-squared as R2=r2.

r=(xiμX)(yiμY)(xiμX) 2(yiμY) 2

R-squared (the coefficient of determination) is then:

R2=r2

R-squared measures the proportion of the variability in Y that can be explained by a linear relationship with X in this model.

Plain-language meaning of the symbols

  • xi, yi: the i-th observed values of X and Y.
  • μX, μY: the average of all X values and the average of all Y values.
  • b1: the slope of the line (change in Y per 1-unit change in X).
  • b0: the intercept (predicted Y when X = 0).
  • r: Pearson correlation coefficient between X and Y.
  • R2: proportion of variance in Y explained by the linear model.

Interpreting the results

Interpreting the slope

The slope tells you how much Y tends to change when X increases by one unit.

  • Positive slope: As X increases, Y tends to increase (e.g., more hours studied, higher test scores).
  • Negative slope: As X increases, Y tends to decrease (e.g., more distance from a router, lower Wi-Fi speed).
  • Slope near zero: Little or no linear relationship; changes in X do not systematically affect Y in a straight-line way.

Always interpret the slope in the original units. For example, if X is “hours” and Y is “dollars,” a slope of 15 means each extra hour is associated with about $15 more.

Interpreting the intercept

The intercept is the predicted value of Y when X is zero. Sometimes this has a direct meaning (e.g., predicted starting weight when age = 0 days in a growth study). In other cases, X = 0 is outside the realistic range (e.g., 0 years of education in a dataset of adults), so the intercept is just a mathematical anchor for the line and should not be over-interpreted.

Understanding R-squared

R-squared ranges from 0 to 1:

  • R-squared close to 1: The line explains a large share of the variation in Y; points cluster tightly around the line.
  • R-squared around 0.5: The line captures some trend, but there is also a lot of scatter around it.
  • R-squared near 0: The linear model explains almost none of the variation; Y does not follow a clear linear pattern with X.

A higher R-squared does not prove a causal relationship, and a lower R-squared does not necessarily mean the model is useless; it depends on the context and how much noise is typical in your field.

Worked example (step by step)

Using the earlier study-time dataset:

X (hours): 1, 2, 3, 4, 5, 6

Y (score): 52, 57, 63, 68, 74, 79

  1. Compute the average of X and Y.
    • Mean of X: (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5
    • Mean of Y: (52 + 57 + 63 + 68 + 74 + 79) / 6 = 65.5
  2. For each pair, subtract the mean and multiply: (xi − meanX)(yi − meanY). Add up those products.
  3. Also compute (xi − meanX)2 for each point and add them up.
  4. Divide the sums to get the slope:
    • slope b 1 = ( xi meanX ) ( yi meanY ) ( xi meanX ) 2
  5. Compute the intercept:
    • intercept bb0=meanY-b1×meanX

For this dataset, the resulting equation is approximately:

Predicted score = 48.9 + 4.9 × hours studied

This means each extra hour of study is associated with roughly a 5-point increase in the test score, and the positive intercept shows the baseline level at very low study time. The calculator performs all these steps automatically for any dataset you enter.

Comparing key linear regression outputs

Output What it represents How to interpret it
Slope (b1) Average change in Y for a one-unit increase in X Sign (positive/negative) shows direction; magnitude shows strength of change per unit of X
Intercept (b0) Predicted value of Y when X = 0 Meaningful only if X = 0 is realistic; otherwise mainly anchors the line
Correlation (r) Strength and direction of linear association Ranges from -1 (perfect negative) to +1 (perfect positive); 0 means no linear relationship
R-squared (R2) Proportion of variance in Y explained by the linear model Close to 1: strong linear fit; close to 0: weak linear fit

Assumptions and limitations of simple linear regression

The calculator implements standard simple linear regression, which relies on several assumptions. These are not enforced by the tool, so it is your responsibility to judge whether they are reasonable for your data.

  • Linearity: The true relationship between X and Y is assumed to be (approximately) a straight line. Strong curves, thresholds, or other non-linear patterns may be poorly captured.
  • Independent errors: The residuals (differences between observed and predicted Y) are assumed to be independent from one data point to the next. Time series or clustered data can violate this.
  • Constant variance (homoscedasticity): The spread of residuals should be roughly the same across all X values. If variability grows or shrinks with X, standard regression inferences may be unreliable.
  • Normality of residuals (for inference): For significance tests and confidence intervals, residuals are often assumed to be approximately normally distributed. For simple prediction, this is less critical.
  • Sensitivity to outliers: A few extreme points can strongly pull the slope and intercept. Always check whether unusual observations are errors, special cases, or genuinely part of the process.
  • Correlation is not causation: A strong linear relationship does not prove that changes in X cause changes in Y. Hidden variables or reversed causality may be responsible.
  • Simple regression only: This calculator fits one predictor (X) to one outcome (Y). Situations with many predictors or more complex relationships require multiple regression or other methods.

Because the tool is designed for quick, exploratory analysis, you should treat its output as one piece of evidence and combine it with subject-matter knowledge and more detailed diagnostics when decisions are important.

Provide equal-length datasets to see the best fit line.

Residual Run Mini-Game

Click to guide your trend line—catch drifting points before noise pulls them away.

Score0
Best0
Time90s
Stability
Controls: ←/→ slope, ↑/↓ intercept, space to pause, tap/drag to nudge line.
Stay close to the true regression—tight residuals earn higher streaks.
Aim for minimal error. Highlighted captures mirror low residuals in real fits.

Tip: After you calculate, the game tunes targets to your latest slope and intercept.

Embed this calculator

Copy and paste the HTML below to add the Linear Regression Calculator - Find the Best Fit Line to your website.