Scatter Plot Generator

JJ Ben-Joseph headshot JJ Ben-Joseph

What this scatter plot generator does

This tool lets you paste or type pairs of numeric values and instantly see a scatter plot, the Pearson correlation coefficient r, and a least-squares regression line. It runs entirely in your browser, so your data is not uploaded to a server.

Quick how-to

  1. Enter one pair per line in the textbox, in the format x,y.
  2. You can use integers or decimals, and negative values are allowed.
  3. Provide at least two valid points to draw a scatter plot; at least three distinct x-values are recommended for meaningful correlation and regression.
  4. Click the button to generate the chart, see the correlation coefficient, and view the regression line equation.

What is a scatter plot?

A scatter plot is a graph that shows how two numerical variables relate to each other. Each point on the plot represents a single observation with an x-value and a corresponding y-value. By looking at the pattern of points, you can quickly see whether there is a visible relationship, whether the cloud of points is tightly clustered or widely spread, and whether any obvious outliers stand apart from the rest.

Common examples include:

  • Study time (hours) vs. test score (percentage).
  • Temperature vs. ice cream sales.
  • Advertising spend vs. number of website visits.
  • Age vs. blood pressure.

When the points tend to rise from left to right, the relationship is positive: larger x values usually come with larger y values. When the points tend to fall from left to right, the relationship is negative: larger x values tend to come with smaller y values. A loose cloud with no obvious tilt suggests little or no linear relationship.

Correlation and the Pearson r coefficient

This generator computes the Pearson correlation coefficient, often written as r. The value of r is always between -1 and 1:

  • r close to 1: strong positive linear relationship.
  • r close to -1: strong negative linear relationship.
  • r near 0: little or no linear relationship (though a nonlinear pattern could still exist).

Conceptually, Pearson's r compares how much x and y vary together (their covariance) with how much they vary on their own (their standard deviations). The sample correlation formula is:

r = ( xi - μ_x ) ( yi - μ_y ) ( xi - μ_x ) 2 ( yi - μ_y ) 2

In practice, the calculator uses an equivalent computational formula that avoids rounding errors, but the meaning is the same: it measures the strength and direction of a linear association.

How to interpret r

There are no universal cutoffs, but many practitioners use rules of thumb like these:

  • |r| < 0.1: essentially no linear correlation.
  • 0.1 ≤ |r| < 0.3: weak linear correlation.
  • 0.3 ≤ |r| < 0.5: moderate linear correlation.
  • |r| ≥ 0.5: strong linear correlation.

Always look at the plot itself in addition to the number. A single outlier can change r dramatically, and a curved pattern can produce an r value near zero even when there is a clear nonlinear relationship.

Least-squares regression line

Along with the scatter plot and correlation, the tool draws the least-squares regression line. This is the straight line that best summarizes the linear trend in the data by minimizing the sum of squared vertical distances between the points and the line.

The regression line has the equation

y = m x + b

where m is the slope and b is the y-intercept. For a set of n points (xi, yi), the slope and intercept can be written in terms of sample means and sums of squares. In simplified symbolic form:

  • m = Sxy / Sxx
  • b = ȳ − m x̄, where x̄ and ȳ are the sample means of x and y.

On the chart, the regression line gives a quick visual summary of the trend. You can also use it to make rough predictions: plug a new x into the equation to estimate the corresponding y. Remember that such predictions are only reliable within the range of your observed data and only when the linear model is a good fit.

Worked example

Suppose you collect data on hours spent studying and exam scores for five students:

Student Study time (hours) Score (%)
A 1 65
B 2 70
C 3 78
D 4 85
E 5 88

You would enter these data as:

1,65
2,70
3,78
4,85
5,88
  

After generating the plot, you would see points that rise from left to right, indicating a positive relationship between study time and score. The calculator will report an r value close to 1, signaling a strong positive linear correlation. The regression line might have an equation similar to:

score = 5.8 × hours + 59 (values will vary slightly depending on rounding).

This means that, on average, each extra hour of study time is associated with about 5.8 additional percentage points on the exam. You can visually check how well the line follows the data and whether any point lies unusually far from the trend.

Scatter plot, correlation, and regression at a glance

Feature What it shows Best use Main limitation
Scatter plot Individual points for each (x,y) pair. Spot patterns, clusters, outliers, and general shape of the relationship. Visual only; does not give a single numeric summary.
Pearson r Single number between -1 and 1 summarizing linear association. Quickly judge strength and direction of a linear relationship. Insensitive to nonlinear patterns; can be distorted by outliers and small samples.
Regression line Best-fitting straight line through the data points. Summarize trend and make approximate predictions within the data range. Assumes a linear relationship and can mislead if the pattern is curved or heavily influenced by outliers.

Assumptions and limitations of this tool

To use the scatter plot generator effectively, it is important to understand its assumptions and the situations where its results may be misleading.

Data format and input assumptions

  • Numeric input only: The tool expects both x and y values to be valid numbers. Non-numeric entries, missing values, or extra commas may cause rows to be skipped or produce errors.
  • One pair per line: Each line should contain exactly one x,y pair. Empty lines are ignored.
  • At least two valid points: A minimum of two valid data pairs is required to draw a scatter plot; more points give a clearer picture.
  • Correlation and regression need enough data: While the code can compute r and a regression line with as few as two points, meaningful interpretation usually requires more observations (for example, 8–10 or more).

Statistical assumptions

  • Linear relationship: Pearson correlation and simple linear regression describe linear patterns. If your data follow a curve (e.g., U-shaped), the scatter plot may show a clear pattern even when r is near zero and the regression line is not appropriate.
  • Influence of outliers: A single extreme point can strongly affect both r and the regression line, making the relationship appear stronger or weaker than it really is. Always check the plot for outliers before trusting the summary statistics.
  • No causation implied: Correlation and regression describe association, not cause and effect. A high r does not prove that changes in x cause changes in y. There may be lurking variables or common causes.
  • Extrapolation risk: Predictions far outside the range of observed x-values (extrapolation) can be very unreliable, even if the line fits well inside the original data range.

Technical limitations

  • Browser precision: Calculations use standard JavaScript floating-point arithmetic. For typical educational or exploratory data sets this is accurate enough, but extremely large or tiny values may accumulate rounding error.
  • Scaling and readability: If your data span a very wide range (for example, mixing values near 0 with values in the millions), the automatic scaling may make small-scale structure hard to see.
  • Local-only computation: All processing happens in your browser. This is good for privacy, but it also means that very large data sets may be slower to plot depending on your device.

For classroom work, quick analysis, and exploratory data visualization, these limitations are usually not a problem. For high-stakes decisions or formal statistical studies, consider using specialized statistical software and consulting a statistics reference or expert.

Example: 1,3 on one line, 2.5,4 on the next.

Provide at least two points.

Embed this calculator

Copy and paste the HTML below to add the Scatter Plot Generator and Correlation Explorer to your website.