Principal Component Analysis (PCA) Calculator

JJ Ben-Joseph headshot JJ Ben-Joseph

Paste your dataset (CSV/TSV/space‑delimited). Rows = observations. Columns = variables (numeric only). Optionally include a header row with variable names.

All calculations run locally in your browser. No data leaves your device.
Paste data and click Run PCA.

The PCA Guidebook: Practical, Intuitive, and Thorough

Quick start

  1. Paste your table above (rows = observations, columns = numeric variables). Headers are auto‑detected.
  2. Pick Basis: use Correlation when variables have different units or spreads; use Covariance when scales are comparable and you want large‑variance features to dominate.
  3. Choose how to treat missing values: Drop rows (listwise deletion) or Impute means (simple baseline).
  4. Click Run PCA. Read the Summary, then scan the Scree (elbow), Loadings (variable weights), Scores (observation positions), and the Correlation Circle (quality of representation on PC1–PC2).
  5. Export Scores and Loadings as CSV for downstream analysis.

PCA in plain language

PCA is a smart rotation of your data cloud. Imagine plotting every observation in p‑dimensional space. PCA finds perpendicular axes (principal components) that capture as much spread (variance) as possible, one after another. PC1 points where the data is widest; PC2 is the next widest direction orthogonal to PC1, and so on. Because these axes are uncorrelated, they remove redundancy and make structure easier to see.

Pre‑processing: what to do before PCA

Mathematics of PCA (clear but compact)

Let X be the n×p data matrix after centering (and optional standardizing). The sample covariance (or correlation) matrix is:

C = X T X n − 1

PCA solves the eigenproblem and orthonormality conditions:

C v j = λ j v j , λ 1 ≄ λ 2 ≄ ⋯ ≄ λ p ≄ 0 , v j T v k = ÎŽ j k

The scores and a low‑rank reconstruction are:

T = X · V X ≈ T k · V k T

(Add back the column means you subtracted to return to the original space.)

Explained variance ratio for PC j is:

EVR j = λ j ∑ i = 1 p λ i

SVD view (why it’s numerically stable)

Singular Value Decomposition factors the centered/standardized matrix as:

X = U ÎŁ V T

The eigenvalues of C relate to the singular values via:

Σ 2 n − 1 → eigenvalues of C

Scores are equivalently:

T = U ÎŁ

Scores, Loadings, Communalities & Contributions

Choosing the number of components (k)

Interpreting components like a pro

  1. Read loadings first. Identify which variables drive PC1, PC2, 
 High positive vs. negative weights can indicate meaningful trade‑offs (e.g., price ↑ while efficiency ↓).
  2. Use the scores plot to detect groups/outliers. Clusters in PC1–PC2 space often correspond to meaningful segments; extreme scores flag outliers or novel cases.
  3. Check the correlation circle. Variables close together are positively correlated; opposite sides indicate negative correlation; near‑orthogonal ≈ weakly related.
  4. Relate back to the domain. Components are combinations of variables—name them by what they measure (e.g., “overall size”, “sweetness vs. acidity”, “market risk”).
  5. Remember non‑uniqueness. If λ’s are tied or nearly equal, the corresponding PCs can rotate within their subspace. Focus on the subspace, not exact axes.

High‑dimensional case (p ≫ n)

When variables outnumber observations, at most n−1 eigenvalues are non‑zero. PCA still works and is often essential. Computation is faster via the SVD of X or by eigendecomposing XXᔀ and mapping to feature space. Interpretation is the same.

Outliers & robustness

Advanced PCA topics

Domain‑specific tips

Common pitfalls

FAQ (extended)

How do I apply these loadings to new data? Store your training means (and stds for correlation PCA). For a new row x, compute xâ€Č = (x − mean)/std as appropriate, then scores = xâ€Č · V_k. This tool reports means and stds so you can replicate preprocessing.

Why don’t my results match another package exactly? PCA is unique up to sign flips; small differences arise from numerical methods, missing‑value handling, and whether covariance or correlation was used.

Can I rotate PCs (e.g., varimax)? Rotation is a factor analysis concept. PCA already yields orthogonal components that maximize variance; rotated solutions optimize different criteria.

Does scaling change scores? Yes—correlation PCA gives each variable equal variance, shifting both loadings and scores; covariance PCA lets high‑variance variables dominate.

Glossary

Scree Plot (Explained Variance %)

PC Scores (PC1 vs PC2)

Correlation Circle (Variable Loadings on PC1–PC2)

Related Calculators

Covariance Matrix Calculator - Analyze Multivariate Data

Compute the covariance matrix for two or three datasets and explore its meaning.

covariance matrix calculator multivariate statistics

Mohr's Circle Stress Calculator - Principal Stresses and Angles

Compute principal stresses and orientation using Mohr's circle from plane stress components.

mohr's circle calculator principal stress calculator mechanics

Eigenvalue and Eigenvector Calculator - Understand Matrix Behavior

Calculate eigenvalues and eigenvectors of a 2x2 matrix. Useful for systems analysis, vibrations, and more.

eigenvalue calculator eigenvector calculator linear algebra