Principal component analysis (PCA) is a widely used technique for reducing the dimensionality of datasets while retaining as much variability as possible. Given a collection of observations, PCA finds new axes—called principal components—that point in the directions of greatest variance. Mathematically, if the data matrix is , PCA computes the eigenvectors of the covariance matrix . These eigenvectors form an orthogonal basis capturing the most significant features.
High-dimensional data can be difficult to visualize or analyze directly. PCA projects the data onto a smaller set of orthogonal axes ranked by variance. By keeping only the first few components, we simplify the dataset while preserving its essential patterns. This helps in noise reduction, visualization, and speeding up subsequent machine learning algorithms. For example, images with thousands of pixels can often be approximated accurately using just a handful of principal components.
The first step in PCA is centering the data by subtracting the mean of each feature. The covariance matrix summarizes how the features vary together. Solving the eigenvalue problem yields eigenvalues representing the variances explained by their corresponding eigenvectors . Sorting the eigenvalues in descending order ranks the components from most to least significant. The proportion of variance explained by the th component is , helping us decide how many components to keep.
Suppose we analyze the dataset with rows (1, 2), (3, 4), and (5, 6). After centering, the covariance matrix is . Its eigenvectors are after normalization, corresponding to eigenvalues 8 and 0. The first component accounts for nearly all the variance, indicating the points lie along a straight line. By projecting onto that component, we reduce the two-dimensional data to a single coordinate without losing much information.
This calculator performs PCA entirely in your browser. After parsing the input matrix, it centers each column, computes the covariance matrix, and applies a simple eigen-decomposition using the numeric JS library built into modern browsers. Because the dataset is expected to be small—only a few dozen rows at most—the computation completes instantly. The resulting eigenvalues and eigenvectors are displayed as plain text so you can verify how much variance each component captures.
The eigenvectors reveal directions in feature space where the data varies the most. If an eigenvector has entries of similar magnitude, the corresponding component blends all original features. Conversely, a component with a large coefficient for one feature and small coefficients for others indicates that feature dominates. By examining the eigenvalues, you can gauge how many components are necessary to approximate the data effectively. A sharp drop-off after the first few values often signals that a lower-dimensional representation suffices.
PCA is closely related to singular value decomposition (SVD), which factorizes a matrix into left singular vectors, singular values, and right singular vectors. In fact, performing SVD on the centered data matrix yields principal components in the columns of . PCA also provides the foundation for more advanced dimensionality reduction methods such as kernel PCA, which applies the technique in a feature space induced by a nonlinear mapping. Understanding standard PCA prepares you to explore these extensions.
While PCA is powerful, its linear nature means it cannot fully capture nonlinear relationships. It also assumes that the directions of greatest variance are the most informative, which may not hold if the data contains outliers or irrelevant noise. Standardizing features to have unit variance can mitigate scale differences, and robust PCA variants attempt to handle outliers. Nevertheless, ordinary PCA remains a staple analysis tool due to its simplicity and broad applicability.
Enter small datasets and observe how the principal components align with obvious patterns. Try rotating the points in the plane or adding noise to see how the eigenvalues change. By experimenting with different configurations, you can build intuition for how PCA reacts to variations in spread and orientation. This interactive approach turns an abstract algebraic procedure into a tangible exploration of data structure.
Expand rational functions into a Laurent series around a point.
Break down a 3x3 matrix into lower and upper triangular matrices for linear algebra and numerical methods.
Use Cramer's rule to solve small linear systems with determinants.