The Pareto distribution is perhaps the most famous example of a power law. Originally proposed by economist Vilfredo Pareto in the late nineteenth century, it was used to describe how a small fraction of the population often controls a large proportion of wealth. Since then power law distributions have been discovered across a staggering range of subjects: city sizes, earthquake magnitudes, internet file popularity, and even the energy of solar flares. The hallmark of these phenomena is a "heavy tail" in which extreme events occur far more often than predicted by a normal distribution. Instead of a sharp drop-off, the probability decays slowly like \(x^{-\alpha}\).
The Pareto distribution is defined by two positive parameters: the scale \(x_m\), which is the smallest possible value, and the shape parameter \(\alpha\) that controls the tail. The probability density function for \(x \geq x_m\) is
with the cumulative distribution function
For \(\alpha > 1\) the mean is \( \alpha x_m / (\alpha - 1) \) and for \(\alpha > 2\) the variance is \( \alpha x_m^2 / ((\alpha - 1)^2 (\alpha - 2)) \). As \(\alpha\) grows, the tail becomes thinner and these moments shrink. Conversely, when \(\alpha \leq 2\) the variance is infinite and when \(\alpha \leq 1\) even the mean does not exist. This highlights just how wild heavy-tailed distributions can be.
The shape parameter plays a critical role in determining the frequency of rare, high-magnitude events. In wealth distributions a small \(\alpha\) implies extreme inequality—vast fortunes concentrated in a few hands—while a larger \(\alpha\) corresponds to a more even spread. In internet traffic or file sizes, \(\alpha\) between 1 and 2 means that although small requests dominate by count, a significant amount of total traffic is still driven by a relatively small number of very large transfers.
\u03B1 | Tail Weight | Mean Exists? | Variance Exists? |
---|---|---|---|
< 1 | Extremely heavy | No | No |
1 – 2 | Heavy | Yes | No |
> 2 | Moderate | Yes | Yes |
Even simple systems such as file sharing networks can produce power-law distributions due to preferential attachment or multiplicative growth. Understanding the tail behavior is essential for assessing risk, planning capacity, or estimating the probability of rare events. In finance, for example, heavy tails imply that extreme market moves happen more often than a Gaussian assumption would suggest.
Provide the scale \(x_m\), the shape \(\alpha\), and the point \(x\) at which to evaluate the probability density and cumulative probability. The script computes \(f(x)\) and \(F(x)\). It also returns the mean and variance if those moments are defined. Enter values greater than the scale for \(x\). If you attempt to compute at a value below \(x_m\) the density is zero and the cumulative probability is likewise zero.
The calculations are straightforward algebra. For the density we raise the scale to the power \(\alpha\) and divide by \(x^{\alpha+1}\). For the cumulative probability we take one minus the ratio \((x_m/x)^\alpha\). The mean and variance formulas follow from integrating the distribution when the shape parameter permits. This tool performs everything directly in your browser so you can experiment with different parameters or copy the resulting text.
When modeling data with a Pareto distribution, it's common to estimate \(\alpha\) from the slope of a log–log plot. Many data sets only obey the power law over a certain range, with cutoffs at the lower or upper end. Nevertheless, the Pareto model offers valuable insight into systems where "the rich get richer" or where network effects and positive feedback dominate. Even a simple spreadsheet or a few lines of code, like this calculator, provide intuition about the explosive behavior of heavy tails.
As a quick example, set \(x_m=1\) and \(\alpha=1.5\) and evaluate at \(x=2\). The density is then 0.53 while the cumulative probability is approximately 0.65. Since \(\alpha\) is less than two, the variance does not exist. If we increase \(\alpha\) to 3 while keeping the scale at 1, the variance becomes 0.75 and the tail probability beyond 10 drops to just 3.2%. These calculations illustrate how the distribution becomes better behaved as the shape parameter rises. Yet the heavy tail persists, making the Pareto distribution a useful model whenever extremes dominate averages.
Analysts frequently need to invert the cumulative distribution to find the value of x associated with a probability p. For the Pareto distribution the quantile function has the simple form
This formula powers inverse transform sampling: draw a uniform random number U on [0,1), plug it into the expression above, and obtain a heavy‑tailed sample. Monte Carlo simulations based on this procedure help evaluate worst‑case scenarios, from insurance losses to network traffic spikes.
Real‑world data rarely arrive with known parameters. A common approach estimates α by fitting a straight line to the survival function on logarithmic axes. The negative slope of that line provides an estimate of α. This technique connects to Zipf's law for ranked data: when you sort observations from largest to smallest and plot rank versus value on a log–log scale, a linear trend indicates Pareto behavior. Linguists observe this pattern in word frequencies, and economists see it in wealth distributions. Determining α quantifies how extreme the inequality or concentration truly is.
No real quantity grows without bound. Physicists recognize that earthquake magnitudes cannot exceed the energy available in tectonic plates, and economists note that personal wealth is capped by global resources. To acknowledge these limits, analysts sometimes use a truncated Pareto model that sets an upper cutoff xmax. Truncation alters normalization and ensures moments remain finite even for small α. While the calculator models the classic unbounded form, you should consider plausible cutoffs when applying the distribution to data.
Imagine an online retailer tracking order sizes. Suppose xm=20 dollars and past data suggest α=1.8. If management wants to know the purchase threshold that only five percent of orders exceed, set p=0.95 in the quantile function to obtain x≈20/(1−0.95)1/1.8≈174 dollars. The survival probability at x=174 is therefore five percent, revealing how a small set of large orders drives revenue. The median, computed as xm·21/α, is roughly 46 dollars, so half of all purchases fall below that amount despite the heavy‑tailed nature.
Heavy tails pose challenges. Sample means converge slowly and can be dominated by a single outlier, making parameter estimates unstable. Curvature on a log–log plot may indicate that only part of the data follows a power law. Before adopting the Pareto model, compare it with alternatives such as the lognormal or Weibull to ensure a good fit. When reporting summary statistics, remember that the mean exists only when α>1 and the variance only when α>2; citing them otherwise is mathematically incorrect.
The Pareto distribution links to numerous areas of probability. Sums of independent Pareto variables can converge to stable Lévy distributions, and Bayesian analyses often use Pareto priors to encode belief in occasional large deviations. In reliability engineering it models lifetimes with a wear‑out failure mode, while network scientists apply it to degree distributions in scale‑free graphs. Exploring these connections deepens intuition for heavy‑tailed phenomena and underscores why Pareto's simple formula continues to captivate researchers.
Evaluate the PDF and CDF of the Cauchy distribution for any location and scale parameters.
Evaluate the beta distribution's PDF and CDF for given parameters.
Compute PDF, CDF, survival probability, mean, variance, and quantiles for Student's t distribution with any degrees of freedom.