Kullback–Leibler (KL) divergence measures how one probability distribution diverges from a second, reference distribution. Given discrete distributions and over the same set, the KL divergence from to is defined as
Intuitively, the formula quantifies the extra information required to encode events sampled from if we use a code optimized for . When and are identical, the divergence is zero. As the distributions diverge, the value grows, reflecting the inefficiency of using to approximate .
KL divergence appears throughout machine learning and information theory. In variational inference, models minimize KL divergence to find a simpler approximate distribution that mimics a complicated posterior. In reinforcement learning, policies are often constrained by a maximum KL divergence from prior policies to ensure stable updates. The concept also helps compare language models, evaluate generative networks, and track training progress in classification problems.
Because KL divergence is asymmetric, generally differs from . This asymmetry underscores its interpretation as a measure of relative entropy—the expected extra message length when is encoded with 's code.
Enter probabilities for and separated by commas. Each list should contain the same number of values and sum to 1. The script normalizes them if necessary. When you press the compute button, it iterates through the arrays, sums , and displays the result. Probabilities equal to zero contribute nothing because the limit of approaches zero as vanishes.
Suppose assigns probabilities while assigns . Plugging these into the formula yields
Small values indicate the distributions are close, while large values highlight stark differences.
By experimenting with the inputs, you can see how skewing probability mass increases the divergence. Extreme mismatches quickly produce large values. This sensitivity to improbable events is a hallmark of KL divergence and influences its use in robust statistics. In practice, KL divergence informs algorithms ranging from expectation-maximization to reinforcement learning policy updates, showcasing its broad relevance.
Determine the wavelength of maximum emission for a blackbody at any temperature using Wien's displacement law.
Compute the travel time of seismic P or S waves given distance and wave velocity. Useful for earthquake analysis and geophysical surveys.
Compute photon energy from wavelength or frequency using Planck's relation. Learn how light's quantum nature connects energy to electromagnetic waves.