The negative binomial distribution models the number of failures that occur before a specified number of successes is achieved in a sequence of independent trials. Imagine repeatedly flipping a coin until you get three heads. The count of tails observed before that third head is a negative binomial random variable. Because the distribution keeps tally of the failures until a set number of successes, it tends to be right skewed, especially when success probability is low. The negative binomial is closely related to the geometric distribution, which is the special case where only one success is required.
Mathematically the probability of observing exactly failures before the -th success is given by
where is the probability of success in each trial. The binomial coefficient counts the number of ways to arrange the failures and successes, while the probability terms account for the chance of each arrangement. The mean of the distribution is
and the variance is . These relationships show how variability increases dramatically as the success probability decreases.The negative binomial distribution arises in many real-world settings. Epidemiologists use it to model the number of people one patient might infect before recovery. Reliability engineers describe the number of component failures before a system breaks. Marketing analysts look at how many prospects decline before making a set number of sales. Whenever events occur independently with a constant success chance and the process stops after a target number of successes, a negative binomial model is a natural fit. The distribution’s long tail captures the possibility of observing many failures, which the regular binomial distribution does not express as directly.
The shape of the distribution depends heavily on the parameters. A high success probability produces a sharply peaked distribution near zero failures. A low probability leads to a long tail where many failures might accumulate. The table below shows how the mean number of failures grows as the success probability decreases for a fixed .
Success Probability | Mean Failures | Variance |
---|---|---|
0.9 | 0.33 | 0.37 |
0.7 | 1.29 | 1.84 |
0.5 | 3 | 6 |
0.3 | 7 | 23.3 |
Enter the desired number of successes , the success probability (as a fraction between 0 and 1), and the observed failure count . Press Compute to calculate four key quantities: the probability mass at exactly , the cumulative probability up to and including , and the distribution’s mean and variance. The calculator performs a straightforward summation for the cumulative value, so very large may take a moment. If your browser supports clipboard operations, you can copy the text result after calculation for later use.
This tool is useful for anyone dealing with count data where events have to succeed a certain number of times. For instance, a factory might record how many defective items appear before the tenth non-defective one. A call center might track how many rejections an agent receives before three positive responses. Because the negative binomial models the failure count, it is especially handy when events are rare. The accompanying binomial distribution would instead measure the number of successes in a fixed number of trials, which is not always convenient if the trial count varies or could be extremely large.
To compute the binomial coefficient in the equation above, the script uses a basic iterative factorial routine. This approach is adequate for small integer parameters—typical in most applications—and avoids any external libraries. The probability terms are multiplied carefully to reduce rounding error. For the cumulative distribution, the calculator simply sums the individual probabilities from zero up to the requested failure count. While direct closed-form expressions exist, the summation technique is easier to implement and sufficient for moderate parameter ranges.
The negative binomial is sometimes described with the roles of failures and successes reversed, particularly in early literature. In that formulation the distribution represents the number of successes before a fixed number of failures. Although the algebra looks different, the probabilities are equivalent after adjusting the parameters. This calculator sticks with the more common convention where is the target successes and the random variable counts failures.
The negative binomial assumes independent, identical trials and a constant success probability. In many practical cases these assumptions hold only approximately. For example, if success probability changes over time or trials influence one another, the distribution may not fit perfectly. However, it often serves as a reasonable approximation even when conditions are not ideal. The ability to model over-dispersion—variability that exceeds the mean—is particularly valuable in statistics. Compared with the Poisson distribution, the negative binomial allows the variance to be larger than the mean.
When the success probability is very high or the number of successes is small, the distribution becomes narrow. In these cases the geometric distribution () may suffice. Conversely, with low success probability and higher , a very long tail arises and the mean can be quite large. Always examine whether your sample data align with the theoretical mean and variance before relying heavily on the distribution.
The negative binomial distribution is a versatile model for discrete counts where a process continues until a predetermined number of successes occur. Whether you are testing reliability, analyzing infection spread, or studying marketing conversions, it can provide valuable insight into how many failures you might expect along the way. This calculator makes it easy to explore the distribution without specialized statistical software, offering quick access to probabilities and summary measures. By adjusting the parameters you can see just how sensitive the results are to changes in success chance or required successes, deepening your understanding of this fundamental statistical tool.
Some texts describe the negative binomial in terms of a mean and dispersion parameter rather than and . The relationships and allow conversion between forms. This perspective highlights how the variance can exceed the mean, a key feature when modeling over-dispersed count data in fields like ecology and public health.
A helpful way to understand the negative binomial is through its representation as a Poisson-Gamma mixture. Imagine a Poisson process with a random rate parameter that itself follows a gamma distribution. The resulting marginal distribution for the count of events is negative binomial. This view clarifies why the distribution is appropriate when the underlying event rate varies between observations, such as infection counts across regions with different exposure levels. In Bayesian analysis, this mixture interpretation emerges naturally when a gamma prior is placed on a Poisson rate.
Estimating and from data can be done via maximum likelihood or the method of moments. For the latter, compute the sample mean and sample variance , then solve and . Solving yields and , provided the variance exceeds the mean.
Simulating a negative binomial variable is straightforward: generate geometric variables representing failures before each success and sum them, or more efficiently, draw from a gamma distribution for the Poisson rate and then from a Poisson distribution. For instance, to model how many emails a marketer sends before receiving five positive replies, one might assume a success probability of 0.2. Repeatedly simulating the process reveals a wide spread, with some campaigns succeeding after only a few rejections and others taking dozens of attempts. Such simulations help set realistic expectations and evaluate strategies.
When applying the negative binomial distribution, be cautious about its assumptions. Trials must be independent, and the success probability should remain constant. In real-world data these conditions may be violated if, for example, a learning effect increases success chances over time or if failures are clustered due to unobserved factors. Misinterpreting the parameters is another risk: need not be an integer in some statistical treatments, but this calculator assumes a whole number of required successes. Always verify that your modeling choices reflect the context of the problem.