Previous posts focus on maximum likelihood estimation for continuous distributions (this post and this post). In this post we shift the attention to parameter estimation for discrete distributions, in particular, the three commonly used discrete distributions – Poisson, binomial and negative binomial.
Practice problems to reinforce concepts discussed here are found here.
Practice problems for maximum likelihood estimation for continuous distributions are found here and here.
In fitting a discrete distribution to observed data, we focus on two procedures – method of moments and maximum likelihood estimation.
For method of moments estimation, we adopt the approach of equating the sample mean with the population mean for distributions with one parameter (e.g. Poisson) and equating the sample mean with the population mean and equating the sample variance with the population variance for distributions with two parameters (e.g. negative binomial). Of course, for twoparameter distributions, instead of equating sample variance with population variance, we can instead equate sample second moment with population second moment.
For maximum likelihood estimation (MLE), the idea is similar to MLE for continuous distributions. In the discrete case, use the probability function (or probability mass function) to set up a likelihood function instead of the probability density function. The rest of the procedure works similarly – take the natural log of the likelihood function, take derivative(s) and solve the equation(s) resulting from equating the derivative(s) to zero. In addition to using examples, we point out the issues in implementing MLE for negative binomial distribution and binomial distribution.
Poisson Distribution
The Poisson distribution has only one parameter , which is the mean of the distribution. When complete data is available, the method of moments estimate of would be the sample mean and the maximum likelihood estimate of is also the sample mean. Thus for the Poisson distribution, the method of moments estimate coincides with the maximum likelihood estimate in the presence of complete data. However, when the sample data is not complete data (e.g. grouped data, censored data or truncated data), the maximum likelihood estimate of does not equal the method of moments estimate.
Example 1
The claim frequency data of 100 insureds is given in the following table.
# of Claims  # of Insureds 

0  40 
1  24 
2  20 
3  8 
4  5 
5  3 
6+  0 
Total  100 
A Poisson distribution is fitted to the claim frequency data using maximum likelihood estimation. Determine the resulting estimate of the probability of having zero claims.
The sample mean frequency is:
The method of moments estimate of the mean is , which is also the maximum likelihood estimate. The estimated probability of having zero claims is .
Example 2
The following table gives the claim frequency data of a group of insureds.
# of Claims  # of Insureds 

0 or 1  26 
2  12 
3  3 
4+  0 
Fit the Poisson distribution to the claim frequency data using maximum likelihood. Determine the estimated probability of observing 0 or 1 claim.
Since the given observed claim frequency data is not complete data, do not equate the maximum likelihood estimate with the sample mean. In any case, the sample mean is a little murky since we do not know how many of the 26 insureds have zero claims. The probability of 0 or 1 claim is . The likelihood function is given by the following.
The in is a multiplicative constant term that can be ignored. The following gives the loglikelihood function and its derivative.
Setting the derivative equal to zero leads to the quadratic equation . Solving this equation produces the following estimate of and the estimated probability.

.
Negative Binomial Distribution
The negative binomial distribution has two parameters. we consider two parametrizations of the negative binomial distribution.
(1)……
(2)……
Depending on the version, the negative binomial parameters are either and or and . To get ready for method of moments estimation, note the population mean and variance in the two versions.
(3)……
(4)……
Equating the sample mean with and the sample variance with produces the following method of moments estimates.
(5)……
(6)……
The estimates in (5) are the method of moments estimates for the negative binomial distribution as described in (1). The estimates in (6) are the method of moments estimates for the negative binomial distribution as described in (2). For both cases to work, the sample variance must exceed the sample mean, i.e. . In both (5) and (6), the sample variance is obtained by the biased sample variance, i.e. the one obtained by dividing by sample size rather than .
Example 3
Use the sample data in Example 1. Fit negative binomial distribution to the observed claim frequency data using method of moments. Determine the probability of observing zero claims according to the fitted distribution.
From Example 1, . The following gives the sample variance.
According to (5), the estimates of and are:
Then , the probability of observing zero claims, is .
When both parameters are unknown, maximum likelihood estimation for the negative binomial distribution requires using a numerical software package. The following example demonstrates why.
Example 4
The observed claim counts for three insureds: 0, 1, 2. Fit a negative binomial distribution to the observed data.
The likelihood function is based on the probability function in (1).
Taking the partial derivatives with respect to both parameters.
Solving these two equations produces the following equations.
Solving for in the last equation would require numerical techniques.
In light of Example 4, we do not focus on MLE for the case that both of the negative binomial parameters are unknown. When the parameter is known, maximum likelihood estimation works like method of moments in that the product of the two parameters and is the sample mean.
Example 5
Using maximum likelihood estimation, fit the negative binomial distribution with parameters and to the claim frequency data in Example 1. Determine the probability of observing zero claims according to the fitted distribution.
With , we have . Then the probability of observing zero claims is .
Binomial Distribution
The binomial distribution has two parameters and where is a positive integer and is a real number between 0 and 1. This is a model for counting the number of successes in performing a series of independent Bernoulli trials (a Bernoulli trial is a random experiment in which there are two distinct outcomes called success and failure). Usually the parameter is denoted by . However, we already use to mean the sample size. So the parameters of the binomial distribution are and . The following is the probability function.
(7)……
The mean of the binomial distribution is and its variance is . When both parameters and are unknown, we can use the method of moments estimation. However, it is likely that the estimate may end up not being an integer. In that case, the compromise is to round to the nearest integer. This is one pitfall of working with an integerparameter.
For maximum likelihood estimation, let’s start with the simpler case that is known. In this case the parameter is the only one that needs to be estimated. Suppose that is the sample data where for each . Then maximum likelihood estimator of is given by
(8)……
There is a handy way to interpret the MLE estimate of . Each data point is an observed number of successes when performing Bernoulli trials. In the sample of size , is the total number of trials. The sum of all the would be the total number of successes out of the trials. Thus is the sample proportion of successes.
Based on (8), . When the parameter is known, the maximum likelihood estimate is also the method of moments estimate.
When both and are not known, the maximum likelihood estimation of and is done by creating a likelihood profile for various possible values of . A possible value of has to be at least as large as the largest binomial observation. The steps for creating a likelihood profile is as follows:
 Start with the value of that is the largest observed value.
 Using the chosen , calculate according to (8).
 Evaluate the loglikelihood at .
 Increase be 1.
 Repeat Step 2 to Step 4 until a maximum in loglikelihood is found.
For the likelihood profile approach to work, the sample variance must be less than the sample mean. Otherwise, the loglikelihood values will increase without bound (see Problem 4J here).
Example 6
Claim frequency data has been collected from 100 insureds and is given in the following table.
# of Claims  # of Insureds 

0  30 
1  40 
2  25 
3  5 
4+  0 
Fit the binomial distribution to the given claim frequency data using the method of moments.
The following gives the sample mean and sample variance.
Note that the sample variance is less than the sample mean. It is then possible to fit binomial distribution to the observed data. This fact is crucial for performing maximum likelihood estimation (the next two examples). The following steps give the method of moments estimates.
Since the calculated is not an integer, round to 4. As a result, the method of moments estimates are and .
Example 7
Use the same data in Example 6. Fit the binomial distribution to the observed claim frequency data using maximum likelihood estimation. Assume that is known with ranging from 3 to 8.
The maximum likelihood estimate of can be obtained by formula (8). The estimated are:
Example 8
Use the same data in Example 6. Fit the binomial distribution to the observed claim frequency data using maximum likelihood. Assume that both parameters and are unknown. The maximum likelihood estimation is performed by creating a likelihood profile as described above.
The largest observation is in the sample is 3 (there are 5 such observations). In creating the likelihood profile, the starting value of is 3. Use this value to set up the likelihood function and the corresponding loglikelihood function . Then evaluate at (0.35 is found in Example 7).
Next, perform the same process using . The process is continued until a maximum is loglikelihood is found. The following table shows the results.
loglikelihood  

3  0.35  122.8241929 
4  0.2625  123.5787391 
5  0.21  123.523266 
6  0.175  137.2092949 
7  0.15  124.171543 
8  0.13125  124.4007318 
The loglikelihood is the greatest at the starting value of . The loglikelihood decreases as increases. Thus the maximum likelihood estimates are and .
Other Considerations
Poisson, binomial and negative binomial are three commonly used discrete distributions. One important distinction among these three distribution is that the mean and variance are identical for the Poisson distribution, the mean is greater than the variance for the binomial distribution and the mean is less than the variance for the negative binomial distribution. Thus we have the following observation.
In examining sample data for discrete distributions, we should compare the sample mean and sample variance. If sample mean is roughly the same, then Poisson might be a good fit. If the sample mean is greater than the sample variance, the binomial distribution might be a good fit. If sample mean is less than the sample variance, the negative binomial distribution might be a good fit.
The universe of discrete distributions is larger than the three commonly used discrete distributions. However, the guideline described in the above paragraph is a good starting point in the modeling process.
For the sample claim frequency data in Example 1, the sample mean is 1.23 and the sample variance is 3.31. Among the three distributions of Poisson, binomial and negative binomial, the negative binomial distribution best represents the data. For the sample claim frequency data in Example 6, the binomial distribution best represents the data since the sample variance is significantly less than the sample mean.
The above observation about comparing the sample mean and sample variance is a useful one. When fitting a Poisson, binomial or negative binomial distribution, there is another technique that is more refined. The key is to consider these distributions as members of the (a,b,0) class of distributions (the (a,b,0) class is introduced here). The distributions in the (a,b,0) class is characterized by the following recursive relation.
(9)……
The notation refers to the probability that the distribution takes on the value of . For any member of the (a,b,0) class, the probabilities can be generated according to (9) for some constants and . The three commonly used discrete distributions – Poisson, binomial, and negative binomial – are (a,b,0) distributions. This means that any one of these distributions can generated recursively using (9). See Table 1 in this post for the and associated with each of the three distributions. The relation (9) can be rearranged as follows:
(10)……
The relation (10) says that the ratio is a linear function of with the slope being and the yintercept being . If the (a,b,0) distribution is a Poisson distribution, then . If the (a,b,0) distribution is a negative binomial distribution, then . If the (a,b,0) distribution is a binomial distribution, then . Thus the slope in (10) is an indicator of the (a,b,0) distribution.
Using observed data, is estimated by the ratio where is the sample size and is the number of observations that equal . Then relation (10) is approximated by the following.
(11)……
If the sample data is drawn from an (a,b,0) distribution, the quantity on the lefthand side of (11) should have a linear pattern when plotted against . If the plot is roughly horizontal, it is an indication that the (a,b,0) distribution is a Poisson distribution. If the plot has a positive slope, it is an indication that the (a,b,0) distribution is a negative binomial distribution. If the plot has a negative slope, it is an indication that the (a,b,0) distribution is a binomial distribution. This is further discussed in the following example.
Example 9
Consider the sample claim frequency data in Example 1. The quantities are shown in the following table.
0  40  
1  24  0.6 
2  20  1.67 
3  8  1.2 
4  5  2.5 
5  3  3 
6+  0 
The following is a plot of the ratio against
The plot shows roughly a linear pattern. The slope is clearly positive. This suggests that the negative binomial distribution is a good fit.
When fitting an (a,b,0) distribution, it is a good idea to construct a plot according to relation (11). A couple of caveats. Any category with cannot be used in the plot. The plot is less reliable if there is an insufficient amount of data.
actuarial practice problems
Dan Ma actuarial
Daniel Ma actuarial
Daniel Ma Math
Daniel Ma Mathematics
Actuarial exam
2019 – Dan Ma