# More on calculating maximum likelihood estimators

This post continues the preceding post on maximum likelhood estimation. The preceding post focuses on calculating MLE when there is complete data (or individual data). This post focuses on calculating MLE for the other data scenarios such as grouped data, censored and truncated data.

Individual data refers to a data set where the exact value of every data point in the data set is completely known. Grouped data refers to a summarized data set that consists of frequency data, i.e. the counts that fall into a set of intervals.

Censored data refers to a data set where information on some of the data points is only partially known. For example, a data point exceeding a limit $u$ is recorded as $u$ (the data point is right censored or censored from above). A data point lower than a limit $l$ is recorded as $l$ (the data point is left censored or censored from below). A handy example of a censored data set is a reliability study where the times at failure for machines are recorded during a 5-year period. In this study, the time at failure for any machine that is still operating at the end of the study is recorded as 5 even though the machine may continue to work for a number of more years.

Truncated data refers to a data set where data values in some intervals are not observed and are thus ignored. For example, in an insurance coverage with a deductible $d$, when considering payment data, any loss that is below $d$ is not included into the calculation. This is an example of a data set that is truncated below. Any data set such that values above a certain threshold are not observed or collected is truncated above.

For censored data and truncated data, we focus on claim data with a policy limit (censored from above) or on claim data with a deductible (truncated from below) or on claim data with both a policy limit and a deductible.

Several examples (Example 3, Example 4, Example 6 and Example 7) concern the Pareto distribution. The Pareto distribution used here is also called Pareto type II distribution. For useful facts about Pareto type II, see this post in a companion blog.

Grouped Data

In this scenario, the data points are not available individually. Instead, we know the counts of the data points that fall into a set of intervals. Unlike the case for complete data, the likelihood is not the value of the density function. It is the difference of two values of the cumulative distribution function (CDF) to account for the probability of a data point falling into an interval. The rest of the procedure is the same as before – finding the likelihood function, and then taking log to get the log-likelihood function. Then take the derivative or partial derivatives and set the derivative or partial derivative equal to zero. The maximum likelihood estimates are then the solutions of the resulting equations. This is illustrated in Example 1.

Example 1
The following claim data has been collected from a large of insureds.

Interval Frequency
$(0,5)$ 10
$(5,10)$ 2
$(10,15)$ 6
$(15,20)$ 1
$(20,\infty)$ 1
Total 20

The exponential distribution with mean $\theta$ is fitted to the grouped data. Calculate the maximum likelihood estimate of the parameter $\theta$.

Note that the density is $f(x)=\frac{1}{\theta} \ e^{-x/\theta}$. The CDF is $F(x)=1-e^{-x/\theta}$.

Any observation that falls into the interval (0, 5) has likelihood $1-e^{-5/\theta}$, which is $F(5)$, accounting for the probability of an observation being in that interval. The likelihood for the interval (0, 5) is then $(1-e^{-5/\theta})^{10}$. Any observation that falls into the interval (5, 10) has likelihood $e^{-5/\theta}-e^{-10/\theta}$, which is $F(10)-F(5)$. The likelihood for the interval is $(e^{-5/\theta}-e^{-10/\theta})^2$. Continue on with the same process. The likelihood function $L(\theta)$ is the product of the likelihood of the intervals.

$\displaystyle L(\theta)=\biggl[1-e^{-5/\theta} \biggr]^{10} \ \biggl[e^{-5/\theta}-e^{-10/\theta} \biggr]^2 \ \biggl[e^{-10/\theta}-e^{-15/\theta} \biggr]^6 \ \biggl[e^{-15/\theta}-e^{-20/\theta} \biggr] \ e^{-20/\theta}$

The likelihood function can be further simplified before obtaining the log-likelihood function.

$\displaystyle L(\theta)=e^{-105/\theta} \ \biggl[1-e^{-5/\theta} \biggr]^{19}$

$\displaystyle l(\theta)=\ln L(\theta)=-\frac{105}{\theta}+19 \ \ln (1-e^{-5/\theta})$

Solving the equation obtained by setting the derivative of the log-likelihood function equal to zero gives the maximum likelihood estimate.

$\displaystyle \frac{d \ l(\theta)}{d \ \theta}=\frac{105}{\theta^2}-\frac{19}{1-e^{-5/\theta}} \ \frac{5 e^{-5/\theta}}{\theta^2}=0$

$\displaystyle e^{-5/\theta}=\frac{21}{124}$

$\displaystyle -5/\theta=\ln \biggl[\frac{21}{124}\biggr]$

$\displaystyle \hat{\theta}=\frac{-5}{\ln \biggl[\frac{21}{124}\biggr]}=7.7597$

The most obvious difference with the case of individual data MLE is that the likelihood function is made up of product of differences of values of the CDF. Otherwise, the same process applies. For some distributions, the maximum likelihood estimate is hard to do for grouped data because of the CDF being hard to manipulate mathematically. It is also the case the method of moments is also difficult to carry out for grouped data.

Censored Data

An example of censored data would be an insurance coverage with a policy limit. Any loss exceeding the limit $u$ is considered the value of $u$. The likelihood of this data point is then $1-F(u)$, the probability of a data point exceeding $u$. The rest of the MLE procedure works the same as before. To contrast, if the censored data point is below a threshold $m$, then the likelihood of the data point is $F(m)$.

Example 2
Observed claims are: 5, 6, 9, 15, 23. In addition, there are two claims exceeding the policy limit of 25.

An exponential distribution with mean $\theta$ is fitted to the claim data. Calculate the maximum likelihood estimate of $\theta$.

For the individual data points, the likelihood is $f(x)=\frac{1}{\theta} \ e^{-x/\theta}$. For the censored data points, the likelihood is $1-F(x)=e^{-x/\theta}$, the probability of exceeding the limit. The following is the likelihood function.

$\displaystyle L(\theta)=\frac{1}{\theta^5} \ e^{-\frac{5+6+9+15+23}{\theta}} \ e^{-\frac{25+25}{\theta}}=\frac{1}{\theta^5} \ e^{-\frac{108}{\theta}}$

The following derivation gives the maximum likelihood estimate $\hat{\theta}$.

$\displaystyle l(\theta)= \ln L(\theta)=-5 \ln(\theta)-\frac{108}{\theta}$

$\displaystyle \frac{d \ l(\theta)}{d \ \theta}=-\frac{5}{\theta}+\frac{108}{\theta^2}=0$

$\displaystyle -5+\frac{108}{\theta}=0$

$\displaystyle \hat{\theta}=\frac{108}{5}=21.6$

Truncated Data

We center the discussion on the scenario of a coverage with a deductible. Truncation is due to the fact that payment on a claim is conditional on the loss exceeding the deductible. Suppose that the insurance coverage has a deductible $d$. Suppose that claims $x_1,x_2,\cdots,x_n$ have been observed (individual data). We assume that losses below $d$ are not submitted. So all observations $x_i$ are above the deductible $d$. There are two ways to applying maximum likelihood estimation to such truncated claim data.

1. Work with the claim data $x_1,x_2,\cdots,x_n$ as is without any modification. Then the resulting maximum likelihood fitted distribution would be for claim data before applying any deductible. The mean of this fitted distribution would be the mean claim cost without a deductible. Of course, we can then estimate from this fitted distribution the claim cost of imposing a deductible.
2. This approach is called shifting since the approach is to subtract the deductible $d$ from each observed claims $x_i$. The resulting maximum likelihood fitted distribution would be for the claim payment reflecting a deductible of $d$. The mean of this fitted distribution would be the mean claim cost per payment (over all losses exceeding the deductible of $d$). For this reason, the original mean claim cost (without a deductible) cannt be recovered from this fitted distribution. However, imposing a deductible of $d$ to this fitted distribution would be equivalent to imposing a deductible of $2 d$ to the original loss distribution.

Essentially in approach 1, we fit a distribution to the truncated claim data (but unmodified by the deductible). The resulting maximum likelihood fitted distribution is for the original loss distribution before any deductible being applied. In the second approach we fit a distribution to the claim payment data (after shifting a deductible from the data). The resulting maximum likelihood fitted distribution is for the claim payment distribution reflecting the deductible used in the shifting. Which approach to use depends on whether we want to fit a distribution to the truncated claim data including the deductible or fit a distribution to the claim payment data (with the deductible not included).

To illustrate how these two approaches work, we fit the Pareto distribution to a set of claim data in both ways (Example 3 and Example 4). We round out the discussion on truncated data with an example using exponential distribution (Example 5).

Example 3
An insurance coverage has a deductible of 5. The following claims are observed:

12, 8, 14, 17, 13

A Pareto distribution with parameters $\alpha$ and $\theta=20$ is fitted to these data. Determine the maximum likelihood estimate of $\alpha$. We wish that the fitted Pareto distribution is an estimated model for claim cost before the deductible. So we do not subtract the deductible of 5 from the data points (i.e. approach 1). We discuss several ways of using this fitted Pareto distribution to estimate claim costs.

The density function and the CDF of the Pareto distribution are:

$\displaystyle f(x)=\frac{\alpha \ 20^\alpha}{(x+20)^{\alpha+1}} \ \ \ \ \ \ x>0$

$\displaystyle F(x)=1-\biggl(\frac{20}{x+20} \biggr)^\alpha \ \ \ \ \ \ \ x>0$

Because we assume that we do not have any information about claims below 5, observing a claim $x$ is conditional on the fact that the loss underlying that claim exceeds 5. Thus the likelihood of a claim $x$ is a conditional probability. The likelihood of a claim amount $x$ is $\frac{f(x)}{1-F(5)}$. Plugging in the Pareto information, the following is the likelihood of a claim $x$.

$\displaystyle \frac{f(x)}{1-F(5)}=\frac{\frac{\alpha \ 20^\alpha}{(x+20)^{\alpha+1}}}{\biggl(\frac{20}{5+20} \biggr)^\alpha}=\frac{\alpha \ 25^\alpha}{(x+20)^{\alpha+1}}$

There are 5 data points. The likelihood function is then the product of these 5 likelihood values.

$\displaystyle L(\alpha)=\frac{\alpha^5 \ 25^{5 \alpha}}{\prod \limits_{i=1}^5 (x_i+20)^{\alpha+1}}=\frac{\alpha^5 \ 25^{5 \alpha}}{37196544^{\alpha+1}}$

The usual steps produce the maximum likelihood estimate for $\alpha$.

$\displaystyle l(\alpha)=\ln L(\theta)=5 \ln(\alpha)+5 \alpha \ln(25)-(\alpha+1) \ \ln (37196544)$

$\displaystyle \frac{d \ l(\alpha)}{d \ \theta}=\frac{5}{\alpha}+5 \ln(25)-\ln(37196544)=0$

$\displaystyle \hat{\alpha}=\frac{5}{\ln(37196544)-5 \ln(25)}=3.7387$

The Pareto distribution with $\hat{\alpha}=3.7387$ and $\theta=20$ is the fitted distribution for claim data in this insurance coverage. The deductible of 5 is not factored into this Pareto distribution. So this is the fitted distribution for the claim cost before applying the deductible. Thus, the mean claim cost without any deductible is $E[X]=\frac{20}{\hat{\alpha}-1}=7.3027$. Solving the equation $F(x)=0.5$ gives the median. Thus, the median claim cost without any deductible is 4.0739. When imposing a deductible of 5, here’s the estimated claim costs:

Limited Expected value………..$\displaystyle E[X \wedge 5]=\frac{\theta}{\hat{\alpha}-1} \ \biggl[1-\biggl(\frac{\theta}{5+\theta} \biggr)^{\hat{\alpha}-1} \biggr]=3.3392$

Claim Cost Per Loss……………….$\displaystyle E[X]-E[X \wedge 5]=3.9635$

Claim Cost Per Payment………..$\displaystyle \frac{E[X]-E[X \wedge 5]}{1-F(5)}=9.1284$

When imposing a deductible of 10, here’s the estimated claim costs based on the fitted Pareto distribution.

Limited Expected value………..$\displaystyle E[X \wedge 10]=\frac{\theta}{\hat{\alpha}-1} \ \biggl[1-\biggl(\frac{\theta}{10+\theta} \biggr)^{\hat{\alpha}-1} \biggr]=4.8971$

Claim Cost Per Loss……………….$\displaystyle E[X]-E[X \wedge 10]=2.4056$

Claim Cost Per Payment………..$\displaystyle \frac{E[X]-E[X \wedge 10]}{1-F(10)}=10.9541$

The claim cost without a deductible is $E[X]=7.3027$ over all losses. When imposing a deductible of 5, the claim cost per loss is reduced to 3.9635. When imposing a deductible of 10, the claim cost per loss is further reduced to 2.4056. Note that the claim costs per payment are conditional means (calculated over all losses exceeding the deductible). So they are higher than the claim cost per loss.

Example 4
We now show how to estimate MLE using the second approach for truncated data. We continue to use the claim data from Example 3. We still wish to fit a Pareto distribution with parameters $\alpha$ and $\theta=20$ to the same data. This time we subtract the deductible of 5 from the claims. The resulting fitted Pareto distribution is for the distribution of claim payments based on the deductible of 5.

After subtracting the deductible of 5, the data are: 7, 3, 9, 12, 8. The maximum likelihood estimation is based on this shifted data. This data set is a complete data set. We can use the formula shown in the preceding post.

$\displaystyle \hat{\alpha}=\frac{n}{\ln\biggl(\prod \limits_{i=1}^n (\theta+x_i) \biggr)-n \ln(\theta)}=\frac{5}{\ln(16136064)-5 \ln(20)}=3.0904$

The Pareto distribution with parameters $\hat{\alpha}=3.0904$ and $\theta=20$ is the fitted distribution for claim payments. The deductible of 5 is baked into this Pareto distribution. The mean of this distribution is $E[X]=\frac{20}{\hat{\alpha}-1}=9.5675$. This mean is the mean claim payment with a deductible of 5 baked in. So we cannot recover the claim cost without deductible from this fitted distribution. This fitted Pareto distribution is modified from the original Pareto distribution describing the losses without the deductible. If we impost a deductible of 5 to this modified distribution, the result would be equivalent to imposing a deductible of 10 to the original distribution.

Limited Expected value………..$\displaystyle E[X \wedge 5]=\frac{\theta}{\hat{\alpha}-1} \ \biggl[1-\biggl(\frac{\theta}{5+\theta} \biggr)^{\hat{\alpha}-1} \biggr]=3.5666$

Claim Cost Per Payment………..$\displaystyle \frac{E[X]-E[X \wedge 5]}{1-F(5)}=11.9594$

The mean claim cost of 11.9594 is equivalent to the mean claim cost when imposing a deductible of 10 to the claim data before the deductible. Note that 11.9594 is in line with the equivalent number of 10.9541 in Example 3. The two answers may be equivalent but they usually do not equate exactly.

Example 5
This example deals with the same coverage and same claim data as in Example 3. This time we fit the exponential distribution with mean $\theta$ to the claim data. We apply the maximum likelihood estimation using the first approach (without subtracting the deductible from the claim data). Observing a claim $x$ is conditional on it exceeding the deductible 5. The likelihood of a claim $x$ is

$\displaystyle \frac{\frac{1}{\theta} e^{-x/\theta}}{e^{-5/\theta}}=\frac{1}{\theta} \ e^{-(x-5)/\theta}$

Thus the likelihood function is:

\displaystyle \begin{aligned} L(\theta)&=\frac{1}{\theta^5} \ e^{-(7-5)/\theta} \ e^{-(10-5)/\theta} \ e^{-(12-5)/\theta} \ e^{-(16-5)/\theta} \ e^{-(22-5)/\theta} \\&=\frac{1}{\theta^5} \ e^{-42/\theta} \end{aligned}

The maximum likelihood estimate is derived as follows:

$\displaystyle l(\theta)=\ln [L(\theta)]=-5 \ \ln(\theta)-\frac{42}{\theta}$

$\displaystyle \frac{d \ l(\theta)}{d \ \theta}=-\frac{5}{\theta}+\frac{42}{\theta^2}=0$

$\displaystyle \hat{\theta}=\frac{42}{5}=8.4$

On careful examination, note that if we use the shifted approach (the second approach) on the exponential distribution, we get the same maximum likelihood estimate $\hat{\theta}=8.4$. Because the exponential distribution is memoryless, either approach for truncated data leads to the same likelihood function $L(\theta)$. The exponential distribution is the only case where the maximum likelihood fitted distribution is both for claim data without a deductible and for claim payment with a deductible. Any other distribution would lead to two different fitted distributions when using both approaches for truncated claim data (just like the Pareto distribution in Example 3 and Example 4).

One comment about the two approaches. If there are two approaches in handling truncated claim data, how do we know which approach to use in an exam problem? The answer depends on the goal of the problem. If the goal is to generate a fitted distribution to answer questions about the loss distribution or the claim data before applying any deductible, the first approach is used. Possible wordings: applying MLE on the original claim data, the fitted distribution is the loss distribution, or the loss distribution is fitted to a distribution.

If the goal is to generate a fitted distribution to answer questions about claim payment reflecting a certain deductible, then use approach 2 by shifting a number from the claim data. Possible wordings: shifting the data by some amount, a certain distribution is fitted to the claim payment data, or claim payment data is fitted to this certain distribution. The idea is that we should look for instruction in the problem.

Censoring and Truncation Combined

We can also apply maximum likelihood estimation on claim data arising from insurance coverage with both a deductible and a policy limit. The addition of the policy limit poses no new challenge. The deductible is already taken care of by the two approaches discussed in the preceding section. The only new piece of information we need is on how to handle the censored limit. Any data point that is above the maximum covered loss $u$ is represented as $u$. Its likelihood is one of the following depending on the approach.

Approach 1………..$\displaystyle \frac{1-F(u)}{1-F(d)}$

Approach 2………..$\displaystyle 1-F(u-d)$

In Approach 1, the denominator is $1-F(d)$ indicating that the likelihood is a conditional probability. The numerator is $1-F(u)$ indicating that the original data point is not known but is above the limit $u$. In Approach 2, we use the limit $u$ to stand in for the actual data point but subtract the deductible from it to make $u-d$ the claim payment.

For any individual data point in the claim data (any data point above the deductible and below the limit), the likelihood has already been described in the preceding section (in one of two approaches). We now close with two more examples demonstrating combining truncation and censoring.

Example 6
An insurance coverage has a deductible of 5 and a maximum covered loss of 25. The following claims are observed:

12, 8, 14, 17, 13, 25*, 25*

The first 5 data points are individual data, the same data set found in Example 3. The last two claims with asterisk are claims that exceed 25 and are recorded as 25. Just like Example 3, we fit the Pareto distribution with parameters $\alpha$ and $\theta=20$ to these data in order to estimate the claim cost without a deductible.

For the 2 data points 25, the following is the likelihood:

$\displaystyle \frac{1-F(25)}{1-F(5)}=\frac{\biggl(\frac{20}{45} \biggr)^\alpha}{\biggl(\frac{20}{25} \biggr)^\alpha}= \frac{25^\alpha}{45^\alpha}$

The individual data points are the same as in Example 3. We only need to multiply the above likelihood (two times) to the $L(\alpha)$ in Example 3.

$\displaystyle L(\alpha)=\frac{\alpha^5 \ 25^{5 \alpha}}{\prod \limits_{i=1}^5 (x_i+20)^{\alpha+1}} \ \frac{25^\alpha}{45^\alpha} \ \frac{25^\alpha}{45^\alpha}=\frac{\alpha^5 \ 25^{7 \alpha}}{37196544^{\alpha+1} \ 45^{2 \alpha}}$

The usual steps produce the maximum likelihood estimate for $\alpha$.

$\displaystyle l(\alpha)=\ln L(\theta)=5 \ln(\alpha)+7 \alpha \ln(25)-(\alpha+1) \ \ln (37196544)-2 \alpha \ln(45)$

$\displaystyle \frac{d \ l(\alpha)}{d \ \theta}=\frac{5}{\alpha}+7 \ln(25)-\ln(37196544)-2 \ln(45)=0$

$\displaystyle \hat{\alpha}=\frac{5}{\ln(37196544)+2 \ln(45)-7 \ln(25)}=1.9897$

The fitted Pareto distribution with parameters $\hat{\alpha}=1.9897$ and $\theta=20$ is a distribution to the claim cost without a deductible.

Example 7
Use the same data set in Example 6 but use the shifting approach (the second approach described in the preceding section. The fitted Pareto distribution will be a model for claim payments for the insurance coverage with a deductible of 5.

For the two data points of 25, the likelihood is $1-F(25-5)=(20/40)^\alpha$. The likelihood function is obtained by multiply this likelihood (two times) with the likelihood of the individual data points.

$\displaystyle L(\alpha)=\frac{\alpha^5 \ 20^{5 \alpha}}{16136064^{\alpha+1}} \ \biggl(\frac{20}{40}\biggr)^\alpha \ \biggl(\frac{20}{40}\biggr)^\alpha=\frac{\alpha^5 \ 20^{7 \alpha}}{16136064^{\alpha+1} \ 40^{2 \alpha}}$

The usual steps produce the maximum likelihood estimate for $\alpha$.

$\displaystyle l(\alpha)=\ln L(\theta)=5 \ln(\alpha)+7 \alpha \ln(20)-(\alpha+1) \ \ln (16136064)-2 \alpha \ln(40)$

$\displaystyle \frac{d \ l(\alpha)}{d \ \theta}=\frac{5}{\alpha}+7 \ln(20)-\ln(16136064)-2 \ln(40)=0$

$\displaystyle \hat{\alpha}=\frac{5}{\ln(16136064)+2 \ln(40)-7 \ln(20)}=1.6643$

The fitted Pareto distribution with parameters $\hat{\alpha}=1.6643$ and $\theta=20$ is a distribution to the claim payment after a deductible of 5 is met.

actuarial practice problems

Dan Ma actuarial

Daniel Ma actuarial

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

$\copyright$ 2018 – Dan Ma

## 4 thoughts on “More on calculating maximum likelihood estimators”

1. […] The practice problems are to reinforce the concepts discussed in two posts – this one and this one. The first post shows how to obtain maximum likelihood estimates given complete data (individual […]

2. […] basic practice problems to reinforce the concepts discussed in two posts – this one and this one. The first post shows how to obtain maximum likelihood estimates given complete data (individual […]

3. […] posts focus on maximum likelihood estimation for continuous distributions (this post and this post). In this post we shift the attention to parameter estimation for discrete distributions, in […]

4. […] posts on parameter estimation focus on continuous distributions – this one and this one. Two practice problem sets, Practice Problem Set 2 and Practice Problem Set 3, are to reinforce […]