Practice Problem Set 3 – maximum likelihood estimation

This practice problem set has more exercises on maximum likelihood estimation, the continuation of Practice Problem Set 2.

This problem set and the previous one present basic practice problems to reinforce the concepts discussed in two posts – this one and this one. The first post shows how to obtain maximum likelihood estimates given complete data (individual data). The second post focuses on maximum likelihood estimation for other data scenarios (grouped data, censored data and truncated data).

.

 Practice Problem 3-A The following information is given about a sample of 5 observations. The observations are drawn from a Weibull distribution with parameters $\tau=2.5$ and $\theta$. Three of the observations are 13, 25 and 36. The only thing known about the remaining two observations is that they exceed 40. Determine the following: The maximum likelihood estimate of the parameter $\theta$. The median of the fitted distribution.

.

 Practice Problem 3-B A random sample of 10 losses are given below: 4, 60, 274, 95, 56, 121, 26, 228, 49, 56 The distribution that models the losses is known to be an exponential distribution with mean $\theta$. Determine the maximum likelihood estimate of the parameter $\theta$. A policy that covers these losses has a deductible of 50 and a maximum covered loss of 500 (the maximum payment per loss is 450). Determine the expected amount paid per loss under this policy.

.

 Practice Problem 3-C In a study of patients with cardiovascular disease, 5 patients are observed for a period 5 years. Three of the patients die during the study with their times of death at 1, 1, 3. The remaining two patients survive to the end of the study. The time until death is modeled by a distribution with the following cumulative distribution function: $\displaystyle F(x)=1-e^{-x^2/ \theta^2} \ \ \ \ \ \ x>0$ Use the method of maximum likelihood estimation to estimate the parameter $\theta$. Determine the probability of observing a patient surviving to the end of the study period.

.

 Practice Problem 3-D An insurance coverage has a deductible of 50. A sample of 7 losses is given: 65, 100, 150, 200, 350, 505 and 600. No information is known about losses below 50. The losses are known to follow an exponential distribution with mean $\theta$. Estimate the parameter $\theta$ using maximum likelihood estimation. Determine the expected amount paid per loss under this insurance coverage. Determine the expected amount paid per payment under this insurance coverage.

.

 Practice Problem 3-E An insurance coverage has a deductible of 50 and a maximum covered loss of 500. A sample of 7 losses is given: 65, 100, 150, 200, 350, 500 and 500. No information is known about losses below 50. The two data points of 500 are the result of censoring at 500. The losses are known to follow an exponential distribution with mean $\theta$. Estimate the parameter $\theta$ using maximum likelihood estimation. Determine the expected amount paid per loss under this insurance coverage. Determine the expected amount paid per payment under this insurance coverage.

.

 Practice Problem 3-F An insurance coverage has a deductible of 50 and a maximum covered loss of 500. A sample of 7 payments is given: 15, 50, 100, 150, 300, 450 and 450. The two data points of 450 are the result of censoring at 500 and then subtracting the deductible. The payments are known to follow an exponential distribution with mean $\theta$. Estimate the parameter $\theta$ using maximum likelihood estimation. Determine the expected amount paid per payment under this insurance coverage. Determine the expected amount paid per loss under this insurance coverage.

.

 Practice Problem 3-G An insurance coverage has a deductible of 50 and a maximum covered loss of 500. A sample of 7 losses is given: 55, 60, 100, 150, 250, 500 and 500. No information is known about losses below 50. The two data points of 500 are the result of censoring at 500. The losses (including the losses below the deductible and the losses exceeding the limit) are known to follow a Pareto distribution with parameters $\alpha$ and $\theta=150$. Estimate the parameter $\alpha$ using maximum likelihood estimation. Determine the expected amount paid per loss under this insurance coverage. Determine the expected amount paid per payment under this insurance coverage.

.

 Practice Problem 3-H An insurance coverage has a deductible of 5. The following claims are observed: 7, 10, 12, 16, 22 The above sample is the result of a truncation below at 5. A Weibull distribution with parameters $\tau=2$ and $\theta$ is fitted to the loss distribution (including losses below the deductible). Determine the maximum likelihood estimate of $\theta$. Determine the fitted median for losses. Determine the fitted median for submitted claims.

.

 Practice Problem 3-I Two groups of insureds are pooled for maximum likelihood estimation. Losses for Group 1 has a Pareto distribution with parameters $\alpha$ and $\theta=500$. Losses for Group 2 has a Pareto distribution with parameters $\alpha$ and $\theta=1000$. The following losses have been observed: Group 1: 500, 585, 900 Group 2: 875, 980, 1500 Determine the maximum likelihood estimate for the parameter $\alpha$.

.

 Practice Problem 3-J Suppose that the lifetimes of a certain type of washing machines have a Weibull distribution with parameters $\tau=3$ and $\theta$. Seven such machines are tested during a 5-year period. Two of the machines fail before the end of the testing period. Their times at failure are 2, 3. The other 5 machines are in working condition at the end of the testing period. Determine the maximum likelihood estimate of the parameter $\theta$. Using the fitted distribution, determine the median lifetime of such washing machines.

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

3-A
• $\hat{\theta}=40.7233$
• Median of fitted distribution: 35.17
3-B
• $\hat{\theta}=96.9$
• $\displaystyle E[X \wedge 500]-E[X \wedge 50]=57.28378$ where $\displaystyle E[X \wedge 500]=\hat{\theta} (1-e^{-500/ \hat{\theta}})$ and $\displaystyle E[X \wedge 50]=\hat{\theta} (1-e^{-50/ \hat{\theta}})$
3-C
• $\hat{\theta}=4.50925$
• $\displaystyle P[X > 5]=0.292436$
3-D
• $\hat{\theta}=231.4285714$
• $\displaystyle E[X]-E[X \wedge 50]=186.46096$ where $\displaystyle E[X]=\hat{\theta}$ and $\displaystyle E[X \wedge 50]=\hat{\theta} (1-e^{-50/ \hat{\theta}})$
• $\hat{\theta}=231.4285714$
3-E
• $\hat{\theta}=303$
• $\displaystyle E[X \wedge 500]-E[X \wedge 50]=198.7260$ where $\displaystyle E[X \wedge x]=\hat{\theta} (1-e^{-x/ \hat{\theta}})$
• $\displaystyle \frac{E[X \wedge 500]-E[X \wedge 50]}{1-F(50)}=\hat{\theta}=234.38$
3-F
Same answers as in 3-E because exponential distribution is memoryless.
3-G
• $\hat{\alpha}=1.332427776$
• $\displaystyle E[X \wedge 500]-E[X \wedge 50]=132.9344$ where $\displaystyle E[X \wedge x]=\frac{150}{\hat{\alpha}-1} \biggl[1-\biggl(\frac{150}{x+150} \biggr)^{\hat{\alpha}-1} \biggr]$
• $\displaystyle \frac{E[X \wedge 500]-E[X \wedge 50]}{1-F(50)}=195.0334$
3-H
• $\displaystyle \hat{\theta}=13.4759$
• Fitted median for losses = 11.2194
• Fitted median for submitted claims = 12.2831
3-I
• $\hat{\alpha}=1.2697$
3-J
• $\hat{\alpha}=5.8964$
• median: 5.2183

actuarial practice problems

Dan Ma actuarial

Daniel Ma actuarial

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

$\copyright$ 2018 – Dan Ma

Practice Problem Set 2 – maximum likelihood estimation

The post presents basic practice problems for the topic of parametric model selection, focusing on maximum likelihood estimation. The practice problems are to reinforce the concepts discussed in two posts – this one and this one. The first post shows how to obtain maximum likelihood estimates given complete data (individual data). The second post focuses on maximum likelihood estimation for other data scenarios (grouped data, censored data and truncated data).

More maximum likelihood practice problems are found in the next problem set.

.

 Practice Problem 2-A The following losses are recorded for a group of insureds: 19, 45, 12, 31, 32, 4, 1, 19, 30, 15 An exponential distribution with mean $\theta$ is fitted to the loss data. Determine the maximum likelihood estimate of the parameter $\theta$. Suppose an insurance coverage with a deductible of 5 is to cover these losses. Determine the expected insurance payment per loss. Suppose an insurance coverage with a deductible of 5 is to cover these losses. Determine the expected insurance payment per payment.

.

 Practice Problem 2-B A sample of size 5 produced the values 332, 42, 94, 6, 9533. You fit a lognormal distribution with parameters $\mu$ and $\sigma$ using maximum likelihood estimation. Determine the estimates of the lognormal parameters $\mu$ and $\sigma$. Use these estimates to determine the probability of observing a value exceeding 500.

.

 Practice Problem 2-C The claim size follows a Weibull distribution with parameters $\tau=2$ and $\theta$. The following claim experience is recorded: 15, 5, 9, 10, 11, 20. Use the method of maximum likelihood estimation to estimate the parameter $\theta$. Determine the probability of observing a claim in excess of 15.

.

 Practice Problem 2-D Five claims have been observed: 11, 13, 9, 8, 10. The claim distribution is known to be a gamma distribution with shape parameter $\alpha=2$ and scale parameter $\theta$. Estimate the parameter $\theta$ using maximum likelihood estimation. Determine the probability of observing a claim in excess of 10.2.

.

 Practice Problem 2-E A claim size distribution is a Pareto distribution with parameters $\alpha$ and $\theta=100$. A sample of 10 claims is observed: 20, 61, 110, 8, 23, 3, 27, 7, 35, 9. These observed claims are before the application of a deductible. Use maximum likelihood estimation to estimate the parameters $\alpha$. According to the fitted Pareto distribution, determine the mean insurance payment per loss if the insurance coverage has no deductible. According to the fitted Pareto distribution, determine the mean insurance payment per loss if the insurance coverage has a deductible of 20.

.

 Practice Problem 2-F The distribution of claim size is an inverse exponential distribution with parameter $\theta$. Eight claims are observed: 55, 8, 23, 22, 59, 64, 106, 25. Estimate the parameters $\theta$ using maximum likelihood. Using the fitted distribution, determine the 85th percentile of the claim size.

.

Practice Problem 2-G

A total of 40 claims have been observed for a loss distribution that is known to be an exponential distribution with mean $\theta$. The data is summarized in the table below.

Interval Frequency
(0, 40) 22
(40, 80) 7
(80, 120) 4
(120, $\infty$) 7
Total 40

Determine the maximum likelihood estimate of $\theta$.

.

 Practice Problem 2-H An insurance coverage with a policy limit of 30 is purchased to cover a random loss. If the loss exceeds 30, the coverage will pay 30. Otherwise, the coverage pays for the loss in full. The reported losses are: 19, 30*, 12, 30*, 30*, 4, 1, 19, 30, 15. The loss distribution is known to be an exponential distribution with mean $\theta$. Determine the maximum likelihood estimate of $\theta$.

.

 Practice Problem 2-I An insurance coverage has a deductible of 20. The following losses are part of a data set that has been truncated at 20: 25, 61, 110, 23, 27, 35. The truncated claim data without modification is fitted to a Pareto distribution with parameter $\alpha$ and $\theta=100$. Determine the maximum likelihood estimate of $\alpha$. Using the fitted distribution to determine the mean insurance payment per loss without a deductible. Using the fitted distribution to determine the mean insurance payment per loss with respect to the deductible of 20. Using the fitted distribution to determine the mean insurance payment over all losses exceeding the deductible of 20.

.

 Practice Problem 2-J An insurance coverage has a deductible of 20. The following losses are part of a data set that has been truncated at 20: 25, 61, 110, 23, 27, 35. The truncated data is shifted by 20 and is fitted to a Pareto distribution with parameter $\alpha$ and $\theta=100$. Determine the maximum likelihood estimate of $\alpha$. Using the fitted distribution to determine the mean insurance payment over all losses exceeding the deductible of 20. Compare this with the last part of Problem 2-I.

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

2-A
• $\hat{\theta}=20.8$
• $\displaystyle E[X]-E[X \wedge 5]=\hat{\theta} \ e^{-5/\hat{\theta}}=16.3556$
• $\displaystyle \frac{E[X]-E[X \wedge 5]}{1-F(5)}=\hat{\theta} =20.8$
2-B
• $\hat{\mu}=5.008$ and $\hat{\sigma}=2.4523$
• $\displaystyle P[X > 500]=0.3121$
2-C
• $\hat{\theta}=12.5963$
• $\displaystyle P[X > 15]=0.24218$
2-D
• $\hat{\theta}=5.1$
• $\displaystyle P[X > 10.2]=3 e^{-2}=0.4060$
2-E
• $\hat{\alpha}=4.1546$
• $\displaystyle E[X]-E[X \wedge 20]=31.70-13.8649=17.8351$
2-F
• $\hat{\theta}=25.46775$
• 85th percdntile = 156.7064
2-G
• $\hat{\theta}=61.48288$
2-H
• $\displaystyle \hat{\theta}=27.1429$
2-I
• $\hat{\alpha}=5.48686$
• $E[X]=22.287296$
• $\displaystyle E[X]-E[X \wedge 20]=22.287296-12.45212=9.835176$
• $\displaystyle \frac{E[X]-E[X \wedge 20]}{1-F(20)}=26.74475$
2-J
• $\hat{\alpha}=4.7199$
• $E[X]=26.8824$

actuarial practice problems

Dan Ma actuarial

Daniel Ma actuarial

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

$\copyright$ 2018 – Dan Ma

More on calculating maximum likelihood estimators

This post continues the preceding post on maximum likelhood estimation. The preceding post focuses on calculating MLE when there is complete data (or individual data). This post focuses on calculating MLE for the other data scenarios such as grouped data, censored and truncated data.

Individual data refers to a data set where the exact value of every data point in the data set is completely known. Grouped data refers to a summarized data set that consists of frequency data, i.e. the counts that fall into a set of intervals.

Censored data refers to a data set where information on some of the data points is only partially known. For example, a data point exceeding a limit $u$ is recorded as $u$ (the data point is right censored or censored from above). A data point lower than a limit $l$ is recorded as $l$ (the data point is left censored or censored from below). A handy example of a censored data set is a reliability study where the times at failure for machines are recorded during a 5-year period. In this study, the time at failure for any machine that is still operating at the end of the study is recorded as 5 even though the machine may continue to work for a number of more years.

Truncated data refers to a data set where data values in some intervals are not observed and are thus ignored. For example, in an insurance coverage with a deductible $d$, when considering payment data, any loss that is below $d$ is not included into the calculation. This is an example of a data set that is truncated below. Any data set such that values above a certain threshold are not observed or collected is truncated above.

For censored data and truncated data, we focus on claim data with a policy limit (censored from above) or on claim data with a deductible (truncated from below) or on claim data with both a policy limit and a deductible.

Several examples (Example 3, Example 4, Example 6 and Example 7) concern the Pareto distribution. The Pareto distribution used here is also called Pareto type II distribution. For useful facts about Pareto type II, see this post in a companion blog.

Grouped Data

In this scenario, the data points are not available individually. Instead, we know the counts of the data points that fall into a set of intervals. Unlike the case for complete data, the likelihood is not the value of the density function. It is the difference of two values of the cumulative distribution function (CDF) to account for the probability of a data point falling into an interval. The rest of the procedure is the same as before – finding the likelihood function, and then taking log to get the log-likelihood function. Then take the derivative or partial derivatives and set the derivative or partial derivative equal to zero. The maximum likelihood estimates are then the solutions of the resulting equations. This is illustrated in Example 1.

Example 1
The following claim data has been collected from a large of insureds.

Interval Frequency
$(0,5)$ 10
$(5,10)$ 2
$(10,15)$ 6
$(15,20)$ 1
$(20,\infty)$ 1
Total 20

The exponential distribution with mean $\theta$ is fitted to the grouped data. Calculate the maximum likelihood estimate of the parameter $\theta$.

Note that the density is $f(x)=\frac{1}{\theta} \ e^{-x/\theta}$. The CDF is $F(x)=1-e^{-x/\theta}$.

Any observation that falls into the interval (0, 5) has likelihood $1-e^{-5/\theta}$, which is $F(5)$, accounting for the probability of an observation being in that interval. The likelihood for the interval (0, 5) is then $(1-e^{-5/\theta})^{10}$. Any observation that falls into the interval (5, 10) has likelihood $e^{-5/\theta}-e^{-10/\theta}$, which is $F(10)-F(5)$. The likelihood for the interval is $(e^{-5/\theta}-e^{-10/\theta})^2$. Continue on with the same process. The likelihood function $L(\theta)$ is the product of the likelihood of the intervals.

$\displaystyle L(\theta)=\biggl[1-e^{-5/\theta} \biggr]^{10} \ \biggl[e^{-5/\theta}-e^{-10/\theta} \biggr]^2 \ \biggl[e^{-10/\theta}-e^{-15/\theta} \biggr]^6 \ \biggl[e^{-15/\theta}-e^{-20/\theta} \biggr] \ e^{-20/\theta}$

The likelihood function can be further simplified before obtaining the log-likelihood function.

$\displaystyle L(\theta)=e^{-105/\theta} \ \biggl[1-e^{-5/\theta} \biggr]^{19}$

$\displaystyle l(\theta)=\ln L(\theta)=-\frac{105}{\theta}+19 \ \ln (1-e^{-5/\theta})$

Solving the equation obtained by setting the derivative of the log-likelihood function equal to zero gives the maximum likelihood estimate.

$\displaystyle \frac{d \ l(\theta)}{d \ \theta}=\frac{105}{\theta^2}-\frac{19}{1-e^{-5/\theta}} \ \frac{5 e^{-5/\theta}}{\theta^2}=0$

$\displaystyle e^{-5/\theta}=\frac{21}{124}$

$\displaystyle -5/\theta=\ln \biggl[\frac{21}{124}\biggr]$

$\displaystyle \hat{\theta}=\frac{-5}{\ln \biggl[\frac{21}{124}\biggr]}=7.7597$

The most obvious difference with the case of individual data MLE is that the likelihood function is made up of product of differences of values of the CDF. Otherwise, the same process applies. For some distributions, the maximum likelihood estimate is hard to do for grouped data because of the CDF being hard to manipulate mathematically. It is also the case the method of moments is also difficult to carry out for grouped data.

Censored Data

An example of censored data would be an insurance coverage with a policy limit. Any loss exceeding the limit $u$ is considered the value of $u$. The likelihood of this data point is then $1-F(u)$, the probability of a data point exceeding $u$. The rest of the MLE procedure works the same as before. To contrast, if the censored data point is below a threshold $m$, then the likelihood of the data point is $F(m)$.

Example 2
Observed claims are: 5, 6, 9, 15, 23. In addition, there are two claims exceeding the policy limit of 25.

An exponential distribution with mean $\theta$ is fitted to the claim data. Calculate the maximum likelihood estimate of $\theta$.

For the individual data points, the likelihood is $f(x)=\frac{1}{\theta} \ e^{-x/\theta}$. For the censored data points, the likelihood is $1-F(x)=e^{-x/\theta}$, the probability of exceeding the limit. The following is the likelihood function.

$\displaystyle L(\theta)=\frac{1}{\theta^5} \ e^{-\frac{5+6+9+15+23}{\theta}} \ e^{-\frac{25+25}{\theta}}=\frac{1}{\theta^5} \ e^{-\frac{108}{\theta}}$

The following derivation gives the maximum likelihood estimate $\hat{\theta}$.

$\displaystyle l(\theta)= \ln L(\theta)=-5 \ln(\theta)-\frac{108}{\theta}$

$\displaystyle \frac{d \ l(\theta)}{d \ \theta}=-\frac{5}{\theta}+\frac{108}{\theta^2}=0$

$\displaystyle -5+\frac{108}{\theta}=0$

$\displaystyle \hat{\theta}=\frac{108}{5}=21.6$

Truncated Data

We center the discussion on the scenario of a coverage with a deductible. Truncation is due to the fact that payment on a claim is conditional on the loss exceeding the deductible. Suppose that the insurance coverage has a deductible $d$. Suppose that claims $x_1,x_2,\cdots,x_n$ have been observed (individual data). We assume that losses below $d$ are not submitted. So all observations $x_i$ are above the deductible $d$. There are two ways to applying maximum likelihood estimation to such truncated claim data.

1. Work with the claim data $x_1,x_2,\cdots,x_n$ as is without any modification. Then the resulting maximum likelihood fitted distribution would be for claim data before applying any deductible. The mean of this fitted distribution would be the mean claim cost without a deductible. Of course, we can then estimate from this fitted distribution the claim cost of imposing a deductible.
2. This approach is called shifting since the approach is to subtract the deductible $d$ from each observed claims $x_i$. The resulting maximum likelihood fitted distribution would be for the claim payment reflecting a deductible of $d$. The mean of this fitted distribution would be the mean claim cost per payment (over all losses exceeding the deductible of $d$). For this reason, the original mean claim cost (without a deductible) cannt be recovered from this fitted distribution. However, imposing a deductible of $d$ to this fitted distribution would be equivalent to imposing a deductible of $2 d$ to the original loss distribution.

Essentially in approach 1, we fit a distribution to the truncated claim data (but unmodified by the deductible). The resulting maximum likelihood fitted distribution is for the original loss distribution before any deductible being applied. In the second approach we fit a distribution to the claim payment data (after shifting a deductible from the data). The resulting maximum likelihood fitted distribution is for the claim payment distribution reflecting the deductible used in the shifting. Which approach to use depends on whether we want to fit a distribution to the truncated claim data including the deductible or fit a distribution to the claim payment data (with the deductible not included).

To illustrate how these two approaches work, we fit the Pareto distribution to a set of claim data in both ways (Example 3 and Example 4). We round out the discussion on truncated data with an example using exponential distribution (Example 5).

Example 3
An insurance coverage has a deductible of 5. The following claims are observed:

12, 8, 14, 17, 13

A Pareto distribution with parameters $\alpha$ and $\theta=20$ is fitted to these data. Determine the maximum likelihood estimate of $\alpha$. We wish that the fitted Pareto distribution is an estimated model for claim cost before the deductible. So we do not subtract the deductible of 5 from the data points (i.e. approach 1). We discuss several ways of using this fitted Pareto distribution to estimate claim costs.

The density function and the CDF of the Pareto distribution are:

$\displaystyle f(x)=\frac{\alpha \ 20^\alpha}{(x+20)^{\alpha+1}} \ \ \ \ \ \ x>0$

$\displaystyle F(x)=1-\biggl(\frac{20}{x+20} \biggr)^\alpha \ \ \ \ \ \ \ x>0$

Because we assume that we do not have any information about claims below 5, observing a claim $x$ is conditional on the fact that the loss underlying that claim exceeds 5. Thus the likelihood of a claim $x$ is a conditional probability. The likelihood of a claim amount $x$ is $\frac{f(x)}{1-F(5)}$. Plugging in the Pareto information, the following is the likelihood of a claim $x$.

$\displaystyle \frac{f(x)}{1-F(5)}=\frac{\frac{\alpha \ 20^\alpha}{(x+20)^{\alpha+1}}}{\biggl(\frac{20}{5+20} \biggr)^\alpha}=\frac{\alpha \ 25^\alpha}{(x+20)^{\alpha+1}}$

There are 5 data points. The likelihood function is then the product of these 5 likelihood values.

$\displaystyle L(\alpha)=\frac{\alpha^5 \ 25^{5 \alpha}}{\prod \limits_{i=1}^5 (x_i+20)^{\alpha+1}}=\frac{\alpha^5 \ 25^{5 \alpha}}{37196544^{\alpha+1}}$

The usual steps produce the maximum likelihood estimate for $\alpha$.

$\displaystyle l(\alpha)=\ln L(\theta)=5 \ln(\alpha)+5 \alpha \ln(25)-(\alpha+1) \ \ln (37196544)$

$\displaystyle \frac{d \ l(\alpha)}{d \ \theta}=\frac{5}{\alpha}+5 \ln(25)-\ln(37196544)=0$

$\displaystyle \hat{\alpha}=\frac{5}{\ln(37196544)-5 \ln(25)}=3.7387$

The Pareto distribution with $\hat{\alpha}=3.7387$ and $\theta=20$ is the fitted distribution for claim data in this insurance coverage. The deductible of 5 is not factored into this Pareto distribution. So this is the fitted distribution for the claim cost before applying the deductible. Thus, the mean claim cost without any deductible is $E[X]=\frac{20}{\hat{\alpha}-1}=7.3027$. Solving the equation $F(x)=0.5$ gives the median. Thus, the median claim cost without any deductible is 4.0739. When imposing a deductible of 5, here’s the estimated claim costs:

Limited Expected value………..$\displaystyle E[X \wedge 5]=\frac{\theta}{\hat{\alpha}-1} \ \biggl[1-\biggl(\frac{\theta}{5+\theta} \biggr)^{\hat{\alpha}-1} \biggr]=3.3392$

Claim Cost Per Loss……………….$\displaystyle E[X]-E[X \wedge 5]=3.9635$

Claim Cost Per Payment………..$\displaystyle \frac{E[X]-E[X \wedge 5]}{1-F(5)}=9.1284$

When imposing a deductible of 10, here’s the estimated claim costs based on the fitted Pareto distribution.

Limited Expected value………..$\displaystyle E[X \wedge 10]=\frac{\theta}{\hat{\alpha}-1} \ \biggl[1-\biggl(\frac{\theta}{10+\theta} \biggr)^{\hat{\alpha}-1} \biggr]=4.8971$

Claim Cost Per Loss……………….$\displaystyle E[X]-E[X \wedge 10]=2.4056$

Claim Cost Per Payment………..$\displaystyle \frac{E[X]-E[X \wedge 10]}{1-F(10)}=10.9541$

The claim cost without a deductible is $E[X]=7.3027$ over all losses. When imposing a deductible of 5, the claim cost per loss is reduced to 3.9635. When imposing a deductible of 10, the claim cost per loss is further reduced to 2.4056. Note that the claim costs per payment are conditional means (calculated over all losses exceeding the deductible). So they are higher than the claim cost per loss.

Example 4
We now show how to estimate MLE using the second approach for truncated data. We continue to use the claim data from Example 3. We still wish to fit a Pareto distribution with parameters $\alpha$ and $\theta=20$ to the same data. This time we subtract the deductible of 5 from the claims. The resulting fitted Pareto distribution is for the distribution of claim payments based on the deductible of 5.

After subtracting the deductible of 5, the data are: 7, 3, 9, 12, 8. The maximum likelihood estimation is based on this shifted data. This data set is a complete data set. We can use the formula shown in the preceding post.

$\displaystyle \hat{\alpha}=\frac{n}{\ln\biggl(\prod \limits_{i=1}^n (\theta+x_i) \biggr)-n \ln(\theta)}=\frac{5}{\ln(16136064)-5 \ln(20)}=3.0904$

The Pareto distribution with parameters $\hat{\alpha}=3.0904$ and $\theta=20$ is the fitted distribution for claim payments. The deductible of 5 is baked into this Pareto distribution. The mean of this distribution is $E[X]=\frac{20}{\hat{\alpha}-1}=9.5675$. This mean is the mean claim payment with a deductible of 5 baked in. So we cannot recover the claim cost without deductible from this fitted distribution. This fitted Pareto distribution is modified from the original Pareto distribution describing the losses without the deductible. If we impost a deductible of 5 to this modified distribution, the result would be equivalent to imposing a deductible of 10 to the original distribution.

Limited Expected value………..$\displaystyle E[X \wedge 5]=\frac{\theta}{\hat{\alpha}-1} \ \biggl[1-\biggl(\frac{\theta}{5+\theta} \biggr)^{\hat{\alpha}-1} \biggr]=3.5666$

Claim Cost Per Payment………..$\displaystyle \frac{E[X]-E[X \wedge 5]}{1-F(5)}=11.9594$

The mean claim cost of 11.9594 is equivalent to the mean claim cost when imposing a deductible of 10 to the claim data before the deductible. Note that 11.9594 is in line with the equivalent number of 10.9541 in Example 3. The two answers may be equivalent but they usually do not equate exactly.

Example 5
This example deals with the same coverage and same claim data as in Example 3. This time we fit the exponential distribution with mean $\theta$ to the claim data. We apply the maximum likelihood estimation using the first approach (without subtracting the deductible from the claim data). Observing a claim $x$ is conditional on it exceeding the deductible 5. The likelihood of a claim $x$ is

$\displaystyle \frac{\frac{1}{\theta} e^{-x/\theta}}{e^{-5/\theta}}=\frac{1}{\theta} \ e^{-(x-5)/\theta}$

Thus the likelihood function is:

\displaystyle \begin{aligned} L(\theta)&=\frac{1}{\theta^5} \ e^{-(7-5)/\theta} \ e^{-(10-5)/\theta} \ e^{-(12-5)/\theta} \ e^{-(16-5)/\theta} \ e^{-(22-5)/\theta} \\&=\frac{1}{\theta^5} \ e^{-42/\theta} \end{aligned}

The maximum likelihood estimate is derived as follows:

$\displaystyle l(\theta)=\ln [L(\theta)]=-5 \ \ln(\theta)-\frac{42}{\theta}$

$\displaystyle \frac{d \ l(\theta)}{d \ \theta}=-\frac{5}{\theta}+\frac{42}{\theta^2}=0$

$\displaystyle \hat{\theta}=\frac{42}{5}=8.4$

On careful examination, note that if we use the shifted approach (the second approach) on the exponential distribution, we get the same maximum likelihood estimate $\hat{\theta}=8.4$. Because the exponential distribution is memoryless, either approach for truncated data leads to the same likelihood function $L(\theta)$. The exponential distribution is the only case where the maximum likelihood fitted distribution is both for claim data without a deductible and for claim payment with a deductible. Any other distribution would lead to two different fitted distributions when using both approaches for truncated claim data (just like the Pareto distribution in Example 3 and Example 4).

One comment about the two approaches. If there are two approaches in handling truncated claim data, how do we know which approach to use in an exam problem? The answer depends on the goal of the problem. If the goal is to generate a fitted distribution to answer questions about the loss distribution or the claim data before applying any deductible, the first approach is used. Possible wordings: applying MLE on the original claim data, the fitted distribution is the loss distribution, or the loss distribution is fitted to a distribution.

If the goal is to generate a fitted distribution to answer questions about claim payment reflecting a certain deductible, then use approach 2 by shifting a number from the claim data. Possible wordings: shifting the data by some amount, a certain distribution is fitted to the claim payment data, or claim payment data is fitted to this certain distribution. The idea is that we should look for instruction in the problem.

Censoring and Truncation Combined

We can also apply maximum likelihood estimation on claim data arising from insurance coverage with both a deductible and a policy limit. The addition of the policy limit poses no new challenge. The deductible is already taken care of by the two approaches discussed in the preceding section. The only new piece of information we need is on how to handle the censored limit. Any data point that is above the maximum covered loss $u$ is represented as $u$. Its likelihood is one of the following depending on the approach.

Approach 1………..$\displaystyle \frac{1-F(u)}{1-F(d)}$

Approach 2………..$\displaystyle 1-F(u-d)$

In Approach 1, the denominator is $1-F(d)$ indicating that the likelihood is a conditional probability. The numerator is $1-F(u)$ indicating that the original data point is not known but is above the limit $u$. In Approach 2, we use the limit $u$ to stand in for the actual data point but subtract the deductible from it to make $u-d$ the claim payment.

For any individual data point in the claim data (any data point above the deductible and below the limit), the likelihood has already been described in the preceding section (in one of two approaches). We now close with two more examples demonstrating combining truncation and censoring.

Example 6
An insurance coverage has a deductible of 5 and a maximum covered loss of 25. The following claims are observed:

12, 8, 14, 17, 13, 25*, 25*

The first 5 data points are individual data, the same data set found in Example 3. The last two claims with asterisk are claims that exceed 25 and are recorded as 25. Just like Example 3, we fit the Pareto distribution with parameters $\alpha$ and $\theta=20$ to these data in order to estimate the claim cost without a deductible.

For the 2 data points 25, the following is the likelihood:

$\displaystyle \frac{1-F(25)}{1-F(5)}=\frac{\biggl(\frac{20}{45} \biggr)^\alpha}{\biggl(\frac{20}{25} \biggr)^\alpha}= \frac{25^\alpha}{45^\alpha}$

The individual data points are the same as in Example 3. We only need to multiply the above likelihood (two times) to the $L(\alpha)$ in Example 3.

$\displaystyle L(\alpha)=\frac{\alpha^5 \ 25^{5 \alpha}}{\prod \limits_{i=1}^5 (x_i+20)^{\alpha+1}} \ \frac{25^\alpha}{45^\alpha} \ \frac{25^\alpha}{45^\alpha}=\frac{\alpha^5 \ 25^{7 \alpha}}{37196544^{\alpha+1} \ 45^{2 \alpha}}$

The usual steps produce the maximum likelihood estimate for $\alpha$.

$\displaystyle l(\alpha)=\ln L(\theta)=5 \ln(\alpha)+7 \alpha \ln(25)-(\alpha+1) \ \ln (37196544)-2 \alpha \ln(45)$

$\displaystyle \frac{d \ l(\alpha)}{d \ \theta}=\frac{5}{\alpha}+7 \ln(25)-\ln(37196544)-2 \ln(45)=0$

$\displaystyle \hat{\alpha}=\frac{5}{\ln(37196544)+2 \ln(45)-7 \ln(25)}=1.9897$

The fitted Pareto distribution with parameters $\hat{\alpha}=1.9897$ and $\theta=20$ is a distribution to the claim cost without a deductible.

Example 7
Use the same data set in Example 6 but use the shifting approach (the second approach described in the preceding section. The fitted Pareto distribution will be a model for claim payments for the insurance coverage with a deductible of 5.

For the two data points of 25, the likelihood is $1-F(25-5)=(20/40)^\alpha$. The likelihood function is obtained by multiply this likelihood (two times) with the likelihood of the individual data points.

$\displaystyle L(\alpha)=\frac{\alpha^5 \ 20^{5 \alpha}}{16136064^{\alpha+1}} \ \biggl(\frac{20}{40}\biggr)^\alpha \ \biggl(\frac{20}{40}\biggr)^\alpha=\frac{\alpha^5 \ 20^{7 \alpha}}{16136064^{\alpha+1} \ 40^{2 \alpha}}$

The usual steps produce the maximum likelihood estimate for $\alpha$.

$\displaystyle l(\alpha)=\ln L(\theta)=5 \ln(\alpha)+7 \alpha \ln(20)-(\alpha+1) \ \ln (16136064)-2 \alpha \ln(40)$

$\displaystyle \frac{d \ l(\alpha)}{d \ \theta}=\frac{5}{\alpha}+7 \ln(20)-\ln(16136064)-2 \ln(40)=0$

$\displaystyle \hat{\alpha}=\frac{5}{\ln(16136064)+2 \ln(40)-7 \ln(20)}=1.6643$

The fitted Pareto distribution with parameters $\hat{\alpha}=1.6643$ and $\theta=20$ is a distribution to the claim payment after a deductible of 5 is met.

actuarial practice problems

Dan Ma actuarial

Daniel Ma actuarial

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

$\copyright$ 2018 – Dan Ma

Calculating maximum likelihood estimators

If the probability model that describes a population is completely known (along with its parameters), then we can use it to obtain information about the population. However, in the real world, this is rarely the case. Instead, we have observed data. We may have the information that the observed data follows a particular distribution but its parameters are not known. In other words, the form of the distribution from which the observed data is drawn is known (perhaps it is an assumption) but the specific values of the parameters are not known. Then the option we have is to use the observed data to estimate the values of the parameters.

One way to estimate the parameters is the method of moments, which is relative easy to use (for the most parts). This is the focus of the practice problem set. In this post, we discuss the method of maximum likelihood estimation.

The method of maximum likelihood estimation is to maximize the probability or likelihood of observing the data we collected. Suppose that the form of the distribution is known and its density function is $f(x; \theta_1, \theta_2, \cdots, \theta_k)$. But the $k$ parameters are not known. The goal is to choose one particular member of the assumed parametric distribution family $f(x; \hat{\theta_1}, \hat{\theta_2}, \cdots, \hat{\theta_k})$ that gives the highest likelihood of the observed data. Let’s consider the exponential distribution as an example.

Exponential Example

Suppose it is known that size of claims from a large group of insureds has an exponential distribution with unknown mean $\theta$. The density function is $f(x)=\frac{1}{\theta} e^{-x/\theta}$ where $x>0$. We observe $n$ claims $x_1,x_2,\cdots,x_n$. The method of maximum likelihood is to choose the value of $\theta$ that has the highest likelihood of observing these observations. The likelihood of observing the data is:

\displaystyle \begin{aligned} L(\theta)&=f(x_1) \cdot f(x_2) \cdots f(x_n) \\&=\frac{1}{\theta} e^{\frac{-x_1}{\theta}} \cdot \frac{1}{\theta} e^{\frac{-x_2}{\theta}} \cdots \frac{1}{\theta} e^{-\frac{x_n}{\theta}} \\&=\frac{1}{\theta^n} \ e^{\frac{-\sum \limits_{i=1}^n x_i}{\theta}} \end{aligned}

The goal is to choose the value of $\theta$ so that the function $L(\theta)$ is as large as possible. In other words, the goal is to maximize the function $L(\theta)$, which is called the likelihood function. In many cases, it is easier to maximize the natural log of $L(\theta)$.

$\displaystyle l(\theta)=\ln[L(\theta)]=-n \ln(\theta)-\frac{\sum \limits_{i=1}^n x_i}{\theta}$

The function $l(\theta)$ is called the log-likelihood function. The $\theta$ for which $l(\theta)$ is maximum is also a value for which $L(\theta)$ is maximum. The following gives the first and second derivatives of $l(\theta)$.

$\displaystyle l'(\theta)=-\frac{n}{\theta}+\frac{\sum \limits_{i=1}^n x_i}{\theta^2}$

$\displaystyle l''(\theta)=\frac{n}{\theta^2}-2 \ \frac{\sum \limits_{i=1}^n x_i}{\theta^3}$

Setting the first derivative equal to zero and solving for $\theta$ gives

$\displaystyle \hat{\theta}=\frac{\sum \limits_{i=1}^n x_i}{n}$

Plugging $\hat{\theta}$ into the second derivative produces a negative value. Thus $\hat{\theta}$ gives the maximum log-likelihood $l(\theta)$ and thus the maximum likelihood $L(\theta)$. The value $\hat{\theta}$ is called the maximum likelihood estimate (MLE) of the parameter $\theta$. It is also called the maximum likelihood estimator of the parameter $\theta$ since $\hat{\theta}$ is also a function (as the observations change, the estimate will change). Note that $\hat{\theta}$ is the mean of the sample $x_1,x_2,\cdots,x_n$. In this instance, the maximum likelihood estimate coincides with the method of moments estimate. Though such examples are the exception, several more examples of MLE = method of moments estimates are discussed below.

MLE

As the above example suggests, the first step in maximum likelihood estimation is to come up with the likelihood function and then the log-likelihood function (by taking the natural log of the likelihood function). If there is only one parameter, take the derivative of the log-likelihood function and then set it equal to zero and solve for the parameter. If there are more than one parameters in the log-likelihood function, take partial derivative with respective to each parameter. Then set the resulting partial derivatives equal to zero and solve the resulting system of equations.

The likelihood of a data point $x$ (if its value is completely known) is simply the density function evaluated at $x$ (for a continuous distribution) or the probability function evaluated at $x$ (for a discrete distribution). For a given sample $x_1,x_2,\cdots,x_n$, the likelihood function is simply the product of the likelihoods at the individual data points $x_i$.

Another point to keep in mind. When working with likelihood function or log-likelihood function, positive constants can be omitted. This is illustrated by the example of normal distribution.

Normal Example

Observations: $x_1,x_2,\cdots,x_n$. We assume that the data are drawn from a normal distribution with parameters $\mu$ and $\sigma$. The following is the density function.

$\displaystyle f(x)=\frac{1}{\sqrt{2 \pi} \sigma} \ e^{-\frac{1}{2} \frac{(x-\mu)^2}{\sigma^2}} \ \ \ \ \ \ \ -\infty

The following is the full likelihood function.

\displaystyle \begin{aligned} L(\mu,\sigma)&=f(x_1) \cdot f(x_2) \cdots f(x_n) \\&=\frac{1}{(\sqrt{2 \pi})^n} \ \frac{1}{\sigma^n} \ e^{\frac{-\frac{1}{2} \sum \limits_{i=1}^n (x_i-\mu)^2 }{\sigma^2}} \end{aligned}

The constant $\frac{1}{(\sqrt{2 \pi})^n}$ in the last expression can be skipped. When taking the derivative of the log-likelihood function, the log of this constant will become a zero. Thus the essential likelihood function and the log-likelihood function are the following:

$\displaystyle L(\mu,\sigma)=\frac{1}{\sigma^n} \ e^{\frac{-\frac{1}{2} \sum \limits_{i=1}^n (x_i-\mu)^2 }{\sigma^2}}$

$\displaystyle l(\mu,\sigma)=\ln[L(\mu,\sigma)]=-n \ln(\sigma)-\frac{\frac{1}{2} \sum \limits_{i=1}^n (x_i-\mu)^2 }{\sigma^2}$

Now take partial derivatives of $l(\mu,\sigma)$, first with respect to $\mu$ and then with respect to $\sigma$.

$\displaystyle \frac{\partial \ l(\mu,\sigma)}{\partial \ \mu}=\frac{2 \sum \limits_{i=1}^n (x_i-\mu)}{\sigma^2}=\frac{2 \biggl[ \biggl(\sum \limits_{i=1}^n x_i \biggr) - n \mu \biggr] }{\sigma^2}=0$

$\displaystyle \frac{\partial \ l(\mu,\sigma)}{\partial \ \sigma}=-\frac{n}{\sigma}+\frac{\sum \limits_{i=1}^n (x_i-\mu)^2}{\sigma^3}=-\frac{n}{\sigma}+\frac{\biggl( \sum \limits_{i=1}^n x_i^2\biggr) - \mu^2}{\sigma^3}=0$

Solving the first equation, we obtain the solution $\hat{\mu}$. Plug that into the second equation and we produce $\hat{\sigma}^2$.

$\displaystyle \hat{\mu}=\frac{\sum \limits_{i=1}^n x_i}{n} \ \ \ \ \ \ \ \ \ \ \hat{\sigma}^2=\frac{\sum \limits_{i=1}^n x_i^2}{n}- \hat{\mu}^2$

The MLE estimate for the mean $\mu$ for the normal distribution is the sample mean and the MLE estimate for $\sigma^2$ is the sample variance.

Formulas

The MLE method does not always have a closed form calculation. For some distributions, the only way to get MLE estimates is through software package. The following list gives several distributions that have accessible calculation for MLE. The list is by no means exhaustive. The distribution names in red are the ones whose MLE estimates coincide with the method of moments estimates.

.

Exponential Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\frac{1}{\theta} \ e^{-\frac{x}{\theta}} \ \ \ \ \ \ \ \ x>0$
MLE Estimate $\displaystyle \hat{\theta}=\frac{\sum \limits_{i=1}^n x_i }{n}$

Inverse Exponential Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\frac{\theta \ e^{-\theta / x} }{x^2} \ \ \ \ \ \ \ \ x>0$
MLE Estimate $\displaystyle \hat{\theta}=\frac{n }{\sum \limits_{i=1}^n 1/x_i}$

Normal Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\frac{1}{\sqrt{2 \pi} \ \sigma} \ e^{-\frac{1}{2} \frac{(x-\mu)^2}{\sigma^2}} \ \ \ \ -\infty
MLE Estimate $\displaystyle \hat{\mu}=\frac{\sum \limits_{i=1}^n x_i}{n}$
MLE Estimate $\displaystyle \hat{\sigma}^2=\frac{\sum \limits_{i=1}^n x_i^2}{n}- \hat{\mu}^2$

Lognormal Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\frac{1}{\sqrt{2 \pi} \ \sigma \ x} \ e^{-\frac{1}{2} \frac{(\ln(x)-\mu)^2}{\sigma^2}} \ \ \ \ 0
MLE Estimate $\displaystyle \hat{\mu}=\frac{\sum \limits_{i=1}^n \ln(x_i)}{n}$
MLE Estimate $\displaystyle \hat{\sigma}^2=\frac{\sum \limits_{i=1}^n [\ln(x_i)]^2}{n}- \hat{\mu}^2$

Pareto Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\frac{\alpha \ \theta^\alpha}{(x+\theta)^{\alpha+1}} \ \ \ \ \ \ x>0, \ \ \theta \ \text{fixed}$
MLE Estimate $\displaystyle \hat{\alpha}=\frac{n }{\ln \biggl(\prod \limits_{i=1}^n (\theta+x_i) \biggr)- n \ \ln(\theta) }$

Weibull Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\tau \ \frac{x^{\tau-1}}{\theta^\tau} \ e^{- x^\tau / \theta^\tau} \ \ \ \ \ \ x>0, \ \ \tau \ \text{fixed}$
MLE Estimate $\displaystyle \hat{\theta}=\biggl[\frac{\sum \limits_{i=1}^n x_i^\tau}{n} \biggr]^{1/\tau}$

Uniform Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\frac{1}{\theta} \ \ \ \ \ \ 0
MLE Estimate $\displaystyle \hat{\theta}=\text{max}(x_1,x_2,\cdots,x_n)$

Gamma Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle f(x)=\frac{1}{\Gamma(\alpha)} \ \frac{1}{\theta^\alpha} \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} \ \ \ \ \ \ x>0, \ \ \alpha \ \text{fixed}$
MLE Estimate $\displaystyle \hat{\theta}=\frac{\sum \limits_{i=1}^n x_i }{\alpha \ n}$
Fitted Mean $\displaystyle \alpha \ \hat{\theta}=\frac{\sum \limits_{i=1}^n x_i }{n}$

Binomial Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle P[X=x]=\binom{m}{x} \ p^x \ (1-p)^{m-x} \ \ \ \ \ \ x=0,1,2,\cdots,n, \ \ m \ \text{fixed}$
MLE Estimate $\displaystyle \hat{p}=\frac{\sum \limits_{i=1}^n x_i }{m \cdot n}$
Fitted Mean $\displaystyle m \hat{p}=\frac{\sum \limits_{i=1}^n x_i }{n}$

Poisson Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle P[X=x]=\frac{e^{-\lambda} \ \lambda^x}{x!} \ \ \ \ \ \ x=0,1,2,3,\cdots,$
MLE Estimate $\displaystyle \hat{\lambda}=\frac{\sum \limits_{i=1}^n x_i }{n}$

Negative Binomial Distribution
Data $x_1,x_2,\cdots,x_n$
Density $\displaystyle P[X=x]=\binom{r+x-1}{x} \ p^r \ (1-p)^{x} \ \ \ \ \ \ x=0,1,2,3,\cdots, \ \ r \ \text{fixed}$
MLE Estimate…… $\displaystyle \hat{p}=\frac{r }{r+\frac{1}{n} \sum \limits_{i=1}^n x_i}=\frac{r }{r+\overline{x}}$
Fitted Mean…… $\displaystyle \frac{r \ (1-\hat{p})}{\hat{p}}=\frac{1}{n} \sum \limits_{i=1}^n x_i=\overline{x}$

Remarks

The observed data discussed in all the above examples and formulas are the case for complete data (or individual data). In this scenarios, each data point in the data set $x_1,x_2,\cdots,x_n$ is known. In other words, the data is not grouped data (not summarized in any way), not censored and not truncated. For claims data in the form of individual data, no deductible or other insurance coverage modification has been applied. So complete data or individual data is exactly as it is recorded. The next post discusses how to calculate MLE for grouped data and censored or truncated data.

actuarial practice problems

Dan Ma actuarial

Daniel Ma actuarial

Daniel Ma Math

Daniel Ma Mathematics

Actuarial exam

$\copyright$ 2018 – Dan Ma