Statistical models for car insurance pricing

Using Generalized Linear Models to estimate the pure premium and creating a risk classification of insureds.

Image credit: Freepik

Introduction

In the insurance sector, contracting and pricing policy is oriented towards the pursuit of the mutual principle, whereby the insurance company diversifies risks (the different policies) by grouping them into homogeneous classes. The purpose of this classification is to even out the differences between the different categories of customers in relation to their insured risk, so that the insurance company is better able to cope with the occurrence of damaging events and thus to pay the compensation due. Thus, the basic idea of class pricing is to make the individual insured contribute to the overall riskiness of the insurer’s portfolio in an appropriate and efficient manner.

The estimate of the pure premium, net of any expense loading, is determined in relation to the company’s expected loss on a certain insured risk:

$$Premium = E(Indemnity \ | \ Insured \ characteristics)$$

On the basis of the personal characteristics of the insured, and possibly of the insured item, it is therefore possible to quantify the risk incurred by the company. This risk, in the case of non-life insurance, depends on two factors:

  • The frequency of claims: \(Frequency = \frac{N \ Claims}{Exposure}\)
  • The severity of claims: \(Severity = \frac{Total \ Loss}{N \ Claims}\)

Assuming that these can be modelled as two independent random variables, we can express the pure premium that each policyholder will pay to the insurer as the product of their expected values.

$$Premium = E(Frequency) \times E(Severity)$$

The objective of this paper is therefore to build a model for Claims Frequency and Severity for the construction of risk class pricing in Motor Third Party Liability (Motor-TPL) insurance.

About the project

The analysis is developed entirely in R using data available in the CASdatasets package, which contains a wide variety of actuarial datasets, originally collected for the book “Computational Actuarial Science with R“ edited by Arthur Charpentier.In particular, two datasets were used, freMTPLfreq and freMTPLsev, which collect the risk characteristics for 413,169 motor liability policies (mostly observed over one year of exposure) of a French insurance company: freMTPLfreq contains the risk characteristics and the number of claims, while freMTPLsev contains the amount of claims and the corresponding policy ID.

The project is structured as follows:

  • Importing data
  • Descriptive analysis, detection of outliers and creation of variables for frequency and severity model estimation
  • Estimation and choice of model for claims frequency using a GLM approach with Poisson distribution and a “log“ link function
  • Estimation and choice of model for severity using a GLM approach with Gamma distribution and a “log“ link function
  • Pure premium estimation and creation of tariff classes
  • Conclusion
Not yet published, it will be available soon.

I like analyzing data, getting useful information from them to answer business questions.

Related