Case study on Loss Curves (Actuarial Science)

We just added a new case study:

Modelling Loss Curves in Insurance with RStan

Mick Cooney

http://mc-stan.org/users/documentation/case-studies/losscurves_casestudy.html

Abstract
Loss curves are a standard actuarial technique for helping insurance companies assess the amount of reserve capital they need to keep on hand to cover claims from a line of business. Claims made and reported for a given accounting period are tracked seperately over time. This enables the use of historical patterns of claim development to predict expected total claims for newer policies.

In insurance, depending on the types of risks, it can take many years for an insurer to learn the amount of liability incurred on policies written during any particular year. So, at a particular point in time after the policy is written some claims may not reported or known about by then, or some claims are still working through the legal system so the final amount due is not determined.

Total claim amounts from a simple accounting period are laid out in a single row of a table, each column showing the total claim amount after that period of time. Subsequent accounting periods have less development, so the data takes a triangular shape - hence the term ā€˜loss trianglesā€™. Using previous patterns, data in the upper part of the triangle is used to predict values in the unknown lower triangle, giving the insurer a probabilistic forecast of the ultimate claim amounts to be paid for all business written.

The ChainLadder package provides functionality to generate and use these loss triangles.

In this case study, we take a related but different approach: we model the growth of the losses in each accounting period as an increasing function of time, and use the model to estimate the parameters which determine the shape and form of this growth. We also use the sampler to estimate the values of the ā€œultimate loss ratioā€, i.e. the ratio of the total claims on an accounting period to the total premium received to write those policies. We treat each accounting period as a cohort.

2 Likes

I missed out on a few references that I would like to add to the case study.

I will add them tonight, would it be possible to get the updates added to the version on the website once that is done?

Sure. Easiest for us if you can create a pull request on the web siteā€™s repo on stan-dev. But if you donā€™t know how to use GitHub, itā€™s probably not worth learning for this and I can do it for you.

Hi! Iā€™m brand new to Stan, trying to explore MCMC for estimating reserve variability. My primary goal is to create a model that imitates the traditional chain ladder method of reserve estimation while allowing for some judgmental input via prior distributions. So slightly different from the probability-distribution-as-a-loss-curve approach taken in this case study, but definitely a related concept. The model Iā€™m aiming for was first introduced in this paper, but the coding was done in WinBUGS then, and Iā€™m attempting to adapt the model for Stan.

My question reveals my naivety in Stan and modeling in general, but I feel like asking here is my best approach to getting an answer. Iā€™m learning a lot through the user manual and different forum threads, but one thing I havenā€™t been able to understand yet is whether or not itā€™s possible to use a traditional loss triangle (an N x N array with NAs in the lower triangle) as my data for the model. I know Stan treats variables declared in the data block as known. But Iā€™m wondering if the Userā€™s Guide sections 3.1 or 3.3 are hinting at a workaround for this? If I set the lower triangle values equal to zero, can I effectively index the upper and lower triangles separately and then differentiate the two in the model block, and then just ignore the ā€œfitā€ for the lower triangle? Or am I really asking for (unsupported) ragged arrays here? In the case study it looks like a 1 dimensional vector or array was used so maybe Iā€™m asking for too much. Hopefully this question makes sense. Thanks in advance!

Iā€™ve found it best to use a matrix with the first column being AY index, second DY index, third CY index (I do lots with inflation modeling) then fourth and subsequent columns are the paid, incurred, etc values. Depending on the data/model these arenā€™t strictly indices but time from initial period to handle fractional amounts.

1 Like

I agree with Stephenll that data in this ā€œlongā€ format is easier to model in Stan. I am also currently exploring these types of actuarial reserving models, although I am definitely still a relative beginner compared to many people here on this forum. I have found Markus Gessmanā€™s blog (Correlated log-normal chain-ladder model), Glenn Meyerā€™s CAS Monographs, and Markus Gessmanā€™s research paper Hierarchical Compartmental Reserving Models to be great beginner-friendly resources to get started in exploring some various Bayesian model structures.

I have a couple of questions Iā€™ve come up with in the last few months of learning Bayesian modelling in this context:

  • Have people found more success in fitting to cumulative losses or incremental losses? My issue with modelling incremental losses is that these are often losses < 0, and Iā€™m not aware of an appropriate or well-studied response distribution with support < 0 and a skewed right tail.

  • One of the most commonly used stochastic reserving models in the industry is to model incremental losses using a GLM with an Over-dispersed Poisson distribution. Has anyone found a resource that describes how to replicate this model in Stan or is there a Bayesian equivalency? Also I have the issue that the Poisson distribution has a count/integer variable as response, and Stan wonā€™t allow you to use it for a continuous response.

  • If I have a vector of industry or benchmark CDFs for each line of business that Iā€™m reserving for, is there a way for me to set up a function that gives credibility to the data in a way such that earlier development periods rely more on the data and the later development periods rely more on the industry/benchmark? Ideally I would like this distribution to converge to 1 fairly quickly for the triangles I work on.
    I tried thinking about this two different ways: The first was an explicit weighting beta_weight = w1 * beta_ind + (1 - w1) * beta_data where w_1 goes from 0 to 1 over time. The other was trying to manipulate the priors of the beta_data in such a way that they have mean = beta_ind and decreasing variance so that the posterior distributions are drawn and more towards beta_ind as development increases. Iā€™m leaning towards the second being a better approach but Iā€™m not sure if itā€™s an appropriate way for me to model it and havenā€™t had time to implement and test it yet.

I would also be very appreciative if anyone could point me to any references for people that have used Bayesian models in a pricing context using more predictive variables (especially for WC, GL, AL lines of business).

Building out these types of models as someone with no formal higher level statistical training is quite challenging, but I really hope to be exploring this area more and more over the next few years.

I answered many of my own questions above by finding this excellent and well-written paper by David Clark. It looks tricky for me to implement in Stan but as I get some free time Iā€™ll work my way through it.

Cheers to anyone that tries to implement Bayesian actuarial reserving models in their work - there are a lot of nuances to this type of data structure and time-dependent model that are really tricky for a beginner to implement without some provided Stan code to reference.

1 Like