Multivariate Frequency-Severity BRMS Model

Hi there,

I am currently trying to model aggregate insurance losses in BRMS using a multivariate LogNormal-Poisson model. This has not proven straightforward, due to the frequency being intrinsically linked to the severity component.

To give some context, every insurance policy in a dataset has a deductible (aka “excess”). This can vary significantly between policies. No losses will be reported if they fall below the deductible, so the severity data is left-truncated at the deductible. All things being equal, we would expect a lower frequency for high deductible policies versus low deductible ones.

If we assume all policies follow the same frequency distribution for ground-up frequency (i.e. frequency assuming all losses are reported, whether above or below the deductible) and the same severity distribution, then:

Ground-Up Frequency ~ Poisson(lambda)

Ground-up Severity ~ LogN(mu, sigma)

This means that, for policy i in the data, the aggregate loss S_i, given deductible d_i, is:

S_i = Sum(X_{ij}) from j = 1 to N_i

N_i ~ Poisson(lambda * (1 - LogN_CDF(d_i, mu, sigma)))

X_{ij} ~ LogN(mu, sigma | X_{ij} > d_i)

For simplicity, I am assuming no population- or group-level effects.

I have copied in some example code below.

My key questions are:

  1. Is there a way in a BRMS multivariate model to refer to parameters from the other model component? I want the Poisson model parameter to be adjusted based on the LogNormal severity at the deductible.
  2. Is it possible to adjust the log-likelihood in a way such that both frequency and severity have equal influence on the parameter fit? Poisson and LogNormal log-likelihoods are quite different, so there will currently be a bias in the model towards the severity.
  3. The ‘ded * 0’ term in my code is included purely to bring the deductible into the Stan model data. Is there a cleaner way of doing this?

Using stanvars, I can refer to Intercept_loss and sigma_loss, which works for this context. However, it would not work if I start adding population or group level effects to the severity model, so the methodology needs to generalise beyond what it is now.


#### Load Packages ####

library(brms)
library(tidyr)

options(stringsAsFactors = FALSE)

#### Simulate Frequency Data ####


freq_n = 10e3
freq_lambda = 2

freq_data =
  data.frame(
    pol_id =  seq(freq_n),
    freq = TRUE,
    sev = FALSE,
    ded = runif(freq_n, 1e3, 10e3),
    claimcount_fgu = rpois(freq_n, freq_lambda),
    loss = 1
  )

#### Simulate severity Data ####

sev_mu = 8
sev_sigma = 1.5

sev_data =
  data.frame(
    ded = rep(freq_data$ded,
              freq_data$claimcount_fgu),
    loss =
      unlist(
        lapply(
        seq(freq_n),
        function(i){
          
          rlnorm(freq_data$claimcount_fgu[i], 
                 sev_mu, 
                 sev_sigma)
          
        }
      )
      )
  ) %>%
  mutate(
    pol_id = rep(seq(freq_n), freq_data$claimcount_fgu)
  ) %>% 
  filter(
    loss > ded
  ) %>%
  mutate(
    claim_id = row_number(),
    freq = FALSE,
    sev = TRUE,
    claimcount = 0,
    claimcount_fgu = 0
  )

freq_data_net =
  freq_data %>%
  left_join(
    sev_data %>%
      group_by(
        pol_id
      ) %>%
      summarise(
        claimcount = n()
      ) %>%
      ungroup(),
    by = "pol_id"
  ) %>%
  mutate(
    claimcount = coalesce(claimcount, 0L)
  )

#### Join Data ####

full_data = 
  bind_rows(freq_data_net, sev_data)

#### Multivariate Model ####

stanvars =
  "
  nlp_claimcount_f1 = 
    nlp_claimcount_f1 + 
    log(1 - lognormal_cdf(C_claimcount_1 , 
                          Intercept_loss, 
                          sigma_loss));
  "

fit_freq = 
  bf(claimcount | subset(freq) ~ f1 + ded * 0,
     f1 ~ 1,
     nl = TRUE) + 
  poisson()

fit_sev = 
  bf(loss | subset(sev) + trunc(lb = ded) ~ 1
     ) + 
  lognormal()


mv_model_fit =
  brm(fit_freq + fit_sev + set_rescor(FALSE),
      data = full_data,
      chains = 1,
      iter = 1000,
      warmup = 250,
      stanvars =
        stanvar(
          scode = stanvars,
          block = "model"
        ),
      prior = 
        c(prior(normal(0, 1),
                class = b,
                coef = Intercept,
                resp = claimcount,
                nlpar = f1)
          )
      )

Please let me know if my explanation or code is not clear. Any advice would be very much appreciated!

1 Like

Hi
sorry for not getting to you earlier, this is a well written question.

No, this is currently not possible. The closest you can do with brms out of the box is to add varying slopes/intercepts that are correlated across responses. Other than that you may play with stanvars (as you did), or it might be also possible to write a custom family that would accept claim count as response and severity as additional data (via vint or vreal).

If this is an issue, this looks like the model actually does not represent your domain knowledge well - if you actually don’t want exactly the same values to predict both responses, just “somewhat related” values then using an intercept/slope that is correlated between the responses might in fact be a better choice. In fact a correlated intercept per row + fixing the sigma to a small-ish constant can closely mimic a correlation of the residuals (and works for non-gaussian likelihood), maybe that would represent your domain knowledge better?

You can pass additional data via stanvars.

Hope that helps at least a little - feel free to ask for further clarifications if my descriptions seem mysterious :-)

This is a very comprehensive and clear response, so thank you very much!

BRMS is still incredibly useful, even if I can’t apply it directly to this specific context. I will continue with my current approach (with some of your suggested tweaks).

I look forward to seeing how BRMS continues to develop as time goes on.

Hi Chris,

A fellow actuary trying to figure out how to do the same thing here.
I think your solution is great, and it should work with covariates if you replace the intercept_loss with “mu_loss”.The code you included also seem to no longer work with brms, and the inserted “stanvars” would need to be moved anyway in the code generated by brms::make_stancode function.
brms::make_standata would create the data needed for an rstan::stan call using the stan code saved in a .stan file.

Regards,
Allen

Hi Allen,

Good to see there are others working on the same problem!

Using mu_loss won’t solve the issue, as mu_loss is applicable to the severity data but we need to calculate a mu and sigma for each point in the frequency data when we calculate the deductible adjustment. This may be achievable by setting the severity and frequency data to be the same length, putting in some dummy values and then adjusting the log-likelihood so that they do not impact the posterior calculation, but this could be a bit fiddly.

Using block = "likelihood" could solve the issue of adjusting the likelihoods for frequency and severity to ensure the model is not biased towards optimising one over the other.

I’d like to publish a report on this approach but it’s not something I have done before. Do you know of any routes to getting something like this published, e.g. through the IFoA, SOA or elsewhere?

1 Like

HI Chris,

You are right. I found that out after replying to your original message. I ended up doing exactly what you described in the modeling block to set up a conditional target += operation, which would then be able to work with predictors. I know that CAS has a few magazines/journals that would probably welcome such a report. Here is a link the variance: https://www.variancejournal.org/ Would you be interested in collaborating regarding the topic? My email is tzw0423@hotmail.com

Regards,
Allen

Hi Martin,

Me and Allen have been collaborating on a GitHub project to try and solve this problem in BRMS, which you can find here:

There are two key issues which are somewhat blocking us:

Adjusting Lambda for Deductible

In order to adjust the mu_claimcount parameter (i.e. the lambda parameter in the frequency model) for deductible censoring (multiplying by (1 - lognormal_cdf(deductible, mu_loss, sigma_loss))), we would need to edit it before the target += poisson_log_lpmf(Y_claimcount, mu_claimcount) argument in the Stan model code.

However, if we use stanvars to make the adjustment at the start of the block, mu_claimcount will not have yet been defined and so the model will fail. Conversely, if we put it at the end of the block, then it will be after the target +=... line. Is there a way to adjust mu_claimcount while still using the BRMS framework?

The only thing I can think of is to add something like this argument at the end to negate the effect of the unadjusted mu_claimcount and use our own:

target += poisson_log_lpmf(Y_claimcount, mu_claimcount * ded_adj) - poisson_log_lpmf(Y_claimcount, mu_claimcount)

but this seems a bit hacky.

Adjusting the weighting of the likelihoods

We want our model to solve for mu_loss and sigma_loss - not just to optimise against the severity data but also to produce realistic deductible censoring adjustments to frequency. In a single-response variable model, the relative sizes of the likelihoods is irrelevant but when we are using multi-response models where the parameters impact both responses, this is not the case.

Is there an easy way to adjust the target += by a scale factor - one which differs for frequency and severity? If we were to divide the severity likelihood by the total likelihood for severity and the equivalent for frequency, the model would give equal weighting to both. At present, I think there is a big bias towards optimising for severity.

Do these questions make sense and am I mis-understanding how BRMS and Stan work?

Any advice would be really appreciated.

Best regards,

Chris

2 Likes

I won’t be able to take a closer look before late next week, but maybe writing your own custom family ( you can mimic multivariateresponseby passing the other outcome via vreal or vint) would let you have enough control over what is added to target?

I have created a function called brms_freq_sev which allows you to run brms frequency severity models with the aforementioned censoring for frequency at the deductible. You can find it on the GitHub repo:

It would be great to get some feedback on it, as I feel a fair amount of it could be made more efficient.

2 Likes

HI, Thanks for the work. I am testing right now but I am unfamiliar to the model spec as defined in the bookdown. My data has only one loss recorded per policy grouped by year (I don’t plan to use year as a explanatory variable). The model spec does not specifically prohibit only one loss, but will it work, or is there some way to go about this using the package? Thanks.
Edit: Suggestion: Treat year as a policy.