Hi there,
I am currently trying to model aggregate insurance losses in BRMS using a multivariate LogNormal-Poisson model. This has not proven straightforward, due to the frequency being intrinsically linked to the severity component.
To give some context, every insurance policy in a dataset has a deductible (aka “excess”). This can vary significantly between policies. No losses will be reported if they fall below the deductible, so the severity data is left-truncated at the deductible. All things being equal, we would expect a lower frequency for high deductible policies versus low deductible ones.
If we assume all policies follow the same frequency distribution for ground-up frequency (i.e. frequency assuming all losses are reported, whether above or below the deductible) and the same severity distribution, then:
Ground-Up Frequency ~ Poisson(lambda)
Ground-up Severity ~ LogN(mu, sigma)
This means that, for policy i in the data, the aggregate loss S_i, given deductible d_i, is:
S_i = Sum(X_{ij}) from j = 1 to N_i
N_i ~ Poisson(lambda * (1 - LogN_CDF(d_i, mu, sigma)))
X_{ij} ~ LogN(mu, sigma | X_{ij} > d_i)
For simplicity, I am assuming no population- or group-level effects.
I have copied in some example code below.
My key questions are:
- Is there a way in a BRMS multivariate model to refer to parameters from the other model component? I want the Poisson model parameter to be adjusted based on the LogNormal severity at the deductible.
- Is it possible to adjust the log-likelihood in a way such that both frequency and severity have equal influence on the parameter fit? Poisson and LogNormal log-likelihoods are quite different, so there will currently be a bias in the model towards the severity.
- The ‘ded * 0’ term in my code is included purely to bring the deductible into the Stan model data. Is there a cleaner way of doing this?
Using stanvars, I can refer to Intercept_loss and sigma_loss, which works for this context. However, it would not work if I start adding population or group level effects to the severity model, so the methodology needs to generalise beyond what it is now.
#### Load Packages ####
library(brms)
library(tidyr)
options(stringsAsFactors = FALSE)
#### Simulate Frequency Data ####
freq_n = 10e3
freq_lambda = 2
freq_data =
data.frame(
pol_id = seq(freq_n),
freq = TRUE,
sev = FALSE,
ded = runif(freq_n, 1e3, 10e3),
claimcount_fgu = rpois(freq_n, freq_lambda),
loss = 1
)
#### Simulate severity Data ####
sev_mu = 8
sev_sigma = 1.5
sev_data =
data.frame(
ded = rep(freq_data$ded,
freq_data$claimcount_fgu),
loss =
unlist(
lapply(
seq(freq_n),
function(i){
rlnorm(freq_data$claimcount_fgu[i],
sev_mu,
sev_sigma)
}
)
)
) %>%
mutate(
pol_id = rep(seq(freq_n), freq_data$claimcount_fgu)
) %>%
filter(
loss > ded
) %>%
mutate(
claim_id = row_number(),
freq = FALSE,
sev = TRUE,
claimcount = 0,
claimcount_fgu = 0
)
freq_data_net =
freq_data %>%
left_join(
sev_data %>%
group_by(
pol_id
) %>%
summarise(
claimcount = n()
) %>%
ungroup(),
by = "pol_id"
) %>%
mutate(
claimcount = coalesce(claimcount, 0L)
)
#### Join Data ####
full_data =
bind_rows(freq_data_net, sev_data)
#### Multivariate Model ####
stanvars =
"
nlp_claimcount_f1 =
nlp_claimcount_f1 +
log(1 - lognormal_cdf(C_claimcount_1 ,
Intercept_loss,
sigma_loss));
"
fit_freq =
bf(claimcount | subset(freq) ~ f1 + ded * 0,
f1 ~ 1,
nl = TRUE) +
poisson()
fit_sev =
bf(loss | subset(sev) + trunc(lb = ded) ~ 1
) +
lognormal()
mv_model_fit =
brm(fit_freq + fit_sev + set_rescor(FALSE),
data = full_data,
chains = 1,
iter = 1000,
warmup = 250,
stanvars =
stanvar(
scode = stanvars,
block = "model"
),
prior =
c(prior(normal(0, 1),
class = b,
coef = Intercept,
resp = claimcount,
nlpar = f1)
)
)
Please let me know if my explanation or code is not clear. Any advice would be very much appreciated!