Operating System: Windows 10
R v. 3.6.0
brms v. 2.15.0
loo v. 2.4.1
I have several GAMMs and would like to choose the best model for my data based on LOO information criterion. However, many of the models have pareto-k values that are too high, as indicated by the loo function. I am concerned that the models may be misspecified, and would like to compare the number of parameters in the models to their p_loo values. However, I’m not sure how to determine the exact number of parameters in my models.
The code for two of the models I am comparing is below, along with the output from running the loo function on each model. The structure of the two models is essentially the same, with the only difference being that in model1 the response variable is log-transformed whereas in model2 the response variable is scaled. The data set contains 24,428 observations (reproductive_state, Rgroup, and animalID are factors with 8, 5, and 68 levels, respectively, scaled_julianDay and scaled_age are numerical, with 366 and 73 unique values, respectively).
model1 <- brm(bf(log_movementrate ~ s(scaled_julianDay,
reproductive_state, bs="fs", xt="cc") +
Rgroup +
s(scaled_age, by=reproductive_state) +
(1 | animalID) +
arma(time = julianDay, gr=animalID, p=4, q=0)),
data = dm, family = gaussian(),
control=list(adapt_delta=0.99, max_treedepth=15))
model2 <- brm(bf(scaled_movementrate ~ s(scaled_julianDay,
reproductive_state, bs="fs", xt="cc") +
Rgroup +
s(scaled_age, by=reproductive_state) +
(1 | animalID) +
arma(time = julianDay, gr=animalID, p=4, q=0)),
data = dm, family = gaussian(),
cores=4,
control=list(adapt_delta=0.99, max_treedepth=15))
loo(model1)
Computed from 4000 by 24428 log-likelihood matrix
Estimate SE
elpd_loo -27721.5 185.5
p_loo 161.6 4.1
looic 55442.9 371.1
------
Monte Carlo SE of elpd_loo is 0.2.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 24426 100.0% 483
(0.5, 0.7] (ok) 2 0.0% 581
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo(model2)
Computed from 4000 by 24428 log-likelihood matrix
Estimate SE
elpd_loo -59886.1 1170.3
p_loo 38954.4 1095.0
looic 119772.2 2340.7
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 16311 66.8% 450
(0.5, 0.7] (ok) 2173 8.9% 111
(0.7, 1] (bad) 1740 7.1% 10
(1, Inf) (very bad) 4204 17.2% 0
See help('pareto-k-diagnostic') for details.
Below are pp_check plots for each model (row 1: pp_check type= “stat” stat= “mean”, row 2: ppc_dens_overlay, row 3: ppc_loo_pit_overlay, row 4: pp_check type= “ecdf_overlay”).
Questions:
- How do I determine the exact number of parameters in my models?
- Why are the p_loo values and pareto-k values so different between the two models, when the models have essentially the same structure?
Any help or suggestions would be appreciated. Thanks!