Determine number of parameters in brms GAMM to compare to p_loo value

Operating System: Windows 10
R v. 3.6.0
brms v. 2.15.0
loo v. 2.4.1

I have several GAMMs and would like to choose the best model for my data based on LOO information criterion. However, many of the models have pareto-k values that are too high, as indicated by the loo function. I am concerned that the models may be misspecified, and would like to compare the number of parameters in the models to their p_loo values. However, I’m not sure how to determine the exact number of parameters in my models.

The code for two of the models I am comparing is below, along with the output from running the loo function on each model. The structure of the two models is essentially the same, with the only difference being that in model1 the response variable is log-transformed whereas in model2 the response variable is scaled. The data set contains 24,428 observations (reproductive_state, Rgroup, and animalID are factors with 8, 5, and 68 levels, respectively, scaled_julianDay and scaled_age are numerical, with 366 and 73 unique values, respectively).

model1 <- brm(bf(log_movementrate ~ s(scaled_julianDay, 
                        reproductive_state, bs="fs", xt="cc") +
                   	Rgroup +
                  	s(scaled_age, by=reproductive_state) + 
                 	(1 | animalID) +
                   	arma(time = julianDay, gr=animalID, p=4, q=0)),
                   data = dm, family = gaussian(),
                   control=list(adapt_delta=0.99, max_treedepth=15))

model2 <- brm(bf(scaled_movementrate ~ s(scaled_julianDay, 
                         reproductive_state, bs="fs", xt="cc") +
                   	 Rgroup +
                  	 s(scaled_age, by=reproductive_state) + 
                  	 (1 | animalID) +
                 	 arma(time = julianDay, gr=animalID, p=4, q=0)),
                   data = dm, family = gaussian(),
                   cores=4,
                   control=list(adapt_delta=0.99, max_treedepth=15))


loo(model1)
Computed from 4000 by 24428 log-likelihood matrix

        	 Estimate    SE
elpd_loo -27721.5 185.5
p_loo       161.6   4.1
looic     55442.9 371.1
------
Monte Carlo SE of elpd_loo is 0.2.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     24426 100.0%  483       
(0.5, 0.7]   (ok)           2   0.0%  581       
(0.7, 1]   (bad)          0   0.0%  <NA>      
(1, Inf)   (very bad)     0   0.0%  <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

loo(model2)
Computed from 4000 by 24428 log-likelihood matrix

         Estimate     SE
elpd_loo -59886.1 1170.3
p_loo     38954.4 1095.0
looic    119772.2 2340.7
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     16311 66.8%   450       
(0.5, 0.7]   (ok)        2173  8.9%   111       
(0.7, 1]   (bad)       1740  7.1%   10        
(1, Inf)   (very bad)  4204 17.2%   0         
See help('pareto-k-diagnostic') for details.

Below are pp_check plots for each model (row 1: pp_check type= “stat” stat= “mean”, row 2: ppc_dens_overlay, row 3: ppc_loo_pit_overlay, row 4: pp_check type= “ecdf_overlay”).

Questions:

  1. How do I determine the exact number of parameters in my models?
  2. Why are the p_loo values and pareto-k values so different between the two models, when the models have essentially the same structure?

Any help or suggestions would be appreciated. Thanks!

2 Likes

I can’t easily answer, but in order to ping somebody who can, can you clarify for me: Is your question about how to count the parameters in a conceptual sense (i.e. how many effective parameters are in your model)? Or is your question about how to extract the nominal number of parameters from the brms object?

I believe I’m looking for the nominal number of parameters in the brms model; I basically want to find the p value referred to in this post: A quick note what I infer from p_loo and Pareto k values - Modeling - The Stan Forums (mc-stan.org). My understanding, which may be wrong, is that the p_loo value from the output of the loo function gives me the effective number of parameters in the model, and that I should compare this number to the nominal number of parameters in the model, p, to help determine if the model is misspecified.

I think that what you want (but maybe @paul.buerkner can chime in) is available via
rstan::get_num_upars(your_brms_model$fit)

1 Like

I agree that this could be a reasonable choice indeed.

1 Like