I am conducting an analysis of various effects on the harvest of different fish species. Each species is analyzed independent using the same set of 8 nested models. I will not be using the results for predictions, I am trying to capture those predictors which modify the response variable and in what direction.
It was suggested to me to use both loo_compare and loo_model_weights for model selection. For some of the species, I have come across the situation where the model selected by loo_compare is not the same as the model with the highest stacking weight by loo_model_weights and do not know how best to proceed with reporting results and discussion.
I have read some papers and the various documentation, but it is still unclear to me which method would be best for my study, if another method would be better, or if I should report the results using both methods (although this sounds like it could be confusing for the reader).
I should add… that these are not exactly nested models because I checked each model for multicollinearity and removed the highest predictor with a VIF>5, this process was then repeated to bring all VIF < 5.
These are so close to each other that we can’t say that one would be better than others. See more information at CV-FAQ 16 and the reference given there.
Stacking is not model selection, but model averaging and can give non-zero weight for a model that is on average much worse, but the mixture of weighted models can still be better. For example if the true distribution would be t-distribution and you have one narrow and one wide normal, one of the normal’s can be much worse alone, but the mixture can be closer to t-distribution than either normal, and thus the weights can be non-zero for both.
Your stacking weights are mostly non-zero for the models that have good predictive performance, but for example mod13 has weight 0.106, although it’s predictive performance alone is much worse than the performance of the best ones. This is likely due to mod13 having thicker tails in the predictive distribution and adding that to the mixture improves the combination just a little. This hints that it might be possible to modify one of the top models to have an observation model that better matches the true distribution.
These models are indistinguishable based on the predictive performance and the rest have much worse predictive performance.
I didn’t assume they are nested, and the stacking weights even hint that they are not nested as so many models have non-zero weight.
This will make the model comparison very difficult. Instead, I recommend using all the covariates and use projection predictive variable selection, especially if the only differences between the models are which covariates are used. If you tell more about the models, I may be able to provide further advice.
I am fitting to a neg_binomial_2 family model, which does not seem to be supported by projpred. Here are two figures, first the ppc_dens_overlay, and second the mcmc_areas for the full model for one of my species (one with a particularly large number of 0 estimates!).
I got as far as identifying the solution terms ( soltrms_lat_final in the example).
The example then refers to the main vignette for post-selection inference. That vignette calls the function project which does not accept neg_binomial_2 models.
I do not need to make predictions from the final model, but am rather looking for influential variables and the direction they modify the response variable. Can I simply use the solution terms identified to rerun the model using the whole data set?
Did you set latent = TRUE and the other arguments relevant for the latent projection (here: latent_ll_oscale = latent_ll_oscale_nebin and latent_ppd_oscale = latent_ppd_oscale_nebin) also in the project() call, not only in the varsel() or cv_varsel() call? This is explained in section Implementation, but I guess I should have pointed that out more clearly.
That sounds like you don’t have an actual prediction problem but rather want to interpret the parameter estimates. Just be warned that projpred is not made for parameter interpretation, but prediction problems.
You can avoid the repetition of the arguments related to the latent projection by defining the reference model object of class refmodel first (via projpred::get_refmodel() which redirects to projpred:::get_refmodel.stanreg() in your case; it is in this creation of the reference model object where you add the arguments related to the latent projection) and then calling varsel(), cv_varsel(), or project() on that predefined reference model object (instead of on the reference model fit) without the need for repeating the arguments related to the latent projection.
Alternatively, if you have run the code from the latent-projection vignette, you can call project() on the output of varsel() or cv_varsel() (instead of on the reference model fit) without the need for repeating the arguments related to the latent projection.
All of this might sound a bit complicated now but is in fact a quite natural approach when you get more familiar with the UI of projpred. Anyway, I guess I really need to revise that part of the latent-projection vignette. Sorry for that. I’ll put it on the issue tracker.
Again, thank you both for your help.
I was able to get the projections to work using the latent = TRUE argument.
Getting back to my problem. @avehtar I started down this path at your suggestion that I can use this approach instead of model selection. But once I have selected the best predictor terms in my model, I need to deviate from the vignettes to use those terms for parameter interpretation. Is it simply enough to rerun the model using the terms identified?
I missed this earlier (you tried to tag me, but missed one letter in my username).
For the parameter interpretation you need to take into account causal assumptions and possible collinearities
If you want to be careful, we recommend doing the interpretation with the reference model (the big model). Although the projection has nice decision theoretical interpretation, the actual implementation is such that there can be difference to the theory and we don’t have yet such diagnostic that we would be happy recommending using the projected posterior for interpretation (assuming causal assumptions allow that). For posterior interpretation (assuming causal assumptions allow that) the refitted model posterior with the selected variables, is likely to be safer than the projected posterior (the variable selection procedure itself is stable), but still we recommend to make possible causal interpretations using the reference model.