Coefficients not sensitive to prior specification remain small when using a wide prior and I do not have a reason to believe that the coefficient of the parameters that are sensitive to prior spec could have larger impacts than the other.
My reference model is the model with all 11 predictors (some are categorical with several levels) and is likely overfitting above 7-8 parameters.
1 predictor is the most parsimonious model but I guess interpreting carefully the model with 6 predictors would be fine given the small improvement in elpd (and that the projpred elpd plot is based on models without monotonic effects) and the results make sense. Is that correct?
I should have been more clear that I when asking about the model, I think that the prior is part of the model. The prior matters a lot whether the model with all predictors overfits or not (we did run the experiments with Noa)
What prior did you use?
For predictive purposes I’m in favor of the bigger model, for explanatory purposes it gets more difficult.
Probably. I don’t have any idea how to interpret any R2 for ordinal model.
This is not a bad choice, but due to small levels (both in target and predictors) having so few observations, the predictive performance is also prior sensitive. I did run a few test runs, and I would say you should not trust any single prior or report results based on just a single prior, but illustrate the sensitivity. Prior sensitivity is common with rare events, and then you can’t avoid dealing with the complications of data not being informative alone.
Thank you for your suggestion.
I will then report the main results with prior(normal(0,1) and discuss the sensitivity in supplementary.
Is there a way to integrate this sensitivity into one final result? With some kind of model averaging maybe?
I will mark your latest reply as the “Solution” but all the posts in this thread were really helpful! Thank you all!
Yes, but then you still probably have sensitivity with respect to the distribution over the different prior parameter values. Really, the problem is having target and predictor levels with 0 observations, which makes the log predictive density and parameter posteriors weakly identified by likelihood. It is still possible that some of you quantities of interest and conclusions are not sensitive. So far you have told you are interested in parameter posteriors and model selection, but it would be interesting to know what is the actual scientific question?
Please post also here some of the results on how do you end presenting the sensitivity results as this is interesting example and I’d like to learn more about this example as ity might help developing better advice in future
Hello again,
I am back to working on this project and have become more familiar with shrinkage priors. I am considering using the R2D2 prior (as suggested by avehtari) on b, as it allows me to retain all predictors. I am using the default mean and precision of the Beta prior (also suggested in Yanchenko et al. 2024 - The R2D2 Prior for Generalized Linear Mixed Models), but I slightly increased the concentration to imply less shrinkage. prior(R2D2(mean_R2 = 0.5, prec_R2 = 2, cons_D2 = 1), class = "b")
However, my predictors belong to different classes (numerical [standardized], categorical, and ordinal modeled as monotonic mo()) . I came across R2D2M2 prior and monotonic predictors - #3 by Xavier_La_Rochelle but I can’t dummy-code all my categorical/ordinal covariates (for convergence purpose). My questions :
(1) Can I still interpret coefficients and conditional effects, given that not all predictors are on the same scale?
(2) If yes, is there a general guideline on how to choose the concentration value?
(3) Regarding the interpretation of the results (see below), some coefficients are centered on 0 (b_nominalA_1) which I interpret as shrunk and thus not meaningful. However, the coefficients that seem to have an impact (b_nominalA_2 and A3) are still slightly shrunk toward 0. I would have expected the shrinkage to either lead to coefficients centered on 0 (strong shrinkage) or leave them largely unaffected (slighty encompassing 0 to far from 0). Could someone clarify what I am missing here?
Categorical variables have between 3 to 6 levels. Do I need to (and can I) adjust anything in the prior?
Thank you for the link, this is very interesting! However, I have not been able to reproduce the PAV calibration plots (even looking at the github repo)…
One of the posterior predictive check seems to indicate that one of the level (F) is poorly represented in the model, probably due to the low number of observations: POSTERIOR CHECK
I am working on adapting the PAV calibration code to my model, I had an issue with cmdstanr…
Would you have other examples for the interpretation of the PAV calibration plot, and guidance on how to fix issues?
For example: the caption of Fig. 29 says that the S-shape means under confidence in predictions and is due to an implementation error. What would be the guidance if the implementation is correct but the same S-shape issue is found?
I think the wording could be improved (ping @TeemuSailynoja ) as the S-shape can be due to too rigid model, too. Any shape deviating significantly from the diagonal is likely due to too rigid model, for example using latent linear model, when a non-linear model would be needed (e.g. splines or Gaussian processes). Also when looking at the need for zero-inflation, a too rigid count data distribution can cause miscalibration for probability of zero.
Maybe try spline smooth s() for the continuous variables? If you have changed your model since the first version, post the current version and the plots you see