Guidance on prior predictive checks in ordinal model

Coefficients not sensitive to prior specification remain small when using a wide prior and I do not have a reason to believe that the coefficient of the parameters that are sensitive to prior spec could have larger impacts than the other.

My reference model is the model with all 11 predictors (some are categorical with several levels) and is likely overfitting above 7-8 parameters.

1 predictor is the most parsimonious model but I guess interpreting carefully the model with 6 predictors would be fine given the small improvement in elpd (and that the projpred elpd plot is based on models without monotonic effects) and the results make sense. Is that correct?
loo_compare

A last question would be on computing a R2. I read that bayes_R2() is probably biased for ordinal models and I saw in another post R^2 calculation for brm model with cumulative family type - #4 by andymilne that a Bayesian McKelvey-Zavoina R2 could do a better approximation?


On projpred and monotonic effect:

Will do, thanks!

I should have been more clear that I when asking about the model, I think that the prior is part of the model. The prior matters a lot whether the model with all predictors overfits or not (we did run the experiments with Noa)

What prior did you use?

For predictive purposes I’m in favor of the bigger model, for explanatory purposes it gets more difficult.

Probably. I don’t have any idea how to interpret any R2 for ordinal model.

I only modified the b_ priors in the model normal(0,1) in the reference model :

formula = extinction_cat ~ 
    cov_numeric1
  + cov_numeric2
  + cov_nominal1
  + cov_nominal2
  + cov_nominal3
  + cov_nominal4
  + cov_nominal5
  + cov_nominal6
  + mo(cov_ordinal1)
  + mo(cov_ordinal2)
  + mo(cov_ordinal3)
  , data = df_extinction
  , family = cumulative("logit")
  , prior = c(prior(normal(0,1), class = "b"))

This is not a bad choice, but due to small levels (both in target and predictors) having so few observations, the predictive performance is also prior sensitive. I did run a few test runs, and I would say you should not trust any single prior or report results based on just a single prior, but illustrate the sensitivity. Prior sensitivity is common with rare events, and then you can’t avoid dealing with the complications of data not being informative alone.

Thank you for your suggestion.
I will then report the main results with prior(normal(0,1) and discuss the sensitivity in supplementary.
Is there a way to integrate this sensitivity into one final result? With some kind of model averaging maybe?


I will mark your latest reply as the “Solution” but all the posts in this thread were really helpful! Thank you all!

1 Like

Yes, but then you still probably have sensitivity with respect to the distribution over the different prior parameter values. Really, the problem is having target and predictor levels with 0 observations, which makes the log predictive density and parameter posteriors weakly identified by likelihood. It is still possible that some of you quantities of interest and conclusions are not sensitive. So far you have told you are interested in parameter posteriors and model selection, but it would be interesting to know what is the actual scientific question?

Please post also here some of the results on how do you end presenting the sensitivity results as this is interesting example and I’d like to learn more about this example as ity might help developing better advice in future

I’m interested in explaining extinction risk of species (response variable) relative to the biology/ecological traits of species (all predictors).

I will! (I think I will add the powerscale sensitivity and some discussion in supplementary)

Hello again,
I am back to working on this project and have become more familiar with shrinkage priors. I am considering using the R2D2 prior (as suggested by avehtari) on b, as it allows me to retain all predictors. I am using the default mean and precision of the Beta prior (also suggested in Yanchenko et al. 2024 - The R2D2 Prior for Generalized Linear Mixed Models), but I slightly increased the concentration to imply less shrinkage.
prior(R2D2(mean_R2 = 0.5, prec_R2 = 2, cons_D2 = 1), class = "b")
However, my predictors belong to different classes (numerical [standardized], categorical, and ordinal modeled as monotonic mo()) . I came across R2D2M2 prior and monotonic predictors - #3 by Xavier_La_Rochelle but I can’t dummy-code all my categorical/ordinal covariates (for convergence purpose).
My questions :
(1) Can I still interpret coefficients and conditional effects, given that not all predictors are on the same scale?
(2) If yes, is there a general guideline on how to choose the concentration value?
(3) Regarding the interpretation of the results (see below), some coefficients are centered on 0 (b_nominalA_1) which I interpret as shrunk and thus not meaningful. However, the coefficients that seem to have an impact (b_nominalA_2 and A3) are still slightly shrunk toward 0. I would have expected the shrinkage to either lead to coefficients centered on 0 (strong shrinkage) or leave them largely unaffected (slighty encompassing 0 to far from 0). Could someone clarify what I am missing here?

Thanks!

  • If the scales of different covariates are not hugely different, then the prior is still reasonable
  • For interpretation, it’s better to focus on effects on predictions instead of coefficients

I have not seen anything concrete beyond doing prior predictive simulations, and this requires more research

Looks like you have quite informative likelihood

You were asking about prior predictive checks, but see also posterior predictive checks for ordinal data in Recommendations for visual predictive checks in Bayesian workflow

Thank you for your reply!

Categorical variables have between 3 to 6 levels. Do I need to (and can I) adjust anything in the prior?

Thank you for the link, this is very interesting! However, I have not been able to reproduce the PAV calibration plots (even looking at the github repo)…


One of the posterior predictive check seems to indicate that one of the level (F) is poorly represented in the model, probably due to the low number of observations:
POSTERIOR CHECK


But the prior check is not really good either for this level:
PRIOR CHECK

Is there anything that can be done here (apart from being aware of this and being cautious in the interpretation)?

Probably not

What is the problem you have?

I am working on adapting the PAV calibration code to my model, I had an issue with cmdstanr…

Would you have other examples for the interpretation of the PAV calibration plot, and guidance on how to fix issues?
For example: the caption of Fig. 29 says that the S-shape means under confidence in predictions and is due to an implementation error. What would be the guidance if the implementation is correct but the same S-shape issue is found?

I think the wording could be improved (ping @TeemuSailynoja ) as the S-shape can be due to too rigid model, too. Any shape deviating significantly from the diagonal is likely due to too rigid model, for example using latent linear model, when a non-linear model would be needed (e.g. splines or Gaussian processes). Also when looking at the need for zero-inflation, a too rigid count data distribution can cause miscalibration for probability of zero.

Maybe try spline smooth s() for the continuous variables? If you have changed your model since the first version, post the current version and the plots you see

I tried spline smooth on the continuous variables but the result did not make sense when approaching the extreme values. It slightly improved the PAVa calibration plots but did not make a huge difference.


Here is the original model:

       brm(
         extinction_cat ~ 
         cov_numeric1
         + cov_numeric2
         + cov_nominal1
         + cov_nominal2
         + cov_nominal3
         + cov_nominal4
         + cov_nominal5
         + cov_nominal6
         + mo(cov_ordinal1)
         + mo(cov_ordinal2)
         + mo(cov_ordinal3)
         , data = df_extinction
         , family = cumulative("logit")
         , prior = c(prior(R2D2(mean_R2 = 0.5, prec_R2 = 2, cons_D2 = 1), class = "b")) # R2D2
         , iter = 18000
         , warmup = 12000
         , thin = 3
         , chains = 3
         , cores = 3
       )

Most posterior predictive checks look good (apart from two variables, see post #30) but I have trouble interpreting PAVa calibration plots (and they do not look as good as in the recommendations in the link you sent) :




I also tried a simpler model (threatened vs not threatened, instead of 5 extinction categories) with family = bernoulli("logit") (and still using R2D2 prior). The posterior predictive checks look better and, I think, the PAVa calibration plot does too:

Is there anything I could do to improve the model? I guess it may be the best I can get given the data I have.

These are quite good. As the curves have big steps, it means the most of the predictive information is in covariates that are discrete with few states. As the beginning of the red line is at some point slightly below the blue envelope, it means that for those predicted probabilities, the actually observed probability is smaller. As in addition, the end of the red line is then near the top edge of the blue envelope, we can see that the probabilities are slightly shrunk towards mid-range values. We can say the model is underconfident

This calibration plot has the similar shape, but for the low probabilities the quantile dot plot is hiding the envelope.

It is possible to improve the calibration. Now the models are underconfident. One option is to give improving the model and use post-processing of the predicted probabilities to adjust the predicted probabilities to be better calibrated. Another option is to continue investigating what is the reason for underconfidence. As splines did not help, my next suggestions would be to 1) check possible interactions, 2) varying coefficients (kind of interaction, too).

Having better calibrated predictive probabilities is specifically useful if the goal is prediction, but if you are interested just in the posterior, it is possible that further improving the model does not change the posterior for the quantity of interest.

Can you also show the output of get_prior()? I don’t remember how R2D2 behaves for mo()

Thanks again for all the help and information!

I am interested in how the response variable changes relative to covariates, not in the distribution of the parameters per se.


So if I am looking at the first PAVa plot in my previous post, the model is less confident than what it should be when assigning the response category 1?
(1) When adding the two interactions that would make sense in my context, PAVa plots stay relatively similar (the first PAVa plot below just for comparison):


(2) what should I specifically be looking for here ?


Here is the get_prior() for a monotonic covariate :
dirichlet(1) simo monotonic_covariate default


Also, maybe I should specify that I have coded the categorical covariates through:
contrasts(covariate) = contr.sum(number_level_covariate)
which, if I am correct, would now make the intercept represent the grand mean.

Yes.

That’s all we can infer from there. Still underconfident, which is likely due to model not being flexible enough.

As I suspected, it’s not included in R2D2. Maybe post the whole output of get_prior()?

I’m not familiar with this (looking forward to learning, but now too late in Finland)

My bad I did not paste the b class, here is the full output where monotonic seems integrated in the R2D2 approach:

I think I will go with this model parameterization but will regularly check the forum if people want to chime in or add info on the sum contrast coding for categorical covariates.
Thanks again for all the help and information!

1 Like