I used stan_jm with different option for assoc like etavalue, etaauc etc. How can I compare these models, any fitness information I can get from these model?

This is a general question, I cannot find how to evaluate models with different option for parameters.



  • loo for model comparison (for e.g. comparing models with different association structures).

  • posterior uncertainty intervals for assessing the magnitude of parameters.

  • ps_check for a rough and ready plot to assess calibration of the survival curve.

  • I once started work on measures of prediction error (e.g. Brier score) and discrimination (time-dependent AUC) – see this branch – but I never finished it for the official rstanarm release.

@jackinovik might also have some suggestions or code base that I haven’t mentioned here.

1 Like

sorry for tagging this old thread.
Model comparison using loo/waic seems to be a little bit tricky in terms of joint modeling.
Since joint models generally have many subject-specific parameters (random effects), thus many observations are highly influential and removing them may be problematic.
There are always a lot of warnings about p_waic > 0.4 or pareto_k > 0.7 when I apply waic/loo to stan_jm objects.
In this case, should I trust those model comparison results?

Yeah, you are right – I have seen the same thing. loo is aggregated at the level of an individual, not an observation; so “leave one out” in stan_jm is leave one entire individual out of both the longitudinal and survival submodels. So as you have seen, they appear more influential than leaving out a single observation in another model.

How much this matters, or exactly why it happens like that, I’m not sure sorry. I’m not really knowledgeable enough on the loo theory and the Pareto smoothing. Perhaps someone better versed in that stuff like @jonah or @avehtari can advise.

First you should decide whether you care which one you are computing, leave-one-observation-out or leave-one-individual-out. If you really want leave-one-individual-out, then it seems stan_jm already does that and high pareto k’s indicate that PSIS-LOO fails (and WAIC fails even more badly), and you should not trust the results (unless elpdf_diff is very large) but you could try solving the issue with moment matching loo (see Avoiding model refits in leave-one-out cross-validation with moment matching • loo) or K-fold-CV (see, e.g., Holdout validation and K-fold cross-validation of Stan programs with the loo package • loo)

If you would be fine with leave-one-observation out, then you would need to modify how log_lik is computed in generated quantities, so that you compute log_lik separately for each observation.

1 Like

Thank you for the advice! I do want leave-one-individual-out.
I will read them and try to address this issue.