I am using cmdstan_diagnose() to show the diagnostic results. However, it takes much longer to print when I calculate log_lik in the mode than without log_lik. I might be wrong but I guess it is because there are many data points, each has a log_lik distribution and cmdstan_diagnose() is also “diagnosing” log_lik. My gut feeling is that we should not consider log_lik in cmdstan_diagnose()since they are generated from posteriors. Could you please teach me a simple way to exclude log_lik from cmdstan_diagnose()? It sames there is no option argument in the cmdstan_diagnose() function, should I manually remove log_lik from fit restuls before running it? I believe it is the same case for cmdstan_summary()
Yeah that’s right. cmdstan_diagnose() and cmdstan_summary() are just calling underlying methods from CmdStan itself, which doesn’t provide a way to select variables. However, if you use fit$summary() (which uses the posterior package) then you can specify which variables to summarize. For example:
This will give you posterior summary statistics, rhat, effective sample sizes, but not divergence and treedepth warnings. Those are coming in this pull request
which will be merged soon but is already usable if you want to try it.
Hi @jonah, thanks for your suggestions. May I ask if you have any suggestion to kind of ignore log_lik when doing diagnostic? Should I use your method to export posteriors without log_lik and then apply a user-defined diagnostic function? but then, as you mentioned, I will lose divergence info. Do you think it is possible to add this as a new feature to allow something like
In rstan, we have stan_diag()RStan Diagnostic plots — Diagnostic plots • rstan, probably I can remove log_lik from the fitting result before applying this function. However, it requires a rstan fit object and usually converting cmdstan results to rstan fit object with large number of parameters would take some time.
We can’t do this in CmdStanR without a change in CmdStan because cmdstan_diagnose() is just calling CmdStan’s diagnose utility. But we just merged a pull request on the master branch of CmdStanR that adds a method for getting other diagnostics like divergences. So if you install CmdStanR from GitHub with
devtools::install_github("stan-dev/cmdstanr")
you can use this:
# this will tell you if you have divergence, treedepth or E-BFMI issues and shouldn't be
# affected by large number of log_lik elements (these are per-variable diagnostics)
# see ?diagnostic_summary for details
fit$diagnostic_summary()
And then you can use fit$summary() to get the r-hat and ess values like I mentioned above.