Cmdstan_diagnose() is slow with log_lik included in the model

Hi there,

I am using cmdstan_diagnose() to show the diagnostic results. However, it takes much longer to print when I calculate log_lik in the mode than without log_lik. I might be wrong but I guess it is because there are many data points, each has a log_lik distribution and cmdstan_diagnose() is also “diagnosing” log_lik. My gut feeling is that we should not consider log_lik in cmdstan_diagnose()since they are generated from posteriors. Could you please teach me a simple way to exclude log_lik from cmdstan_diagnose()? It sames there is no option argument in the cmdstan_diagnose() function, should I manually remove log_lik from fit restuls before running it? I believe it is the same case for cmdstan_summary()

Thank you very much.

Yeah that’s probably the reason it’s so slow.

Yeah that’s right. cmdstan_diagnose() and cmdstan_summary() are just calling underlying methods from CmdStan itself, which doesn’t provide a way to select variables. However, if you use fit$summary() (which uses the posterior package) then you can specify which variables to summarize. For example:

fit$summary(variables = c("alpha", "beta")) 

or to include everything except log_lik:

exclude_log_lik <- grep("log_lik", fit$metadata()$model_params, value = TRUE, invert = TRUE)
fit$summary(variables = exclude_log_lik)

This will give you posterior summary statistics, rhat, effective sample sizes, but not divergence and treedepth warnings. Those are coming in this pull request

which will be merged soon but is already usable if you want to try it.


Hi @jonah, thanks for your suggestions. May I ask if you have any suggestion to kind of ignore log_lik when doing diagnostic? Should I use your method to export posteriors without log_lik and then apply a user-defined diagnostic function? but then, as you mentioned, I will lose divergence info. Do you think it is possible to add this as a new feature to allow something like

exclude_log_lik <- grep("log_lik", fit$metadata()$model_params, value = TRUE, invert = TRUE)
fit$cmdstan_diagnose(variables = exclude_log_lik)

In rstan, we have stan_diag() RStan Diagnostic plots — Diagnostic plots • rstan, probably I can remove log_lik from the fitting result before applying this function. However, it requires a rstan fit object and usually converting cmdstan results to rstan fit object with large number of parameters would take some time.


We can’t do this in CmdStanR without a change in CmdStan because cmdstan_diagnose() is just calling CmdStan’s diagnose utility. But we just merged a pull request on the master branch of CmdStanR that adds a method for getting other diagnostics like divergences. So if you install CmdStanR from GitHub with


you can use this:

# this will tell you if you have divergence, treedepth or E-BFMI issues and shouldn't be
# affected by large number of log_lik elements (these are per-variable diagnostics) 
# see ?diagnostic_summary for details

And then you can use fit$summary() to get the r-hat and ess values like I mentioned above.


Sounds great. Thanks a lot.

1 Like