Thining and diagnostics

Hi,

I’m trying to deal with some memory constraints because of the large number of necessary iterations. I’m running 4 chains with 4000 iterations each (2000 warmup/2000 sampling). I noticed that when I use “thin=2” I start seeing the below problems, which doesn’t happen without thining.

3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

I was expecting these diagnostic checks to be run before thining. Is this wrong?

Thanks,
Karim

thin options is used to reduce the amount of draws saved (there can be cases where, e.g., laptops can’t load all draws from very long chains to memory and then thin helps), and diagnostics are run only for the saved draws.

Are your draws taking too much space, or why are you thinning?

Yes, I’m running out of RAM when I extract the data to a data.frame. Would it be valid for me to fit the model without thining and then thin out the samples after extracting?

You can do diagnostics for non-thinned chains and if everything looks fine, thin, and then check ESS or MCSE relevant for the quantities of interest. That ESS warning has quite high ESS threshold to keep convergence diagnostics more reliable.

Everything looks fine before I thin. Is there a function in rstan that I can use to check ESS after I thin for the quantities I’m interested in?

See https://rdrr.io/cran/rstan/man/Rhat.html and https://rdrr.io/cran/rstan/man/monitor.html
There are more useful functions in monitor.R
See also https://github.com/avehtari/rhat_ess

Sorry, quick question on ess_bulk and ess_tail: the generated quantity I’m interested in is a matrix, can I pass them an (iteration * rows * columns) \times chains array or should I pass it each cell of the matrix separately, i.e., iterations \times chains?

Rhat: Convergence and efficiency diagnostics for Markov Chains in rstan: R Interface to Stan says for ess_bulk and ess_tail that they accept

A two-dimensional array whose rows are equal to the number of iterations of the Markov Chain(s) and whose columns are equal to the number of Markov Chains (preferably more than one).

and monitor: Compute summaries of MCMC draws and monitor convergence in rstan: R Interface to Stan says for monitor

A 3-D array (iterations * chains * parameters) of MCMC simulations from any MCMC algorithm.

Awesome, thanks. I think I figured it out. Looks like my ESS bulk and tail are > 100 so thining is working for these parameters.