Hi, Stan experts
I am a beginner in Bayesian modeling and Stan. I love Stan since it is very powerful. But I have several questions about the sampling and optimizing function.

Stan provides the n_eff number that indicates the number of effective samples in MCMC sampling. However, there seem no markers about which sample is considered as ‘effective’. I only get all samples after warm-up. Is that correct?

I am not sure how the ‘mean’ of a variable in the output is derived. I computed the mean of all samples eventually obtained. The resulted number is not equal to the value provided in output. Does the ‘mean’ refer to the mean of effective samples only??

Using the optimizing function can get a point estimate of a variable. However, can stan also provide the log likelihood of observations given this fitted value? If not, is there any easy way to calculate it?

Or in general, can someone point me to the explanations for all output metrics.

Hi, here’s a quick response to at least the first two of your questions.

Yes. There is no distinction between effective and ineffective samples, they all carry information. The point is that MCMC samples are autocorrelated, so that a given number of samples carry less information than the same number of independent samples would. If you decide for instance that you want at least the equivalent of 200 independent samples for each parameter, you can make sure that your number of effective samples is higher than this (which will take more than 200 MCMC samples). If you end up with high number of samples, thinning can be a relevant option.

As far as I know, the returned posterior mean is the mean of the posterior samples for the parameter in question. If you cannot reproduce this quantity, I would double check how to obtained your own statistic. Make sure you applied the mean to the samples for the parameter in question.

This I may not be able to answer, as I rarely use the optimizing function. Still, as a general note on producing log-likelihoods in Stan, you could have a look at the code at the end of Gelman’s paper on WAIC and LOO: http://www.stat.columbia.edu/~gelman/research/unpublished/loo_stan.pdf

My best suggestion would be to look at the Stan and rStan manuals.

The posterior mean will be the mean of all the draws after warmup.

Yes, but it drops constant terms in any expressions involving the ~ statement — if you want those, you need to use target += foo_lpdf(y | ...) instead.

The MCMC chapter of the manual explains what all those numbers are and how they’re all computed and why we report the things that we do.