Questions from a beginner

fenbukaide · October 24, 2017, 12:18am

Hi, Stan experts
I am a beginner in Bayesian modeling and Stan. I love Stan since it is very powerful. But I have several questions about the sampling and optimizing function.

Stan provides the n_eff number that indicates the number of effective samples in MCMC sampling. However, there seem no markers about which sample is considered as ‘effective’. I only get all samples after warm-up. Is that correct?
I am not sure how the ‘mean’ of a variable in the output is derived. I computed the mean of all samples eventually obtained. The resulted number is not equal to the value provided in output. Does the ‘mean’ refer to the mean of effective samples only??
Using the optimizing function can get a point estimate of a variable. However, can stan also provide the log likelihood of observations given this fitted value? If not, is there any easy way to calculate it?
Or in general, can someone point me to the explanations for all output metrics.

Thanks so much for your time
best
ruyuan

novice · October 24, 2017, 8:53am

Hi, here’s a quick response to at least the first two of your questions.

Yes. There is no distinction between effective and ineffective samples, they all carry information. The point is that MCMC samples are autocorrelated, so that a given number of samples carry less information than the same number of independent samples would. If you decide for instance that you want at least the equivalent of 200 independent samples for each parameter, you can make sure that your number of effective samples is higher than this (which will take more than 200 MCMC samples). If you end up with high number of samples, thinning can be a relevant option.
As far as I know, the returned posterior mean is the mean of the posterior samples for the parameter in question. If you cannot reproduce this quantity, I would double check how to obtained your own statistic. Make sure you applied the mean to the samples for the parameter in question.
This I may not be able to answer, as I rarely use the optimizing function. Still, as a general note on producing log-likelihoods in Stan, you could have a look at the code at the end of Gelman’s paper on WAIC and LOO: http://www.stat.columbia.edu/~gelman/research/unpublished/loo_stan.pdf
My best suggestion would be to look at the Stan and rStan manuals.

Cheers,
Joergen

Bob_Carpenter · November 2, 2017, 9:31pm

The posterior mean will be the mean of all the draws after warmup.

Yes, but it drops constant terms in any expressions involving the ~ statement — if you want those, you need to use target += foo_lpdf(y | ...) instead.

The MCMC chapter of the manual explains what all those numbers are and how they’re all computed and why we report the things that we do.

Topic		Replies	Views
Low effective sample size after running Bayesian cognitive model in Stan Modeling rstan , fitting-issues	8	782	August 18, 2021
Calculation of SEof mean in STAN General	2	539	December 15, 2018
Bayesian inference in Stan Modeling	40	5805	July 18, 2017
Bayesian Inference with free optimisation parameters Modeling	12	526	February 1, 2019
Reporting effective sample sizes in manuscripts: include log-posterior? General	16	3347	December 18, 2018

Questions from a beginner

Related topics