Unit testing sampling

tpapp · June 9, 2017, 11:02am

This is a question about unit testing samplers.

I read the wiki page and I am a bit confused. Suppose that in a given model, I have a parameter f, and I set it to a known value f₀, simulate data, and using that data I ran Stan (using the model) and get posterior draws fhat after burn-in. Then in

Δ = (fhat-E[f]) / MCMC-SE ~ N(0, 1)

should I use E[f] = f₀, or the sample average of fhat? Similarly, is MCMC-SE the standard error of fhat, calculated using the effective sample size?

Cook, Gelman and Rubin (2006) [Validation of Software for Bayesian Models…] propose using posterior quantiles of the true value, transformed into a χ² via a unit normal. Is that related, or complementary to the above approach?

Bob_Carpenter · June 12, 2017, 6:53am

Yes, MCMC-SE is the MCMC standard error, which is estimated by dividing the estimated posterior standard deviation by the square root of the estimated effective sample size.

@betanalpha will have to answer the rest about (1), but I can say the technique’s different than Cook et al., which looks at posterior quantiles relative to simulated values, and uses the inverse CDF for a normal to convert them back to what should be a standard normal and then tests whether that is true.

betanalpha · June 12, 2017, 12:22pm

Regarding (1), no. The MCMC Central Limit Theorem that you quote makes no statement about how close the MCMC estimator is to the true value of the parameters.

Plugging data into the likelihood and adding the prior gives you a posterior distribution, and we extract information from that distribution by computing posterior expectations, E[f]. The MCMC Central Limit Theorem says that for certain well-behaving Markov chains the MCMC estimator of a given posterior expectation, f_hat, will be close to the true expectation E[f]. In fact it will be unbiased with standard error MCMC-SE relative to all possible realizations of the Markov chain. But if your model is poorly specified then those posterior expectations could be very far off from the true values and MCMC can’t fix a bad model.

Note that all of this assumes that the MCMC Central Limit Theorem holds, which is not guaranteed in practice. This is why I’m always yelling about divergences and other diagnostics that identify obstructions to the Central Limit Theorem holding in practice.

The only guarantees that we have about calibration of posterior expectations and true parameter values are over ensembles of experiments, which leads to (2). In other words, even if the model is well-specified then there is no guarantee that in any given analysis the posterior captures the true value. But it will over an ensemble of analyses following the procedure in Cook et al.

tpapp · July 23, 2017, 12:27pm

So effectively to use the heuristic integration testing procedure, I would have to sample from a distribution for which I know E[f]?

Also, what do you mean by

look at Delta = (f_hat - E[f]) / MCMC-SE and compare to the distribution N(0, 1)

since f_hat is a single value from each chain. Should the f_hats from multiple chains be compared to N(0, 1) with a test, say Kolmogorov-Smirnov? Or p-values for single f_hats?

Finally, where can I find these tests in the codebase? I found the test directory in stan-dev, but nothing like the example in the wiki.

Bob_Carpenter · July 23, 2017, 3:49pm

The closest thing we have is this repo from Michael:

We’re still working getting this all better integrated into the overall workflow.

This should hold for all the quantities of interest over multiple simulations of data and fit. You could also run something like the Cook-Gelman-Rubin diagnostics.

tpapp · August 19, 2017, 10:40am

What method is used to calculate the “fit” values for various models?

Bob_Carpenter · August 19, 2017, 3:35pm

We’re just using Stan and other algorithms we’re evaluating to do the fitting. Then just doing things like simple greater-than comparisons.

Topic		Replies	Views
Uncertainty around central tendency - Bayes SE vs. Bayesin CI General	10	1682	June 15, 2017
Calculation of SEof mean in STAN General	2	548	December 15, 2018
Cook et al spike at 0 Modeling techniques	64	3929	August 11, 2017
How to compare different algorithms for Bayesian inference in terms of speed (and subject to effectiveness) General	16	3833	February 17, 2018
How to produce unbiased samples with HMC method Modeling	8	840	July 22, 2018

Unit testing sampling

Related topics