8 school problem question

Yiu_Lau · February 23, 2018, 11:38pm

In this case study http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

it was claimed that the true value of the marginal posterior mean of log(tau) is 0.7657852.

How was this derived? Do we know that the marginal posterior distribution has a closed form solution or was it calculated otherwise? I suspect it’s done by Gibbs sampling because it’s very close to conditionally conjugate.

Bob_Carpenter · February 27, 2018, 7:42am

We derive reference results with long, carefully validated runs of NUTS.

avehtari · February 27, 2018, 8:04am

BDA[1-3] chapter 5 shows that the 8 schools posterior can be factored so that marginal posterior of tau can be computed with one dimensional quadrature. Thus the expectation of log(tau) can be computed with arbitrary accuracy without Monte Carlo.

betanalpha · February 27, 2018, 7:21pm

In that particular case study the reference was calculated as @Bob_Carpenter described – by using a very long and validated run of the dynamic Hamiltonian Monte Carlo implementation in Stan.

Yiu_Lau · March 1, 2018, 8:00pm

I don’t understand this. Since the point of the case study is to show nuts samples are biased because the chain could only explore the highly correlated area very rarely in order to maintain detailed balance, wouldn’t a long nuts chain also give biased estimates for tau? Like, every time the chain goes near the problematic gradient spot, the chain overcompensates by biasing in the other direction. A long chain would simply be a repetition of this phenomenon? For example, when you said you were running a really long nuts chain to calculate tau. Did you tune it to the point where it encounters no divergence? Or did you simply disregard the divergences? If there were divergences while sampling this long chain, wouldn’t it go against the the idea that divergences results in biased estimates?

Yiu_Lau · March 1, 2018, 9:09pm

May be I am not understanding correctly, but I think in chapter 5 (BDA3) ( I assume its the bit around equation (5.5)) it only gives the expression for p(mu,tau | data) because phi = (mu, tau) in this example. Then to calculate p(tau | data) we still need to integrate out mu.

So do you mean doing one dimensional quadrature for mu first and then do tau, or doing a two-dimensional quadrature?

betanalpha · March 2, 2018, 12:02am

A long chain does indeed repeat the oscillating bias for the centered parameterization. The baseline value is derived from a fit with the non-centered parameterization that does not suffer from this problem. That’s what @Bob_Carpenter and I meant by “validated run”.

avehtari · March 2, 2018, 8:20am

Equation 5.21 gives analytical form for the unnormalized marginal p(tau|y). This is one-dimensional function and easy to integrate with quadrature with arbitrary accuracy. As Mike wrote, this approach was not used in that specific case study, but it would be easy to check the result this way, too.

Topic		Replies	Views
Inefficient sampling and divergent transitions in network meta-analysis (ported from WinBUGS) Modeling fitting-issues , meta-analysis	12	1271	July 24, 2020
New preprint: Investigating the efficiency of marginalising over discrete parameters in Bayesian computations Publicity rstan	9	1343	September 21, 2022
Divergences and Quantifying HPD intervals Modeling	7	1185	October 13, 2020
Error evaluating a 1D integral for Pareto/GGG Buy Till You Die marginal likelihood Modeling fitting-issues , mixture	27	1462	January 18, 2023
Variation in elapsed time of parallel chains and best practices for computing expectations from multiple chains Algorithms mcmc	18	3223	October 29, 2018

8 school problem question

Related topics