8 school problem question

In this case study http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

it was claimed that the true value of the marginal posterior mean of log(tau) is 0.7657852.

How was this derived? Do we know that the marginal posterior distribution has a closed form solution or was it calculated otherwise? I suspect it’s done by Gibbs sampling because it’s very close to conditionally conjugate.

We derive reference results with long, carefully validated runs of NUTS.

BDA[1-3] chapter 5 shows that the 8 schools posterior can be factored so that marginal posterior of tau can be computed with one dimensional quadrature. Thus the expectation of log(tau) can be computed with arbitrary accuracy without Monte Carlo.

In that particular case study the reference was calculated as @Bob_Carpenter described – by using a very long and validated run of the dynamic Hamiltonian Monte Carlo implementation in Stan.

I don’t understand this. Since the point of the case study is to show nuts samples are biased because the chain could only explore the highly correlated area very rarely in order to maintain detailed balance, wouldn’t a long nuts chain also give biased estimates for tau? Like, every time the chain goes near the problematic gradient spot, the chain overcompensates by biasing in the other direction. A long chain would simply be a repetition of this phenomenon? For example, when you said you were running a really long nuts chain to calculate tau. Did you tune it to the point where it encounters no divergence? Or did you simply disregard the divergences? If there were divergences while sampling this long chain, wouldn’t it go against the the idea that divergences results in biased estimates?

May be I am not understanding correctly, but I think in chapter 5 (BDA3) ( I assume its the bit around equation (5.5)) it only gives the expression for p(mu,tau | data) because phi = (mu, tau) in this example. Then to calculate p(tau | data) we still need to integrate out mu.

So do you mean doing one dimensional quadrature for mu first and then do tau, or doing a two-dimensional quadrature?

A long chain does indeed repeat the oscillating bias for the centered parameterization. The baseline value is derived from a fit with the non-centered parameterization that does not suffer from this problem. That’s what @Bob_Carpenter and I meant by “validated run”.

1 Like

Equation 5.21 gives analytical form for the unnormalized marginal p(tau|y). This is one-dimensional function and easy to integrate with quadrature with arbitrary accuracy. As Mike wrote, this approach was not used in that specific case study, but it would be easy to check the result this way, too.

1 Like