A general approach to reparameterization

discourse

#1

Hi all,

In finite dimensional models, the Central Limit Theorem can usually be applied to prove that the posterior distribution converges to a multivariate normal as the amount of data grows. However, Stan can have trouble sampling from multivariate normals when the parameters are highly correlated. My idea is to use draws from an initial “bad” parameterization to locate a better one.

Suppose f(y|theta) is the model and f(theta) is the prior. What do you think about the following general strategy:

  1. Obtain an initial set of draws from the posterior f(y|theta)f(theta) in the usual fashion
  2. Use the draws to compute a posterior mean vector mu and a posterior covariance matrix Sigma
  3. Reparameterize the model as f(y|mu + L_Sigma * epsilon)f(mu + L_Sigma * epsilon), where L_Sigma is the Cholesky factor of Sigma
  4. Obtain draws from the posterior of epsilon
  5. Iterate to update mu and Sigma

I’ve had some success with this approach in practice, as the posterior of epsilon is usually “close” to a multivariate standard normal.


#2

That is computationally very expensive and would be better achieved by adapting the whole mass matrix (same cost), more efficient.


#3

This is similar to what Stan already does if you specify a dense mass matrix (Euclidean metric). It uses exponentially increasing blocks to alternatively draw a sample and update the covariance matrix estimate. We can then use that to adjust the metric over which we sample.

We essentially do what you’re suggesting informally by reparameterizing a model by eye to try to put all the parameters on the same scale. Once everything’s on a unit scale, correlation doesn’t matter. It’s not the correlation that hurts Stan, it’s the varying scales plus correlation when we use a diagonal metric. Also, varying curvature is a problem in that there’s no fixed mass matrix/metric that works everywhere, so we have to be conservative, which can result in slow sampling with a small step size (the alternative is to introduce more bias as you won’t get into the high curvature regions with larger step sizes).