Speed up adaptation with Variational Approximation?

If you want to automate how to choose the locations. you could try kmeans. It is quite commonly used in this kind situations, and usually works quite well (although not necessarily optimal).


That’s not a bad idea. I had thought to maybe define a probability associated with a vector of locations in terms of an attraction force to individual Census regions and a repulsion force between the RBF centers and let Stan move the RBF centers around for me, but I think this is getting more fancy than needed for my model.

One question I have though is that I seem to get not very good mixing for the RBF coefficients, whereas the actual individual census region multipliers, which are basically Mult[i] = RBF(x[i],y[i]) + error_i with t distributed errors mixes just fine. Since it’s this quantity which affects the predictions, it basically indicates to me that the smooth function I’m estimating is not that well identified (much of the variation is at a fine spatial scale within each metropolitan area for example). However, I don’t actually need that smooth function to be well identified, it is after all basically a regularization device for the Mult[i] parameters.

So, how much should I care about things like Rhat or effective sample size of the RBF coefficients? Provided that I have Rhat ~ 1 for Mult[i] and good mixing in traceplots it seems that I should be good to go to use this information in prediction or explanation, and ignore the fact that the nuisance parameters of the regularization function struggles to converge.

We see exactly this behavior all the time in hierarchical models—the hierarchical parameters won’t be well estimated, but the lower level parameters and predictions will converge. I think ADVI had similar problems with hierarchical parameters being off in ways that didn’t much affect prediction.

The problem we see is that we don’t know if we’re seeing problems that affect predictions until we solve the problems. So we generally recommend trying to remediate problems with convergence.

But I’m not sure we need to be so strict in some of these cases. @betanalpha ?