Sparse or low-rank approximate mass matrix


#1

I was looking at manual and it suggested that for some models, for example the stochastic volatility model, that using the full mass matrix rather than the diagonal mass matrix could speed things up, though this does not scale for larger problems.

If I have some sense of the covariance structure of my problem, it seems like it could be possible to specify which elements to keep track of and save on computational efficiency. Or would this not work due to the mass matrix needing to be inverted?

Alternatively, would it be possible to use a low rank approximation to the mass matrix similar to what is done in L-BFGS? I realize all of this will likely involve digging into the Stan internals, and wanted to get a sense if this sounds reasonable before digging in.


#2

Have you incorporated that information into the statistical model itself? If you have, then next thing to do is to transform the parameters so that the unconstrained parameter space is more regular. The non-centered parameterization is an example of that.

Estimating the mass matrix well will only take you so far. If the gradients of the log probability function are different in orders of magnitude for different regions of the typical set, being able to estimate the (single) mass matrix well isn’t going to be enough.

Hopefully that makes sense. If not, maybe read through some of the literature on HMC and NUTS? @betanalpha’s stuff on arXiv is great.


#3

I’ve tried that, and it has improved things a lot - but I am still running into issues. It’s a hierarchical model and for some parameters there is more data than others so in some pairs correlations pop up in one parameterization and I can reparameterize to eliminate these, but then I have correlations in other variables. I do have a good sense from the data which is better for which so I could pick and choose based on the data, but I was trying to avoid implementing both parameterizations in the same model.

I’ve also gotten rid of the ‘funnel’ problems using the NCP so I’m now dealing with rotation issues.


#4

Sounds like you tried the easy and not so easy fixes. Awesome!

If I were in your shoes, I’d still attack it from the Stan program side rather than try to tweak warm-up, but I can see why you’d want to do that.

If you think it’s the estimation of the mass matrix that’s a problem (and not that there is no one mass matrix that will scale the joint density well), then I’d verify that next by swapping out the mass matrix. Or trying some different estimation procedure for the mass matrix.


#5

by “Stan program” do you mean Stan model or Stan itself? There are probably some other more adhoc reparameterizations I could explore before getting into the stan model.

I’m not sure I follow?