State-space, GP best parameterization & recentering

aaronjg · June 24, 2017, 9:34pm

I’ve been looking at some of the examples in the manual and on the forums for time-series models and was wondering what the preferred way to parameterize these models is.

It seems that in some cases people use:
f ~ MultiNormal(0, K(x|alpha,rho))
and others use:

f[1] ~ normal(0,alpha^2)
for(i in 2:length(f))
   f[i] ~ normal(f[i-1], c(x[i]-x[i-1],alpha,rho))

It seems that these would work out to be the same, but I was wondering if there is a difference in computational effeciency in STAN? Also for some covariance kernels, eg Matern 1/2, there are sparse representations for the precision matrix and could use MultiNormPrecision.

Additionally, using the non-centered parameterization you could rewrite the second using f’ such that f~N(0,1). This seems very similar (equivalent?) to using the Cholesky decomposition of the covariance matrix.

I don’t have a good intuition on what would be best in terms of keeping parameters on unit scales, vectorization, and depth of the autodiff graph. I was going to start exploring these options for a model I am working on but wanted to ask here first if people have experience or reccomendations between:

Multinormal vs. Conditional Specification
and
Centered vs. Non-Centered Parameterization

bbbales2 · June 24, 2017, 10:01pm

I just finished writing up a post to get some feedback on some stuff I was working on with regards to time series stuff: Approximate GPs with Spectral Stuff

I really don’t have much experience with this stuff. Just trying to get a grips on what other people are doing, so I am curious as well.

betanalpha · June 24, 2017, 10:22pm

There is no absolute answer – each of those model implementations can drastically change your posterior geometry, and those changes will depend on the size and structure of your data. Ultimately you start with one then check for speed and, most importantly, all of the diagnostics (especially Rhat and divergences). If there are issues then you can try another implementation.

aaronjg · June 26, 2017, 8:54pm

Do you have any suggestions/intuition about how the size/structure affects things? I’m guessing considerations in the hierarchical GP would be something about length scale of the data, amount of sampling noise, number of GPs, and number of data points per process?

I switched from the conditional noncentered to the multinormal centered taking advantage of the sparse precision matrix. For a small dataset it slowed things down by about 75%, with no noticeable increase in sample efficiency. I was a bit surprised since the with the precision matrix, the derivatives don’t have to propegate as far, but there are around twice as many multiplication operations which I guess slowed things down. However, without running this for the 24 hours or so on the full simulated data, I’m not really sure how things hold up…

betanalpha · June 27, 2017, 1:15am

Try to fit without any data. Try to fit with lots of data. Centering and non-centered transition between pathologies in the latent Gaussian structure and the top-level measurement model so you can understand when to apply them by studying those two extremes.

aaronjg · June 27, 2017, 1:18am

Thanks! I’ll give that a shot.

Topic		Replies	Views
Simple Non-Centered Time Series? Modeling	7	1186	November 29, 2017
Sparse GPs of different sizes Modeling	3	384	February 13, 2023
Applied Gaussian Processes in Stan, Part 1. A Case Study Modeling	12	2697	November 21, 2019
Non centered parameterization on variance parameter Modeling	31	8508	October 22, 2018
Spatio-Temporal Model Modeling	11	3984	February 8, 2018

State-space, GP best parameterization & recentering

Related topics