fit <- stan(file=‘poisson.stan’,
There are 26 2-dimensional observed inputs x and outputs y, and I want to estimate the posterior predictive distribution given 1296 2-dimensional inputs x. However, running Stan code overnight will stuck at 0% of iterations. Does anyone know how to make the estimation more quickly? Are there any techniques that I could possibly use? Many thanks in advance!
It’s been a while since I last looked at how to do interpolation with GPs in Stan, but as I recall you need to model based on the dimensionality of the observed data (so, cov should be a 23x23 matrix), then in the generated quantities section you construct a larger predict_cov using the desired interpolation dimensionality.
Ah, but in the following “Analytical Form of Joint Predictive Inference” section, it shows a faster variant where you do the smaller observed covariance in the model and construct a larger prediction covariance in the generated quantities section.
Yes, I’m aware of that. But I thought it only applies for gaussian likelihood but does not apply for poisson likelihood, as the predictive posterior distribution does not have the closed form if the likelihood is poisson?
I would use brms’s approximate gp (gpa) approach. It’s detailed here, and can be used for multidimensional GPs as well. It’s already in brms, so you can just do brms(y ~ gp(x1,x2,K=50), family = …). It is magical. It turns a 2 hour model into a 20 second model; an intractably large GP into something that takes 2-3 hours. The gist of it, is that it decomposes the RBF kernel function into eigenfunctions and eigenvalues. The larger the number of eigenfunctions used (K), the more accurate it is. I’ve found that even with 50, it tends to do very well. It’s far faster, nearly as accurate, converges better, and prediction is simpler. Brms, again, will handle the prediction side for you.
Stan will soon have possibility to use Laplace method to integrate over the latent values, which is usually very accurate with Poisson likelihood and faster than sampling the latent values.
brms has two different GP approximations. s(x1, bs="gp") and gp(x1, k=20). The first one is based on spline basis function presentation of Matern covariance function GP from mgcv package, and the latter is based on the basis function approximation via Laplace eigenfunctions (based on Solin & Särkkä, Hilbert space methods for reduced-rank Gaussian process regression and further implementation details will be described in a paper under preparation).
s(x1, bs="gp") works for 1D and 2D
gp(x1, k=20) works also for 3D and sometimes up to 4D given the approximated function is quite smooth. The number of basis functions is k^D.
The current exact GP in brms gp(x1) is not using the more efficient builtin covariance functions, but for small data can still be useful for more than 4 dimensions.
Given “given 1296 2-dimensional inputs x” I would use one of the basis function approaches.
@Stephen_Martin can you describe a bit more what kind of data and models you have had when using gp(x1, x2, k=20) in brms? We are interested learning success and failure stories before finishing the paper.
@avehtari I currently am using additive univariate kernels, currently (so, in brms terms: gp(x1) + gp(x2)) [there’s a reason for this]. I’ve been using generated data, and tested on a couple others. Typically, it’s std.normal predictors between N=300 and N=1000. I am currently applying it to latent variable models (e.g., in CFA/SEM). Specifically, I have a latent location variable for J indicators, and a corresponding latent scale variable for the J indicator’s residual variances. I have the latent location variable predicting the latent scale variable via an approximate gp.
This lets us have non-linear functions of reliability over latent space, ala IRT test information.
@avehtari Nope. It may help that my input space is always similar, because inputs are always N(0,1), so my limits are just between -3*(5/2) and 3*(5/2).
Occasionally there are convergence problems (in terms of Rhats), but certainly less often than when using an exact GP.
What sort of behavior should I be on the lookout for?
Hi @avehtari, Thank you for your reply! I have played with the basis function approximation and it works well! However,I want to further increase the number of 2D data points to 2500 or 3000. I ran the basis function approximation GP and in three days it only finishes 2000 iteration. I may need more iterations. Do you think there are any tricks in basis function approximation GP to further speed up the method when the number of data is large? Or what do people usually do when the number of data points is large when using GP? Many thanks in advance!
How many basis functions? Less basis functions will make it faster. How does your basis function model Stan code look like? You could combine basis functions with poisson_log_glm for at least 4x speedup. You might benefit from @bbbales2’s new adaptation and low rank mass matrix code, but that requires installing special version of CmdStan.
Thank you for your help! I used 40 basis functions and 1.5 boundary factor as I’m only sure my normalised length scale is greater than 1*2/35=0.05. I attached my Stan code here. gp.stan (2.6 KB) Do you have any examples which combines poisson_log_glm and basis function?
Ah, this is not optimal and can’t be directly switched to is poisson_log_glm. You would need to create the full design matrix instead.
You could try to generate one with brms make_stancode and use tensor splines instead of GP. For 2D and that many observations, Although the GP basis function approach you use can be slightly better as a prior, mgcv creates the full design matrix for you and then it’s trivial to use poisson_log_glm. The GP basis function approach you use, could also create that full design matrix, but is not just doing in the code you are using,