I have a GPR with squared exponential kernel. The design matrix is 5000x14 (5000 observations with 14 continuous variables). Is it possible to solve such problems in Stan? Or is it better to use GPstuff?
Not enough information for a complete answer. In special cases (e.g. 1) normal model for the target, single length scale (might even work with several length scales if carefully done), or 2) additive model with basis functions) yes and faster with optimization. In general case, the autodiff may require too much memory or dynamic HMC to integrate over the latent values can be very slow.
Thank you. Is there a way to do something similar to ARD in this case? The problem I have is really explainable AI (XAI). One approach is to use SHAP values to establish importance of features. I would like to design something Bayesian by using Gaussian process.
I guess. I was thinking about having Gaussian process model for my system and using ARD to determine important variables. But it seems my problem is too large for latent Gaussian process. Thus the second paper might be a good solution. Do you have Stan or R code to share?
Question. Do you scale the data? Let’s say in your design matrix the first column ranges from 0 to 1 while the second column ranges from 0 to 100. Do you want to scale all columns to the common scale -1 to 1 when calculating the distance for the radial basis function kernel?
Depends on the data, but usually I would normalize continuous covariates to have mean=0 and sd=1, so that it’s easier to define the priors for lengthscales. Depending on the additional domain specific information other scalings can be better to make the covariates to be easier to compare.
Thank you. The original problem was to find important variables. My intention was to use ARD for this. However for 5000 observations and 14 variables with a binomial likelihood ARD was too slow.
My understanding is that the algorithms in both articles require to estimate PPD. In case I have a squared exponential kernel with different length scales I gain nothing with respect to computational time. Perhaps if I use radial basis function kernel instead to estimate PPD for training points with Stan I could reduce computational burden. Then I could apply ideas from the older paper. Right? Or do you have any other suggestions?
With that ratio, no need for MCMC. Use something like GPy, GPflow or in R CRAN - Package gplite
Sorry, I don’t understand what you are saying.
If you want to model all interactions, have binomial observation model, 5000 observations and 14 variables, use something else than Stan.
If you are happy with an additive model, have binomial observation model, 5000 observations and 14 variables, you can try the basis function approximated GP also in Stan (probably good to use glm compound functions for speedup).
Based on the information you gave about the the model, the conditional posterior of the latent values is well approximated with normal distribution and the latent variables can be efficiently integrated out with Laplace, EP or VI and the remaining marginal posterior of the covariance function parameters is low dimensional and close to normal and for variable relevance purposes finding the maximum of the marginal posterior with optimization would be sufficient. The packages I mentioned do this kind of inference efficiently. If you would a more complicated model or would have a much smaller data then MCMC could produce more accurate inference.
The original problem can be posed as latent variable GP: y_i\sim Binomial(5,p_i) p\sim MultivariateNormal(0,K(x∣\theta))
I would like to estimate posterior of p as well as to determine important variables. There are 5000 observations and vector x has 14 components.
I’m still struggling how to approximate binomial outcome with normal outcome.
The problem I have can be posed as latent variable GP: y_i\sim Binomial(5,p_i), i=1,...,5000 p\sim MultivariateNormal(0,K(x∣\theta))
I would like to estimate the posterior of p as well as to determine important variables. Vector x has 14 components.
Can you please explain how to integrate out latent variables with variational inference to get marginal posterior of the covariance function parameters? Can I use Stan to perform integration?
This is what I expected, and this is the very basic form of Gaussian process and there are many software packages that can handle this much faster than Stan.
There are many GP specific software packages implementing these.
Currently, you can use Stan MCMC to perform integration, but it is currently much slower than GP specific software (in the future it will be less slow but still slower than GP specific software).