I am looking for any form of approximate inference (EP, variational,…) for Gaussian Processes with Binomial likelihood. I have a GP model with binomial likelihood in STAN that currently uses MCMC and now I am looking for another form of approximate inference to find the posterior simply because MCMC is too slow. It. Any pointer, paper or documentation that has the formulas for this case is really much appreciated.

Hi @NTorabi, are you aware that Stan can do variational inference out-of-the box? The exact code to call Stan’s variational inference on a Stan model varies a little bit by interface, but in all cases it’s a small and straightforward change to you analysis script, with no change to your Stan code.

I have an example of a stochastic differential equation (some equivalence to GP’s) with binary data at Binary data in state space models | Charles Driver

By default this fits using optimization, an extended kalman filter linearized approximation to the state update, and no priors (max likelihood), but you can enable priors and have MAP estimation, switch to the Stan HMC sampler, and also do full state sampling instead of the linearization if you want to wait around :)

LIke others mentioned you could try other inference approaches available in Stan, like variational or MAP estimation, but MCMC sampling of a Gaussian Process with nongaussian likelihood is quite feasible and can scale to relatively large (of course the *large* here is also relative).

I implemented it for a multi-channel GP with a 26 \times 26 matrix and negative binomial likelihood in this preprint and it ran in a couple of minutes or so, but it is possible to scale up to M chanels and N observation points (MN \times MN-sized matrix), although at some point your parameter space will become too large and things will become slow, with M=10,20 ir probably became more like hours and convergence started to worsen (plus jointly estimating M>2 is roughly equivalent to estimating multiple pairs). There are also strategies for scaling up for large N, like a smaller number of control points.

We will release detailed, commented code and a walkthrough when we organize it for the peer-reviewed submission, but until then you can find the essential code here.

Can you tell more about your model? GP with what kind of covariance function? What is the structure of the data used to computed the covariance (e.g. 1D, 2D, 3D, more dimensions, hierarchical structure, etc)? How much data you have? Depending on the answers to these, there are different possibilities to speedup GP computations.

This is a Multi-Task-Learning problem using GP where we simultaneously model M segments together using GP. The kernel has the form of K^y \bigotimes K^x. It is 1-dimensional Kronecker GP with binomial output. We model k^y directly and put the LKJ prior on that to model correlation among M segments so it has the shape of M \times M. And K^x is RBF kernel and common among all segments and it has N \times N shape. I see convergence issue sometime, I see that it is slow and so these are the reasons I am looking for EP formulation of GP + binomial likelihood. Thanks

How big are M and N? Do you compute the Cholesky using the Kronecker property? You could also use the profiler to check which part of the model code is taking most of the time. Can you share your code? There could be possibilities for significant speed-ups without need to switch to EP. The black box VI in Stan is likely to not perform well. To use EP or VI that would work well with GPs, it would be good to check GP specific software packages.