LIke others mentioned you could try other inference approaches available in Stan, like variational or MAP estimation, but MCMC sampling of a Gaussian Process with nongaussian likelihood is quite feasible and can scale to relatively large (of course the large here is also relative).
I implemented it for a multi-channel GP with a 26 \times 26 matrix and negative binomial likelihood in this preprint and it ran in a couple of minutes or so, but it is possible to scale up to M chanels and N observation points (MN \times MN-sized matrix), although at some point your parameter space will become too large and things will become slow, with M=10,20 ir probably became more like hours and convergence started to worsen (plus jointly estimating M>2 is roughly equivalent to estimating multiple pairs). There are also strategies for scaling up for large N, like a smaller number of control points.
We will release detailed, commented code and a walkthrough when we organize it for the peer-reviewed submission, but until then you can find the essential code here.