Sorry for the long description but I promise (that I believe) this is going somewhere:
I am trying to infer parameters from a large gaussian process (or maybe this should be called a field, I’m not sure) model with correlated tasks. The total number of tasks is something we can prioritize and include a few dozen, few hundred, or ideally all of it (which would be a few thousand), and the number of observation for each task is in the tens, so inference involves a Cholesky decomposition of a matrix with a few to several thousand elements in each dimension (as well as sampling the latent gaussian function to use a negative binomial likelihood).
As a consequence, running 10k Stan iterations is taking well over a week (or several weeks) depending on whether I run it with all subgroups (for instance to make them share the binomial dispersion) or not, so I’m in need of ways to speed up Stan, or use approximate methods like INLA.
To that end I’ve been reading Rue & Martino (2009). The main limitation of INLA seems to be that the model must be a gaussian latent field for the Laplace approximation of \tilde{\pi}(x_i\theta,y) to be reasonable, but since gaussian processes are the prototypic LGF this doesn’t seem to be a problem to me right now. Nevertheless several questions came up, so finally here they are:

INLA requires a maximumlikelihoodlike optimization step to find the mode of \tilde{\pi}(x_i\theta,y) and \tilde{\pi}(\thetay) around which to expand the Laplace approximation. Is this feasible for large parameter spaces?

Sparsity is invoked for computational efficiency, but it doesn’t seem to be required for INLA, is that correct?
About 1., I was under the impression that was the reason why things like simulated annealing existed, and why MCMC approaches wouldn’t get stuck in the same way as regular maximum likelihood methods. And about 2. this seems to apply to any method where the model is a latent gaussian field, so if the covariance matrix happens to be sparse (it’s probably not) it would improve Stan’s performance as well.
Finally, (3.) in the case of Stan for that kind of model, what can be done except for reducing the number of samples? Is GPU acceleration already default for matrix operations, is there specific commands or customization needed?
I guess if you read until here I can only thank you for your patience as well.