I have a spatiotemporal hierarchical model with a latent processes on which I place a Gaussian process prior. The model has been tested and validated and fits my data well, but sampling is rather slow. To make the model more computational tractable and to be able to apply it to larger data sets, I am implementing a nearest neighbor approximation for the Gaussian process. However, in doing this, I run into a puzzling issue. I will not show the entire model as it is rather involved (the model has been published here: https://www.pnas.org/content/117/4/1877) and will try to explain the issue in simple terms and with just a few lines of code.

The relevant code reads (note the distinction between the original model and the nearest neighbor model):

parameters {
vector[n] a;
...
}
transformed parameters {
vector[n] w; // this is the latent process
// in the original model I have:
L = cholesky_decompose(Q); // Q is the full covariance matrix
// in the nearest neighbor model I have:
L = nngp_chol(...); // nngp_chol is a user-defined function that computes the nearest neighbor Cholesky factor of the covariance matrix Q
w = L * a;
...
}
model{
a ~ normal(0,1);
...
}

By setting the number of neighbors equal to ‘n-1’, the nearest neighbor approximation should produce the exact full matrix ‘Q’. Indeed, printing variable ‘w’ in this case shows exactly the same output in both implementations of the model. That is,
L = cholesky_decompose(Q) and L = nngp_chol(…) produce exactly the same Cholesky factor. However, the model only runs when using the first. When I use L = nngp_chol(…) Stan throws the following error:
Log probability evaluates to log(0), i.e., negative infinity. ]
Stan can’t start sampling from this initial value.

This is puzzling to me. Apart from the line L = cholesky_decompose(Q) (or L = nngp_chol()) the two models are exactly the same, and both models produce the same ‘L’, but only one works. Any idea why this might be happening?

Sorry, I really have problems give any advise. Parts of the model are missing, so I doubt anybody else can figure out what wrong with nngp_chol. Could you be more specifc?

Hi!
I’m interested in to know how the model can be ‘efficient’ in terms of estimation. A Gaussian latent variable (GP prior, gaussian random field or whatever) generally is slow to fit in this type of model (in spatial and spatio-temporal context). I know that a Gaussian latent variable with markov properties (GMRF) is more efficient due \boldsymbol{Q}^{-1} = \boldsymbol{\Sigma} and especially when you use a neighbor structure. In Stan this should be very expensive with the HMC algorithm (for a large data set).
I don’t know anything about extreme theory but if you have a good example of this please share this!

I identified and fixed the issue. In building the covariance function that defines the Gaussian processes (GPs), I am declaring a matrix to store euclidean distances but not all elements of the matrix were initialized since in the nearest neighbor implementation only some of the distances are used. Because Stan does not support ragged arrays, this was the best approach I could think of. The problem was that then I have parameters operating on all elements of this matrix, in an element-wise fashion. It appears that Stan won’t start sampling in this case, even if the uninitialized elements are not used anywhere. Just to illustrate this with a simple example:

data{
vector[N] y; // some observations
}
transformed data {
vector[2] f;
f[1] = 2; // f[2] is not initialized
}
parameters{
real s_raw;
}
transformed parameters{
vector[2] s = s_raw * f;
}
model{
s_raw ~ normal(0,1);
y ~ normal(0,s[1]);
}

Stan will fail to start sampling in the model above. However, if ‘f[2]’ is initialized to any arbitrary value, e.g. f[2] = 0, then the model will run without issue.

A further question is, are the following two models the same?:
model 1 (same as above, but with f[2] initialized):

data{
vector[N] y; // some observations
}
transformed data {
real f = 2;
}
parameters{
real s_raw;
}
transformed parameters{
real s = s_raw * f;
}
model{
s_raw ~ normal(0,1);
y ~ normal(0,s);
}

@CavieresJJ: fitting the spatiotemporal model with full GPs takes a few hours with about 100 data sites. I am not sure if this is slow or fast, what I know is that with standard MCMC algorithms I was unable to make any meaningful inferences in this rather complex model due to extremely slow convergence and poor mixing. Stan provides good mixing and yields reasonable answers. With the nearest neighbor implementation I am now able to fit the model to thousands of data sites in several hours. Examples? I published the model in the PNAS paper to which I referred above and also made the code publicly available, so feel free to take a look.

@Louis
Thanks for your reply!. I’m building a spatio-temporal model using a GMRF (by the SPDE method) assuming that a Gaussian Process is less efficient (due the \Sigma is not sparse). My model has currently 4 chains, max_treedepth = 12 and adap_delta = 0.9 but with some complications to get a good mixed of the chains, that is my problem now. I think that the difficult parameter to estimate is kappa (\kappa) in the SPDE method.
After of this I will see you code in detail, maybe I could used a GP for my problem