Problem for accounting for spatial correlation with NNGP model

Hello everyone

First of all I am not a statistician but a simple PhD candidate in geography who use statistics and try to understand it. So feel free to correct my statistical vocabulary!

I am actually learning how to fit a model accounting for spatial correlation in Stan and my model seems very inefficient in accounting for it. I am using the meuse dataset as my toy dataset.
I will go through the steps I have been doing for the analysis - so you can understand better and eventually correct:

First I fitted a simple linear model which didn’t account for spatial correlation. A simple
lm(meuse$log_zinc ~ 1 + meuse$dist)

I checked if there was any correlation within the residuals by fitting a semi-variogram. Here is the result:

Form this graph I deduce that the range might be somewhere around 1000.

Because I wanted to compare the results with a software I know better I fit the model in INLA and the SPDE. With INLA I get great results, the standard error are 2 times bigger and when plotting the semivariogram on the residuals I do not notice any spatial correlation:

Because I am trying to switch from INLA to Stan (more flexible and help me to understand better what I am really doing) I try to fit the same model in Stan. I found a great tutorial of how to approximate the Gaussian Random Field: https://mc-stan.org/users/documentation/case-studies/nngp.html and try to use the code provided to fit my own model.

Note that in the Stan code I changed the correlation structure from

                          iNNdistM[j, k] = exp(- phi * NN_distM[(i - 1), h]);

to

                          iNNdistM[j, k] = exp(- NN_distM[(i - 1), h] / phi);

Because I have no clue about how to set a prior for phi I set phi = Normal(300,200), don’t blame me. I tried a lot of different priors but nothing conclusive (I guess because I tried randomly)

After running the model I compute the semi variogram and obtain this:

As I said I am not an expert but this semi-variogram doesn’t look good at all. What worries me more is that the standard deviation of the coefficients are almost the same as the non-spatial model. I guess if the model accounted properly for the spatial correlation I should have standard deviation more or less like the INLA ones.

If you could provide some insights I would be very grateful!

Here is the stan code for the model. I also join the script for fitting the model in INLA and in Stan.

  functions{
      real nngp_w_lpdf(vector w, real sigmasq, real phi, matrix NN_dist,
                       matrix NN_distM, int[,] NN_ind, int N, int M){

          vector[N] V;
          vector[N] I_Aw = w;
          int dim;
          int h;

          for (i in 2:N) {

              matrix[ i < (M + 1)? (i - 1) : M, i < (M + 1)? (i - 1): M]
              iNNdistM;
              matrix[ i < (M + 1)? (i - 1) : M, i < (M + 1)? (i - 1): M]
              iNNCholL;
              vector[ i < (M + 1)? (i - 1) : M] iNNcorr;
              vector[ i < (M + 1)? (i - 1) : M] v;
              row_vector[i < (M + 1)? (i - 1) : M] v2;

              dim = (i < (M + 1))? (i - 1) : M;

              if(dim == 1){iNNdistM[1, 1] = 1;}
              else{
                  h = 0;
                  for (j in 1:(dim - 1)){
                      for (k in (j + 1):dim){
                          h = h + 1;
                          iNNdistM[j, k] = exp(- NN_distM[(i - 1), h] / phi);
                          iNNdistM[k, j] = iNNdistM[j, k];
                      }
                  }
                  for(j in 1:dim){
                      iNNdistM[j, j] = 1;
                  }
              }

              iNNCholL = cholesky_decompose(iNNdistM);
              iNNcorr = to_vector(exp(- phi * NN_dist[(i - 1), 1:dim]));

              v = mdivide_left_tri_low(iNNCholL, iNNcorr);

              V[i] = 1 - dot_self(v);

              v2 = mdivide_right_tri_low(v', iNNCholL);

              I_Aw[i] = I_Aw[i] - v2 * w[NN_ind[(i - 1), 1:dim]];

          }
          V[1] = 1;
          return - 0.5 * ( 1 / sigmasq * dot_product(I_Aw, (I_Aw ./ V)) +
                          sum(log(V)) + N * log(sigmasq));
      }
  }

  data {
      int<lower=1> N;
      int<lower=1> M;
      int<lower=1> P;
      vector[N] Y;
      matrix[N, P] X; // design matrix
      int NN_ind[N - 1, M];
      matrix[N - 1, M] NN_dist;
      matrix[N - 1, (M * (M - 1) / 2)] NN_distM; // Distance between the neighbours
  }

  parameters{
      vector[P] beta;
      real<lower = 0> sigma;
      real<lower = 0> tau;
      real<lower = 0> phi;
      vector[N] w;
  }

  model{
      beta ~ normal(0,5);
      phi ~ normal(1000, 200);
      sigma ~ inv_gamma(2, 1);
      tau ~ inv_gamma(2, 1);
      w ~ nngp_w(sigma, phi, NN_dist, NN_distM, NN_ind, N, M);
      Y ~ normal(X * beta + w, tau);
  }
  
  generated quantities{
    real y_rep[N];
    
    for(n in 1:N){
      y_rep[n] = normal_rng(X[n] * beta + w[n], tau);
    }
  }

instead of

model{
vector[N] mu = alpha+beta*x;
y~normal(mu,sigma);
}


meuse_SPDE.R (3.2 KB)
meuse_NNGP.R (1.9 KB)
Helper_functions.R (6.1 KB)

1 Like

There’s some new stuff in brms that should make GPs easier to work with that I think is relevant here.

Have a look here: Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming and the associated paper: https://arxiv.org/abs/2004.11408 (especially the examples at the end).

1 Like

Thanks for the answer! As I said I am trying to limit my use of packages type brms or INLA to get a better grasp of what I am doing though. In the model that I gave, could it have something to do with the exponential correlation function or with the prior for phi ?

EDIT: Just to give an update. I forgot it is possible to see the Stan code with a brms model. I fitted the model in brms with the gp() and although this is very slow compared to the NNGP this seem to work. Here is the variogram:

Thanks for the help!

Not a bad plan.

I think then you could start with the no-approximation model in Stan and work up from there. What dimension would the GP covariance functions be?

Did you get a chance to try out the approximate gp stuff in brms?

I’m not sure if the approximate thing will work as well as the SPDE thing, but to use the same model the spectrum is

S(lambda) = tau/(kappa^2 + llambdal^2)^2

Yes I copy pasted the Stan code from brms. It is actually a great feature of brms to see what is the machine made of. The covariance matrix will be big … I am working with Citizen Science observations and I have ~1500 observations which are all spatially correlated. So as I understand the covariance matrix will be 1500x1500

Did you get a chance to try out the approximate gp stuff in brms?

Yes, results were way faster! For the meuse dataset (155 observations) it took ~1000 seconds to run the model with an exact GP and ~100 seconds with an approximate GP.

brms has the Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming implemented. Just add option k to gp(). From gp: Set up Gaussian process terms in 'brms' in brms: Bayesian Regression Models using 'Stan'

  • k Optional number of basis functions for computing approximate GPs. If NA (the default), exact GPs are computed.

What would be a good recommandation for k? Otherwise, with k=10 the model runs really smoothly!

2 Likes

See the linked paper for the discussion and how to diagnose whether k is big enough.

2 Likes

Great, thanks again this is very helpful! I will go through the paper. I will certainly have some questions though.

2 Likes

So I tested fitting the exact GP in both Stan and brms and with the same model specifications I get different results.

While brms results are very close to the INLA ones (so I guess these are the right results) the model that I wrote in Stan underestimate the uncertainty in the intercept and has a much lower rho that with brms. Moreover, the model in Stan somehow doesn’t change the shape of the variogram compared to a non-spatial model (even though the semi variance is much reduced):

Here is the semi variogram that I obtain with the brms model:

What I do not understand is that I do use the same priors, same specifications as when fitting the model in brms. Here is the model I wrote:

data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data related to GPs
  // coordinates
  vector[2] x[N];
}
transformed data {
  real delta = 1e-12;
}
parameters {
  vector[K] b;  // population-level effects
  real<lower=0> sigma;  // residual SD
  
  // GP standard deviation parameters
  real<lower=0> alpha;
  // GP length-scale parameters
  real<lower=0> rho;
  // latent variables of the GP
  vector[N] zgp;
}
transformed parameters {
  vector[N] f;
  {
    matrix[N, N] L_cov;
    matrix[N, N] cov = cov_exp_quad(x, alpha, rho);
  
    // diagonal elements
    for (n in 1:N)
      cov[n, n] = cov[n, n] + delta;
    
    L_cov = cholesky_decompose(cov);
    f = L_cov * zgp;
  }
}
model {
  // initialize linear predictor term
  vector[N] mu = X * b + f;
  // priors including all constants
  target += normal_lpdf(b | 0, 10);
  target += cauchy_lpdf(alpha | 0, 1);
  target += normal_lpdf(zgp | 0, 1);
  target += inv_gamma_lpdf(rho | 2.874624, 0.393695);
  target += exponential_lpdf(sigma | 1);
  // likelihood including all constants
  target += normal_lpdf(Y | mu, sigma);
}
generated quantities {
   vector[N] y_rep;
  for (n in 1:N)
    y_rep[n] = normal_rng(X[n]*b + f[n], sigma);
}

And here the brms model:

exact_gp <- brm(data = df_meuse, family = 'gaussian',
                  log_zinc ~ 1 + dist + gp(xcoord,ycoord),
                  prior = c(prior(normal(0, 10), class = Intercept),
                            prior(normal(0, 10), class = b),
                            prior(inv_gamma(2.874624, 0.393695), class = lscale),
                            prior(cauchy(0, 1), class = sdgp)),
                  iter = 5000, warmup = 2000, chains = 1, cores = 4,
                  seed = 13,
                  control = list(adapt_delta = 0.999,
                                 max_treedepth = 12))

I just prefer to be sure of what I am doing before implementing the approximate GP in Stan!

EDIT:
Another quick question (which shouldn’t be complicated).

I am trying to see at which range the spatial correlation disappear given the alpha and rho parameters from the exponential covariance function.

As I understand this would work like this:

# Compute the range
mean_alpha <- mean(samples$alpha)
mean_rho <- mean(samples$rho)

# Min and max distance in the distance matrix
dist_seq <- seq(from = 0, to = 4000, length.out = 100)

gp <- sapply(dist_seq, function(x) mean_alpha*exp(-x^2/(2*mean_rho)))
plot(gp ~ dist_seq, type = 'l')   

However I obtain a curve like this:

Which I interpret as a range of 0 … However if the sequence goes from 0 to 1 I get this:

Are the distances somehow standardized inside the function cov_exp_quad?

Use make_stancode for the brms model to get Stan code so you can compare where the differences are.

I think INLA doesn’t have exponentiated quadratic?

Exponential or exponentiated quadratic?

No.

Use make_stancode for the brms model to get Stan code so you can compare where the differences are.

Yes that is what I have been doing! So after digging a bit more I figured that the difference in results between my own Stan model and the one produce by brms comes from the data list

It seems that brms scale the coordinates when fitting a GP so that the maximum distance between 2 points is 1. Basically I was using the brms prior for the scaled data on the raw coordinates. I am now able to recover the results of brms using a prior fit for my data, giving me a coherent semi-variogram.

Sorry I confused myself here. As far as my experience goes with INLA it is only possible to use the Matern covarience function. And I was talking about the exponentiated quadratic.

Thanks for your answers - and patience!

3 Likes

Hi Ben - I wonder if I could hijack your thread to ask a brief question of you. I’m myself playing around with GP models in Stan/BRMS and I’m wondering if you could provide some code showing how you got your variogram plots from your BRMS model.

1 Like