Case study: NNGP based models in Stan


#1

I wrote a case study on the Nearest neighbor Gaussian process (NNGP) based model and would be happy to get feedback. It would be great if someone can help me to put it on the Stan website.

Nearest Neighbor Gaussian Processes (NNGP) based models in Stan

Nearest Neighbor Gaussian Processes (NNGP) based models is a family of highly scalable Gaussian processes based models. In brief, NNGP extends the Vecchia’s approximation to a process using conditional independence given information from neighboring locations. In this case study, I will briefly review response and random-effects NNGP models, illustrate how to code NNGP in Stan, and provide examples of implementing NNGP based models in Stan.

The Rmarkdown file and the compiled HTML file are on my Github:

https://github.com/LuZhangstat/NNGP_STAN

And here is a pdf for easier viewing. The format in the pdf is a little bit off-shape.
nngp_stan.pdf (1.6 MB)

Thanks, and wish all of you a happy Thanksgiving!


#2

This is cool.

Some things based on a (very) quick read:

  • In Section 2.3 you have a parameter (L_VB) in your model block that isn’t defined anywhere.
  • What does ./ mean in (10). You should use standard matrix notation in equations
  • You don’t need the scare quotes around “sequential” in Section 4.3

#3

Great to see that this is progressing, Here are some comments

Section 1.1

  • Could you make a connection to terminology used in Stan manual Chapter 18 “Gaussian Processes”? That chapter also discusses the benefits of marginalization. It would make it easier for those who have already been using GPs in Stan to make the connection to the terminology you use.

Section 1.2

  • Eq 4 is missing mean in the first N().
  • Are A and D after eq 5 same or different as A and D after eq 4? It seems they should be different?

Section 1.3

  • "among the locations indexed less that i"
    that -> than?
  • "the nonzero entries appear in the positions indexed by N( s i ) are obtained as row vectors"
    something worng with this sentence

Section 2.2

  • "the locations indexed less that i ."
    that -> than?

Section 2.2.2

  • Add caption to the figure. What are different filled, non-filled , yellow and blue circles?

Section 2.3

  • “a uniform prior for phi” and real<lower = ap, upper = bp> phi;
    Uniform priors with or without constraints are not recommended. 1) It’s better to use gradually decreasing prior density anyway, and 2) having lower and upper constraints which are not physically or logically justified (like variance has to be positive) has been shown often to lead parameter transformation which makes the MCMC sampling to behave poorly. It would be better to have real<lower = 0> phi;
    And for a prior choice, see Section 18.3 in Stan manual and “Robust Gaussian Processes in Stan” draft case study part 3 https://betanalpha.github.io/assets/case_studies/gp_part3/part3.html

Section 2.4

  • "The flop…"
    The flops… ?
  • For clarity combine these declarations
vector[N] YXb;
vector[N] U;
real kappa_p_1;
kappa_p_1 = tausq / sigmasq + 1;
YXb = Y - X_beta;
U = YXb;

like this

vector[N] YXb = Y - X_beta;;
vector[N] U = YXb;
real kappa_p_1 = tausq / sigmasq + 1;
  • No need for
real out;

instead just have

return - 0.5 * ( 1 / sigmasq * dot_product(U, (U
./ V)) + sum(log(V)) + N * log(sigmasq));

Section 3.1

  • "It’s worth mentioning that we use…"
    This implies that other sentences are not worth mentioning, so just drop it and write “We use…”

Section 4.1

  • "Response NNGP model is faster and easier to sample from posterior distribution
    than the random-effects NNGP models."
    You compare “Response NNGP model” to “Random-effects NNGP model”. Could you also compare “Response NNGP model” to “Response GP model”? That is how much faster NNGP is than basic GP?

#4

Great work! and thanks for sharing your code and providing this case study, I started to adapt your original implementation in the google group into my model a few month ago.

here are some comments:

Section 3.2

  • the printed code is for "nngp_random_b1.stan’, but I assume you wanted “nngp_random.stan”.

Section 4.3

  • "we only recommend coding random-effects NNGP model in Stan when sample size is small"
    Would you like to mention when the response is non-Gaussian (e.g. binomial/Poisson), one do not have the option of using the response NNGP model?

Section 2.4

  • Can some of the for loops be avoided:

for (j in 1: dim){ iNNcorr[j] = exp(- phi * NN_dist[(i - 1 ), j]);}

becomes

iNNcorr = to_vector(exp(- phi * NN_dist[(i - 1), 1: dim]));

,and

for (j in 1:dim){ U[i] = U[i] - v2[j] * YXb[NN_ind[(i - 1 ), j]];

becomes

U[i] = U[i] - v2 * YXb[NN_ind[(i - 1), 1:dim]];


#5

Thank you so much for all of these excellent comments. I just updated my case study on my Github. Here is the new pdf: nngp_stan.pdf (1.5 MB)


#6

Thanks a lot for your timely comments.

Best Regards,


#7

Thank you so much for your comments. They are very helpful, and I updated my case study accordingly. My only hesitation is the one for Section 4.1. Works for the comparison of NNGP and GP can be found in the referred papers. Since this case study focuses on implementing NNGP in Stan, I think it would be fine if I don’t compare NNGP with GP in this case study.

Best Regards,


#8

Many thanks indeed. I like your code for saving for loops. As for your second comment, I would like to suggest reading Finley’s 2017 paper. That paper not only covers the response and random-effect NNGP models, it also discusses collapsed NNGP model and conjugate NNGP model. It might be helpful for the modeling of non-Gaussian response, but sorry I can give limited suggestions on this question.

Best Regards,


#9

I didn’t mean you would need to run it. It would be just easier for the reader if you could mention some speed comparison results (and refer for further information in the papers, but then the reader doesn’t necessarily need to dig up these papers). Since the speed of GPs in Stan depends a lot on how they have been implemented, it would be useful to show the reference code or give a link to the reference and just mention that you don’t run it in the notebook as it is so slow.


#10

Okay, I will think about it. Thanks a lot.


#11

I just added the basic GP model in the case study. Here is the pdf: stan_nngp.pdf (1.5 MB) . The Stan code for basic GP is available on my github: https://github.com/LuZhangstat/NNGP_STAN


#12

Great!

Both your basic GP and NNGP make the computations only in Stan language using autodiff for gradients. For the basic GP we know that there’s a huge speedup if the covariance matrix computation is coded as a internal function with hand written gradients (this makes the autodiff expression tree much smaller and faster). You could mention that your basic GP is not the recommended way if specific covariance function is available as internal function in Stan, but your basic GP is fair comparison to your NNGP as it is also implemented only in Stan language. It is likely that your NNGP could also be made faster with built-in function (the speedup in builtin-function comes from using hand written gradients instead of autodiff, otherwise Stan code is anyway transplied to C++ before execution).


#13

This sounds amazing. I checked the Stan manual but didn’t find an internal function with hand written gradients. Or do you mean ov_exp_quad also provides handwritten gradient? Would you like to provide a link for the “fast” basic GP code? Thanks a lot!

Best,


#14

Yes. Using cov_exp_quad is much faster than writing the same covariance matrix directly in Stan. Naturally this doesn’t change the difference in theoretical scaling of GP vs NNGP, but shows that there can be significant computational overheads depending on the implementation.

For the latest recommended GP Stan code and prior recommendations, see
Robust Gaussian Processes in Stan, Part 3
https://betanalpha.github.io/writing/


#15

Thank you so much. I think I need some time to learn these stuff first and then update my code. I will let you know when I have progress. ^_^


#16

Hi Aki,

Sorry for the late reply. I was busy with other works. I checked ‘cov_exp_quad’ and found that the function uses squared distance, while here I need Euclidean distance.

I also checked folder “math/prim/mat/fun” and didn’t find other covariance functions which are useful for my study case, so I think there is no built-in function for the exponential kernel I use in this study case.

BTW, I didn’t find the gradient for cov_exp_quad. Would you please tell me where it is?

Best Regards,


#17

Should be in https://github.com/stan-dev/math/blob/develop/stan/math/rev/mat/fun/cov_exp_quad.hpp

It’s in the rev folder cause there’s a specialized implementation for reverse mode autodiff. It’s in the mat folder cause the code uses Eigen matrix types.

I’m honestly not sure which specializations of cov_exp_quad this covers though haha. Kinda have to dig to figure that out.


#18

Not that deeply.

rstan::lookup("cov_exp_quad")
93 cov_exp_quad           (real[] x1, real[] x2 real alpha, real rho)     matrix  457
94 cov_exp_quad                      (real[] x, real alpha, real rho)     matrix  457
95 cov_exp_quad (row_vectors x1, row_vectors x2 real alpha, real rho)     matrix  457
96 cov_exp_quad                 (row_vectors x, real alpha, real rho)     matrix  457
97 cov_exp_quad         (vectors x1, vectors x2 real alpha, real rho)     matrix  457
98 cov_exp_quad                     (vectors x, real alpha, real rho)     matrix  457

which is really just parsing the index of the Stan User Manual. The primary signature is {row_}vectors x, real alpha, real rho, which works for the case where row_vector[N] x[K]; is declared in the data block but recalculates the squared Euclidean distance between all pairs of vectors every leapfrog step. It is somewhat unfortunate that there is currently no signature for matrix x, real alpha, real rho that uses the same analytic gradients but would take a fixed NxN distance matrix.


#20

Thank you so much!


#21

Sorry, the matrix I want to put as an argument is the coordinates of data which contains, for example, longitude and latitude. But I just realized that I could put it as “row-vectors”. I’ve already edited my reply. Thanks anyway. ^_^