Log-likelihood of Nearest Neighbor Gaussian Process

Hi all,

I wrote a header file for getting log-likelihood of Nearest Neighbor Gaussian Process (NNGP) with exponential covariance function. This is my first attempt to code in C++ using stan math library. I am really grateful to Rob for all the help in my first attempt.

I have some questions about my code.

1. I tried to follow the template in other stan code, but I don’t know how good it is.

2. I only check the dimension of one input matrix and the positivity of parameters. I am not sure what are the other things that I should check. For instance, NA, finite value, symmetry etc.

3. I am using $\sigma^2$ instead of $\sigma$ as my input, which is different from most of the stan code. Should I change it into $\sigma$?

Besides the code I wrote for getting log-likelihood of NNGP, I also wrote a simple test file. You can get access to the executable file in folder “nngp_lu” through this link: link

Thank you very much for all your help, I am looking forward to hearing from you.

#ifndef NNGP_LOG_HPP
#define NNGP_LOG_HPP

#include <stan/math/prim/mat/fun/cholesky_decompose.hpp>
#include <stan/math/prim/scal/err/check_size_match.hpp>
#include <stan/math/prim/scal/err/check_greater.hpp>
#include <stan/math/prim/scal/err/check_not_nan.hpp>
#include <stan/math/prim/scal/err/check_positive.hpp>
#include <stan/math/prim/scal/fun/constants.hpp>
#include <stan/math/prim/scal/meta/return_type.hpp>
#include <stan/math/prim/mat/fun/to_row_vector.hpp>
#include <stan/math/prim/mat/fun/mdivide_left_tri_low.hpp>
#include <stan/math/prim/mat/fun/mdivide_right_tri_low.hpp>
#include <stan/math/prim/mat/fun/square.hpp>
namespace stan {
namespace math {

/**
* Return the (natural) log probability of the Nearest
* Neighbors Gaussian Process with exponential kernal
* given the nearest neighbors distance matrics, the
* index of the nearest neighbors and specified
* parameters including partial sill, nuggets, and range.
*
*
* @param sigmasq  Partial Sill
* @param tausq    Nuggets
* @param phi      inverse of Range
* @param beta     vector of regression coefficients
*/

template <typename T_y, typename T_x, typename T_beta,
typename T_psill, typename T_nugget, typename T_range,
typename T_dist, typename T_distM, typename T_ind>
typename return_type<T_y, T_x>::type

nngp_log(const T_y& Y, const T_x& X, const T_beta& beta,
const T_psill& sigmasq, const T_nugget& tausq,
const T_range& phi, const T_dist& neardist,
const T_distM& neardistM, const T_ind& nearind){

static const char* function("nearest_neighbor_GP_log");
double kappa_p_1 = tausq / sigmasq + 1.0;

typedef typename return_type<T_y, T_x, T_dist>::type lp_type;
lp_type lp(0.0);

int M = neardist.cols();    // no. of neighbors
int N = X.rows();           // no. of observations
int P = X.cols();           // no. of predictors

using Eigen::Dynamic;
using Eigen::Matrix;
using Eigen::VectorXd;

/* check dimension matching: */
check_size_match(function,
"No. of columes in neardist", M,
"No. of columes in nearind", nearind.cols());

/* check positivity: */
check_greater(function, "partial sill", sigmasq, 0);
check_greater(function, "nuggets", tausq, 0);
check_greater(function, "phi", phi, 0);

VectorXd V(N);
V(0) = kappa_p_1;

VectorXd Uw(N);             // Uw(N) = temp_w(N) = Y - X * beta
VectorXd temp_w(N);

Uw = subtract(Y, multiply(X, beta));
temp_w = Uw;

// get exp(-phi * dist):
for (int i = 0; i < (N - 1); i++){
const int dim = (i < M) ? (i + 1) : M;
Matrix<double, Dynamic, Dynamic> temp_distM(dim, dim);
int h = 0;
for (int j = 0; j < dim; j++){
for (int k = j; k < dim; k++){
temp_distM(j, k) = std::exp(-phi * neardistM(i, j*dim + k));
temp_distM(k, j) = temp_distM(j, k);
}
}
for(int j = 0; j < dim; j++){
temp_distM(j, j) = kappa_p_1;
}

check_positive(function, "Covariance matrix rows", temp_distM.rows());
Matrix<double, Dynamic, Dynamic> choldistM(cholesky_decompose(temp_distM));

VectorXd u(dim);
VectorXd v(dim);
Matrix<double, 1, Dynamic> v2(1, dim);
for (int j = 0; j < dim; j++){
u(j) = std::exp(-phi * neardist(i, j));
}

// v = L^-1 * u
v = mdivide_left_tri_low(choldistM, u);
V((i+1)) = kappa_p_1 - stan::math::square(v).sum();
v2 = mdivide_right_tri_low(to_row_vector(v), choldistM);

for (int j = 0; j < dim; j++){
Uw(i + 1) -= v2(j) * temp_w(nearind(i, j) - 1);
}
}

for (int i = 0; i < N; i++){
lp += - 0.5 * log(V(i)) - 0.5 / sigmasq * (Uw(i) * Uw(i) / V(i));
}

lp -= 0.5 * N * log(sigmasq);

return lp;
}

}
}


Hi @Lu.Zhang,

• are you on GitHub? If not, you get yourself an account and get familiar with using Git.
• this isn’t going to work within Stan: double kappa_p_1 = tausq / sigmasq + 1.0; won’t work with Stan’s autodiff types.

Let’s get you on Git so you don’t have to send 28 MB files around.

I agree with @syclik: it’ll be much easier to analyze your code if you can add it as a branch in GitHub. To answer one of your questions, yes, definitely use sigma instead of sigma^{2}.

Thank you so much for all your advice. I use bitbucket. I will create a
GitHub account soon and change my code accordingly.

Best,

Lu Zhang

Hi all,

Sorry for the late reply. I asked a student from computer science about GitHub, and she told me to create a branch and pull a request for merge. The following is the GitHub link: https://github.com/stan-dev/math/pull/439. I am not sure if this is what I am expected to do on Github. So please don’t hesitate to tell me if I need to make any change.

Thank you so much!

Happy Thanksgiving!

Lu Zhang