Gaussian Process(es): gp_exp_quad_cov not found

I am following the Applied Gaussian Process in Stan document by @anon79882417 (Applied Gaussian Processes in Stan, Part 1. A Case Study).

transformed parameters {
    matrix[N, N] L_K;
    {
        matrix[N, N] K;
        K = gp_exp_quad_cov(X, magnitude, length_scale);
        K = add_diag(K, square(eta));
        L_K = choleky_decompose(K);
    }
}

but when I try to comple the model I get the following error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for: 

  gp_exp_quad_cov(vector[], real, real)

Function gp_exp_quad_cov not found.

Is still this function in the development code and must I implement the kernel as a Stan function?
The version of stan that I use is:

rstan_2.18.2       StanHeaders_2.18.0

You’re after cov_exp_quad()?

Thank you @hhau!
I replaced

K = gp_exp_quad_cov(X, magnitude, length_scale);
K = add_diag(K, square(eta));

with

 K = cov_exp_quad(X, magnitude, length_scale) + diag_matrix(rep_vector(square(eta), N));

Now I get a

Error in cpp_object_initializer(.self, .refClassDef, ...) : 
  could not find function "cpp_object_initializer"
failed to create the sampler; sampling not done

but I think this is a totally unrelated error.

Hi @fabio -

What version of Stan/math are you using? If you’re on an older version, before 2.18 I believe, you can replace gp_exp_quad_cov with cov_exp_quad, as we’re making the names of the covariance functions more unified. I think the gp_exp_quad_cov is in 2.18.

As to your second question, try the initialization of K in two seperate lines, like so:

K = cov_exp_quad(x, magnitude, length_scale);
K = K + diag_matrix(rep_vector(square(eta), N);

And I think this should solve the compilation issue. With that said, many of the kernels (matern 3/2, matern 5/2, exponential) are not yet apart of stan math, and you can find the implementations here: https://github.com/drezap/math/tree/kernels-beta, if you want to play around with some other kernels (I don’t think ARD works yet, though on this branch, the matern5/2 got merged a few weeks ago).

I’ve been slacking at getting these merged, apologies.

Please let me know if that solution works.

I’d also recommend checking out Betancourt’s case studies, for suggestions as to how to scale the hyper prior distributions, as I’ve neglected this in my cases study.

1 Like

Hi @anon79882417.
First of all, thank you for your document. It pairs very well with the workflow presented by @betanalpha.
This said, I changed the transformed parameters block following your advice:

transformed parameters {
    matrix[N, N] L_K;
    {
        matrix[N, N] K;
        K = cov_exp_quad(X, magnitude, length_scale);
        K = K + diag_matrix(rep_vector(square(eta), N));
        L_K = cholesky_decompose(K);
    }
}

but the error remains:

recompiling to avoid crashing R session
Error in cpp_object_initializer(.self, .refClassDef, ...) : 
  could not find function "cpp_object_initializer"
failed to create the sampler; sampling not done

I am using

StanHeaders_2.18.0 
rstan_2.18.2

Can you please run in command Stan to make sure it’s not an issue with Rstan? I’ve never seen this error before. If I have a syntax error in Rstan, it usually just crashes. command Stan will catch it with a Lint error.

Could you also post the full Stan code? not just the transformed parameters block? This might help.

@anon79882417: here’s all the stan code that i use:

data {
    int<lower = 1> N;
    int<lower = 1> D;
    vector[D] X[N];
    vector[N] Y;
}

transformed data {
    vector[N] Y_log = log(Y);
    vector[N] mu = rep_vector(0, N);
}

parameters {
    real<lower = 0> magnitude;
    real<lower = 0> length_scale;
    real<lower = 0> eta;
}

transformed parameters {
    matrix[N, N] L_K;
    {
        matrix[N, N] K;
        K = cov_exp_quad(X, magnitude, length_scale);
        K = K + diag_matrix(rep_vector(square(eta), N));
        L_K = cholesky_decompose(K);
    }
}

model {
    magnitude ~ normal(0, 2);
    length_scale ~ inv_gamma(5, 5);
    Y_log ~ multi_normal_cholesky(mu, L_K);
}

short long story: I am trying to reproduce the two-dimensional krigging on the mesue dataset, so in this case D=2 and Y corresposds to the zinc varable, that I log-transformed (see transformed data block).

cmdstan (v. 2.18.0) gives me no error: just one warning:

Model name=GP_model
Input file=/home/fabio/Documents/STAN/spatial/GP.stan
Output file=/home/fabio/Documents/STAN/spatial/GP.hpp
DIAGNOSTIC(S) FROM PARSER:
Warning: left-hand side variable (name=K) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.

Compiling pre-compiled header
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe  -c -O3 stan/src/stan/model/model_header.hpp -o stan/src/stan/model/model_header.hpp.gch

--- Linking C++ model ---
g++ -Wall -I . -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.66.0 -isystem stan/lib/stan_math/lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -I src -isystem stan/src -isystem stan/lib/stan_math/ -DFUSION_MAX_VECTOR_SIZE=12 -Wno-unused-local-typedefs -DEIGEN_NO_DEBUG -DNO_FPRINTF_OUTPUT -pipe   src/cmdstan/main.cpp  -O3 -o /home/fabio/Documents/STAN/spatial/GP -include /home/fabio/Documents/STAN/spatial/GP.hpp stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_3.1.0/lib/libsundials_idas.a 

with the fopllowing output that I just have to understand:

                 mean    se_mean        sd       2.5%       25%      50%
magnitude    1.369176 0.12199123 1.0094181 0.05219338 0.5674067 1.189485
length_scale 1.323135 0.04959016 0.7037901 0.52565260 0.8159852 1.157775
eta          5.725117 0.06227755 0.4925067 4.46203250 5.4712100 5.761075
                  75%    97.5%     n_eff      Rhat
magnitude    2.045213 3.687646  68.46758 1.0221040
length_scale 1.596740 3.024755 201.41658 0.9982247
eta          6.035350 6.610091  62.54049 1.0115286

The interpretation of the hyperparameters wasn’t clear to me initially, either. If you want help interpreting them, I mention it (once, I think) in my case study, Betancourt has some info about it, but the main reference for GPs is GPML (Gaussian Processes for Machine Learning) which I think you can find with a google search. Highly recommended you check it out. Please put some thought into the scale of the length-scale hyper parameter, at minimum. There’s also a Trangucci et al paper about it.

On that note - I would look into the posterior predictive distribution. Using this, you can generate predictions for any range of the input data you desire. So if you want to do prediction for spatial data (with lat/long) an approach could be to set x_pred to be an evenly spaced grid along each dimension. I’ve also seen spatial analysis that use census tracts. Generating the posterior predictive is straight forward multivariate gaussian algebra.

I’m finding GPs more useful for prediction or as prior distributions where we need more flexibility, or if I need to use a basis function approach to data analysis. Honestly, it’s a lot easier to interpret linear regression coefficients than the hyper parameters of a GP regression. Also, I think for spatial data people use the Matern5/2 covariance function, not as smooth as the exp quad one. It all depends on the application though. This just got merged into stan/math, and it has ARD “built-in”.

Stan could be less stable for higher dimensional data (i.e. autodiff consumes lots of memory with large covariance matrices, and the posterior induced by a GP prior can be nasty, so HMC sometimes doesn’t behave nicely if the model isn’t well specified or is has a complex geometry), but let us know how it goes!