Gaussian process regression error (rejecting initial value)

Hello Stan users,

I am trying to re-purpose some of Rob Trangucci’s excellent code on Gaussian processes but I am getting some errors and I am not sure what to do https://github.com/rtrangucci/multi_output_gps

Rejecting initial value:
Error evaluating the log probability at the initial value.

Here is the modified code. I originally commented out some of the program thinking that was the issue but I am still getting the same error. The big change that I am working on is that I need a Poisson model where I can observe covariate level data at the observation level. The end result being that I would have observation level effects that are independent of the GP grid.

functions {
  matrix gp_exp_quad_chol(real[] x, real alpha, real len, real jitter) {
    int dim_x = size(x);
    matrix[dim_x, dim_x] L_K_x;
    matrix[dim_x, dim_x] K_x = cov_exp_quad(x, alpha, len);
    for (n in 1:dim_x)
      K_x[n,n] = K_x[n,n] + jitter;
    L_K_x = cholesky_decompose(K_x);
    return L_K_x;
  }
}
data {
  // Model is for outcomes that have been
  // observed on a 2 dimensional grid
  int<lower=1> N; // Number of observations
  int<lower=1> K; // Number of predictors
  int<lower=1> dim_x_1; // size of grid dimension 1
  int<lower=1> dim_x_2; // size of grid dimension 2
  real x_1[dim_x_1]; // locations - 1st dimension
  real x_2[dim_x_2]; // locations - 2nd dimension
  int<lower=0> y[N];  // Outcome
                      //matrix[N, K] X; // predictor matrix

  int row_i[N]; // row position of each observation
  int col_j[N]; // col position of each observation
  

}
parameters {
                                //real w0; //intercept
                                //vector[K] beta; //PH predictors
  real<lower=0> len_scale_x_1; // length-scale - 1st dimension
  real<lower=0> len_scale_x_2; // length-scale - 2nd dimension
  real<lower=0> alpha; // Scale of outcomes

  // Standardized latent GP
  matrix[dim_x_1, dim_x_2] y_tilde;
}
model {
  matrix[dim_x_1, dim_x_2] latent_gp; 
  {
    matrix[dim_x_1, dim_x_1] L_K_x_1 = gp_exp_quad_chol(x_1, 1.0, len_scale_x_1, 1e-12);
    matrix[dim_x_2, dim_x_2] L_K_x_2 = gp_exp_quad_chol(x_2, alpha, len_scale_x_2, 1e-12);

    // latent_gp is matrix-normal with among-column covariance K_x_1
    // among-row covariance K_x_2
    
    latent_gp = L_K_x_1 * y_tilde * L_K_x_2';
  }
  // priors
  len_scale_x_1 ~ gamma(8, 2);
  len_scale_x_2 ~ gamma(8, 2);
  alpha ~ normal(0, 1);
  to_vector(y_tilde) ~ normal(0, 1);
                                    // weakly informative prior for the intercept
                                    //w0 ~ normal(0,10);
                                    //Weakly informative prior for the other variables in data
                                    //beta ~  normal(0,10); //w0 + X * beta +

  // likelihood 
   y ~ poisson_log(to_vector(latent_gp[row_i, col_j]'));
}

Thanks for any help you can provide!

-Patrick

I think your problem is with this:

y ~ poisson_log(to_vector(latent_gp[row_i, col_j]'));

The multiple indexing like that returns a matrix, not an array (I think this copies after how R does its matrix multiple indexing). I think the thing you want is:

for(n in 1:N)
  y[n] ~ poisson_log(latent_gp[row_i[n], col_j[n]]);
}

Hope that helps! (tried it on my computer with fake data and it ran)

edit: changed it’s to its :D

That did it! It is running really slow, but that is probably because of the size of the data set (I have only worked on smaller fake data so far). I will have to do some additional testing.

Thanks for your help!

1 Like

I am still having a hard time sampling this model. I talked to some folks and they mentioned that the number of observations I have is what is driving the slow down.

I have about ~800,000 observations on a 50 by 50 grid. Is there anyway to vectorize the code that you suggested?

Thanks again for any help you can provide!

vector[N] log_lambda;
for (n in 1:N) log_lambda[n] = latent_gp[row_i[n], col_j[n]];
y ~ poisson_log(log_lambda);
1 Like

Hi everyone,

I am new to stan and I have hard time running my code in matlab. Every time, I run it, I face this problem “Rejecting initial value:
Error evaluating the log probability at the initial value.” I put part of my code here. Any suggestion? Thank you in advance.

code = {
‘data {’
’ int<lower=0> T; ‘
’ int<lower=0> N; ‘
’ real<lower=0,upper=1> Fhat[N,T]; ‘
’ int<lower=0,upper=1> U[N,N]; ‘
’}’
‘parameters {’
’ simplex[N] M[T]; ‘
’}’
‘transformed parameters {’
’ real<lower=0,upper=1> F[N,T]; ‘
’ for (i in 1:T){’
’ for (n in 1:N){’
’ F[n,i]=0.0;’
’ for (k in 1:N){’
’ F[n,i]= F[n,i] + U[i,k]*M[i][k];’
’ }’
’ }’
’ }’
’}’
‘model {’
‘for (i in 1:T){’
’ M[i] ~ dirichlet([0.5, 0.5, 0.5, 0.5, 0.5, 0.5]’’);’
’}’
’ for(n in 1:N) {’
’ for (i in 1:T){’
’ Fhat[n,i] ~ normal(F[n,i],0.01);’
’ }’
’ }’
’}’
};
evol_dat = struct(‘T’,11,‘N’,6,…
‘Fhat’,Fdata,…
‘U’,Udata);
%%
model = StanModel(‘verbose’,true,‘model_code’,code,‘data’,evol_dat);
model.compile();

fit_vb = model.vb();

print(fit_vb);

Thanks Ben! I just had the slower run finish, so I will implement your suggestions. I tested your code with 100 draws. The gradient eval was cut by about half!

1 Like

I am nearing the end of this gp model and I wanted to test my stan logic because I am getting some results I just do not understand. I am trying to add additional covariates that are independent of the gp. However when i add to the model there is a syntax error depending on how I write it. Here is the fully updated code:

  matrix gp_exp_quad_chol(real[] x, real alpha, real len, real jitter) {
    int dim_x = size(x);
    matrix[dim_x, dim_x] L_K_x;
    matrix[dim_x, dim_x] K_x = cov_exp_quad(x, alpha, len);
    for (n in 1:dim_x)
      K_x[n,n] = K_x[n,n] + jitter;
    L_K_x = cholesky_decompose(K_x);
    return L_K_x;
  }
}
data {
  // Model is for outcomes that have been
  // observed on a 2 dimensional grid
  int<lower=1> N; // Number of observations
  int<lower=1> K; // Number of predictors
  int<lower=1> dim_x_1; // size of grid dimension 1
  int<lower=1> dim_x_2; // size of grid dimension 2
  real x_1[dim_x_1]; // locations - 1st dimension
  real x_2[dim_x_2]; // locations - 2nd dimension
  int<lower=0> y[N];  // Outcome
  matrix[N, K] X; // predictor matrix

  int row_i[N]; // row position of each observation
  int col_j[N]; // col position of each observation
  

}
parameters {
  real w0; //intercept
  vector[K] beta; //PH predictors
  real<lower=0> len_scale_x_1; // length-scale - 1st dimension
  real<lower=0> len_scale_x_2; // length-scale - 2nd dimension
  real<lower=0> alpha; // Scale of outcomes
  // Standardized latent GP
  matrix[dim_x_1, dim_x_2] y_tilde;
}

transformed parameters {
matrix[dim_x_1, dim_x_2] latent_gp; 
  {
    matrix[dim_x_1, dim_x_1] L_K_x_1 = gp_exp_quad_chol(x_1, 1.0, len_scale_x_1, 1e-12);
    matrix[dim_x_2, dim_x_2] L_K_x_2 = gp_exp_quad_chol(x_2, alpha, len_scale_x_2, 1e-12);

    // latent_gp is matrix-normal with among-column covariance K_x_1
    // among-row covariance K_x_2
    
    latent_gp = L_K_x_1 * y_tilde * L_K_x_2';
  }
}
model {
  vector[N] log_lambda;
  
  // priors
  len_scale_x_1 ~ gamma(8, 2);
  len_scale_x_2 ~ gamma(8, 2);
  alpha ~ normal(0, 1);
  to_vector(y_tilde) ~ normal(0, 1);
  //non informative prior for the intercept
  w0 ~ normal(0,10);
  //Weakly informative prior for the other variables in data
  beta ~  student_t(7, 0, 2.5); 

  // likelihood 
   
  for (n in 1:N) log_lambda[n] = latent_gp[row_i[n], col_j[n]]  +  w0  +  X[n] * beta;
  y ~ poisson_log(log_lambda);
}

However in log_lambda loop if I write the code this way I get an error:

for (n in 1:N) log_lambda[n] = latent_gp[row_i[n], col_j[n]]  +  w0  + beta * X[n];

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

base type mismatch in assignment; variable name = log_lambda, type = real; right-hand side type=matrix

ERROR at line 68
67:
68: for (n in 1:N) log_lambda[n] = latent_gp[row_i[n], col_j[n]] + w0 + beta * X[n];
^
69: y ~ poisson_log(log_lambda);

PARSER EXPECTED:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
one node produced an error: failed to parse Stan model ‘lgcp_testben’ due to the above error.

Am I missing something? I don’t see how the first formulation would run while the other throws an error.

Thanks!

X[n] is a row vector, and beta is a column vector, so if you want the dot product of these two it’s X[n] * beta.

beta * X[n] is a KxK matrix.

Hope that helps!

Oops, simple linear algebra 101 mistake. Shouldn’t code on a empty stomach. Thanks ben!