Gaussian Process fit for Poisson Distribution

Hi There,

I am trying to fit a poisson distribution with GP model in Stan. I have done it with Normal likelihood sucessfully. However when I change the likelihood with Poisson_Log then I get receive for that part.
The stan gaves me the error that:

Real return type required for probability function.
error in ‘model3b084954e90_GP_Zacc_LikeLihood_Change’ at line 42, column 30

40:   sigma_f ~ normal(4, 1);
41:    
42:   yn ~ poisson_log(f_predict);
                                 ^
43: }

here is my stan code:

  int<lower=1> N; // number of observation
  vector[N] x;    // univariant Covariate
  vector[N] y;    // Target Variable
  int<lower=1, upper=N> observed_idx[N];

}

transformed data {
  // normalize data
  real xmean = mean(x);
  real ymean = mean(y);
  real xsd = sd(x);
  real ysd = sd(y);
  real xn[N] = to_array_1d((x - xmean)/xsd);
  vector[N] yn = (y - ymean)/ysd;
  real sigma_intercept = 0.15;   // Changed from 0.1
  vector[N] jitter = rep_vector(1e-8, N);
  //vector[N] jitter = rep_vector(1e-7, N);  // 1e-e9
  }

parameters {
     real<lower=0> lengthscale_f; // lenght scale of f
    real<lower=0> sigma_f;       //  scale of f
    vector[N] z_f;
}

transformed parameters {
    matrix[N, N] cov =   cov_exp_quad(xn, lengthscale_f, sigma_f)
                                     + diag_matrix(rep_vector(1e-10, N));
  matrix[N, N] L_cov = cholesky_decompose(cov);
  vector[N] f_predict = L_cov * z_f;
}

model {
  z_f ~ normal(0, 1);
  lengthscale_f ~ lognormal(-3.0, 0.2);
  sigma_f ~ normal(4, 1);
   
  yn ~ poisson_log(f_predict[observed_idx]);
}

generated quantities {
  int y_predict[N] = poisson_log_rng(f_predict);
}

Can anyone give me an insight of this error?

You need to define y as an integer array

array[N] int y;

and do not standardize that, ie remove

vector[N] yn = (y - ymean)/ysd;

and use y in the model.

For univariate X, you can get much faster sampling by using Hilbert space basis function approximation of Gaussian processes

1 Like

Thank you very much for your help. I change the code and simplified it more, now it is synthatically is correct. however it cannot go thorugh the sampling step when I ask rstan or cmdstan to proceed. here is my new version code:

data {
  int<lower=1> N;
  vector[N] x[N];
  int y_observed[N];
}

parameters {
  real<lower=0> rho;
  real<lower=0> alpha;
  vector[N] f_tilde;
}

transformed parameters {
  vector[N] log_f_predict;
{
  matrix[N, N] cov;
  matrix[N, N] L_cov;
  cov =  cov_exp_quad(x, alpha, rho)
                     + diag_matrix(rep_vector(1e-10, N));
  L_cov = cholesky_decompose(cov);
  log_f_predict = L_cov * f_tilde;
}
}

model {
  f_tilde ~ normal(0, 1);
  rho ~ inv_gamma(6.8589, 103.582);
  alpha ~ normal(0, 10);
  y_observed ~ poisson_log(log_f_predict);
}

generated quantities {
  vector[N] f_predict = exp(log_f_predict);
  vector[N] y_predict;
  for (n in 1:N)
    y_predict[n] = poisson_log_rng(log_f_predict[n]);
}

the problem is that, stan told me "Your x dimension is x[N,N], however my data is 1N dimension. when I change the line 3 of the code as:

vector [N] x;

then the stan code is not synthatically correct and give me error:

error in ‘model4cac13fdf77_GP_Poisson_Simplified’ at line 21, column 36

19:   matrix[N, N] cov;
20:   matrix[N, N] L_cov;
21:   cov =  cov_exp_quad(x, alpha, rho)
                                       ^
22:                      + diag_matrix(rep_vector(1e-10, N));

I am not sure why the stan force me define x as [N,N] dimension. but my data is 1N dimension. I have not this problem when I am running code for Normal likelihood.

Can you show the whole error? The cov_exp_quad() function reference is at 6.13 Gaussian Process Covariance Functions | Stan Functions Reference