Gaussian Process Regression with Student likelihood

Hello!
I am new to Stan and i was trying to see how it actually work.
I bumped into a paper (maybe relevant) that was speaking about Gaussian Process Regression with
Student likelihood.(many outliers)
I am wondering if this is possible to be implemented in rstan because i was trying and couldnt find
how to generate sample from the non conjugate posterior regarding the parameters .
I read an interpretation with non Gaussian observations but still it is unclear for me.(https://betanalpha.github.io/assets/case_studies/gp_part1/part1.html#33_simulating_from_a_gaussian_process_conditional_on_non-gaussian_observations)
Any thoughts?

Something like this would work. (Basically replace the poisson with the student t everywhere). The big thing for GPs with non-gaussian observations is that there are no fancy tricks so it just looks a lot like a GLM but with a more fancy prior distribution on the (transformed) mean.

data {
  int<lower=1> N_predict;
  real x_predict[N_predict];

  int<lower=1> N_observed;
  int<lower=1, upper=N_predict> observed_idx[N_observed];
  real  y_observed[N_observed];

  real<lower=0> rho;
  real<lower=0> alpha;
}

transformed data {
  matrix[N_predict, N_predict] cov =   cov_exp_quad(x_predict, alpha, rho)
                     + diag_matrix(rep_vector(1e-10, N_predict));
  matrix[N_predict, N_predict] L_cov = cholesky_decompose(cov);
}

parameters {
  vector[N_predict] f_tilde;
  real<lower=0> dof; // Degrees of freedom for t
  real<lower=0> s; // scale parametr for T
}

transformed parameters {
  vector[N_predict] f_predict = L_cov * f_tilde;
}

model {
  f_tilde ~ normal(0, 1);
  for (n in 1:N_observed)
    y_observed[n] ~ student_t(dof, f_predict[observed_idx[n] ], s);
}

generated quantities {
  vector[N_predict] y_predict;
  for (n in 1:N_predict)
    y_predict[n] = student_t_rng(dof, f_predict[observed_idx[n] ], s);
}
2 Likes

Thank you!
The compiler outputs me this below:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for:

student_t(real, vector, real)

Function student_t not found.
error in ‘model1c1844353f43_student_stan’ at line 37, column 61

35:   vector[N_predict] y_predict;
36:   for (n in 1:N_predict)
37:     y_predict[n] = student_t(dof, f_predict[observed_idx], s);
                                                                ^
38: }

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘student.stan’ due to the above error.

Oops! The funciton isn’t vectorized. Hopefully fixed above. (Sorry - on my phone so i can’t compile)

hmm… i am still getting this which is odd…
Function student_t not found.
error in ‘model1c18541e5396_student_stan’ at line 38, column 63

36:   vector[N_predict] y_predict;
37:   for (n in 1:N_predict)
38:     y_predict[n] = student_t(dof, f_predict[observed_idx[n]],s);
                                                                  ^
39: }

ARgh! stupid typo - that should be student_t_rng because it’s drawing a random number. fixed above. sorry

2 Likes

Yeah you 're right!
Thank you :)

I tried to implement the algorithm above to the below schema which generates the target values with a bit of noise in R. (based on an example in this paper " Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification",Neil,1997.

Target Distribution with adjusted deviations

f.x = function(x){
s = sample(c(1, 2), size = 1, replace = TRUE, prob = c(0.95, 0.05))
ifelse(s==1,0.3+0.4x+0.5sin(2.7x)+1.1/(1+x^2)+rnorm(1,0,0.1),
0.3+0.4
x+0.5sin(2.7x)+1.1/(1+x^2)+rnorm(1,0,1))

}
f.x=Vectorize(f.x)

#Generate train and test set

#set.seed(123)
x = seq(-3,3,length=200)
targets = f.x(x)
dataset = data.frame(cbind(x,targets))
train = (data.frame(cbind(train=dataset$x[1:100],targets=dataset$targets[1:100])))
test = (data.frame(cbind(test=dataset$x[101:200],targets=dataset$targets[101:200])))

I assumed Student distributed noise and then i wanted to fit the model above but the fitting wasn’t good…
Because maybe i am missing something,how student_t_rng comes in the pictureand how i should approach this problem?
Thank you