# Gaussian Process Regression with Student likelihood

Hello!
I am new to Stan and i was trying to see how it actually work.
I bumped into a paper (maybe relevant) that was speaking about Gaussian Process Regression with
Student likelihood.(many outliers)
I am wondering if this is possible to be implemented in rstan because i was trying and couldnt find
how to generate sample from the non conjugate posterior regarding the parameters .
I read an interpretation with non Gaussian observations but still it is unclear for me.(https://betanalpha.github.io/assets/case_studies/gp_part1/part1.html#33_simulating_from_a_gaussian_process_conditional_on_non-gaussian_observations)
Any thoughts?

Something like this would work. (Basically replace the poisson with the student t everywhere). The big thing for GPs with non-gaussian observations is that there are no fancy tricks so it just looks a lot like a GLM but with a more fancy prior distribution on the (transformed) mean.

``````data {
int<lower=1> N_predict;
real x_predict[N_predict];

int<lower=1> N_observed;
int<lower=1, upper=N_predict> observed_idx[N_observed];
real  y_observed[N_observed];

real<lower=0> rho;
real<lower=0> alpha;
}

transformed data {
matrix[N_predict, N_predict] cov =   cov_exp_quad(x_predict, alpha, rho)
+ diag_matrix(rep_vector(1e-10, N_predict));
matrix[N_predict, N_predict] L_cov = cholesky_decompose(cov);
}

parameters {
vector[N_predict] f_tilde;
real<lower=0> dof; // Degrees of freedom for t
real<lower=0> s; // scale parametr for T
}

transformed parameters {
vector[N_predict] f_predict = L_cov * f_tilde;
}

model {
f_tilde ~ normal(0, 1);
for (n in 1:N_observed)
y_observed[n] ~ student_t(dof, f_predict[observed_idx[n] ], s);
}

generated quantities {
vector[N_predict] y_predict;
for (n in 1:N_predict)
y_predict[n] = student_t_rng(dof, f_predict[observed_idx[n] ], s);
}
``````
2 Likes

Thank you!
The compiler outputs me this below:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for:

student_t(real, vector, real)

## Function student_t not found. error in ‘model1c1844353f43_student_stan’ at line 37, column 61

``````35:   vector[N_predict] y_predict;
36:   for (n in 1:N_predict)
37:     y_predict[n] = student_t(dof, f_predict[observed_idx], s);
^
38: }
``````

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘student.stan’ due to the above error.

Oops! The funciton isn’t vectorized. Hopefully fixed above. (Sorry - on my phone so i can’t compile)

## hmm… i am still getting this which is odd… Function student_t not found. error in ‘model1c18541e5396_student_stan’ at line 38, column 63

``````36:   vector[N_predict] y_predict;
37:   for (n in 1:N_predict)
38:     y_predict[n] = student_t(dof, f_predict[observed_idx[n]],s);
^
39: }``````

ARgh! stupid typo - that should be `student_t_rng` because it’s drawing a random number. fixed above. sorry

2 Likes

Yeah you 're right!
Thank you :)

I tried to implement the algorithm above to the below schema which generates the target values with a bit of noise in R. (based on an example in this paper " Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification",Neil,1997.

f.x = function(x){
s = sample(c(1, 2), size = 1, replace = TRUE, prob = c(0.95, 0.05))
ifelse(s==1,0.3+0.4x+0.5sin(2.7x)+1.1/(1+x^2)+rnorm(1,0,0.1),
0.3+0.4
x+0.5sin(2.7x)+1.1/(1+x^2)+rnorm(1,0,1))

}
f.x=Vectorize(f.x)

#Generate train and test set

#set.seed(123)
x = seq(-3,3,length=200)
targets = f.x(x)
dataset = data.frame(cbind(x,targets))
train = (data.frame(cbind(train=dataset\$x[1:100],targets=dataset\$targets[1:100])))
test = (data.frame(cbind(test=dataset\$x[101:200],targets=dataset\$targets[101:200])))

I assumed Student distributed noise and then i wanted to fit the model above but the fitting wasn’t good…
Because maybe i am missing something,how student_t_rng comes in the pictureand how i should approach this problem?
Thank you