Hello!
I am new to Stan and i was trying to see how it actually work.
I bumped into a paper (maybe relevant) that was speaking about Gaussian Process Regression with
Student likelihood.(many outliers)
I am wondering if this is possible to be implemented in rstan because i was trying and couldnt find
how to generate sample from the non conjugate posterior regarding the parameters .
I read an interpretation with non Gaussian observations but still it is unclear for me.(https://betanalpha.github.io/assets/case_studies/gp_part1/part1.html#33_simulating_from_a_gaussian_process_conditional_on_non-gaussian_observations)
Any thoughts?
Something like this would work. (Basically replace the poisson with the student t everywhere). The big thing for GPs with non-gaussian observations is that there are no fancy tricks so it just looks a lot like a GLM but with a more fancy prior distribution on the (transformed) mean.
data {
int<lower=1> N_predict;
real x_predict[N_predict];
int<lower=1> N_observed;
int<lower=1, upper=N_predict> observed_idx[N_observed];
real y_observed[N_observed];
real<lower=0> rho;
real<lower=0> alpha;
}
transformed data {
matrix[N_predict, N_predict] cov = cov_exp_quad(x_predict, alpha, rho)
+ diag_matrix(rep_vector(1e-10, N_predict));
matrix[N_predict, N_predict] L_cov = cholesky_decompose(cov);
}
parameters {
vector[N_predict] f_tilde;
real<lower=0> dof; // Degrees of freedom for t
real<lower=0> s; // scale parametr for T
}
transformed parameters {
vector[N_predict] f_predict = L_cov * f_tilde;
}
model {
f_tilde ~ normal(0, 1);
for (n in 1:N_observed)
y_observed[n] ~ student_t(dof, f_predict[observed_idx[n] ], s);
}
generated quantities {
vector[N_predict] y_predict;
for (n in 1:N_predict)
y_predict[n] = student_t_rng(dof, f_predict[observed_idx[n] ], s);
}
Thank you!
The compiler outputs me this below:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
student_t(real, vector, real)
Function student_t not found.
error in ‘model1c1844353f43_student_stan’ at line 37, column 61
35: vector[N_predict] y_predict;
36: for (n in 1:N_predict)
37: y_predict[n] = student_t(dof, f_predict[observed_idx], s);
^
38: }
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘student.stan’ due to the above error.
Oops! The funciton isn’t vectorized. Hopefully fixed above. (Sorry - on my phone so i can’t compile)
hmm… i am still getting this which is odd…
Function student_t not found.
error in ‘model1c18541e5396_student_stan’ at line 38, column 63
36: vector[N_predict] y_predict;
37: for (n in 1:N_predict)
38: y_predict[n] = student_t(dof, f_predict[observed_idx[n]],s);
^
39: }
ARgh! stupid typo - that should be student_t_rng
because it’s drawing a random number. fixed above. sorry
Yeah you 're right!
Thank you :)
I tried to implement the algorithm above to the below schema which generates the target values with a bit of noise in R. (based on an example in this paper " Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification",Neil,1997.
Target Distribution with adjusted deviations
f.x = function(x){
s = sample(c(1, 2), size = 1, replace = TRUE, prob = c(0.95, 0.05))
ifelse(s==1,0.3+0.4x+0.5sin(2.7x)+1.1/(1+x^2)+rnorm(1,0,0.1),
0.3+0.4x+0.5sin(2.7x)+1.1/(1+x^2)+rnorm(1,0,1))
}
f.x=Vectorize(f.x)
#Generate train and test set
#set.seed(123)
x = seq(-3,3,length=200)
targets = f.x(x)
dataset = data.frame(cbind(x,targets))
train = (data.frame(cbind(train=dataset$x[1:100],targets=dataset$targets[1:100])))
test = (data.frame(cbind(test=dataset$x[101:200],targets=dataset$targets[101:200])))
I assumed Student distributed noise and then i wanted to fit the model above but the fitting wasn’t good…
Because maybe i am missing something,how student_t_rng comes in the pictureand how i should approach this problem?
Thank you