I’m very new to Stan. I have a data set of x vs. y with estimated errors (sigma_x and sigma_y). I want to estimate a slope and intercept (with a probability distribution) that best explains my data. I wrote the following model and it generates a slope and intercept distribution that makes sense to me but I need some trained eyes to have a look if possible. Particularly, I just want to make sure there isn’t anything improper about the way I tried to incorporate sigma_x into the model by generating the parameter x_new and then sampling y based on x_new. Also, I declare alpha and beta in the model block to have normal distributions but the predictions naturally center around different values (0.48 and 0.001 in this case). Am I doing something unnecessary here?
I use the mean of y_pred to calculate a best estimate line through he data, and std of y_pred to generate confidence bands around it. I also generated y_pred2, which takes into account variance in x. I use y_pred2 to generate a cloud of all realizations behind the data, which in my mind provides a better estimate of the overall uncertainties of the linear regression.
I would like to know if there is anything non-sensical about what I’m doing.
int<lower=0> N; //extract parameter N from data
vector[N] x; //extract x variable from data
vector[N] sigma_x; //extract uncertainty in x from data
vector[N] y; //extract y variable from data
vector[N] sigma_y; //extract uncertainty in y from data
real beta; //declare beta0 as continuous unbounded probability distribution
real alpha; //declare beta1 as continuous unbounded probability distribution
beta ~ normal(0, 100); //prior on beta0
alpha ~ normal(1, 1); //prior on beta1
x_new ~ normal(x, sigma_x); //account for error in x
y ~ normal(beta + alphax_new, sigma_y); //likelihood
vector[N] y_pred; //y_pred vector
vector[N] y_pred2; //y_pred2 vector
y_pred = beta + alphax; //posterior for “true y” given x
y_pred2 = beta + alpha*x_new;