I’m using the measurement error as described in the Stan’s manual. It’s a regression of y ~ x
, with the twist that we don’t observe x
directly. Instead, we only observe x_meas
, which is a noisy measurement of x
, i.e. N(x, tau)
.
data {
...
real x_meas[N]; // measurement of x
real<lower=0> tau; // measurement noise
}
parameters {
real x[N]; // unknown true value
real mu_x; // prior location
real sigma_x; // prior scale
...
}
model {
x ~ normal(mu_x, sigma_x); // prior
x_meas ~ normal(x, tau); // measurement model
y ~ normal(alpha + beta * x, sigma);
...
}
Goal: Having estimated this model, how do I make predictions for y_new
given x_meas_new
?
My thoughts: In order to predict y_new
, I would need x_new
to plug into the regression. However, it’s unclear to me how to get x_new
given x_meas_new
?
Mathematically, x_new
should be a combination of x_meas_new
(data) and mu_x
(hierarchical mean). However, I can’t figure out what code I should write to get x_new
.
2 Likes
Because at the latent x_new
depend on the data you have to infer them along with the nominal covariates.
data {
...
real x_meas[N]; // measurement of x
real x_meas_new[N_new]; // measurement of x
real<lower=0> tau; // measurement noise
}
parameters {
real x[N]; // unknown true value
real x_new[N_new]; // unknown true value
real mu_x; // prior location
real sigma_x; // prior scale
...
}
model {
x ~ normal(mu_x, sigma_x); // prior
x_meas ~ normal(x, tau); // measurement model
x_new ~ normal(mu_x, sigma_x); // prior
x_meas_new ~ normal(x_new, tau); // measurement model
y ~ normal(alpha + beta * x, sigma);
...
}
generated quantities {
real y_new[N_new] = normal_rng(alpha + beta * x_new, sigma);
}
1 Like
Thanks @betanalpha, I thought about doing this, but worried that doing so means x_meas_new
is contributing to the estimation of mu_x
.
Is my worry incorrect? I want to make sure that mu_x
is being estimated based on x_meas
only, not x_meas_new
.
That’s not possible in a self-consistent Bayesian model; you either presume that the population of x
is stationary in which case the old and new x
all inform inferences or you presume that the population is changing and model the change.
Trying to have some only some data inform some parameters isn’t self-consistent mathematically (and I don’t personally recommend it), but there are ways of enforcing it with some methods. It’s somewhat common in JAGS, for exampling, using the cut
functionality. There isn’t currently an easy way of achieving this functionality in Stan.
2 Likes