# How to make predictions when the predictor is latent?

I’m using the measurement error as described in the Stan’s manual. It’s a regression of `y ~ x`, with the twist that we don’t observe `x` directly. Instead, we only observe `x_meas`, which is a noisy measurement of `x`, i.e. `N(x, tau)`.

``````data {
...
real x_meas[N];     // measurement of x
real<lower=0> tau;  // measurement noise
}
parameters {
real x[N];          // unknown true value
real mu_x;          // prior location
real sigma_x;       // prior scale
...
}
model {
x ~ normal(mu_x, sigma_x);  // prior
x_meas ~ normal(x, tau);    // measurement model
y ~ normal(alpha + beta * x, sigma);
...
}
``````

Goal: Having estimated this model, how do I make predictions for `y_new` given `x_meas_new`?

My thoughts: In order to predict `y_new`, I would need `x_new` to plug into the regression. However, it’s unclear to me how to get `x_new` given `x_meas_new`?

Mathematically, `x_new` should be a combination of `x_meas_new` (data) and `mu_x` (hierarchical mean). However, I can’t figure out what code I should write to get `x_new`.

2 Likes

Because at the latent `x_new` depend on the data you have to infer them along with the nominal covariates.

``````data {
...
real x_meas[N];     // measurement of x
real x_meas_new[N_new];     // measurement of x
real<lower=0> tau;  // measurement noise
}
parameters {
real x[N];          // unknown true value
real x_new[N_new];          // unknown true value
real mu_x;          // prior location
real sigma_x;       // prior scale
...
}
model {
x ~ normal(mu_x, sigma_x);  // prior
x_meas ~ normal(x, tau);    // measurement model

x_new ~ normal(mu_x, sigma_x);  // prior
x_meas_new ~ normal(x_new, tau);    // measurement model

y ~ normal(alpha + beta * x, sigma);
...
}

generated quantities {
real y_new[N_new] = normal_rng(alpha + beta * x_new, sigma);
}
``````
1 Like

Thanks @betanalpha, I thought about doing this, but worried that doing so means `x_meas_new` is contributing to the estimation of `mu_x`.

Is my worry incorrect? I want to make sure that `mu_x` is being estimated based on `x_meas` only, not `x_meas_new`.

That’s not possible in a self-consistent Bayesian model; you either presume that the population of `x` is stationary in which case the old and new `x` all inform inferences or you presume that the population is changing and model the change.

Trying to have some only some data inform some parameters isn’t self-consistent mathematically (and I don’t personally recommend it), but there are ways of enforcing it with some methods. It’s somewhat common in JAGS, for exampling, using the `cut` functionality. There isn’t currently an easy way of achieving this functionality in Stan.

2 Likes