How to make predictions when the predictor is latent?

I’m using the measurement error as described in the Stan’s manual. It’s a regression of y ~ x, with the twist that we don’t observe x directly. Instead, we only observe x_meas, which is a noisy measurement of x, i.e. N(x, tau).

data {
  ...
  real x_meas[N];     // measurement of x
  real<lower=0> tau;  // measurement noise
}
parameters {
  real x[N];          // unknown true value
  real mu_x;          // prior location
  real sigma_x;       // prior scale
  ...
}
model {
  x ~ normal(mu_x, sigma_x);  // prior
  x_meas ~ normal(x, tau);    // measurement model
  y ~ normal(alpha + beta * x, sigma);
  ...
}

Goal: Having estimated this model, how do I make predictions for y_new given x_meas_new?

My thoughts: In order to predict y_new, I would need x_new to plug into the regression. However, it’s unclear to me how to get x_new given x_meas_new?

Mathematically, x_new should be a combination of x_meas_new (data) and mu_x (hierarchical mean). However, I can’t figure out what code I should write to get x_new.

2 Likes

Because at the latent x_new depend on the data you have to infer them along with the nominal covariates.

data {
  ...
  real x_meas[N];     // measurement of x
  real x_meas_new[N_new];     // measurement of x
  real<lower=0> tau;  // measurement noise
}
parameters {
  real x[N];          // unknown true value
  real x_new[N_new];          // unknown true value
  real mu_x;          // prior location
  real sigma_x;       // prior scale
  ...
}
model {
  x ~ normal(mu_x, sigma_x);  // prior
  x_meas ~ normal(x, tau);    // measurement model

  x_new ~ normal(mu_x, sigma_x);  // prior
  x_meas_new ~ normal(x_new, tau);    // measurement model

  y ~ normal(alpha + beta * x, sigma);
  ...
}

generated quantities {
  real y_new[N_new] = normal_rng(alpha + beta * x_new, sigma);
}
1 Like

Thanks @betanalpha, I thought about doing this, but worried that doing so means x_meas_new is contributing to the estimation of mu_x.

Is my worry incorrect? I want to make sure that mu_x is being estimated based on x_meas only, not x_meas_new.

That’s not possible in a self-consistent Bayesian model; you either presume that the population of x is stationary in which case the old and new x all inform inferences or you presume that the population is changing and model the change.

Trying to have some only some data inform some parameters isn’t self-consistent mathematically (and I don’t personally recommend it), but there are ways of enforcing it with some methods. It’s somewhat common in JAGS, for exampling, using the cut functionality. There isn’t currently an easy way of achieving this functionality in Stan.

2 Likes