Out-of-sample cross validation for response variable with measurement error

I would ideally like to use OOS-CV to test the predictive power of a model by comparing it’s posterior predictive distributions to withheld values, but the response values have measurement error associated with them, so a direct comparison of the mean withheld estimate to the posterior predictive intervals doesn’t seem to completely address the problem. I checked a couple FAQ pages, but still haven’t found anything satisfying. I’m sure there is a method for addressing this, but if someone could point me in that direction, I would be grateful.

Can you post the model description and your Stan model code, so I might be able to provide more concrete answer?

Do the held out values have a different measurement error process (i.e. observational model) than the training data?

Certainly, here is my code. The model is a gamma model with mean mu and shape sigma. I modeled the shape parameter directly as a function of seasonal covariates. It’s a fairly vanilla GLM without many bells or whistles, but the observations come with an estimate of observation error which I included as measurement error in the model. I’ve worked with withheld estimates in the past, but it’s usually a withheld point estimate, and I like to assess predictive power (in addition to IC’s) by checking if the posterior predictive distributions of the withheld values contain the withheld values at approximately the proportion of the nominal credible interval (I know this is a frequentist perspective). I just don’t know how to perform that procedure when each withheld observation is not known with certainty.

data {
  int<lower=0> T;                                                               // Total samples : num Months x num Years
  int<lower=0> P;                                                               // Total number of Effort mean (mu) predictors
  int<lower=0> Q;                                                               // Total number of Effort uncertainty (sigma) predictors
  vector[T] Effort;                                                             // Observed effort (angler-trips) 
  vector[T] sigma_Effort;                                                       // Observed effort variance
  matrix[T, P] X;                                                               // Effort mean design matrix
  matrix[T, Q] Z;                                                               // Effort uncertainty design matrix
}

parameters {
  vector[P] beta;                                                               // Coefficients for mu
  vector[Q] rho;                                                                // Coefficient for sigma (log-scale)
  vector<lower=0>[T] Effort_hat;                                                // Vector of estimated 'true' effort
}

transformed parameters {
  vector[T] alpha;                                                              // Effort inverse-scale
  vector[T] mu;                                                                 // Effort mean 
  vector[T] sigma;                                                              // Effort shape
  for (t in 1:T){
    mu[t] = exp(X[t,]*beta);
    sigma[t] = exp(Z[t,]*rho);
    alpha[t] = sigma[t]/mu[t];
  }
}

model {
  for (t in 1:T){
    Effort_hat[t] ~ gamma(sigma[t], alpha[t]);
    Effort[t] ~ normal(Effort_hat[t], sigma_Effort[t]);                         // Assume observed value is an imprecise but unbiased estimate of true value
  }
  // Priors
  beta ~ normal(0,2);
  rho ~ normal(0,2);
}

generated quantities {
  vector[T] pred_Effort;
  for (t in 1:T){
    pred_Effort[t] = gamma_rng(sigma[t], alpha[t]);
  }
}

The withheld observations would conceivably have the same measurement error process. Each observation comes with its own observation variance as estimated using the same stratified random sampling design.

The log_lik computation in the generated quantities block would be

  vector log_lik[T];
  for (t in 1:T){
    log_lik[T] = normal_lpdf(Effort[t], Effort_hat[t], sigma_Effort[t]);                           }

However, as each observation has its own parameter Effort_hat[t] it’s likely that PSIS-LOO would fail. See Roaches case study for how to do integrated-LOO to get reliable results.

Thanks!