Propagate posterior uncertainty as measurement error?

I have a multivariate model in which two responses (A, B) are jointly modeled using a set of predictors (x1, x2). With brms, the model formula is:

bf(A ~ x1 + x2) + bf(B ~ x1 + x2) + gaussian() + set_rescor(FALSE)

Using the fitted model, posterior expectations of A and B are computed for a separate data set, and are then used to calculate a derived quantity X = A*B. This X is then used to predict Y in another model, but I want to accommodate the posterior distribution of X in doing so. In the absence of implementing a fully joint model (which is a little tricky here because the data sets for A/B and Y are separate), I was thinking I might be able to propagate the posterior uncertainty of X to the second model by treating X as measured with ‘measurement error’. Here’s the brms model formula that I have right now to do this:

bf(Y ~ mi(Xmean)) + bf(Xmean | mi(Xsd) ~ 1) + gaussian() + set_rescor(FALSE)

where Xmean and Xsd are the posterior mean and standard deviation of X, respectively. While this model can be readily fit, I have a couple (mostly conceptual) questions/concerns about this procedure that I was hoping to get some input on.

(1) Digging into the Stan code for the brms measurement error model, I see that it uses roughly the following specification for imputing the latent ‘true’ X:

Y_i \sim \text{Normal}(\alpha + \beta X_i, \sigma_Y) \quad [1]

X_i \sim \text{Normal}(\mu_X, \sigma_X) \quad [2]

X_{\text{mean},i} \sim \text{Normal}(X_i, \sigma_{X_{\text{mean}},i}) \quad [3]

Although I take it that this is a common specification for Bayesian imputation, my intuition expects something like X_i \sim \text{Normal}(X_{\text{mean},i}, \sigma_{X_{\text{mean}},i}) for [3] considering that I’m trying to parametrically represent the posterior distribution for X here. Does what I’m thinking make sense, and if so, is there a way to specify the model as such with brms? I’m also not sure what [2] is meant to be doing here. I don’t think I’m interested in the quantities \mu_X and \sigma_X; if anything, I would think these quantities should vary by each case in the data, as in the alternative expression I gave earlier.

(2) Another consequence of (1) seems to be that the posterior distribution of each imputed X_i deviates from the actual posterior distribution of X_i that I’m trying to build into the second model. Again, I realize this is probably expected behavior of Bayesian imputation, but this doesn’t seems to be achieving exactly what I want it to.

(3) I think that introducing posterior uncertainty as measurement error in this way will break any posterior correlations that might exist among elements of X. Is this thinking correct? While I don’t foresee this to be a major issue in this case, it might be something worth keeping in mind.

With all that said, I wanted to know what others think of this approach. The alternative approach I can think of is to do multiple imputation, where I fit the second model to multiple realizations of X using brm_multiple(), but this has the downside of taking very long to iterate over even a small portion (e.g. 10% = 400 samples) of the posterior for X. It looks like there has been a similar discussion here, but I still had these questions/concerns after reading through the thread. Any thoughts or input would be much appreciated!

Here’s the Stan code for the brms measurement error model, in case it’s helpful to have a look at:

// generated with brms 2.20.4
functions {
}
data {
  int<lower=1> N;  // total number of observations
  int<lower=1> N_Y;  // number of observations
  vector[N_Y] Y_Y;  // response variable
  int<lower=1> Ksp_Y;  // number of special effects terms
  int<lower=1> N_Xmean;  // number of observations
  vector[N_Xmean] Y_Xmean;  // response variable
  // data for measurement-error in the response
  vector<lower=0>[N_Xmean] noise_Xmean;
  // information about non-missings
  int<lower=0> Nme_Xmean;
  array[Nme_Xmean] int<lower=1> Jme_Xmean;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  real Intercept_Y;  // temporary intercept for centered predictors
  vector[Ksp_Y] bsp_Y;  // special effects coefficients
  real<lower=0> sigma_Y;  // dispersion parameter
  vector[N_Xmean] Yl_Xmean;  // latent variable
  real Intercept_Xmean;  // temporary intercept for centered predictors
  real<lower=0> sigma_Xmean;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept_Y | 3, 0.1, 2.5);
  lprior += student_t_lpdf(sigma_Y | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(Intercept_Xmean | 3, 0, 2.5);
  lprior += student_t_lpdf(sigma_Xmean | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N_Y] mu_Y = rep_vector(0.0, N_Y);
    // initialize linear predictor term
    vector[N_Xmean] mu_Xmean = rep_vector(0.0, N_Xmean);
    mu_Y += Intercept_Y;
    mu_Xmean += Intercept_Xmean;
    for (n in 1:N_Y) {
      // add more terms to the linear predictor
      mu_Y[n] += (bsp_Y[1]) * Yl_Xmean[n];
    }
    target += normal_lpdf(Y_Y | mu_Y, sigma_Y);
    target += normal_lpdf(Yl_Xmean | mu_Xmean, sigma_Xmean);
  }
  // priors including constants
  target += lprior;
  target += normal_lpdf(Y_Xmean[Jme_Xmean] | Yl_Xmean[Jme_Xmean], noise_Xmean[Jme_Xmean]);
}
generated quantities {
  // actual population-level intercept
  real b_Y_Intercept = Intercept_Y;
  // actual population-level intercept
  real b_Xmean_Intercept = Intercept_Xmean;
}
1 Like

This is not a measurement error model, this is a missing data model. Measurement error models are specified in brms with me(), not mi().

See Predictors with Measurement Error in brms Models — me • brms for examples. I’d guess this will match your use case reasonably well.

There are a bunch of forum posts here e.g.: Using posteriors as new priors discussing potential drawbacks of this approximation (compared to a full joint model), so consider yourself warned :-)

Best of luck with your project!

Thank you for the response @martinmodrak! I appreciate that a full joint model is ideal, but was exploring a sequential approach because I’m unsure of how to specify a joint model when a predictor in one model is a posterior prediction (as opposed to just a parameter) from the other model. If you have any pointers for how to achieve this (especially with brms), that would be much appreciated.

As for me() vs. mi(), I was under the impression that me() is (soft) deprecated in favor of mi() for most use cases including measurement error, but perhaps I am misunderstanding something?

1 Like