Avoiding identification issues for multiple instances of imputed data in measurement model

I’m working on a model of advice taking. The full model is fairly complex, so I’m going to start with a simplified version and can include the more complicated version if it comes to it.

The basic idea is I am testing how people take advice based on imputed values for their subjective prior estimates. When everything is observed (no missing data) the model is as follows

b_2 = (1 - w)b_1 + (w)a

Where b_1 is a person’s prior belief, a is the advice they receive, b_2 is their posterior belief, and w is a weighting parameter.

In my version of this model, b_1 is missing data that is imputed. For each person, there are 25 cases where b_{1,x} is observed and 25 cases where b_{1,y} = is missing (I’m mostly interested in the missing cases). I have a separate sample where all 50 items are observed, and because they are relatively highly correlated, I can fit an imputation model

\pmb{b_{1,y}} = \pmb{\alpha Z} + \pmb{b_{1,x}\beta} + \pmb{\epsilon}

I then estimate a measurement model for each person’s advice-taking tendency for the items where their prior beliefs were unobserved, which is given a hierarchical prior for an overall population-level w parameter. In Stan the code looks like this

data{
  int<lower=0> N_obs; //N for fully observed sample
  int<lower=0> N_mis; //N for partially observed sample
  int<lower=0> K_obs; //Number of items always observed
  int<lower=0> K_mis; //Number of pitems partially observed
  matrix[N_obs, K_mis] b1_y_obs; //Prior beliefs for partially observed items and fully observed sample
  matrix[N_obs, K_obs] b1_x_obs;  //Prior beliefs for fully observed items and fully observed sample
  matrix[N_mis, K_obs] b1_x_mis;  //Prior beliefs for fully observed items and partially observed sample
  matrix[N_mis, K_mis] advice_mis; //advice values for partially observed sample
  matrix[N_mis, K_mis] b2_mis; //posterior beliefs for partially observed sample
} 

parameters{
  //imputation model pars
  matrix[K_mis, K_obs] b; //coefficients for imputation model
  vector[K_mis] b0; //intercepts for imputation model
  vector<lower=0>[K_mis] sd_imputation; //standardized SD for imputation
  matrix[N_mis, K_mis] y_mis; //imputed values

  //measurement model pars
  real w; //population parameter for advice weight
  real<lower = 0> sd_w; //variance of judge level advice weight parameters
  vector[N_mis] j_w; //judge level advice weight parameters
  reall<lower = 0> sd_post; //error variance for advice weighting model
}

model {
  target += normal_lpdf(j_w | w, sd_w); //hierarchical prior for judge level parameters
  //(other priors leftt out for simplicity)
  for (k in 1:K_mis){
    //imputation model  
    mu_obs[,k] = b0[k] + (b[k,] * b1_x_obs')';
    mu_mis[,k] = b0[k] + (b[k,] * b1_x_mis')';
    target += normal_lpdf(b1_y_obs[,k] | mu_obs[,k], sd_imputation[k]);
    target += normal_lpdf(b1_y_mis[,k] | mu_mis[,k], sd_imputation[k]);

    //measurement model
    for (n in 1:N_mis){
         target += normal_lpdf(b2_mis[n,k] | (1 - j_w[n]) * b1_y_mis[n,k] + j_w[n] * advice_mis[n,k], sd_post);//,
    }
} 

I can get this model to work just fine. The problem is I also want to include a version that references the absolute difference between the imputed prior beliefs and the advice, because there’s a large literature on how the distance between prior and advice influences advice weighting (see things like “egocentric discounting”). The new model looks something like this (I don’t have enough data to fit the extra parameters hierarchically, particularly in the complex version of the model, but this data is experimental and the judges are exchangeable):

parameters{
  //imputation model pars
  matrix[K_mis, K_obs] b; //coefficients for imputation model
  vector[K_mis] b0; //intercepts for imputation model
  vector<lower=0>[K_mis] sd_imputation; //standardized SD for imputation
  matrix[N_mis, K_mis] y_mis; //imputed values

  //measurement model pars
  real w; //population parameter for advice weight intercept
  real <lower = 0> sd_w; //variance of judge level advice weight intercepts
  vector[N_mis] j_w; //judge level advice weight intercept
  real<lower = 0> sd_post; //error variance for advice weighting model

  real bdif; //slope for difference between imputed prior belief and advice
}

transformed parameters {
  abs_dif = fabs(b1_y_mis - advice_mis);
  matrix[N_mis, K_mis] w_adv;
  for (k in 1:K_mis){
    for (n in 1:N_mis){
         w_adv[n,k] = j_w[n] + bdif * abs_dif[n,k];
         }
    }
}

model {
  target += normal_lpdf(j_w | w, sd_w); //hierarchical prior for judge level parameters
  //(other priors leftt out for simplicity)
  for (k in 1:K_mis){
    //imputation model  
    mu_obs[,k] = b0[k] + (b[k,] * b1_x_obs')';
    mu_mis[,k] = b0[k] + (b[k,] * b1_x_mis')';
    target += normal_lpdf(b1_y_obs[,k] | mu_obs[,k], sd_imputation[k]);
    target += normal_lpdf(b1_y_mis[,k] | mu_mis[,k], sd_imputation[k]);

    //measurement model
    for (n in 1:N_mis){
         target += normal_lpdf(b2_mis[n,k] | (1 - w_adv[n,k]) * b1_y_mis[n,k] + w_adv[n,k] * advice_mis[n,k], sd_post);//
         }
    }
} 

The problem, as far as I can tell, is that the imputed missing values, b1_y_mis, are now showing up in the measurement model more than once, because w_adv is also a function of b1_y_mis. Because the likelihood of this model is also informative for estimating b1_y_mis, the model is not identified (at least, I think this is the problem, and it appears to be a reasonable explanation for what’s happening when I check the pairs plots). Ideally, the likelihood of the measurement model would not influence b1_y_mis, only the imputation model.

This thread seemed to suggest one approach would be to avoid the “fully Bayesian” estimation of doing everything in a single joint likelihood, and instead do something more like multiple imputaition, where random values are first extracted from the imputation model, then a measurement model is fit separately on those to account for the uncertainty in the estimates, which seems feasible, though like it could be slow and time consuming to implement (again especially because the “true” model I’m working with is quite a bit more complex than this, though the problem shows up when I try this version as well).

But this post and reply seemed to be more along the lines of what I originally had in mind, though the follow-up thread seemed to lead somewhere quite different.

Appreciate any advice anyone has, and happy to provide more info! Just figured this would be a good starting point to avoid (hopefully) unnecessary complexity.

What you’re talking about is called a “cut” in BUGS, but we don’t support it in Stan. You’d have to literally move to a multiple imputation model, which is what cut often boils down to. The usual problem leading to people to want to use non-Bayesian approaches with cuts is that there’s a “good” model that specifies a variable (often a metabolic or PK model) and then a “bad” model that specifies the variable (often a disease or PD model), and if you put them together, the estimate gets pulled away from known good values.

So I’m assuming your problem is that you’re adding two effects and only their sum is identified. In this case, you can often impose a stronger prior or figure out a way to parameterize in terms of a base value and a difference, where the differences can be modeled hierarchically often with much tighter variances (e.g., for a differential expression model in RNA-seq, modeling base expression plus difference in expression versus modeling expression separately in each experiment leads to much lower variances on the difference parameters and is easier to identify; or reparameterizing beta(alpha, beta) in terms of alpha/(alpha + beta) and (alpha + beta) is much easier to identify).

There’s something missing in your transformed_parameters block because there’s a variable abs_dif without a type.

Also, you can speed up loops with vectorization, for example, replacing the double loop in transformed parameters with

for (n in 1:N_mis) {
  w_adv[n] = j_w + bidt * abs_dif[n];
}

or perhaps better for memory locality,

for (k in 1:K_mis) {
  w_adv[ , k] = j_w + bidt * abs_dif[ , k];
}

Although it’s fiddly indexing, you could also speed up the target increments int remodel with vectorization and that’s a big speedup.

2 Likes

Thank you! Appreciate the feedback, it’s very helpful.

Re the “cut”/multiple imputation issue, more or less what I thought based on your answers in the prior thread.

In terms of the specific issue here, I think you’re that sum is identified but the separate effects are not. I had been trying to think of a similar way of reparameterizing where imposing a strong prior might both make sense conceptually and solve the problem. I’m going to think about your example some more and see if I can come up with something, as it would be cleaner and save a lot of time over writing up a program to do the multiple imputaiton

Re the transformed parameters block, you’re right, it should be

transformed parameters {
  matrix[N_obs, K_mis] abs_dif = fabs(b1_y_mis - advice_mis);
  matrix[N_mis, K_mis] w_adv;
  for (k in 1:K_mis){
    for (n in 1:N_mis){
         w_adv[n,k] = j_w[n] + bdif * abs_dif[n,k];
         }
    }
}

I had tweaked it from the actual code for simplicity.

Thanks again!