Marginalising out missing categorical response variable cases provides inaccurate predictor estimates

Christopher-Peterson · July 11, 2019, 10:34pm

It looks theta (in the original model; phi in your snippet) is acting as both the expected value of y and as your mixing parameter; in pretty much every other discrete missing data / mixture model I’ve seen, these are handled by two different sets of parameters.

Try adding some of the following:

data {
  int n_missing; // total number of missing observations
  int<lower = 1, upper = N> which_missing[n_missing]; // which of your observations are missing?  Easier to compute in R and pass as data than do in stan.
}
parameters {
  simplex[3] mixing_probs[n_missing]; // This gets a dirichlet(1,1,1) prior by default.
}
model {
  ...
  for(n in 1:N){
    ...
      // 
      logP_y_eta[1] = categorical_lpmf(1 | theta); 
      logP_y_eta[2] = categorical_lpmf(2 | theta); 
      logP_y_eta[3] = categorical_lpmf(3 | theta);
      target += log_sum_exp(logP_y_eta + log(mixing_probs[which_missing[n]]));
  }
}

Topic		Replies	Views
Missing data in categorical data models Modeling rstan	7	1370	August 12, 2023
Log_mix for missing categorical data Modeling	5	1844	July 8, 2019
Marginalize missing binary outcome variable for GLM Modeling	11	1144	January 31, 2020
Marginalising out missing data Modeling	5	937	June 22, 2018
Marginalize out missing continuous data Modeling techniques , specification	7	1216	February 25, 2023

Marginalising out missing categorical response variable cases provides inaccurate predictor estimates

Related topics