Marginalising out missing categorical response variable cases provides inaccurate predictor estimates

It looks theta (in the original model; phi in your snippet) is acting as both the expected value of y and as your mixing parameter; in pretty much every other discrete missing data / mixture model I’ve seen, these are handled by two different sets of parameters.

Try adding some of the following:

data {
  int n_missing; // total number of missing observations
  int<lower = 1, upper = N> which_missing[n_missing]; // which of your observations are missing?  Easier to compute in R and pass as data than do in stan.
}
parameters {
  simplex[3] mixing_probs[n_missing]; // This gets a dirichlet(1,1,1) prior by default.
}
model {
  ...
  for(n in 1:N){
    ...
      // 
      logP_y_eta[1] = categorical_lpmf(1 | theta); 
      logP_y_eta[2] = categorical_lpmf(2 | theta); 
      logP_y_eta[3] = categorical_lpmf(3 | theta);
      target += log_sum_exp(logP_y_eta + log(mixing_probs[which_missing[n]]));
  }
}
2 Likes