It looks theta (in the original model; phi in your snippet) is acting as both the expected value of y and as your mixing parameter; in pretty much every other discrete missing data / mixture model I’ve seen, these are handled by two different sets of parameters.
Try adding some of the following:
data {
int n_missing; // total number of missing observations
int<lower = 1, upper = N> which_missing[n_missing]; // which of your observations are missing? Easier to compute in R and pass as data than do in stan.
}
parameters {
simplex[3] mixing_probs[n_missing]; // This gets a dirichlet(1,1,1) prior by default.
}
model {
...
for(n in 1:N){
...
//
logP_y_eta[1] = categorical_lpmf(1 | theta);
logP_y_eta[2] = categorical_lpmf(2 | theta);
logP_y_eta[3] = categorical_lpmf(3 | theta);
target += log_sum_exp(logP_y_eta + log(mixing_probs[which_missing[n]]));
}
}