Thanks very much for your reply! I will go to check that Guide later.
Let me provide some more details about the example data and model code. I have many observations of multinomial counts (3 categories), e.g., row 1: (10, 10, 10), row 2: (12, 8, 10), …, row S: (9, 9, 12). My primary target is to estimate the 3 probability parameters corresponding to the 3 multinomial categories. The first S multivariate observations are fully observed and can be fed into the common multinomial likelihood (with N=30, p=p[1:3]). Due to some special reasons, I also collect 5 extra observations with (NA, NA, NA), with informative missingness. The informative missingness means, I know the missing value for category 1 should be smaller than or equal to 10, and the missing value for category 2 is affirmed to be smaller than or equal to 8. Due to this informative missingness, the missing observations should contribute in the model. The strategy I planned for this is, treating the missing category 1 value as a parameter called Y_miss and putting prior, Truncated Binomial(N=30, p=p[1])T[0,10] on it. And then, conditional on the category 1 value (Y_miss), the category 2 count’s corresponding random variable will follow Binomial(N-Y_miss, p=p[2]/(p[2]+p[3])). For also incorporating the information that “category 2 value should be <=8”, we can add the cumulative version of the binomial distribution function above into the likelihood function to help enhance the estimation of p[1:3] (see the sample code below).
data {
int S;
int<lower=0> Y[S,3];
}
parameters {
real<lower=0, upper=1> p[3];
real Y_miss[5]; //the type here is problematic, but I do not know how to deal with this given the request below
}
model {
// some priors can be put for p[1:3], e.g., Dirichlet
for (s in 1:S){ //for those S rows of complete data
target += multinomial_lupmf(Y[s,]|to_vector(p));
}
for (i in 1:5){ //for those 5 rows of informative NA's
Y_miss[i] ~ binomial(30, p[1]) T[0, 10]; //I want to put truncated binomial prior for this Y_miss parameter
target += binomial_lcdf(8 | 30-Y_miss[i], p[2]/(p[2]+p[3])) //this is my planned likelihood function to handle this informative missing problem, not 100% sure this is appropriate
}
I hope the explanation and the example code clarifies my inquiry better. This model looks a little overparameterized by involving those missing value parameters, but they are nuisance and ancillary ones.
Thanks for any comments on the Stan implementation of this idea or any critiques on my model thoughts for this problem.
Sincerely,
Terrence