Treating a missing count data as a parameter and putting truncated binomial prior for it

Dear Stan users and developers,

I am a new Stan user transitioning from JAGS. In my model, I want to treat a few missing count data points as parameters and put truncated binomial priors on them (some example code is shown below). The “p” parameter in the truncated binomial will be shared in some other places in the model system, and the “N” and truncated boundary “upper” are known data. It seems that Stan does not support this kind of integer parameter specification (i.e., Y_miss) since in the “parameter” section, only “real” type is supported (the code below will result in the error “No distribution ‘binomial’ was found with the correct signature.”).

Does anyone have similar experience or have any suggestions for this issue when using Stan? Thanks very much for any discussions. (In JAGS, this kind of prior can be specified and this model can work.)

data {
 int<lower=1> N;
 int<lower=1, upper=N> upper;
// other data are left out
}
parameter {
 real Y_miss;
 // other parameters are left out
}
model {
  Y_miss ~ binomial(N, p) T[0, upper]; //I want to put truncated binomial prior for this Y_miss parameter
// likelihood function depending on Y_miss (left out)
} 

Sincerely,
Terrence

Welcome! Could you say a bit more about the other aspects of your model, or share the full code? It’s possible that your specific model might have some simplified form for the missing data that others will be able to identify.

At the very least, you could create a model for Y and use that to infer the probabilities of each value of Y_miss. You could then marginalize over those probabilities to account for the uncertainty in the true value. This would be pretty easy and efficient if the rest of your model is straight-forward and upper is modest. It would get inefficient if, for example, upper = 10000.

You should be able to find a handful of topics here on marginalizing unobserved discrete variables. The Stan User Guide’s discussion of finite mixture models is also helpful. 5 Finite Mixtures | Stan User’s Guide

Thanks very much for your reply! I will go to check that Guide later.

Let me provide some more details about the example data and model code. I have many observations of multinomial counts (3 categories), e.g., row 1: (10, 10, 10), row 2: (12, 8, 10), …, row S: (9, 9, 12). My primary target is to estimate the 3 probability parameters corresponding to the 3 multinomial categories. The first S multivariate observations are fully observed and can be fed into the common multinomial likelihood (with N=30, p=p[1:3]). Due to some special reasons, I also collect 5 extra observations with (NA, NA, NA), with informative missingness. The informative missingness means, I know the missing value for category 1 should be smaller than or equal to 10, and the missing value for category 2 is affirmed to be smaller than or equal to 8. Due to this informative missingness, the missing observations should contribute in the model. The strategy I planned for this is, treating the missing category 1 value as a parameter called Y_miss and putting prior, Truncated Binomial(N=30, p=p[1])T[0,10] on it. And then, conditional on the category 1 value (Y_miss), the category 2 count’s corresponding random variable will follow Binomial(N-Y_miss, p=p[2]/(p[2]+p[3])). For also incorporating the information that “category 2 value should be <=8”, we can add the cumulative version of the binomial distribution function above into the likelihood function to help enhance the estimation of p[1:3] (see the sample code below).

data {
 int S;
 int<lower=0> Y[S,3];
}
parameters {
 real<lower=0, upper=1> p[3]; 
 real Y_miss[5]; //the type here is problematic, but I do not know how to deal with this given the request below
}
model {
  // some priors can be put for p[1:3], e.g., Dirichlet

  for (s in 1:S){ //for those S rows of complete data
    target += multinomial_lupmf(Y[s,]|to_vector(p));
  }
  for (i in 1:5){ //for those 5 rows of informative NA's
    Y_miss[i] ~ binomial(30, p[1]) T[0, 10]; //I want to put truncated binomial prior for this Y_miss parameter
    target += binomial_lcdf(8 | 30-Y_miss[i], p[2]/(p[2]+p[3])) //this is my planned likelihood function to handle this informative missing problem, not 100% sure this is appropriate
}

I hope the explanation and the example code clarifies my inquiry better. This model looks a little overparameterized by involving those missing value parameters, but they are nuisance and ancillary ones.

Thanks for any comments on the Stan implementation of this idea or any critiques on my model thoughts for this problem.

Sincerely,
Terrence

Hi Terrence, I would interpret the observation model you describe as “censoring” rather than truncation. The “informative missingness” tells you that the true number of counts in some bin is below some value (i.e., left-censored data). This is for instance analogous to survival analysis, in which right-censoring occurs (we know persons’ or items’ survival up to point in time, after which we stopped observing them). Truncation is a different observation process, i.e., events below/under some threshold do not appear in the data at all.

So, the likelihood for these informatively missing counts in your model should be a cumulative probability of that count being under the thresholds you mention. As explained by @simonbrauer, integrating over all possible counts by means of the HMC itself is not possible in Stan as integer parameters are not allowed. So integration by marginalising over all possible counts is the way to go. Now, I’m not sure that using the binomial distribution here is correct, as counts in any of two of the three categories are correlated with one another. The binomial only applies when if missingness is informative in only one of categories. So, you need a cumulative probability function for a multinomial distribution. I don’t what this is from the top of my head, but imagine it’s manageable. Maybe someone else here or the interweb itself knows.

[EDIT: typo correction]

A quick search for multinomial cumulative distribution function yielded this 1981 (pay walled from the train so haven’t read):

https://www.jstor.org/stable/2240628

The multinomial CDF came up over in the Julia forums a while back. There are a handful of methods available but some are way over my head. Frey wrote some R code for one approach that I think could be re-written in Stan–though it might not be efficient enough to be practically useful in this case. Lebrun describes a more efficient method, as well as a nice summary of other methods and their limitations.

1 Like

With at most a \times b = 80 possible combinations of censoring (a <8 counts in one category and b<10 counts in another), a brute force integration by marginalising may not even be so bad, @simonbrauer?

1 Like

That does seem likely. Given the current model has no additional predictors of the outcome, the probabilities will be the same for all unobserved cases. So @TerrenceT could calculate those probabilities once rather than 5 times.

1 Like