Summing out discrete paramters in a mixture. How can I derive that it works for individual observations?


I have some problems with specifying a model and the models listed in the user guide weren’t quite applicable or at least I could not see that they were. Here is my problem:

  1. I observe discrete timepoints of death of some test organisms. I assume that the observed survival times (or. time of observed death - ToD) follow a Gamma distribution.

  2. Additionally I observe that the distribution has three modes, which I can explain by three predictors which produce the different effects.

  3. Furthermore it is censored after the end of the experiment, although some individuals are still alive.

So far I have been working with a finite mixturemodel of Gamma distributions. Which I adapted mainly from the finite mixtures section in the user’s guide and the censoring section.

data {
  int<lower=1> K;                // number of mixture components
  int<lower=1> N;                // number of data points
  real y[N];                     // observations (death counts at day)
  vector[N] UL;
  matrix[N,K] weights;           // matrix of K predictor weights (N K-simplexes)
parameters {    
  vector<lower=0>[K] mu;         // locations of mixture components
  vector<lower=0>[K] sigma;      // scales of mixture components
transformed parameters {
  vector[K] shape;
  vector[K] rate;
  shape = square(mu) ./ square(sigma);
  rate = mu ./ square(sigma); 
model {
  matrix[N,K] log_weight;        // cache log-weights calculation

  mu ~ normal(0, 10);
  sigma ~ normal(0, 10);
  log_weight = log(weights);
  for (n in 1:N) {  
    // weighing                                    
    row_vector[K] lps = log_weight[n];                       
    for (k in 1:K) 
      // censoring
      if(y[n] < UL[n]){
        lps[k] += gamma_lpdf(y[n] | shape[k], rate[k]);
      } else {
        lps[k] += gamma_lccdf(UL[n]  | shape[k], rate[k]); 

    target += log_sum_exp(lps);

This has been working satisfactory until I realized that that each individual time of death is better modeled if it would be caused by one single predictor and not a weighed mixture.

After some more reading in the forum I realized that this is a general issue that discrete parameters can’t (yet) be sampled in Stan and are instead better marginalized out, which is more efficient. So far so good. What I don’t understand at this point is the following:

5.2 Latent Discrete Parameterization

This model is not directly supported by Stan because it involves discrete parameters zn, but Stan can sample μ and σ by summing out the z parameter as described in the next section.
5.3 Summing out the Responsibility Parameter
To implement the normal mixture model outlined in the previous section in Stan, the discrete parameters can be summed out of the model. If Y is a mixture of K normal distributions with locations μk and scales σk with mixing proportions λ in the unit K-simplex, then
p_Y(y~ |~ \lambda, \mu, \sigma) = \sum_{k=1}^K \lambda_k~ normal(y~|~\mu_k, \sigma_k)

The way this is coded in stan appears to model each observation y_n as the sum of all K weighed probabilities. Is this an approximation, which works well for mixtures? I can see that all Y observations can be represented as the quoted formula but I seem to be missing how this also applies to any single observation. Could someone explain this to me? I’m sure I missed something.

At the moment I am trying to implement this model with latent discrete paramters like in the change point model (7.2 - Stan User’s Guide). Is this generally recommendable for my problem or is there a better alternative?

Thank you, Florian

Meanwhile, I have tried to recover the posterior probability of the latent responsibility paramter as described in the corresponding section in the user guide.

generated quantities {
  for ( n in 1:N){
    for (k in 1:K) {
      z[n,k] = gamma_rcensed(y[n], shape[k], rate[k], UL[n]) + 
               categorical_lpmf(k | to_vector(weights[n]));
    // normalize
    z[n] = z[n] - log_sum_exp(z[n]);

where gamma_rcensed is just gamma_lpdf or gamma_lccdf depending on whether y_n < upper~limit (the same as the probability function above)

The values I get out of it seem plausible to me. Could someone help me and check whether I coded this correctly?