Summing out discrete paramters in a mixture. How can I derive that it works for individual observations?

Hi,

I have some problems with specifying a model and the models listed in the user guide weren’t quite applicable or at least I could not see that they were. Here is my problem:

  1. I observe discrete timepoints of death of some test organisms. I assume that the observed survival times (or. time of observed death - ToD) follow a Gamma distribution.

  2. Additionally I observe that the distribution has three modes, which I can explain by three predictors which produce the different effects.

  3. Furthermore it is censored after the end of the experiment, although some individuals are still alive.

So far I have been working with a finite mixturemodel of Gamma distributions. Which I adapted mainly from the finite mixtures section in the user’s guide and the censoring section.

data {
  int<lower=1> K;                // number of mixture components
  int<lower=1> N;                // number of data points
  real y[N];                     // observations (death counts at day)
  vector[N] UL;
  matrix[N,K] weights;           // matrix of K predictor weights (N K-simplexes)
  
}
parameters {    
  vector<lower=0>[K] mu;         // locations of mixture components
  vector<lower=0>[K] sigma;      // scales of mixture components
}
transformed parameters {
  vector[K] shape;
  vector[K] rate;
  shape = square(mu) ./ square(sigma);
  rate = mu ./ square(sigma); 
}
model {
  matrix[N,K] log_weight;        // cache log-weights calculation

  mu ~ normal(0, 10);
  sigma ~ normal(0, 10);
  
  log_weight = log(weights);
  for (n in 1:N) {  
    // weighing                                    
    row_vector[K] lps = log_weight[n];                       
    for (k in 1:K) 
      // censoring
      if(y[n] < UL[n]){
        lps[k] += gamma_lpdf(y[n] | shape[k], rate[k]);
      } else {
        lps[k] += gamma_lccdf(UL[n]  | shape[k], rate[k]); 
      }

    target += log_sum_exp(lps);
  }
}

This has been working satisfactory until I realized that that each individual time of death is better modeled if it would be caused by one single predictor and not a weighed mixture.

After some more reading in the forum I realized that this is a general issue that discrete parameters can’t (yet) be sampled in Stan and are instead better marginalized out, which is more efficient. So far so good. What I don’t understand at this point is the following:

5.2 Latent Discrete Parameterization

This model is not directly supported by Stan because it involves discrete parameters zn, but Stan can sample μ and σ by summing out the z parameter as described in the next section.
.
5.3 Summing out the Responsibility Parameter
To implement the normal mixture model outlined in the previous section in Stan, the discrete parameters can be summed out of the model. If Y is a mixture of K normal distributions with locations μk and scales σk with mixing proportions λ in the unit K-simplex, then
p_Y(y~ |~ \lambda, \mu, \sigma) = \sum_{k=1}^K \lambda_k~ normal(y~|~\mu_k, \sigma_k)

The way this is coded in stan appears to model each observation y_n as the sum of all K weighed probabilities. Is this an approximation, which works well for mixtures? I can see that all Y observations can be represented as the quoted formula but I seem to be missing how this also applies to any single observation. Could someone explain this to me? I’m sure I missed something.

At the moment I am trying to implement this model with latent discrete paramters like in the change point model (7.2 - Stan User’s Guide). Is this generally recommendable for my problem or is there a better alternative?

Thank you, Florian

Meanwhile, I have tried to recover the posterior probability of the latent responsibility paramter as described in the corresponding section in the user guide.

generated quantities {
  for ( n in 1:N){
    for (k in 1:K) {
      z[n,k] = gamma_rcensed(y[n], shape[k], rate[k], UL[n]) + 
               categorical_lpmf(k | to_vector(weights[n]));
    }
    // normalize
    z[n] = z[n] - log_sum_exp(z[n]);
  }
}

where gamma_rcensed is just gamma_lpdf or gamma_lccdf depending on whether y_n < upper~limit (the same as the probability function above)

The values I get out of it seem plausible to me. Could someone help me and check whether I coded this correctly?