 # Summing out discrete paramters in a mixture. How can I derive that it works for individual observations?

Hi,

I have some problems with specifying a model and the models listed in the user guide weren’t quite applicable or at least I could not see that they were. Here is my problem:

1. I observe discrete timepoints of death of some test organisms. I assume that the observed survival times (or. time of observed death - ToD) follow a Gamma distribution.

2. Additionally I observe that the distribution has three modes, which I can explain by three predictors which produce the different effects.

3. Furthermore it is censored after the end of the experiment, although some individuals are still alive.

So far I have been working with a finite mixturemodel of Gamma distributions. Which I adapted mainly from the finite mixtures section in the user’s guide and the censoring section.

data {
int<lower=1> K;                // number of mixture components
int<lower=1> N;                // number of data points
real y[N];                     // observations (death counts at day)
vector[N] UL;
matrix[N,K] weights;           // matrix of K predictor weights (N K-simplexes)

}
parameters {
vector<lower=0>[K] mu;         // locations of mixture components
vector<lower=0>[K] sigma;      // scales of mixture components
}
transformed parameters {
vector[K] shape;
vector[K] rate;
shape = square(mu) ./ square(sigma);
rate = mu ./ square(sigma);
}
model {
matrix[N,K] log_weight;        // cache log-weights calculation

mu ~ normal(0, 10);
sigma ~ normal(0, 10);

log_weight = log(weights);
for (n in 1:N) {
// weighing
row_vector[K] lps = log_weight[n];
for (k in 1:K)
// censoring
if(y[n] < UL[n]){
lps[k] += gamma_lpdf(y[n] | shape[k], rate[k]);
} else {
lps[k] += gamma_lccdf(UL[n]  | shape[k], rate[k]);
}

target += log_sum_exp(lps);
}
}


This has been working satisfactory until I realized that that each individual time of death is better modeled if it would be caused by one single predictor and not a weighed mixture.

After some more reading in the forum I realized that this is a general issue that discrete parameters can’t (yet) be sampled in Stan and are instead better marginalized out, which is more efficient. So far so good. What I don’t understand at this point is the following:

5.2 Latent Discrete Parameterization

This model is not directly supported by Stan because it involves discrete parameters zn, but Stan can sample μ and σ by summing out the z parameter as described in the next section.
.
5.3 Summing out the Responsibility Parameter
To implement the normal mixture model outlined in the previous section in Stan, the discrete parameters can be summed out of the model. If Y is a mixture of K normal distributions with locations μk and scales σk with mixing proportions λ in the unit K-simplex, then
p_Y(y~ |~ \lambda, \mu, \sigma) = \sum_{k=1}^K \lambda_k~ normal(y~|~\mu_k, \sigma_k)

The way this is coded in stan appears to model each observation y_n as the sum of all K weighed probabilities. Is this an approximation, which works well for mixtures? I can see that all Y observations can be represented as the quoted formula but I seem to be missing how this also applies to any single observation. Could someone explain this to me? I’m sure I missed something.

At the moment I am trying to implement this model with latent discrete paramters like in the change point model (7.2 - Stan User’s Guide). Is this generally recommendable for my problem or is there a better alternative?

Thank you, Florian

Meanwhile, I have tried to recover the posterior probability of the latent responsibility paramter as described in the corresponding section in the user guide.

generated quantities {
for ( n in 1:N){
for (k in 1:K) {
z[n,k] = gamma_rcensed(y[n], shape[k], rate[k], UL[n]) +
categorical_lpmf(k | to_vector(weights[n]));
}
// normalize
z[n] = z[n] - log_sum_exp(z[n]);
}
}


where gamma_rcensed is just gamma_lpdf or gamma_lccdf depending on whether y_n < upper~limit (the same as the probability function above)

The values I get out of it seem plausible to me. Could someone help me and check whether I coded this correctly?