Mixture-Model Inference: how to report hidden states, rather than state proportions?

Dear wonderful Stan community,

I’m conducting a simulation study of a simple mixture model of two Poisson distributions, parameterized \lambda_1 and \lambda_2. Each of the N observations comes from one of these two distributions, according to the hidden mixture state z_n. The weight of these states is controlled by a simplex, p (because there are only two mixture states, this is like the weight of a coin toss). That is, z_n \sim \text{categorial}(p), and y_n \sim \text{Poisson}(\lambda_{z[n]}).

The following program, borrowing heavily from the user manual, successfully infers the rate parameters \lambda and the proportion of the mixtures p. However, I would also like it to infer the state-specific posteriors for those mixtures, the z_n estimates. How could I alter my code to include these estimates in the output?

I tried adding an “ordered[K] z” line to the parameters block, but that crashed everything. I’ve built a similar project in base R, using MCMC, and relied on an explicit z-posterior calculation, using the Law of Total Probability (I think the user manual mentions a similar thing in section 5.3). Perhaps I need to explicitly code this function in Stan, and tell it when to use it? That seems like overkill, hence my question.

Thank you kindly in advance!
~ Max


The following is Stan code:

data {
   int<lower = 1> K; 		// number of mixture components
   int<lower = 1> N; 		// number of data points
   int y[N]; 			// observations
}

parameters {
  simplex[K] p;       		// mixture proportions
  ordered[K] lambda;  		// rates of mixture components
}

model {
    vector[K] log_p = log(p); // cache log calculation
    lambda ~ normal(5, 8);
	for(n in 1:N){
	    vector[K] lps = log_p;
		for(k in 1:K)
		   lps[k] += poisson_lpmf(y[n] | lambda[k]);
		   target += log_sum_exp(lps);
	}
}

The following is R code:

# Generate fake data
lambdas.true <- c(1,13) # god parameters
p <- c(0.3, 0.7) # proportion of mixtures
N <- 150 # number of data points
Zs <- sample(c(1,2), size=N, replace=TRUE, prob=p) # hidden latent state
Y <- rep(NA, N) # data points
for(n in 1:N){
  Y[n] <- rpois(1, lambda=lambdas.true[Zs[n]])
}
library("rstan")
model <- stan_model('simple_mixture.stan')

# pass data to stan and run model
options(mc.cores=4)
fit <- sampling(model, list(K=2, N=N, y=Y, iter=200, chains=4))

# diagnosis of fit
print(fit)

Check out 5.3 Summing out the responsibility parameter | Stan User’s Guide and also 7.2 Change point models | Stan User’s Guide for a concrete example (uses categorical_rng in the generated quantities block to simulate latent discrete values from the posterior).

I don’t think those discuss deriving the point-wise class probabilities of the input data though, do they?

The way I’ve done this in the past is, for each point and class, get the log-probability (across-points) of the class and add the log-likelihood of the point given that class’ distribution, then exponentiate and you have your probability of that point in each class and can apply whatever label decision rule you want.

Great point @mike-lawrence - that could be confusing. Here’s how to compute point-wise class probabilities and sample from the posterior for the latent mixture component in this case:

data {
   int<lower = 1> K; 		// number of mixture components
   int<lower = 1> N; 		// number of data points
   int y[N]; 			// observations
}

parameters {
  simplex[K] p;       		// mixture proportions
  ordered[K] lambda;  		// rates of mixture components
}

transformed parameters {
  vector[K] log_p = log(p);
}

model {
    lambda ~ normal(5, 8);
	for(n in 1:N){
	    vector[K] lps = log_p;
		for(k in 1:K)
		   lps[k] += poisson_lpmf(y[n] | lambda[k]);
		   target += log_sum_exp(lps);
	}
}

generated quantities {
  int<lower = 1, upper = K> z[N];
  
  {
    vector[K] tmp;
    for (i in 1:N) {
      for (k in 1:K) {
        tmp[k] = poisson_lpmf(y[i] | lambda[k]) + log_p[k];
      }
      z[i] = categorical_logit_rng(tmp);
      // or if you want, equivalently
      // z[i] = categorical_rng(exp(tmp - log_sum_exp(tmp)));
    }
  }
}

One note here is that lambda is not constrained to be positive, but probably should be. You could specify a prior for \log(\lambda) that is ordered and unbounded as an alternative, replacing poisson_lpmf with poisson_log_lpmf.

The call to sampling also is probably not doing exactly what you intend – I think you may be looking for something more like this, where iter and chains are their own arguments, instead of being in the list of data:

fit <- sampling(model, list(K = 2, N = N, y = Y), 
                iter=200, chains=4, cores = 4)

mbjoseph and mike-lawrence,

Wow—once again the Stan forums have fortified my faith in humanity! Your solution works wonderfully, and accomplishes what I did in base R with far greater efficiency. While we’re here, maybe I can pick your brain on a similar concept: what if I wanted to add an include-these-members prior, whereby the algorithm deterministically knows the underlying identity of certain members of the mixture. That is, what if we know the identity of certain z[i]?

This prior machinery constitutes the underlying motivation for my original question. I figure we need to talk about certain fixed z[i], which the generated quantities block accomplishes, in order to include them as priors. But the block comes afterward…hardly conducive to a prior. Does that make sense?

(Also, nice catch on the Poisson prior. I changed it from a normal to a gamma(0.3, 3), which I believe also fixes the problem.)

Many, many thanks!

Sure - if you know the true mixture component for some z[i] you can use those observed values in the likelihood and generated quantities, rather than summing over all of the mixture components. I usually do a little hack like this, where you add some data Z_obs and treat 0 as unknown/unobserved:

data {
   int<lower = 1> K; 		// number of mixture components
   int<lower = 1> N; 		// number of data points
   int y[N]; 			// observations
   int<lower = 0, upper = K> Z_obs[N]; // 0 represents NA (unknown Z values)
}

transformed data {
  int<lower = 0, upper = 1> Z_known[N];
  
  for (i in 1:N)
    Z_known[i] = Z_obs[i] > 0;
}

parameters {
  simplex[K] p;       		// mixture proportions
  ordered[K] lambda;  		// rates of mixture components
}

transformed parameters {
  vector[K] log_p = log(p);
}

model {
  lambda ~ lognormal(5, 8);
  
	for(i in 1:N){
    if (Z_known[i]) {
      target += poisson_lpmf(y[i] | lambda[Z_obs[i]]) + log_p[Z_obs[i]];
    } else {
      // Z is unknown, so we sum over all possible values
      vector[K] lps = log_p;
      for (k in 1:K) {
        lps[k] += poisson_lpmf(y[i] | lambda[k]);
      }
      target += log_sum_exp(lps);
    }
	}
}

generated quantities {
  int<lower = 1, upper = K> z[N];
  
  {
    vector[K] tmp;
    for (i in 1:N) {
      if (Z_known[i]) {
        z[i] = Z_obs[i];
      } else {
        for (k in 1:K) {
          tmp[k] = poisson_lpmf(y[i] | lambda[k]) + log_p[k];
        }
        z[i] = categorical_logit_rng(tmp);
      }
    }
  }
}

Ah — how elegant! In essence, you’re helping Stan restrict the likelihood calculation using conditional statements. A good example of the transformed data block, too. I’m glad I asked.

Once again, I deeply appreciate your helping me learn this beautiful language (: