Mixture-Model Inference: how to report hidden states, rather than state proportions?

Dear wonderful Stan community,

I’m conducting a simulation study of a simple mixture model of two Poisson distributions, parameterized \lambda_1 and \lambda_2. Each of the N observations comes from one of these two distributions, according to the hidden mixture state z_n. The weight of these states is controlled by a simplex, p (because there are only two mixture states, this is like the weight of a coin toss). That is, z_n \sim \text{categorial}(p), and y_n \sim \text{Poisson}(\lambda_{z[n]}).

The following program, borrowing heavily from the user manual, successfully infers the rate parameters \lambda and the proportion of the mixtures p. However, I would also like it to infer the state-specific posteriors for those mixtures, the z_n estimates. How could I alter my code to include these estimates in the output?

I tried adding an “ordered[K] z” line to the parameters block, but that crashed everything. I’ve built a similar project in base R, using MCMC, and relied on an explicit z-posterior calculation, using the Law of Total Probability (I think the user manual mentions a similar thing in section 5.3). Perhaps I need to explicitly code this function in Stan, and tell it when to use it? That seems like overkill, hence my question.

Thank you kindly in advance!
~ Max


The following is Stan code:

data {
   int<lower = 1> K; 		// number of mixture components
   int<lower = 1> N; 		// number of data points
   int y[N]; 			// observations
}

parameters {
  simplex[K] p;       		// mixture proportions
  ordered[K] lambda;  		// rates of mixture components
}

model {
    vector[K] log_p = log(p); // cache log calculation
    lambda ~ normal(5, 8);
	for(n in 1:N){
	    vector[K] lps = log_p;
		for(k in 1:K)
		   lps[k] += poisson_lpmf(y[n] | lambda[k]);
		   target += log_sum_exp(lps);
	}
}

The following is R code:

# Generate fake data
lambdas.true <- c(1,13) # god parameters
p <- c(0.3, 0.7) # proportion of mixtures
N <- 150 # number of data points
Zs <- sample(c(1,2), size=N, replace=TRUE, prob=p) # hidden latent state
Y <- rep(NA, N) # data points
for(n in 1:N){
  Y[n] <- rpois(1, lambda=lambdas.true[Zs[n]])
}
library("rstan")
model <- stan_model('simple_mixture.stan')

# pass data to stan and run model
options(mc.cores=4)
fit <- sampling(model, list(K=2, N=N, y=Y, iter=200, chains=4))

# diagnosis of fit
print(fit)

2 Likes

Check out 5.3 Summing out the responsibility parameter | Stan User’s Guide and also 7.2 Change point models | Stan User’s Guide for a concrete example (uses categorical_rng in the generated quantities block to simulate latent discrete values from the posterior).

2 Likes

I don’t think those discuss deriving the point-wise class probabilities of the input data though, do they?

The way I’ve done this in the past is, for each point and class, get the log-probability (across-points) of the class and add the log-likelihood of the point given that class’ distribution, then exponentiate and you have your probability of that point in each class and can apply whatever label decision rule you want.

4 Likes

Great point @mike-lawrence - that could be confusing. Here’s how to compute point-wise class probabilities and sample from the posterior for the latent mixture component in this case:

data {
   int<lower = 1> K; 		// number of mixture components
   int<lower = 1> N; 		// number of data points
   int y[N]; 			// observations
}

parameters {
  simplex[K] p;       		// mixture proportions
  ordered[K] lambda;  		// rates of mixture components
}

transformed parameters {
  vector[K] log_p = log(p);
}

model {
    lambda ~ normal(5, 8);
	for(n in 1:N){
	    vector[K] lps = log_p;
		for(k in 1:K)
		   lps[k] += poisson_lpmf(y[n] | lambda[k]);
		   target += log_sum_exp(lps);
	}
}

generated quantities {
  int<lower = 1, upper = K> z[N];
  
  {
    vector[K] tmp;
    for (i in 1:N) {
      for (k in 1:K) {
        tmp[k] = poisson_lpmf(y[i] | lambda[k]) + log_p[k];
      }
      z[i] = categorical_logit_rng(tmp);
      // or if you want, equivalently
      // z[i] = categorical_rng(exp(tmp - log_sum_exp(tmp)));
    }
  }
}

One note here is that lambda is not constrained to be positive, but probably should be. You could specify a prior for \log(\lambda) that is ordered and unbounded as an alternative, replacing poisson_lpmf with poisson_log_lpmf.

The call to sampling also is probably not doing exactly what you intend – I think you may be looking for something more like this, where iter and chains are their own arguments, instead of being in the list of data:

fit <- sampling(model, list(K = 2, N = N, y = Y), 
                iter=200, chains=4, cores = 4)
4 Likes

mbjoseph and mike-lawrence,

Wow—once again the Stan forums have fortified my faith in humanity! Your solution works wonderfully, and accomplishes what I did in base R with far greater efficiency. While we’re here, maybe I can pick your brain on a similar concept: what if I wanted to add an include-these-members prior, whereby the algorithm deterministically knows the underlying identity of certain members of the mixture. That is, what if we know the identity of certain z[i]?

This prior machinery constitutes the underlying motivation for my original question. I figure we need to talk about certain fixed z[i], which the generated quantities block accomplishes, in order to include them as priors. But the block comes afterward…hardly conducive to a prior. Does that make sense?

(Also, nice catch on the Poisson prior. I changed it from a normal to a gamma(0.3, 3), which I believe also fixes the problem.)

Many, many thanks!

1 Like

Sure - if you know the true mixture component for some z[i] you can use those observed values in the likelihood and generated quantities, rather than summing over all of the mixture components. I usually do a little hack like this, where you add some data Z_obs and treat 0 as unknown/unobserved:

data {
   int<lower = 1> K; 		// number of mixture components
   int<lower = 1> N; 		// number of data points
   int y[N]; 			// observations
   int<lower = 0, upper = K> Z_obs[N]; // 0 represents NA (unknown Z values)
}

transformed data {
  int<lower = 0, upper = 1> Z_known[N];
  
  for (i in 1:N)
    Z_known[i] = Z_obs[i] > 0;
}

parameters {
  simplex[K] p;       		// mixture proportions
  ordered[K] lambda;  		// rates of mixture components
}

transformed parameters {
  vector[K] log_p = log(p);
}

model {
  lambda ~ lognormal(5, 8);
  
	for(i in 1:N){
    if (Z_known[i]) {
      target += poisson_lpmf(y[i] | lambda[Z_obs[i]]) + log_p[Z_obs[i]];
    } else {
      // Z is unknown, so we sum over all possible values
      vector[K] lps = log_p;
      for (k in 1:K) {
        lps[k] += poisson_lpmf(y[i] | lambda[k]);
      }
      target += log_sum_exp(lps);
    }
	}
}

generated quantities {
  int<lower = 1, upper = K> z[N];
  
  {
    vector[K] tmp;
    for (i in 1:N) {
      if (Z_known[i]) {
        z[i] = Z_obs[i];
      } else {
        for (k in 1:K) {
          tmp[k] = poisson_lpmf(y[i] | lambda[k]) + log_p[k];
        }
        z[i] = categorical_logit_rng(tmp);
      }
    }
  }
}

5 Likes

Ah — how elegant! In essence, you’re helping Stan restrict the likelihood calculation using conditional statements. A good example of the transformed data block, too. I’m glad I asked.

Once again, I deeply appreciate your helping me learn this beautiful language (:

2 Likes