Dear wonderful Stan community,
I’m conducting a simulation study of a simple mixture model of two Poisson distributions, parameterized \lambda_1 and \lambda_2. Each of the N observations comes from one of these two distributions, according to the hidden mixture state z_n. The weight of these states is controlled by a simplex, p (because there are only two mixture states, this is like the weight of a coin toss). That is, z_n \sim \text{categorial}(p), and y_n \sim \text{Poisson}(\lambda_{z[n]}).
The following program, borrowing heavily from the user manual, successfully infers the rate parameters \lambda and the proportion of the mixtures p. However, I would also like it to infer the state-specific posteriors for those mixtures, the z_n estimates. How could I alter my code to include these estimates in the output?
I tried adding an “ordered[K] z” line to the parameters block, but that crashed everything. I’ve built a similar project in base R, using MCMC, and relied on an explicit z-posterior calculation, using the Law of Total Probability (I think the user manual mentions a similar thing in section 5.3). Perhaps I need to explicitly code this function in Stan, and tell it when to use it? That seems like overkill, hence my question.
Thank you kindly in advance!
~ Max
The following is Stan code:
data {
int<lower = 1> K; // number of mixture components
int<lower = 1> N; // number of data points
int y[N]; // observations
}
parameters {
simplex[K] p; // mixture proportions
ordered[K] lambda; // rates of mixture components
}
model {
vector[K] log_p = log(p); // cache log calculation
lambda ~ normal(5, 8);
for(n in 1:N){
vector[K] lps = log_p;
for(k in 1:K)
lps[k] += poisson_lpmf(y[n] | lambda[k]);
target += log_sum_exp(lps);
}
}
The following is R code:
# Generate fake data
lambdas.true <- c(1,13) # god parameters
p <- c(0.3, 0.7) # proportion of mixtures
N <- 150 # number of data points
Zs <- sample(c(1,2), size=N, replace=TRUE, prob=p) # hidden latent state
Y <- rep(NA, N) # data points
for(n in 1:N){
Y[n] <- rpois(1, lambda=lambdas.true[Zs[n]])
}
library("rstan")
model <- stan_model('simple_mixture.stan')
# pass data to stan and run model
options(mc.cores=4)
fit <- sampling(model, list(K=2, N=N, y=Y, iter=200, chains=4))
# diagnosis of fit
print(fit)