Dear Stan users,
I want to fit a two-component mixture model to a sample of antibody count data, assuming that individual antibody counts arise from a mixture of two subpopulations, one for the individuals still susceptible to the infection, the other for individuals who show immunity due to past infection or vaccination.
Below, I attach my code. My problem is related to how to generate, in the generated quantities block, the class index Z for each individual, which tells me to which mixture component each individual, based on his or her antibody count, is assigned.
I guess that the current way of calculating Z, as shown below, is not correct. My estimate of \theta, the vector containing the mixture weights, is (0.25, 0.75), for the susceptible and the immune component, respectively. When generating Z using the categorical_rng function, I obtain that all individuals are assigned to the component 2, as its mixture weight is higher than 0.5.
Any guess on how to correctly estimate Z?
Thanks in advance,
Emanuele
data {
int<lower=0> K; // number of mixture components
int<lower=0> N; // number of data points
vector[N] logod; // observations
}
parameters {
simplex[K] theta; // mixing weight of immune component
ordered[K] mu; // locations of mixture components
vector<lower=0>[K] sigma; // scales of mixture components
}
model {
// Priors
mu ~ normal(0, 10);
sigma ~ normal(0,1);
theta ~ dirichlet(rep_vector(1,K));
// Log joint density f(data, parameters)
for (n in 1:N){
target += log_sum_exp(log(theta[1])
+ normal_lpdf(logod[n] | mu[1], sigma[1]),
log(theta[2])
+ normal_lpdf(logod[n] | mu[2], sigma[2]));
}
}
generated quantities{
vector[N] log_lik; // log-likelihood for calculation of LOO
int<lower=1> Z[N]; // group index
for (n in 1:N){
for(k in 1:K){
log_lik[n] = log(theta[k]) + normal_lpdf(logod[n] | mu[k], sigma[k]);
}
Z[n] = categorical_rng(theta);
}
}