Recovering class index in mixture model using generated quantities

emanudelfava · March 10, 2020, 3:40pm

Dear Stan users,

I want to fit a two-component mixture model to a sample of antibody count data, assuming that individual antibody counts arise from a mixture of two subpopulations, one for the individuals still susceptible to the infection, the other for individuals who show immunity due to past infection or vaccination.

Below, I attach my code. My problem is related to how to generate, in the generated quantities block, the class index Z for each individual, which tells me to which mixture component each individual, based on his or her antibody count, is assigned.
I guess that the current way of calculating Z, as shown below, is not correct. My estimate of \theta, the vector containing the mixture weights, is (0.25, 0.75), for the susceptible and the immune component, respectively. When generating Z using the categorical_rng function, I obtain that all individuals are assigned to the component 2, as its mixture weight is higher than 0.5.

Any guess on how to correctly estimate Z?

Thanks in advance,
Emanuele

data {
  int<lower=0> K;        // number of mixture components
  int<lower=0> N;        // number of data points
  vector[N] logod;           // observations
  
}

parameters {
  simplex[K] theta;          // mixing weight of immune component
  ordered[K] mu;             // locations of mixture components
  vector<lower=0>[K] sigma;  // scales of mixture components
  
}

model {
  
  // Priors
  mu ~ normal(0, 10);
  sigma ~ normal(0,1);
  theta ~ dirichlet(rep_vector(1,K)); 
  
  // Log joint density f(data, parameters)
  for (n in 1:N){
    target += log_sum_exp(log(theta[1])
                          + normal_lpdf(logod[n] | mu[1], sigma[1]),
                          log(theta[2]) 
                          + normal_lpdf(logod[n] | mu[2], sigma[2]));
  }
  
}
generated quantities{
  vector[N] log_lik; // log-likelihood for calculation of LOO
  int<lower=1> Z[N];          // group index

  for (n in 1:N){
    for(k in 1:K){
      log_lik[n] = log(theta[k]) + normal_lpdf(logod[n] | mu[k], sigma[k]);      
    }
    Z[n] = categorical_rng(theta);
  }
  
}

nhuurre · March 11, 2020, 10:15am

That doesn’t sound right. categorical_rng does not just select the maximum probability category.
If \theta is (0.25,0.75) then one-in-four individuals are assigned to the first category.

Anyway, mixture weight theta is essentially a prior for category assignment and categorical_rng(theta) samples a new individual.
You can recover the categories for observed individuals by applying Bayes’ theorem and sampling the posterior assignment probability. This is straightforward; posterior is equal to normalized likelihood.

generated quantities{
  vector[N] log_lik; // log-likelihood for calculation of LOO
  int<lower=1> Z[N];          // group index

  for (n in 1:N) {
    vector[K] ll;
    for(k in 1:K) {
      ll[k] = log(theta[k]) + normal_lpdf(logod[n] | mu[k], sigma[k]);      
    }
    log_lik[n] = log_sum_exp(ll);
    Z[n] = categorical_rng(exp(ll - log_lik[n]));
  }
}

emanudelfava · March 12, 2020, 3:11pm

Thank you very much for showing me how to adjust my code to correctly computing the group index of each observation and for the explanation.

Topic		Replies	Views
Mixture Models Modeling fitting-issues	9	2043	September 16, 2020
Mixture models: how to get P(data \| cluster) Modeling	13	2288	May 29, 2017
Generated quantities with hold out data and new levels Modeling	1	609	June 12, 2017
Poisson-LogSkewNormal mixture + partial pooling Modeling	3	996	October 14, 2017
Identification issues with Latent Class Analysis/Mixture model Modeling fitting-issues , specification , mixture	25	3192	October 24, 2022

Recovering class index in mixture model using generated quantities

Related Topics