Non-identifiability of mixing weights in a 2-component Gaussian model

Rplot.pdf (5.3 KB)
Dear stan users:
I am currently constructing a 2-component time-varying Gaussian mixture model using Rstan,the full model is a bit complex and thus right now I just construct a smaller model to examine the model behaviour.
My model is like:
E(y_it) = \lambda_t * N(\beta_{t1}, \sigma) + (1-\lambda_t) * N( \beta_{t2}, \sigma) for all observations i =1: n ( in the real data, n=12 and there is some missing observations, for the simulation study n is set be 50 with the same missing patterns as in the real data)
and t is the time index, currently t=1:72.
the mixing weights and mixture locations are all time-varying with time-invariant and component-shared scale \sigma. Right now, I only focus on estimating \lambda_t and \sigma ( supplying “true” values of \beta_{t1} and \beta_{t2}, to ensure identifiability, all \beta_{t2} >\beta_{t1})
the mixing weights are believed to be slowly and smoothly changing over time and thus I use model it with a bspline with a RW(1) penalty prior ( seen post by http://mc-stan.org/users/documentation/case-studies/splines_in_stan.html).
Right now the model estimates sigma well. But in fact, the “true” \lambda_t is the 1-\lambda_t in the posterior samples. I do not understand why as I already supply the mixture locations \beta_{t1} and \beta_{t2} as data, why there is still label switching? \lambda_t is modelled by inverse logit of \eta_t where \eta_t is the product of the Bspline basis matrix eta_bspline and the coefficients \beta_{\eta}. Can someone please provide some insights? The posterior mean of (1-\lambda_t) is attached, the red line is the simulated “true” values while the black line is the posterior means for (1-\lambda_t)
The stan code is attached below


data {
  
  int<lower=1> N; // number of items that are defined
  
  int<lower=1> K; //number of columns==72 hours in a week 
  
  int<lower=1> D;// indicating number of days fitted
  //for simulation D=3, for real full 2-week data D=14
  
  int<lower=1> Ncol;
  
  int<lower=1,upper=Ncol> kk[N];//hour of week index 1:72
  
  vector[N] dat_complete;//defined LOG-TRANSFORMED observations
  
  matrix[K,2] ordered_overall_mean;
  
  int<lower=0> num_basis;
  
  matrix[K, (num_basis-1)] eta_bspline;

   
  
}



parameters {
  vector[(num_basis-1)] beta_eta_raw;
  
  real<lower=0> tau; \\RW penalty parameter for eta 
  
  real<lower=0> sigma;

  
}

transformed parameters{
  vector[(num_basis-1)] beta_eta;
  
  vector[K] eta;
  
  vector[K] lambda;
  
  beta_eta[1] =beta_eta_raw[1];
  
  for(i in 2:(num_basis-1)){
    beta_eta[i] = beta_eta[i-1] + beta_eta_raw[i] * tau;
  }
  
  eta = eta_bspline* beta_eta;
  
  lambda = inv_logit(eta);
  
}

model{
  
  beta_eta_raw~normal(0,1);
  
  tau~normal(0,1);
  
  sigma~cauchy(0,2.5);
 
  //likelihood  
  
   for(n in 1:N){
      
      
      target +=log_mix(lambda[kk[n]], normal_lpdf(dat_complete[n]| ordered_overall_mean[kk[n],1], sigma),
                     normal_lpdf(dat_complete[n]| ordered_overall_mean[kk[n],2], sigma));
  }
    
 
  
  }



Thank you for your feedback.

Can you plot a bunch of posterior samples of (1 - \lambda t)? What about this seems bad though? They look like they’re doing the same thing.

I plotted posterior samples for \lambda_t and 1-\lambda_t the red line is the simulated truth.

I think 1-\lambda_t posterior samples are quite good and looking like they estimate the “truth” fairly well.

I am quite confused why it is 1-\lambda_t rather than \lambda_t that does a good job.

The first plot is for 101 posterior samples of 1-\lambda_t while the second plot is for \lambda_t
post_sample1.pdf (36.7 KB)
post_sample2.pdf (36.3 KB)

My gut feeling is there is some sort of indexing problem somewhere. You can use print statements in Stan code, so maybe print a few values of ordered_overall_mean and see if they’re what you’d expect?

If that doesn’t get you anywhere, maybe try generating the simulated data with the Stan model? Maybe there’s a difference in the splines you’re using to generate your data and the ones you’re fitting with?

Thank you for your suggestions. I figured out, the error actually comes from the way I simulate the data. in my simulated data \lambda_t actually corresponds to 2nd component so the model is doing the right thing.

1 Like

There’s also a nice case study on mixture models by Michael Betancourt on the Stan web site.