Non-identifiability of mixing weights in a 2-component Gaussian model

Jaslene_Lin · June 25, 2018, 10:11pm

Rplot.pdf (5.3 KB)
Dear stan users:
I am currently constructing a 2-component time-varying Gaussian mixture model using Rstan,the full model is a bit complex and thus right now I just construct a smaller model to examine the model behaviour.
My model is like:
E(y_it) = \lambda_t * N(\beta_{t1}, \sigma) + (1-\lambda_t) * N( \beta_{t2}, \sigma) for all observations i =1: n ( in the real data, n=12 and there is some missing observations, for the simulation study n is set be 50 with the same missing patterns as in the real data)
and t is the time index, currently t=1:72.
the mixing weights and mixture locations are all time-varying with time-invariant and component-shared scale \sigma. Right now, I only focus on estimating \lambda_t and \sigma ( supplying “true” values of \beta_{t1} and \beta_{t2}, to ensure identifiability, all \beta_{t2} >\beta_{t1})
the mixing weights are believed to be slowly and smoothly changing over time and thus I use model it with a bspline with a RW(1) penalty prior ( seen post by http://mc-stan.org/users/documentation/case-studies/splines_in_stan.html).
Right now the model estimates sigma well. But in fact, the “true” \lambda_t is the 1-\lambda_t in the posterior samples. I do not understand why as I already supply the mixture locations \beta_{t1} and \beta_{t2} as data, why there is still label switching? \lambda_t is modelled by inverse logit of \eta_t where \eta_t is the product of the Bspline basis matrix eta_bspline and the coefficients \beta_{\eta}. Can someone please provide some insights? The posterior mean of (1-\lambda_t) is attached, the red line is the simulated “true” values while the black line is the posterior means for (1-\lambda_t)
The stan code is attached below

data {
  
  int<lower=1> N; // number of items that are defined
  
  int<lower=1> K; //number of columns==72 hours in a week 
  
  int<lower=1> D;// indicating number of days fitted
  //for simulation D=3, for real full 2-week data D=14
  
  int<lower=1> Ncol;
  
  int<lower=1,upper=Ncol> kk[N];//hour of week index 1:72
  
  vector[N] dat_complete;//defined LOG-TRANSFORMED observations
  
  matrix[K,2] ordered_overall_mean;
  
  int<lower=0> num_basis;
  
  matrix[K, (num_basis-1)] eta_bspline;

   
  
}



parameters {
  vector[(num_basis-1)] beta_eta_raw;
  
  real<lower=0> tau; \\RW penalty parameter for eta 
  
  real<lower=0> sigma;

  
}

transformed parameters{
  vector[(num_basis-1)] beta_eta;
  
  vector[K] eta;
  
  vector[K] lambda;
  
  beta_eta[1] =beta_eta_raw[1];
  
  for(i in 2:(num_basis-1)){
    beta_eta[i] = beta_eta[i-1] + beta_eta_raw[i] * tau;
  }
  
  eta = eta_bspline* beta_eta;
  
  lambda = inv_logit(eta);
  
}

model{
  
  beta_eta_raw~normal(0,1);
  
  tau~normal(0,1);
  
  sigma~cauchy(0,2.5);
 
  //likelihood  
  
   for(n in 1:N){
      
      
      target +=log_mix(lambda[kk[n]], normal_lpdf(dat_complete[n]| ordered_overall_mean[kk[n],1], sigma),
                     normal_lpdf(dat_complete[n]| ordered_overall_mean[kk[n],2], sigma));
  }
    
 
  
  }

Thank you for your feedback.

bbbales2 · June 26, 2018, 1:26pm

Can you plot a bunch of posterior samples of (1 - \lambda t)? What about this seems bad though? They look like they’re doing the same thing.

Jaslene_Lin · June 26, 2018, 9:41pm

I plotted posterior samples for \lambda_t and 1-\lambda_t the red line is the simulated truth.

I think 1-\lambda_t posterior samples are quite good and looking like they estimate the “truth” fairly well.

I am quite confused why it is 1-\lambda_t rather than \lambda_t that does a good job.

The first plot is for 101 posterior samples of 1-\lambda_t while the second plot is for \lambda_t
post_sample1.pdf (36.7 KB)
post_sample2.pdf (36.3 KB)

bbbales2 · June 27, 2018, 12:50pm

My gut feeling is there is some sort of indexing problem somewhere. You can use print statements in Stan code, so maybe print a few values of ordered_overall_mean and see if they’re what you’d expect?

If that doesn’t get you anywhere, maybe try generating the simulated data with the Stan model? Maybe there’s a difference in the splines you’re using to generate your data and the ones you’re fitting with?

jaslenelin · June 27, 2018, 11:31pm

Thank you for your suggestions. I figured out, the error actually comes from the way I simulate the data. in my simulated data \lambda_t actually corresponds to 2nd component so the model is doing the right thing.

Bob_Carpenter · July 20, 2018, 4:44pm

There’s also a nice case study on mixture models by Michael Betancourt on the Stan web site.

Topic		Replies	Views
Identifiability of Gaussian mixture mode Modeling	24	2871	October 29, 2017
Time-varying mixing weights in 2-component Gaussian mixture model Modeling	2	696	April 13, 2018
Bayesian finite mixture model in stan Modeling	15	2279	September 19, 2017
Trouble with Gaussian Mixture Model Modeling	10	4248	December 7, 2018
Identification of mixture of multivariate normal distributions Modeling fitting-issues	5	2555	June 11, 2018

Non-identifiability of mixing weights in a 2-component Gaussian model

Related topics