Time-varying mixing weights in 2-component Gaussian mixture model


#1

Dear stan users and developers:

I have some problems recovering the ‘known’ parameter values from my proposed model.

My model is a 2-component mixture model where the mixing weights are time-dependent and I model it as an AR(1) process on eta[t] = log(lambda[t] / (1-lambda[t]) and lambda[t] is the mixing weights at time t.
eta[1] = 0
eta[2:TT] = normal( rho * eta[1:(TT-1)], 0.05).

the data y_it ~ lambda[t] * N(mu_t + d_1t, sigma[1] ) + (1-lambda[t]) * N(mu_t + d_2t, sigma[2])
I impose a constraint on the overall mean E(y_it ) = lambda[t] *(mu_t+d_1t) + (1-lambda[t]) * (mu_t + d_2t) = mu_t .

In other words, lambda[t] *d_1t + (1-lambda[t]) * d_2t = 0
mu_t are modeled as B-spline with a cyclic b-spline basis function and b-spline coefficients.
d_1t and d_2t are mixture component deviations and due to the constraint imposed above,
d_2t is modeled as a deterministic term
d_2t = lambda_t /(1-lambda{t) * (-d_1t)

To ensure components are identifiable, I let d_2t > d_1t for all time periods in other words,
d_1t <0 and impose an upper =0 constraint.
The prior on d_1t is another AR(1) process similar to the one imposed on the logit of mixing weights.

I simulate some data from this proposed model and use the data to fit the model,
however, 95% CI does not capture all the parameters.
I suspect it got something to do with the lambda estimation and then I refit the model supplied ‘true’ lambda values and then estimate the rest of the parameters and in this case, all the parameters are well-estimated.
Next, I supply ‘true’ values of all the other parameters and just to estimate eta_t and thus lambda_t, but the model still fails to recover the parameter values.
I know it must be something wrong with my eta_t set up, however, I am kind of lost in where to debug and hoping to gain some insights. Can some one please suggest where the model can potentially go wrong?

I have attached my model and also the simulated data below, thanks in advance.
I also uploaded a graph which illustrates my model, the black line is the underlying overall mean of the 2-component mixture model which is modeled by cyclic b-spline,
the two red lines are mixture component locations and the distance between the red and the black lines are d_1t and d_2t modeled as described above
the mixing weights lambda_t is modeled as inverse_logit eta[t] and follows an AR(1) process

time_varying_mixture_model.r (1.5 KB)
bspline basis construction function.r (3.8 KB)
state_space_true_classical.stan (4.2 KB)the_model.pdf (18.8 KB)

Since, I suspect lambda_t is not estimated well and thus causes overall failure of the model to recover the other parameters.
I changed lambda_t to be just time-independent and simulate from Beta(5,5) distribution
and then in my model I lose AR(1) on eta_t and just estimate lambda directly with prior distribution beta(5,5) -same as I simulated the ‘truth’ and generate the fake data and fit into the model.
This time, surprisingly, all parameters are well-estimated. It seems that I cannot use AR(1) structure on lambda_t, but I would like to model mixing weights are time-dependent as indicated in my real data, can someone please suggest some alternative way of modeling time-dependent mixing weights or my previous AR(1) structure is appropriate but I just have not implemented correctly in Rstan.

Thanks


#2

Since AR(1) processes are imposed on two of the parameters in the model and I came across an old post by Ben Goodrich, where he uses a NCP structure on the parameters that have an AR(1) structure, I implemented this structure in stan model but the poor estimation of mixing weights problem still persist.
In my new model, I simply add eta_0, mu1_0 the initial state and given them a marginal distribution.
define eta_raw, mu1_raw with normal (0,1) and then
eta and mu1 are obtained through

eta[2:TT] = rho[1] * eta[1:(TT-1)] + sigma_ar[1] * eta_raw [1:(TT-1)]
mu1[2:TT] = rho[2] * mu1[1:(TT-1)] + sigma_ar[2] * mu1_raw [1:(TT-1)]

the original post can be found here: https://groups.google.com/forum/#!searchin/stan-users/AR(1)|sort:relevance/stan-users/QnttkCYsqm4/LNMxQAn3CwAJ

It still does not eliminate the poor estimation.
state_space_true_classical_v4.stan (4.8 KB)


#3

It should capture binomial(N, 0.95) of N parameters, but you need lots of computation to get boundaries of 95% intervals right. It’s more efficient to calibrate 50% posterior intervals.

It’s very hard to answer these kinds of diffuse debugging questions quickly, which is why you’re not getting answers.

It’s also easier if you include the model as text with sensible indentation and interline spacing. I opened your model and got this:

//model 1-day 2-component classical data
//b-spline penalty is to be estimated 
data {
  int<lower=0> TT;//# of hours per day=24
  int<lower=0> N;//# obervations per hourly block=12
  int<lower=0> num_basis;//6 for hour of the day effect
  matrix[N, TT] log_obs;// log-transformed obs to fit mixture of 2 normal distn
  matrix[TT,num_basis+1] B_overall; //with 1st column is the intercept, 2:7 bspline coefficients
  matrix[num_basis, num_basis] K_T;
  int<lower=0, upper=1> run_estimation;//if 0 in the simulation mode, if 1 in estimation mode
}

parameters{

vector[TT] eta;//AR(1) logit mixing weights
  
vector<upper=0>[TT] mu1;//AR(1) time-varying mixture location
  
vector<lower=-1, upper=1>[2]      rho;//slope for AR(1) models for logit weights and location1, constrained to ensure stationary time series

vector[num_basis] beta_hrofday;
 
vector<lower=0>[2] sigma_ar;//std dev of AR(1) for eta and mixture deviation 1
 
vector<lower=0>[2] sigma;//std dev of mixture component

real<lower=0> penalty;


}

transformed parameters{
  
  
  
  real deltatheta_hrofday;
  
  vector[num_basis+1] beta_daily;//sum-to-zero hour of the day effect==daily main effect coefficients, where 1st is intercept
  
  deltatheta_hrofday = sum(B_overall[,2:(num_basis+1)] * beta_hrofday)/
    sum(B_overall[,2:(num_basis+1)]);
  
  beta_daily[2:(num_basis+1)] = beta_hrofday - deltatheta_hrofday;
  
  beta_daily[1] = deltatheta_hrofday;
  


}

model{
 vector[TT] lambda;
 vector[TT] mu2;//time-varying mixture location 2;
 vector[TT] y_overall_mean;//u_t in the equation
 vector[TT] mean1;
 vector[TT] mean2;
 
 
 matrix[num_basis+1, num_basis+1] A_hrofday;
 matrix[num_basis+1, num_basis+1] L_hrofday;//cholesky factor of covariance matrix
  
 A_hrofday = diag_matrix(rep_vector(1,num_basis+1));

 A_hrofday[1,1] = penalty;

 A_hrofday[2:(num_basis+1), 2:(num_basis+1)] = penalty * K_T;

 L_hrofday = cholesky_decompose(A_hrofday);//cholesky factor of covariance matrix
  
  
  //prior 
  //prior for AR(1) mixture location
 mu1[1] ~normal(-1,0.05);
 eta[1] ~normal(0,0.05);
 rho~ normal(0,1);

 sigma_ar~cauchy(0,0.5);
 sigma~cauchy(0,0.5);
 
 penalty~gamma(1, 1);
 


 
  //likelihood for AR(1) process
  //a more efficient way to model AR(1) for eta, mu1

eta[2:TT] ~ normal(rho[1] * eta[1:(TT-1)], sigma_ar[1]);
mu1[2:TT] ~ normal(rho[2] * mu1[1:(TT-1)], sigma_ar[2]);

lambda = inv_logit(eta); // if define lambda is a matrix[tt,2]

mu2 = lambda ./ (1-lambda) .* (-mu1);

target += multi_normal_cholesky_lpdf(beta_daily | rep_vector(0, num_basis+1) ,L_hrofday);
  
  
y_overall_mean = B_overall * beta_daily;//u_t in the simulated data
mean1 = y_overall_mean+mu1;
mean2 = y_overall_mean+mu2;
  
//likelihood  
if(run_estimation ==1){
  for(t in 1:TT){
    for(i in 1:N){
  
      
      target +=log_mix(lambda[t], normal_lpdf(log_obs[i,t]| mean1[t], sigma[1]),
                                          normal_lpdf(log_obs[i,t]| mean2[t], sigma[2]));
      
     
    }
  }
}
  
}

generated quantities{
  matrix[TT,2] lambda_gen;
  
  vector[TT] mu2_gen;
  
  vector[TT] y_overall_mean_gen;//u_t in the equation
  
  vector[TT] mean1_gen;
  
  vector[TT] mean2_gen;
  
  vector[TT] log_lik_vec;//loglikelihood per time period
  
  matrix[N,TT] z;
  
  matrix[N,TT] log_obs_sim;
  
  y_overall_mean_gen = (B_overall * beta_daily);
  
  lambda_gen[,1]= inv_logit(eta);
  lambda_gen[,2]= 1-lambda_gen[,1];
  mu2_gen = lambda_gen[,1] ./lambda_gen[,2] .* (-mu1);
    
  mean1_gen = y_overall_mean_gen+mu1;
  mean2_gen = y_overall_mean_gen+mu2_gen;
  
   
   for(t in 1:TT){
      
    for(i in 1:N){
     
    z[i,t] = categorical_rng(to_vector(lambda_gen[t,]));
    
    if(z[i,t]==1){
      log_obs_sim[i,t] = normal_rng(mean1_gen[t],sigma[1]) ;
    }
    else{
      log_obs_sim[i,t] =  normal_rng(mean2_gen[t] ,sigma[2]) ;
    }
  }
   }
    

for(t in 1:TT){
     
log_lik_vec[t] =log_mix(lambda_gen[t,1], normal_lpdf(log_obs[,t]| mean1_gen[t], sigma[1]),
                                          normal_lpdf(log_obs[,t]| mean2_gen[t], sigma[2]));
      
     
    }

  

}