Mixture model with explained variable having non-identifiability issue

Hi @Bob_Carpenter,

Thank you for your insights on this problem. I would like to clarify my issue and ensure that I understand each of the points you mentioned.

I want to use the covariates (de, rei in this case) to inform the group means. I attempted this by using ordered intercepts and calculate group means in likelihood section. However, this approach led to the non-identifiability problem I mentioned earlier. My original code below:

parameter{
  ordered[K] alpha;
...
}
model{
...
// likelihood
  for(n in 1:N){
    target += log_sum_exp(log(theta[1]) + 
                              lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
                            log(theta[2]) +
                              lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
  }
}

Regarding your suggestion to use a more general regression.

tried a similar approach by using covariates to imply group means. However, I encountered several issues and would like to clarify a few points with you.

  1. I tried the softmax function you suggested. I understand that it transforms a vector of values to sum to 1. However, the X[n]*beta expression returns a real number, resulting in an error saying ‘Ill-typed arguments supplied to function softmax.’ Could you clarify what you meant?

I have seen others use softmax after declaring a theta vector. Is there a difference between directly declaring a theta simplex and using softmax as in my code?

  1. I tried using the covariates matrix (including a column of 1 to represent the intercept), but I don’t understand how to use your trick of appending a row of 0 in the coefficients, beta. Does N refer to the number of observations or the number of covariates in the model?"

The stan code I have at the moment is below. I try the data matrix and beta (as intercept plus number of covariates). However, I am definitely mis-specify something in the code. Please do not hesitate to point out the errors. Thank you very much.

data {
  int<lower=1> K;  //number of mixture component (clusters) 
  int<lower=1> N;  //      number of observations
  int<lower=1> V;  // intercept plus number of covariates
  vector[N] LW;   //measured leaf width
  matrix[N, V] CO; // variates matrix
}

parameters {
  array[K] vector[V-1] beta_prefix; // Coefficients for intercepts and covariates
  simplex[K] theta; // mixture proportion
  
  real<lower=0> sigma1;  // dispersion parameter
  real<lower=0> sigma2;  // dispersion parameter
}
transformed parameters{
//  simplex[K] theta; // mixture proportion
//  for (n in 1:N){
//    for(i in 1:K){
//    theta = softmax(CO[n, ] * beta[i]);
//    }
//  }
  array[K] vector[V] beta;
  for(i in 1:K){
    beta[i] = append_row(beta_prefix[i], 0); // append a 0 to fix identifiability issue
  }
}

model {
  // priors
  for(i in 1:K){
  target += normal_lpdf(beta[i] | 0, 1);
  }
  
  target += lognormal_lpdf(sigma1 | 0, 2)- 
    normal_lccdf(0 | 0, 1);
  target += lognormal_lpdf(sigma2 | 0, 2)- 
    normal_lccdf(0 | 0, 1);
  
  target += beta_lpdf(theta | 1, 1);

  
  // likelihood
  
  for(n in 1:N){
    target += log_sum_exp(log(theta[1]) + 
                            lognormal_lpdf(LW[n] | CO * beta[1], sigma1),
                          log(theta[2]) +
                            lognormal_lpdf(LW[n] | CO * beta[2], sigma2));
  }
}

Finally, subtracting a lcdf from a lpdf is a suggestion to deal with non-identifiability from chapter 19.1.1 in this book. I am not using it anymore because I try to solve this with ordered intercept method as applied in brms and in this blog post.

Chieh