Mixture model with explained variable having non-identifiability issue

ChiehTony · May 29, 2024, 3:43am

Dear Stan community,

I am experiencing issues with fitting a mixture regression model with explanatory variables. I believe the problem might be non-identifiability, which becomes more pronounced as the model complexity increases. I’ve tried addressing this by using non-exchangeable priors and pulsive priors, but neither approach has yielded satisfactory results.

There doesn’t seem to be much discussion on mixture models with explanatory variables, so I am posting my problem and a detailed description here. I am relatively new to Stan and some of the mathematical terminology, so please feel free to correct any mistakes. Any suggestions and insights would be greatly appreciated!

Back ground information:
I am trying to fit a mixture regression model on plant functional trait, such as leaf width, biomass, etc. The reason of using mixture regression is because some traits clearly has two or three forms (See histogram below).

We suspect that each form reacts differently to environmental covariates. Therefore, I am trying to fit a mixture regression model composed of two lognormal distributions, each with its own set of coefficients for the environmental covariates.
Here is the data file LW_mix_2var.RData (1.4 KB) his file contains a response variable (LW) and two explanatory variables (de, rei). This data list can be directly fed into the Stan code provided below.

Approaches:
I start from the easiest setting, a model without any explained variable and use order mu to solve the labeling Degeneracy problem. This model provide nice estimation of mu and the clustering proportion is reasonable (around 3:7).

data {
  int<lower=1> K;  //number of mixture component (clusters) 
  int<lower=1> N;  //      number of observations
  real LW[N];   //measured leaf width
}

parameters {
  simplex[K] theta; // mixture proportion
  vector<lower=0>[K] sigma;  // dispersion parameter
  ordered[K] mu;
  }

model {
  // priors
  
  vector[K] log_theta = log(theta);  // cache log calculation
  sigma ~ lognormal(0, 2);
  mu ~ normal(0, 10);

  for (n in 1:N) {
    vector[K] lps = log_theta;
    for (k in 1:K)
    lps[k] += normal_lpdf(LW[n] | mu[k], sigma[k]);
    target += log_sum_exp(lps);
  }
}

However, I found that the ordered mu can’t be easily transform to model with explained variables, at least not with my current skill level. I followed chapter 19 in this book and form the model with 2 explain variables. Here, I use non-exchangeable prior for intercept to break labeling degeneracy:

  target += normal_lpdf(alpha | 0, 1);
  target += normal_lpdf(gamma | 0, 1) - 
    normal_lcdf(alpha | 0, 1);

Full stan code is here

data {
  int<lower=1> K;  //number of mixture component (clusters) 
  int<lower=1> N;  //      number of observations
  vector[N] LW;   //measured leaf width
  vector[N] de;
  vector[N] rei;
}

parameters {
  real bde1;
  real bde2;
  real brei1;
  real brei2;
  
  real alpha;
  real<upper = alpha> gamma;
  
  simplex[K] theta; // mixture proportion
  real<lower=0> sigma1;  // dispersion parameter
  real<lower=0> sigma2;  // dispersion parameter
}

model {
  // priors
  
  target += normal_lpdf(bde1 | 0, 1);
  target += normal_lpdf(bde2 | 0, 1);
  target += normal_lpdf(brei1 | 0, 1);
  target += normal_lpdf(brei2 | 0, 1);
  
  target += normal_lpdf(alpha | 0, 1);
  target += normal_lpdf(gamma | 0, 1) - 
    normal_lcdf(alpha | 0, 1);
  
  target += lognormal_lpdf(sigma1 | 0, 1) - 
    normal_lccdf(0 | 0, 1);
  target += lognormal_lpdf(sigma2 | 0, 1) -
    normal_lccdf(0 | 0, 1);
  
  target += beta_lpdf(theta | 1, 1);
  
  // likelihood
  
  for(n in 1:N){
    target += log_sum_exp(log(theta[1]) + 
                              lognormal_lpdf(LW[n] | alpha + de[n] * bde1 + rei[n] * brei1, sigma1),
                            log(theta[2]) +
                              lognormal_lpdf(LW[n] | gamma + de[n] * bde2 + rei[n] * brei2, sigma2));
  }
}

This model produced similar clustering proportions, and estimation as previous model (only contain respond variable). Unfortunately, the labeling problem was not resolved by the prior, leading to poor chain behavior, terrible Rhat values, and low effective sample sizes.

                mean         sd        5.5%      94.5%     n_eff     Rhat4
bde1     -0.50078036 0.36323954 -0.89303802 -0.0910946  2.024952  9.140970
bde2      0.56871652 0.37493290  0.07015436  1.1625266  2.025801  8.910180
brei1    -0.04276832 0.08592037 -0.16398533  0.1223121  2.698703  1.941315
brei2     0.02580702 0.11141669 -0.08710628  0.2280988  2.162180  3.529945
alpha     0.56568359 0.40147285 -0.09748634  1.0737749  2.013091 13.228487
gamma    -0.35791068 0.14097606 -0.61514816 -0.2268431  2.107936  4.248952
theta[1]  0.37051511 0.19125896  0.20693312  0.7235836  2.043524  6.719915
theta[2]  0.62948489 0.19125896  0.27641641  0.7930669  2.043524  6.719915
sigma1    0.29554891 0.25886354  0.12517962  0.7719165  2.016440 11.399809

I followed the guidance in this Mixture Models post and add the repulsive prior. This additional step did help the chain behavior but it significantly affected the estimation of clustering proportion and coefficients.
Full stan code with repulsive model

functions {
  real potential(real x, real y) {
    return (1 - exp(-squared_distance(x, y)));
  }
}

data {
  int<lower=1> K;  //number of mixture component (clusters) 
  int<lower=1> N;  //      number of observations
  vector[N] LW;   //measured leaf width
  vector[N] de;
  vector[N] rei;
}

parameters {
  real bde1;
  real bde2;
  real brei1;
  real brei2;
  
  real alpha;
  real<upper = alpha> gamma;
  
  simplex[K] theta; // mixture proportion
  real<lower=0> sigma1;  // dispersion parameter
  real<lower=0> sigma2;  // dispersion parameter
  
  positive_ordered[K] p;
}

model {
  // priors
  target += normal_lpdf(p | 0, 10);
  
  target += normal_lpdf(bde1 | 0, 1);
  target += normal_lpdf(bde2 | 0, 1);
  target += normal_lpdf(brei1 | 0, 1);
  target += normal_lpdf(brei2 | 0, 1);
  
  //target += normal_lpdf(alpha | 0, 1);
  //target += normal_lpdf(gamma | 0, 1) - 
  // normal_lcdf(alpha | 0, 1);
  target += normal_lpdf(alpha | 0, 1);
  target += normal_lpdf(gamma | 0, 1);
  
  target += lognormal_lpdf(sigma1 | 0, 1) - 
   normal_lccdf(0 | 0, 1);
  target += lognormal_lpdf(sigma2 | 0, 1) -
   normal_lccdf(0 | 0, 1);
  
  target += beta_lpdf(theta | 1, 1);
  
  // likelihood
  vector[K] ps;
  for(n in 1:N){
    ps[1] = log(theta[1]) + lognormal_lpdf(LW[n] | alpha + de[n] * bde1 + rei[n] * brei1, sigma1);
    ps[2] = log(theta[2]) + lognormal_lpdf(LW[n] | gamma + de[n] * bde2 + rei[n] * brei2, sigma2);
  }
   target += log_sum_exp(ps + log(potential(p[1], p[2])));
}

and the result of this version

                  mean        sd        5.5%      94.5%    n_eff     Rhat4
bde1      0.0896689435 0.9367895 -1.40200595  1.6043320 1884.749 0.9986699
bde2      0.0007346021 0.9309193 -1.49240790  1.4591302 2025.558 0.9993420
brei1     0.1362264287 0.9445447 -1.40160450  1.6530227 1519.362 1.0001007
brei2    -0.0668727475 0.9648211 -1.63400785  1.4601342 1937.525 0.9988794
alpha     0.4534608600 0.7597606 -0.68611967  1.7174675 1195.345 1.0008294
gamma    -0.5860839531 0.7354700 -1.78427745  0.5600027 1990.181 0.9996545
theta[1]  0.4896182546 0.2815705  0.05829443  0.9371015 1660.950 0.9986631
theta[2]  0.5103817460 0.2815705  0.06289817  0.9417057 1660.950 0.9986631
sigma1    1.4061470344 1.8500527  0.18286187  4.2627391 1488.600 1.0031409
sigma2    1.3035895523 1.4137183  0.19231260  3.6827207 1328.768 1.0020604
p[1]      4.6711393356 3.7555106  0.36324149 11.7720050 2700.857 0.9982377
p[2]     11.9647803050 5.8779740  3.95653555 21.9628990 2526.463 0.9991870

Thank you for your time and help! Any advice on improving my approach or resolving the non-identifiability issue would be greatly appreciated.

Best regards,
Chieh

desislava · June 9, 2024, 6:44am

Hello @ChiehTony. Your example is interesting as the two mixture components are clearly identifiable “by eye”. So I took a look at the explanatory variables, de and rei. It seems they may not contain much information to discriminate between the locations of the components. I’ve plotted de and rei against the outcome LW to illustrate.

Aside: There is additional structure to the data as there are only 29 unique combinations of (de, rei) values in a dataset of 364 observations.

So I suspect the explanatory variables lead to the poor sampling in approaches #2 and #3. The priors may suggests that one component is centered at 0.5 and the other at 3 but the data indicates we have little idea where the components are located based on de and rei.

You can test your code first by adding a fake predictor X that discriminates between the clusters. Once you have that working, replace X with de and rei.

ChiehTony · July 26, 2024, 1:27am

Hello @desislava,
Thank you so much for your reply. Sorry for the late response, I was in a conference and just made a some tests according to your suggestions. Please correct me if I mis understood any part.

Here are the three things I tried:

Use the site average instead of each individual measurement to make reduce the problem with too few unique combination of environmental factors.
Create dummy variables that have 2 clusters according to the visually identified growth form (abb) to discriminate the Leaf width

for(i in 1:nrow(HU_GLM)){
  if(HU_GLM$abb[i] == "HUN"){
    HU_GLM$Dummy[i] = rnorm(1, mean = 0.5, sd = 0.2)
    HU_GLM$Dummy2[i] = rnorm(1, mean = 0.7, sd = 0.2)
  } else{
    HU_GLM$Dummy[i] = rnorm(1, mean = -0.5, sd = 0.2)
    HU_GLM$Dummy2[i] = rnorm(1, mean = -0.7, sd = 0.2)
  }
}

Re-run the model with 2 dummy, 1 real factor + 1 dummy, or 2 real factors with the stan code below. I just replace the de or rei according with the dummy variables created above and this model use ordered intercept. Model runs with cmdstanr with 30000 warm up and 2000 samples.

data {
  int<lower=1> K;  //number of mixture component (clusters) 
  int<lower=1> N;  //      number of observations
  vector[N] LW;   //measured leaf width
  vector[N] de;
  vector[N] rei;
}

parameters {
  real bde1;
  real bde2;
  real brei1;
  real brei2;
  
  //real alpha;
  //real<upper = alpha> gamma;
  ordered[K] alpha;
  
  simplex[K] theta; // mixture proportion
  real<lower=0> sigma1;  // dispersion parameter
  real<lower=0> sigma2;  // dispersion parameter
}

model {
  // priors
  
  target += normal_lpdf(bde1 | 0, 1);
  target += normal_lpdf(bde2 | 0, 1);
  target += normal_lpdf(brei1 | 0, 1);
  target += normal_lpdf(brei2 | 0, 1);
  
  target += normal_lpdf(alpha | 0, 1);
  //target += normal_lpdf(gamma | 0, 1) - 
  //  normal_lcdf(alpha | 0, 1);
  
  target += lognormal_lpdf(sigma1 | 0, 1) - 
    normal_lccdf(0 | 0, 1);
  target += lognormal_lpdf(sigma2 | 0, 1) -
    normal_lccdf(0 | 0, 1);
  
  target += beta_lpdf(theta | 1, 1);
  
  // likelihood
  
  for(n in 1:N){
    target += log_sum_exp(log(theta[1]) + 
                              lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
                            log(theta[2]) +
                              lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
  }
}

The model with two dummy variables (de and rei) has a funny behavior. The MCMC trace plots have a lots of pointed peak. The mixture proportion (theta) are close to 0.5 and 0.5.

          mean   sd  5.5% 94.5% rhat ess_bulk
bde1     -0.62 0.55 -1.37  0.26 1.00  1085.04
bde2     -0.51 0.42 -1.23  0.00 1.00  1628.43
brei1    -0.74 0.49 -1.30  0.08 1.00  1377.25
brei2    -0.54 0.36 -1.02  0.04 1.01  1140.95
alpha[1] -0.03 0.34 -0.73  0.24 1.00   471.93
alpha[2]  0.37 0.18  0.19  0.55 1.01   471.84
theta[1]  0.47 0.23  0.03  0.87 1.01   328.66
theta[2]  0.53 0.23  0.13  0.97 1.01   328.66
sigma1    0.45 0.54  0.20  0.74 1.00  1312.14
sigma2    0.28 0.55  0.10  0.46 1.01   465.60

The model with 1 real factor (rei) and 1 dummy (de). The trace plot show the chain doesn’t converge at all parameters. The samples seems to be harder for the

          mean   sd  5.5% 94.5% rhat ess_bulk
bde1     -0.63 0.91 -1.65  1.04 1.01   383.67
bde2     -1.10 0.56 -1.46  0.16 1.03   186.25
brei1    -0.16 0.68 -1.23  1.02 1.02  2648.50
brei2    -0.11 0.38 -0.36  0.40 1.04  1963.85
alpha[1] -0.46 0.55 -1.40  0.22 1.04   104.94
alpha[2]  0.37 0.30  0.14  0.91 1.04   124.00
theta[1]  0.31 0.33  0.01  0.99 1.06    68.26
theta[2]  0.69 0.33  0.01  0.99 1.06    68.26
sigma1    0.72 1.03  0.20  1.92 1.02  1373.89
sigma2    0.52 0.78  0.18  1.38 1.04   226.75

Surprisingly, the model with 2 real factors (rei and de) works really well. The chains mix well and the mixture proportion is estimated correctly. However, if I change these 2 variables to other environmental factors I collected, the model behavior complete back to worse condition as above.

          mean   sd  5.5% 94.5% rhat ess_bulk
bde1      0.01 0.08 -0.11  0.14    1  6700.50
bde2     -0.08 0.04 -0.14 -0.02    1  6704.45
brei1    -0.10 0.07 -0.21  0.01    1  7358.58
brei2    -0.10 0.05 -0.18 -0.03    1  6745.04
alpha[1] -0.53 0.08 -0.65 -0.41    1  4572.48
alpha[2]  1.06 0.04  1.00  1.12    1  8651.24
theta[1]  0.63 0.07  0.52  0.74    1  8214.64
theta[2]  0.37 0.07  0.26  0.48    1  8214.64
sigma1    0.40 0.06  0.32  0.51    1  6006.12
sigma2    0.15 0.03  0.11  0.21    1  4855.24

I couldn’t understand all the strange difference between models. Especially, I don’t understand why dummy variables provide worse estimation than others. I suspect this might be due to the way I generated dummy variables makes them highly correlated and caused a back door effect. Related to other environmental variables I tired, it could be the same resolution problem causing model to fail (Ave_temp and Air in the
dataforforum.RData (1.9 KB)
). Although I felt all the variables are equally bad so I am not sure why combination of depth and rei works.
forum

Currently, my work flow is use a mixture model with no explained variables to confirm the mixture proportion. After this, I ran separate generalized linear model on each growth form to see if environmental factors has different effect on them. I think this is process is not optimal. Any suggestion on this is super appreciated.

Thank you for reading this lengthy try and error process.

Bob_Carpenter · July 30, 2024, 7:08pm

Do you mean when there are covariates informing the mixture component? That’s a not uncommon situation and we see it all the time. The only thing you have to do is replace the intercept only regression

simplex[K] theta;
vector[K] log_theta = log(theta);
...

with a more general regression that now has to be nested within the loop

for (n in 1:N) {
  simplex[K] theta = softmax(X[n] * beta);
...

where `X` is the data matrix and `beta` the regression coefficients.  This assumes you've rolled the intercept into a column of 1 values in `X`.  This can run into its own identifiability issue in that `X[n] * beta` formulated this way is only identified up to an additive constant.  You can fix that by fixing one of the `beta` entries to be zero.  

```stan
parameters {
  vector[N - 1] beta_prefix;
  ...
transformed parameters {
  vector[N] beta = append_row(beta_prefix, 0);
...

Can you explain why you’re subtracting an lcdf from an lpdf?

ChiehTony · July 31, 2024, 5:05am

Hi @Bob_Carpenter,

Thank you for your insights on this problem. I would like to clarify my issue and ensure that I understand each of the points you mentioned.

I want to use the covariates (de, rei in this case) to inform the group means. I attempted this by using ordered intercepts and calculate group means in likelihood section. However, this approach led to the non-identifiability problem I mentioned earlier. My original code below:

parameter{
  ordered[K] alpha;
...
}
model{
...
// likelihood
  for(n in 1:N){
    target += log_sum_exp(log(theta[1]) + 
                              lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
                            log(theta[2]) +
                              lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
  }
}

Regarding your suggestion to use a more general regression.

tried a similar approach by using covariates to imply group means. However, I encountered several issues and would like to clarify a few points with you.

I tried the softmax function you suggested. I understand that it transforms a vector of values to sum to 1. However, the X[n]*beta expression returns a real number, resulting in an error saying ‘Ill-typed arguments supplied to function softmax.’ Could you clarify what you meant?

I have seen others use softmax after declaring a theta vector. Is there a difference between directly declaring a theta simplex and using softmax as in my code?

I tried using the covariates matrix (including a column of 1 to represent the intercept), but I don’t understand how to use your trick of appending a row of 0 in the coefficients, beta. Does N refer to the number of observations or the number of covariates in the model?"

The stan code I have at the moment is below. I try the data matrix and beta (as intercept plus number of covariates). However, I am definitely mis-specify something in the code. Please do not hesitate to point out the errors. Thank you very much.

data {
  int<lower=1> K;  //number of mixture component (clusters) 
  int<lower=1> N;  //      number of observations
  int<lower=1> V;  // intercept plus number of covariates
  vector[N] LW;   //measured leaf width
  matrix[N, V] CO; // variates matrix
}

parameters {
  array[K] vector[V-1] beta_prefix; // Coefficients for intercepts and covariates
  simplex[K] theta; // mixture proportion
  
  real<lower=0> sigma1;  // dispersion parameter
  real<lower=0> sigma2;  // dispersion parameter
}
transformed parameters{
//  simplex[K] theta; // mixture proportion
//  for (n in 1:N){
//    for(i in 1:K){
//    theta = softmax(CO[n, ] * beta[i]);
//    }
//  }
  array[K] vector[V] beta;
  for(i in 1:K){
    beta[i] = append_row(beta_prefix[i], 0); // append a 0 to fix identifiability issue
  }
}

model {
  // priors
  for(i in 1:K){
  target += normal_lpdf(beta[i] | 0, 1);
  }
  
  target += lognormal_lpdf(sigma1 | 0, 2)- 
    normal_lccdf(0 | 0, 1);
  target += lognormal_lpdf(sigma2 | 0, 2)- 
    normal_lccdf(0 | 0, 1);
  
  target += beta_lpdf(theta | 1, 1);

  
  // likelihood
  
  for(n in 1:N){
    target += log_sum_exp(log(theta[1]) + 
                            lognormal_lpdf(LW[n] | CO * beta[1], sigma1),
                          log(theta[2]) +
                            lognormal_lpdf(LW[n] | CO * beta[2], sigma2));
  }
}

Finally, subtracting a lcdf from a lpdf is a suggestion to deal with non-identifiability from chapter 19.1.1 in this book. I am not using it anymore because I try to solve this with ordered intercept method as applied in brms and in this blog post.

Chieh

Bob_Carpenter · July 31, 2024, 3:07pm

[edit: finished after prematurely submitting first version]

Sorry about that—I thought you were using the covariates to inform the mixture component (theta below). Now that I see your original model and the non-identifiability problem:

parameter{
  ordered[K] alpha;
...
}
model{
...
// likelihood
  for(n in 1:N){
    target += log_sum_exp(log(theta[1]) + 
                              lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
                            log(theta[2]) +
                              lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
  }
}

This isn’t using the covariates to inform the mixture component, it’s to define a regression within each mixture component. You’re also tying the parameters bde1 and brei1 across the two mixture components, which can be tricky because then the only thing that varies is the intercept term. Usually for identifiability here, the parameters bde1 and brei1 should be defined so that one of the values is pinned to 0 (the classical approach, which induces an asymmetric prior on the effects) or so that they sum to zero (which allows a symmetric prior, but can be harder to understand).

Here’s more efficient code for the same thing:

parameter{
  ordered[K] alpha;
  real<lower=0, upper=1> theta;
...
}
model{
...
// likelihood
  vector[N] fixed = de * bde1 + rei * brei1;
  vector[N] mu1 = alpha[1] + fixed;
  vector[N] mu2 = alpha[2] + fixed;
  for(n in 1:N){
    target += log_mix(theta, 
                      lognormal_lpdf(LW[n] | mu[1] , sigma1),
                      lognormal_lpdf(LW[n] | mu[2] , sigma2));
  }
}

This is assuming K=2 so that theta is a simplex[2]—if it’s not, then there’s an issue with the model. The way I’ve coded it is more efficient because it uses vector operations rather than a sequence of scalar operations. It’d probably be even more efficient to precompute log(theta) and log1m(theta) and use log_sum_exp, but I find the log_mix approach much more readable and there are savings in autodiff calculating the mixture as a single operation.

ChiehTony · August 1, 2024, 3:45am

Hi @Bob_Carpenter,
Here’s an improved version of your reply to make it clearer and more precise:

Thank you for your suggestions. Here is an update on my approach and the challenges I’m facing:

I am trying to define a regression for each mixture component in a model where the leaf width (or other plant traits) of a population exhibits a dimorphic pattern (inferred by the two mixture components). Each group reacts differently to environmental covariates, which I attempt to model by declaring separate coefficients for each covariate in each mixture component (bde1, bde2, brei1, brei2). The differences in the reactions of each group are expected to be reflected by the differences between bde1 and bde2 or brei1 and brei2.

If my model does not correctly reflect the concept I am trying to test, please point out any discrepancies.

I have adopted your improved code and ran it with cmdstanr:

data {
  int<lower=1> K;  //number of mixture component (clusters) 
  int<lower=1> N;  //      number of observations
  vector[N] LW;   //measured leaf width
  vector[N] de;   // covariate1 to inform regression in each mixture component
  vector[N] rei;  // covariate2 to inform regression in each mixture component
}

parameters {
  positive_ordered[K] alpha;  // ordered intercept to identify
  real<lower = 0, upper = 1> theta; // mixture proportion
  array[K] real bde; // one coefficient for each component
  array[K] real brei; // one coefficient for each component
  
  real<lower=0> sigma1;  // dispersion parameter
  real<lower=0> sigma2;  // dispersion parameter
}
transformed parameters{
}

model {
  // priors
  for(i in 1:K){
  target += normal_lpdf(bde[i] | 0, 1);
  target += normal_lpdf(brei[i] | 0, 1);
  target += normal_lpdf(alpha[i] | 0, 2);
  }
  
  
  target += lognormal_lpdf(sigma1 | 0, 2)- 
    normal_lccdf(0 | 0, 1);
  target += lognormal_lpdf(sigma2 | 0, 2)- 
    normal_lccdf(0 | 0, 1);
  
  target += beta_lpdf(theta | 1, 1);

  
  // likelihood
  array[K] vector[N] fixed;
  for(k in 1:K){
    fixed[k] = bde[k] * de + brei[k] * rei;
  }
  vector[N] mu1 = alpha[1] + fixed[1];
  vector[N] mu2 = alpha[2] + fixed[2];

  
  for(n in 1:N){
    target += log_mix(theta,
                      lognormal_lpdf(LW[n] | mu1[n], sigma1),
                      lognormal_lpdf(LW[n] | mu2[n], sigma2));
  }
}

However, the model still encounters non-identifiability issues when tested with different environmental covariates. Sometimes, the model converges but assigns theta to be 0.5, which is not informative at all.

Could you please explain more about the setting of one coefficient to zero? I think this could be really helpful! Does this mean using a different set of covariates to inform the regression in each mixture component (see stan chunk below)? Any further insights or suggestions would be greatly appreciated. Thank you very much!

parameters{
  real bde;
  real brei
  ...
}
model{
  ...
  // likelihood
  array[K] vector[N] fixed;
  fixed[1] = bde * de 
  fixed[2] = brei * rei;
  
  vector[N] mu1 = alpha[1] + fixed[1];
  vector[N] mu2 = alpha[2] + fixed[2];
  
  for(n in 1:N){
    target += log_mix(theta,
                      lognormal_lpdf(LW[n] | mu1[n], sigma1),
                      lognormal_lpdf(LW[n] | mu2[n], sigma2));
  }
}

Chieh

Bob_Carpenter · August 1, 2024, 8:14pm

For a group of K varying effects, you can pin one of them to zero as follows.

parameters {
  vector[K - 1] alpha_non_zero;
}
transformed parameters {
  vector[K] alpha = append_row(0, alpha_non_zero);
}

If you replace 0 with -sum(alpha_non_zero) you get the (hard) sum-to-zero constraint. There’s a discussion in the User’s Guide around what the latter does to the variance and also some alternatives with better geometry. See:

https://mc-stan.org/docs/stan-users-guide/regression.html#parameterizing-centered-vectors

Topic		Replies	Views
Bayesian mixture model with multiple covariates Modeling	1	581	June 28, 2020
Mixture modeling RStan	1	491	July 2, 2021
Non-identifiability of mixing weights in a 2-component Gaussian model Modeling	5	546	July 20, 2018
Multimodality issues in regression model with mixture prior Modeling techniques , fitting-issues	4	1014	August 29, 2019
Mixture model with unknown k Modeling	7	1458	November 14, 2017

Mixture model with explained variable having non-identifiability issue

Related topics