# Mixture model with explained variable having non-identifiability issue

Dear Stan community,

I am experiencing issues with fitting a mixture regression model with explanatory variables. I believe the problem might be non-identifiability, which becomes more pronounced as the model complexity increases. I’ve tried addressing this by using non-exchangeable priors and pulsive priors, but neither approach has yielded satisfactory results.

There doesn’t seem to be much discussion on mixture models with explanatory variables, so I am posting my problem and a detailed description here. I am relatively new to Stan and some of the mathematical terminology, so please feel free to correct any mistakes. Any suggestions and insights would be greatly appreciated!

Back ground information:
I am trying to fit a mixture regression model on plant functional trait, such as leaf width, biomass, etc. The reason of using mixture regression is because some traits clearly has two or three forms (See histogram below).

We suspect that each form reacts differently to environmental covariates. Therefore, I am trying to fit a mixture regression model composed of two lognormal distributions, each with its own set of coefficients for the environmental covariates.
Here is the data file LW_mix_2var.RData (1.4 KB) his file contains a response variable (LW) and two explanatory variables (de, rei). This data list can be directly fed into the Stan code provided below.

Approaches:
I start from the easiest setting, a model without any explained variable and use order mu to solve the labeling Degeneracy problem. This model provide nice estimation of mu and the clustering proportion is reasonable (around 3:7).

``````data {
int<lower=1> K;  //number of mixture component (clusters)
int<lower=1> N;  //      number of observations
real LW[N];   //measured leaf width
}

parameters {
simplex[K] theta; // mixture proportion
vector<lower=0>[K] sigma;  // dispersion parameter
ordered[K] mu;
}

model {
// priors

vector[K] log_theta = log(theta);  // cache log calculation
sigma ~ lognormal(0, 2);
mu ~ normal(0, 10);

for (n in 1:N) {
vector[K] lps = log_theta;
for (k in 1:K)
lps[k] += normal_lpdf(LW[n] | mu[k], sigma[k]);
target += log_sum_exp(lps);
}
}
``````

However, I found that the ordered mu can’t be easily transform to model with explained variables, at least not with my current skill level. I followed chapter 19 in this book and form the model with 2 explain variables. Here, I use non-exchangeable prior for intercept to break labeling degeneracy:

``````  target += normal_lpdf(alpha | 0, 1);
target += normal_lpdf(gamma | 0, 1) -
normal_lcdf(alpha | 0, 1);
``````

Full stan code is here

``````data {
int<lower=1> K;  //number of mixture component (clusters)
int<lower=1> N;  //      number of observations
vector[N] LW;   //measured leaf width
vector[N] de;
vector[N] rei;
}

parameters {
real bde1;
real bde2;
real brei1;
real brei2;

real alpha;
real<upper = alpha> gamma;

simplex[K] theta; // mixture proportion
real<lower=0> sigma1;  // dispersion parameter
real<lower=0> sigma2;  // dispersion parameter
}

model {
// priors

target += normal_lpdf(bde1 | 0, 1);
target += normal_lpdf(bde2 | 0, 1);
target += normal_lpdf(brei1 | 0, 1);
target += normal_lpdf(brei2 | 0, 1);

target += normal_lpdf(alpha | 0, 1);
target += normal_lpdf(gamma | 0, 1) -
normal_lcdf(alpha | 0, 1);

target += lognormal_lpdf(sigma1 | 0, 1) -
normal_lccdf(0 | 0, 1);
target += lognormal_lpdf(sigma2 | 0, 1) -
normal_lccdf(0 | 0, 1);

target += beta_lpdf(theta | 1, 1);

// likelihood

for(n in 1:N){
target += log_sum_exp(log(theta[1]) +
lognormal_lpdf(LW[n] | alpha + de[n] * bde1 + rei[n] * brei1, sigma1),
log(theta[2]) +
lognormal_lpdf(LW[n] | gamma + de[n] * bde2 + rei[n] * brei2, sigma2));
}
}

``````

This model produced similar clustering proportions, and estimation as previous model (only contain respond variable). Unfortunately, the labeling problem was not resolved by the prior, leading to poor chain behavior, terrible Rhat values, and low effective sample sizes.

``````                mean         sd        5.5%      94.5%     n_eff     Rhat4
bde1     -0.50078036 0.36323954 -0.89303802 -0.0910946  2.024952  9.140970
bde2      0.56871652 0.37493290  0.07015436  1.1625266  2.025801  8.910180
brei1    -0.04276832 0.08592037 -0.16398533  0.1223121  2.698703  1.941315
brei2     0.02580702 0.11141669 -0.08710628  0.2280988  2.162180  3.529945
alpha     0.56568359 0.40147285 -0.09748634  1.0737749  2.013091 13.228487
gamma    -0.35791068 0.14097606 -0.61514816 -0.2268431  2.107936  4.248952
theta[1]  0.37051511 0.19125896  0.20693312  0.7235836  2.043524  6.719915
theta[2]  0.62948489 0.19125896  0.27641641  0.7930669  2.043524  6.719915
sigma1    0.29554891 0.25886354  0.12517962  0.7719165  2.016440 11.399809
``````

I followed the guidance in this Mixture Models post and add the repulsive prior. This additional step did help the chain behavior but it significantly affected the estimation of clustering proportion and coefficients.
Full stan code with repulsive model

``````functions {
real potential(real x, real y) {
return (1 - exp(-squared_distance(x, y)));
}
}

data {
int<lower=1> K;  //number of mixture component (clusters)
int<lower=1> N;  //      number of observations
vector[N] LW;   //measured leaf width
vector[N] de;
vector[N] rei;
}

parameters {
real bde1;
real bde2;
real brei1;
real brei2;

real alpha;
real<upper = alpha> gamma;

simplex[K] theta; // mixture proportion
real<lower=0> sigma1;  // dispersion parameter
real<lower=0> sigma2;  // dispersion parameter

positive_ordered[K] p;
}

model {
// priors
target += normal_lpdf(p | 0, 10);

target += normal_lpdf(bde1 | 0, 1);
target += normal_lpdf(bde2 | 0, 1);
target += normal_lpdf(brei1 | 0, 1);
target += normal_lpdf(brei2 | 0, 1);

//target += normal_lpdf(alpha | 0, 1);
//target += normal_lpdf(gamma | 0, 1) -
// normal_lcdf(alpha | 0, 1);
target += normal_lpdf(alpha | 0, 1);
target += normal_lpdf(gamma | 0, 1);

target += lognormal_lpdf(sigma1 | 0, 1) -
normal_lccdf(0 | 0, 1);
target += lognormal_lpdf(sigma2 | 0, 1) -
normal_lccdf(0 | 0, 1);

target += beta_lpdf(theta | 1, 1);

// likelihood
vector[K] ps;
for(n in 1:N){
ps[1] = log(theta[1]) + lognormal_lpdf(LW[n] | alpha + de[n] * bde1 + rei[n] * brei1, sigma1);
ps[2] = log(theta[2]) + lognormal_lpdf(LW[n] | gamma + de[n] * bde2 + rei[n] * brei2, sigma2);
}
target += log_sum_exp(ps + log(potential(p[1], p[2])));
}

``````

and the result of this version

``````                  mean        sd        5.5%      94.5%    n_eff     Rhat4
bde1      0.0896689435 0.9367895 -1.40200595  1.6043320 1884.749 0.9986699
bde2      0.0007346021 0.9309193 -1.49240790  1.4591302 2025.558 0.9993420
brei1     0.1362264287 0.9445447 -1.40160450  1.6530227 1519.362 1.0001007
brei2    -0.0668727475 0.9648211 -1.63400785  1.4601342 1937.525 0.9988794
alpha     0.4534608600 0.7597606 -0.68611967  1.7174675 1195.345 1.0008294
gamma    -0.5860839531 0.7354700 -1.78427745  0.5600027 1990.181 0.9996545
theta[1]  0.4896182546 0.2815705  0.05829443  0.9371015 1660.950 0.9986631
theta[2]  0.5103817460 0.2815705  0.06289817  0.9417057 1660.950 0.9986631
sigma1    1.4061470344 1.8500527  0.18286187  4.2627391 1488.600 1.0031409
sigma2    1.3035895523 1.4137183  0.19231260  3.6827207 1328.768 1.0020604
p[1]      4.6711393356 3.7555106  0.36324149 11.7720050 2700.857 0.9982377
p[2]     11.9647803050 5.8779740  3.95653555 21.9628990 2526.463 0.9991870
``````

Thank you for your time and help! Any advice on improving my approach or resolving the non-identifiability issue would be greatly appreciated.

Best regards,
Chieh

Hello @ChiehTony. Your example is interesting as the two mixture components are clearly identifiable “by eye”. So I took a look at the explanatory variables, `de` and `rei`. It seems they may not contain much information to discriminate between the locations of the components. I’ve plotted `de` and `rei` against the outcome `LW` to illustrate.

Aside: There is additional structure to the data as there are only 29 unique combinations of `(de, rei)` values in a dataset of 364 observations.

So I suspect the explanatory variables lead to the poor sampling in approaches #2 and #3. The priors may suggests that one component is centered at 0.5 and the other at 3 but the data indicates we have little idea where the components are located based on `de` and `rei`.

You can test your code first by adding a fake predictor `X` that discriminates between the clusters. Once you have that working, replace `X` with `de` and `rei`.

1 Like

Hello @desislava,
Thank you so much for your reply. Sorry for the late response, I was in a conference and just made a some tests according to your suggestions. Please correct me if I mis understood any part.

Here are the three things I tried:

1. Use the site average instead of each individual measurement to make reduce the problem with too few unique combination of environmental factors.

2. Create dummy variables that have 2 clusters according to the visually identified growth form (abb) to discriminate the Leaf width

``````for(i in 1:nrow(HU_GLM)){
if(HU_GLM\$abb[i] == "HUN"){
HU_GLM\$Dummy[i] = rnorm(1, mean = 0.5, sd = 0.2)
HU_GLM\$Dummy2[i] = rnorm(1, mean = 0.7, sd = 0.2)
} else{
HU_GLM\$Dummy[i] = rnorm(1, mean = -0.5, sd = 0.2)
HU_GLM\$Dummy2[i] = rnorm(1, mean = -0.7, sd = 0.2)
}
}
``````
1. Re-run the model with 2 dummy, 1 real factor + 1 dummy, or 2 real factors with the stan code below. I just replace the de or rei according with the dummy variables created above and this model use ordered intercept. Model runs with cmdstanr with 30000 warm up and 2000 samples.
``````data {
int<lower=1> K;  //number of mixture component (clusters)
int<lower=1> N;  //      number of observations
vector[N] LW;   //measured leaf width
vector[N] de;
vector[N] rei;
}

parameters {
real bde1;
real bde2;
real brei1;
real brei2;

//real alpha;
//real<upper = alpha> gamma;
ordered[K] alpha;

simplex[K] theta; // mixture proportion
real<lower=0> sigma1;  // dispersion parameter
real<lower=0> sigma2;  // dispersion parameter
}

model {
// priors

target += normal_lpdf(bde1 | 0, 1);
target += normal_lpdf(bde2 | 0, 1);
target += normal_lpdf(brei1 | 0, 1);
target += normal_lpdf(brei2 | 0, 1);

target += normal_lpdf(alpha | 0, 1);
//target += normal_lpdf(gamma | 0, 1) -
//  normal_lcdf(alpha | 0, 1);

target += lognormal_lpdf(sigma1 | 0, 1) -
normal_lccdf(0 | 0, 1);
target += lognormal_lpdf(sigma2 | 0, 1) -
normal_lccdf(0 | 0, 1);

target += beta_lpdf(theta | 1, 1);

// likelihood

for(n in 1:N){
target += log_sum_exp(log(theta[1]) +
lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
log(theta[2]) +
lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
}
}
``````

The model with two dummy variables (de and rei) has a funny behavior. The MCMC trace plots have a lots of pointed peak. The mixture proportion (theta) are close to 0.5 and 0.5.

``````          mean   sd  5.5% 94.5% rhat ess_bulk
bde1     -0.62 0.55 -1.37  0.26 1.00  1085.04
bde2     -0.51 0.42 -1.23  0.00 1.00  1628.43
brei1    -0.74 0.49 -1.30  0.08 1.00  1377.25
brei2    -0.54 0.36 -1.02  0.04 1.01  1140.95
alpha[1] -0.03 0.34 -0.73  0.24 1.00   471.93
alpha[2]  0.37 0.18  0.19  0.55 1.01   471.84
theta[1]  0.47 0.23  0.03  0.87 1.01   328.66
theta[2]  0.53 0.23  0.13  0.97 1.01   328.66
sigma1    0.45 0.54  0.20  0.74 1.00  1312.14
sigma2    0.28 0.55  0.10  0.46 1.01   465.60
``````

The model with 1 real factor (rei) and 1 dummy (de). The trace plot show the chain doesn’t converge at all parameters. The samples seems to be harder for the

``````          mean   sd  5.5% 94.5% rhat ess_bulk
bde1     -0.63 0.91 -1.65  1.04 1.01   383.67
bde2     -1.10 0.56 -1.46  0.16 1.03   186.25
brei1    -0.16 0.68 -1.23  1.02 1.02  2648.50
brei2    -0.11 0.38 -0.36  0.40 1.04  1963.85
alpha[1] -0.46 0.55 -1.40  0.22 1.04   104.94
alpha[2]  0.37 0.30  0.14  0.91 1.04   124.00
theta[1]  0.31 0.33  0.01  0.99 1.06    68.26
theta[2]  0.69 0.33  0.01  0.99 1.06    68.26
sigma1    0.72 1.03  0.20  1.92 1.02  1373.89
sigma2    0.52 0.78  0.18  1.38 1.04   226.75
``````

Surprisingly, the model with 2 real factors (rei and de) works really well. The chains mix well and the mixture proportion is estimated correctly. However, if I change these 2 variables to other environmental factors I collected, the model behavior complete back to worse condition as above.

``````          mean   sd  5.5% 94.5% rhat ess_bulk
bde1      0.01 0.08 -0.11  0.14    1  6700.50
bde2     -0.08 0.04 -0.14 -0.02    1  6704.45
brei1    -0.10 0.07 -0.21  0.01    1  7358.58
brei2    -0.10 0.05 -0.18 -0.03    1  6745.04
alpha[1] -0.53 0.08 -0.65 -0.41    1  4572.48
alpha[2]  1.06 0.04  1.00  1.12    1  8651.24
theta[1]  0.63 0.07  0.52  0.74    1  8214.64
theta[2]  0.37 0.07  0.26  0.48    1  8214.64
sigma1    0.40 0.06  0.32  0.51    1  6006.12
sigma2    0.15 0.03  0.11  0.21    1  4855.24
``````

I couldn’t understand all the strange difference between models. Especially, I don’t understand why dummy variables provide worse estimation than others. I suspect this might be due to the way I generated dummy variables makes them highly correlated and caused a back door effect. Related to other environmental variables I tired, it could be the same resolution problem causing model to fail (Ave_temp and Air in the
dataforforum.RData (1.9 KB)
). Although I felt all the variables are equally bad so I am not sure why combination of depth and rei works.

Currently, my work flow is use a mixture model with no explained variables to confirm the mixture proportion. After this, I ran separate generalized linear model on each growth form to see if environmental factors has different effect on them. I think this is process is not optimal. Any suggestion on this is super appreciated.

Thank you for reading this lengthy try and error process.

Do you mean when there are covariates informing the mixture component? That’s a not uncommon situation and we see it all the time. The only thing you have to do is replace the intercept only regression

``````simplex[K] theta;
vector[K] log_theta = log(theta);
...
``````

with a more general regression that now has to be nested within the loop

``````for (n in 1:N) {
simplex[K] theta = softmax(X[n] * beta);
...

where `X` is the data matrix and `beta` the regression coefficients.  This assumes you've rolled the intercept into a column of 1 values in `X`.  This can run into its own identifiability issue in that `X[n] * beta` formulated this way is only identified up to an additive constant.  You can fix that by fixing one of the `beta` entries to be zero.

```stan
parameters {
vector[N - 1] beta_prefix;
...
transformed parameters {
vector[N] beta = append_row(beta_prefix, 0);
...
``````

Can you explain why you’re subtracting an lcdf from an lpdf?

Thank you for your insights on this problem. I would like to clarify my issue and ensure that I understand each of the points you mentioned.

I want to use the covariates (de, rei in this case) to inform the group means. I attempted this by using ordered intercepts and calculate group means in likelihood section. However, this approach led to the non-identifiability problem I mentioned earlier. My original code below:

``````parameter{
ordered[K] alpha;
...
}
model{
...
// likelihood
for(n in 1:N){
target += log_sum_exp(log(theta[1]) +
lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
log(theta[2]) +
lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
}
}

``````

Regarding your suggestion to use a more general regression.

tried a similar approach by using covariates to imply group means. However, I encountered several issues and would like to clarify a few points with you.

1. I tried the softmax function you suggested. I understand that it transforms a vector of values to sum to 1. However, the X[n]*beta expression returns a real number, resulting in an error saying ‘Ill-typed arguments supplied to function softmax.’ Could you clarify what you meant?

I have seen others use softmax after declaring a theta vector. Is there a difference between directly declaring a theta simplex and using softmax as in my code?

1. I tried using the covariates matrix (including a column of 1 to represent the intercept), but I don’t understand how to use your trick of appending a row of 0 in the coefficients, beta. Does N refer to the number of observations or the number of covariates in the model?"

The stan code I have at the moment is below. I try the data matrix and beta (as intercept plus number of covariates). However, I am definitely mis-specify something in the code. Please do not hesitate to point out the errors. Thank you very much.

``````data {
int<lower=1> K;  //number of mixture component (clusters)
int<lower=1> N;  //      number of observations
int<lower=1> V;  // intercept plus number of covariates
vector[N] LW;   //measured leaf width
matrix[N, V] CO; // variates matrix
}

parameters {
array[K] vector[V-1] beta_prefix; // Coefficients for intercepts and covariates
simplex[K] theta; // mixture proportion

real<lower=0> sigma1;  // dispersion parameter
real<lower=0> sigma2;  // dispersion parameter
}
transformed parameters{
//  simplex[K] theta; // mixture proportion
//  for (n in 1:N){
//    for(i in 1:K){
//    theta = softmax(CO[n, ] * beta[i]);
//    }
//  }
array[K] vector[V] beta;
for(i in 1:K){
beta[i] = append_row(beta_prefix[i], 0); // append a 0 to fix identifiability issue
}
}

model {
// priors
for(i in 1:K){
target += normal_lpdf(beta[i] | 0, 1);
}

target += lognormal_lpdf(sigma1 | 0, 2)-
normal_lccdf(0 | 0, 1);
target += lognormal_lpdf(sigma2 | 0, 2)-
normal_lccdf(0 | 0, 1);

target += beta_lpdf(theta | 1, 1);

// likelihood

for(n in 1:N){
target += log_sum_exp(log(theta[1]) +
lognormal_lpdf(LW[n] | CO * beta[1], sigma1),
log(theta[2]) +
lognormal_lpdf(LW[n] | CO * beta[2], sigma2));
}
}
``````

Finally, subtracting a lcdf from a lpdf is a suggestion to deal with non-identifiability from chapter 19.1.1 in this book. I am not using it anymore because I try to solve this with ordered intercept method as applied in brms and in this blog post.

Chieh

[edit: finished after prematurely submitting first version]

Sorry about that—I thought you were using the covariates to inform the mixture component (`theta` below). Now that I see your original model and the non-identifiability problem:

``````parameter{
ordered[K] alpha;
...
}
model{
...
// likelihood
for(n in 1:N){
target += log_sum_exp(log(theta[1]) +
lognormal_lpdf(LW[n] | alpha[1] + de[n] * bde1 + rei[n] * brei1 , sigma1),
log(theta[2]) +
lognormal_lpdf(LW[n] | alpha[2] + de[n] * bde2 + rei[n] * brei2 , sigma2));
}
}
``````

This isn’t using the covariates to inform the mixture component, it’s to define a regression within each mixture component. You’re also tying the parameters `bde1` and `brei1` across the two mixture components, which can be tricky because then the only thing that varies is the intercept term. Usually for identifiability here, the parameters `bde1` and `brei1` should be defined so that one of the values is pinned to 0 (the classical approach, which induces an asymmetric prior on the effects) or so that they sum to zero (which allows a symmetric prior, but can be harder to understand).

Here’s more efficient code for the same thing:

``````parameter{
ordered[K] alpha;
real<lower=0, upper=1> theta;
...
}
model{
...
// likelihood
vector[N] fixed = de * bde1 + rei * brei1;
vector[N] mu1 = alpha[1] + fixed;
vector[N] mu2 = alpha[2] + fixed;
for(n in 1:N){
target += log_mix(theta,
lognormal_lpdf(LW[n] | mu[1] , sigma1),
lognormal_lpdf(LW[n] | mu[2] , sigma2));
}
}
``````

This is assuming `K=2` so that `theta` is a `simplex[2]`—if it’s not, then there’s an issue with the model. The way I’ve coded it is more efficient because it uses vector operations rather than a sequence of scalar operations. It’d probably be even more efficient to precompute `log(theta)` and `log1m(theta)` and use `log_sum_exp`, but I find the `log_mix` approach much more readable and there are savings in autodiff calculating the mixture as a single operation.

Hi @Bob_Carpenter,
Here’s an improved version of your reply to make it clearer and more precise:

Thank you for your suggestions. Here is an update on my approach and the challenges I’m facing:

I am trying to define a regression for each mixture component in a model where the leaf width (or other plant traits) of a population exhibits a dimorphic pattern (inferred by the two mixture components). Each group reacts differently to environmental covariates, which I attempt to model by declaring separate coefficients for each covariate in each mixture component (`bde1`, `bde2`, `brei1`, `brei2`). The differences in the reactions of each group are expected to be reflected by the differences between `bde1` and `bde2` or `brei1` and `brei2`.

If my model does not correctly reflect the concept I am trying to test, please point out any discrepancies.

I have adopted your improved code and ran it with `cmdstanr`:

``````data {
int<lower=1> K;  //number of mixture component (clusters)
int<lower=1> N;  //      number of observations
vector[N] LW;   //measured leaf width
vector[N] de;   // covariate1 to inform regression in each mixture component
vector[N] rei;  // covariate2 to inform regression in each mixture component
}

parameters {
positive_ordered[K] alpha;  // ordered intercept to identify
real<lower = 0, upper = 1> theta; // mixture proportion
array[K] real bde; // one coefficient for each component
array[K] real brei; // one coefficient for each component

real<lower=0> sigma1;  // dispersion parameter
real<lower=0> sigma2;  // dispersion parameter
}
transformed parameters{
}

model {
// priors
for(i in 1:K){
target += normal_lpdf(bde[i] | 0, 1);
target += normal_lpdf(brei[i] | 0, 1);
target += normal_lpdf(alpha[i] | 0, 2);
}

target += lognormal_lpdf(sigma1 | 0, 2)-
normal_lccdf(0 | 0, 1);
target += lognormal_lpdf(sigma2 | 0, 2)-
normal_lccdf(0 | 0, 1);

target += beta_lpdf(theta | 1, 1);

// likelihood
array[K] vector[N] fixed;
for(k in 1:K){
fixed[k] = bde[k] * de + brei[k] * rei;
}
vector[N] mu1 = alpha[1] + fixed[1];
vector[N] mu2 = alpha[2] + fixed[2];

for(n in 1:N){
target += log_mix(theta,
lognormal_lpdf(LW[n] | mu1[n], sigma1),
lognormal_lpdf(LW[n] | mu2[n], sigma2));
}
}
``````

However, the model still encounters non-identifiability issues when tested with different environmental covariates. Sometimes, the model converges but assigns `theta` to be 0.5, which is not informative at all.

Could you please explain more about the setting of one coefficient to zero? I think this could be really helpful! Does this mean using a different set of covariates to inform the regression in each mixture component (see stan chunk below)? Any further insights or suggestions would be greatly appreciated. Thank you very much!

``````parameters{
real bde;
real brei
...
}
model{
...
// likelihood
array[K] vector[N] fixed;
fixed[1] = bde * de
fixed[2] = brei * rei;

vector[N] mu1 = alpha[1] + fixed[1];
vector[N] mu2 = alpha[2] + fixed[2];

for(n in 1:N){
target += log_mix(theta,
lognormal_lpdf(LW[n] | mu1[n], sigma1),
lognormal_lpdf(LW[n] | mu2[n], sigma2));
}
}
``````

Chieh

For a group of `K` varying effects, you can pin one of them to zero as follows.

``````parameters {
vector[K - 1] alpha_non_zero;
}
transformed parameters {
vector[K] alpha = append_row(0, alpha_non_zero);
}
``````

If you replace `0` with `-sum(alpha_non_zero)` you get the (hard) sum-to-zero constraint. There’s a discussion in the User’s Guide around what the latter does to the variance and also some alternatives with better geometry. See:

https://mc-stan.org/docs/stan-users-guide/regression.html#parameterizing-centered-vectors