Constraining the cluster probabilities in mixture models

Hi Everyone,

I have developed a hierarchical Beta bivariate mixture model with gaussian copula. And I need it to constrain it so that the only possible values are in between .01, and .99. However, it does not work. Can someone sense why?

Model below.

functions {
  real gaussian_copula_density(real u_x, real u_y, real rho) {
    real constrained_u_x = fmin(fmax(u_x, 1e-10), 1 - 1e-10);
    real constrained_u_y = fmin(fmax(u_y, 1e-10), 1 - 1e-10);
    real constrained_rho = fmin(fmax(rho, -0.999), 0.999);

    real rho_sq = square(constrained_rho);
    real one_m_rho_sq = 1 - rho_sq;

    real z_x = inv_Phi(constrained_u_x);
    real z_y = inv_Phi(constrained_u_y);

    real log_term1 = -log(2 * pi()) - 0.5 * log1m(rho_sq);
    real log_term2 = -0.5 * (square(z_x) + square(z_y) - 2 * constrained_rho * z_x * z_y) / one_m_rho_sq;

    real copula_density = exp(log_term1 + log_term2);
    return fmax(copula_density, 1e-12);
  }
}

data {
  int<lower=1> K;
  int<lower=1> N;
  int<lower=1> I;
  array[I, N] real<lower=0> y;
  array[I, N] real<lower=0> x;
  array[I, K] real<lower=0> alphax;
  array[I, K] real<lower=0> betax;
  array[I, K] real<lower=0> alphay;
  array[I, K] real<lower=0> betay;
  array[I] row_vector<lower=0.04>[K] theta;
  array[I, K] real<lower=-1, upper=1> rho;
}

transformed data {
  real w_min = 0.01;
  real w_max = 1 - (K-1) * w_min;
}

parameters {
  row_stochastic_matrix[I, K] v;
}

transformed parameters {
  row_stochastic_matrix[I, K] w = w_min + w_max * v;
}

model {
  for (i in 1:I) {
    v[i] ~ dirichlet(theta[i]);
  }

  for (i in 1:I) {
    for (n in 1:N) {
      row_vector[K] log_w = log(w[i]);
      row_vector[K] lps = log_w;

      for (k in 1:K) {
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real copula_density = gaussian_copula_density(u_x, u_y, rho[i, k]);

        lps[k] += beta_x + beta_y + log(copula_density);
      }
      target += log_sum_exp(lps);
    }
  }
}

generated quantities {
  array[I, N] real x_rep;
  array[I, N] real y_rep;
  array[I, N] real log_lik;
  array[I, N, K] real w_prob;

  for (i in 1:I) {
    for (n in 1:N) {
      vector[K] ps;
      vector[K] lps;

      for (k in 1:K) {
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real copula_density = gaussian_copula_density(u_x, u_y, rho[i, k]);

        lps[k] = log(w[i, k]) + beta_x + beta_y + log(copula_density);
        ps[k] = w[i, k] * exp(beta_x + beta_y) * copula_density;
      }

      ps = ps / sum(ps);
      log_lik[i, n] = log_sum_exp(lps);

      for (k in 1:K) {
        w_prob[i, n, k] = ps[k];
      }

      int comp = categorical_rng(ps);
      x_rep[i, n] = beta_rng(alphax[i, comp], betax[i, comp]);
      y_rep[i, n] = beta_rng(alphay[i, comp], betay[i, comp]);
    }
  }
}

I didn’t have time to study your code carefully, and I’ve never tried to program a copula, but I think the problem is here:

 real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
 real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);      

beta_lpdf() requires that x[i,n] and y[i,n] be in [0,1], but you’ve defined x and y as

  array[I, N] real<lower=0> y;
  array[I, N] real<lower=0> x;

Both have a lower bound of 0, but no upper bound. Changing those to

  array[I, N] real<lower=0, upper=1> y;
  array[I, N] real<lower=0, upper=1> x;

might fix it. I emphasize “might” because I don’t know what impact those bounds might have on other parts of your model.

It really helps us if you can indicate how a Stan program fails. I verified that your transform will preserve a stochastic matrix, so that’s not the problem. Maybe you can fix the issue found by @kholsinger and report back on what problems remain.

You’ll find inv_Phi is not very efficient or very stable.

fmax(copula_density, 1e-12) will break derivatives when copula_density < 1e-12, which can cause the sampler to devolve to a random walk.

I would remove all the conditioning you are doing of everything until you are sure that’s the problem, e.g.,

fmax(copula_density, 1e-12)
...
real constrained_u_x = fmin(fmax(u_x, 1e-10), 1 - 1e-10);
real constrained_u_y = fmin(fmax(u_y, 1e-10), 1 - 1e-10);
real constrained_rho = fmin(fmax(rho, -0.999), 0.999);

This kind of patching breaks derivatives in all but the second case. This one’s OK for derivatives,

row_stochastic_matrix[I, K] w = w_min + w_max * v;

but it can run afoul of the problem that the data’s consistent with values outside of your constrained range, so probability can bunch up at the boundaries and cause computational problems.

Hi both (@kholsinger, @Bob_Carpenter),

I have made the changes you have suggested, or at least I think so (code below).

I do not know how to substitute inv_phi.

The code fails to constrain the w to 0.1 and 0.99.

range(summarydf09$w_prob)
[1] 3.85487e-198  1.00000e+00

I am not an expert in statistics so this code is the best I can do in combination with LLMs and this forum’s older posts.

I cannot change the values I receive for alpha and beta.

functions {
  real gaussian_copula_density(real u_x, real u_y, real rho) {
    // Numerical stabilization parameters
    real epsilon = 1e-12;
    
    // Safely clamp u values to (epsilon, 1-epsilon) with gradient preservation
    real clamped_u_x = u_x * (1 - 2*epsilon) + epsilon;
    real clamped_u_y = u_y * (1 - 2*epsilon) + epsilon;
    
    // Stable inverse normal approximation with bounded outputs
    real z_x = inv_Phi(clamped_u_x);
    real z_y = inv_Phi(clamped_u_y);

    // Quadratic form computation with stabilized denominator
    real rho_sq = square(rho);
    real inv_one_m_rho_sq = 1 - rho_sq;
    
    return -log(2 * pi()) 
           - 0.5 * log1m(rho_sq)
           - 0.5 * inv_one_m_rho_sq * (z_x*z_x + z_y*z_y - 2*rho*z_x*z_y);
  }
}

data {
  int<lower=1> K;
  int<lower=1> N;
  int<lower=1> I;
  
  array[I, N] real<lower=0, upper=1> x;
  array[I, N] real<lower=0, upper=1> y;
  
  array[I, K] real<lower=0> alphax;
  array[I, K] real<lower=0> betax;
  array[I, K] real<lower=0> alphay;
  array[I, K] real<lower=0> betay;
  
  array[I] row_vector<lower=0.04>[K] theta;
  array[I, K] real<lower=-0.99, upper=0.99> rho;
}

transformed data {
  real w_min = 0.01;
  real w_max = 1 - K * w_min;  // Corrected weight calculation
}

parameters {
  row_stochastic_matrix[I, K] v;
}

transformed parameters {
  row_stochastic_matrix[I, K] w = w_min + w_max * v;
}

model {
  // Prior for mixture weights
  for (i in 1:I) {
    v[i] ~ dirichlet(theta[i]);
  }

  // Likelihood calculation
  for (i in 1:I) {
    for (n in 1:N) {
      row_vector[K] lps = log(w[i]);
      
      for (k in 1:K) {
        // Beta component log-probs
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        
        // Copula term
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real log_copula = gaussian_copula_density(u_x, u_y , rho[i, k]);
        
        lps[k] += beta_x + beta_y + log_copula;
      }
      target += log_sum_exp(lps);
    }
  }
}

generated quantities {
  array[I, N] real x_rep;
  array[I, N] real y_rep;
  array[I, N] real log_lik;
  array[I, N, K] real w_prob;

  for (i in 1:I) {
    for (n in 1:N) {
      vector[K] lps;
      vector[K] ps;

      for (k in 1:K) {
        // Component probabilities
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real log_copula = gaussian_copula_density(u_x, u_y , rho[i, k]);
        
        lps[k] = log(w[i, k]) + beta_x + beta_y + log_copula;
      }
      
      // Normalize probabilities
      ps = softmax(lps);
      log_lik[i, n] = log_sum_exp(lps);
      w_prob[i, n] = to_array_1d(ps);
      
      // Generate posterior predictions
      int comp = categorical_rng(ps);
      x_rep[i, n] = beta_rng(alphax[i, comp], betax[i, comp]);
      y_rep[i, n] = beta_rng(alphay[i, comp], betay[i, comp]);
    }
  }
}


I didn’t understand why you were using this:

real w_max = 1 - K * w_min;

rather than

real w_max = 0.99;

If you use w_min = 0.01 and w_max = 0.99, in this:

transformed parameters {
row_stochastic_matrix[I, K] w = w_min + w_max * v;
}

it will force all the entires in the matrix to fall between 0.01 and 0.99 because the values of v fall between 0 and 1. If you have an example where you do this and you think it’s not constraining properly, please share.

Usually Stan doesn’t need the kind of numerical stabilization you are trying to provide if priors are reasonable. The clamping you are doing on u_x and u_y are gong to be challenging as it’s using a very all range near machine precision, meaning that there will be what’s known as “catastrophic cancellation” when performing arithmetic with other numbers. You lose all but four or five digits of precision in 1 - 2 * epsilon, for example, because 2 * epsilon is very close to 0, which means it doesn’t play nicely when added.

In the gaussian_copula_density you can drop the constant term. I think that should be -log(sqrt(2 * pi()), or equivalently, -0.5 * log(2 * pi()) (you can pull the multiplication by -0.5 out, but that’s not going to be a bottleneck).

P.S. This code looks reasonable, but if you’re not an expert in stats, a multivariate mixture and a copula is a rough place to start. Mixtures can be very difficult to fit because of the inherent non-identifiability of the mixture component IDs.

1 Like

So, I have changed epsilon, and replaced w_max with 0.96. However, the results are still not constrained. I tried with and without copula but it still does not constrain it properly. As a disclaimer, I am trying a novel approach to cognitive modelling were I use participants’ elicited beliefs as participant-level prior, so I cannot change the parameters of the distributions. I am only interested in replicating their w. Since when I elicit them they have to be within 0.01 and 0.96 (I fixed the number of clusters) I would like my model to do the same.

functions {
  real gaussian_copula_density(real u_x, real u_y, real rho) {
    // Numerical stabilization parameters
    real epsilon = 0.01;
    
    // Safely clamp u values to (epsilon, 1-epsilon) with gradient preservation
    real clamped_u_x = u_x * (1 - 2*epsilon) + epsilon;
    real clamped_u_y = u_y * (1 - 2*epsilon) + epsilon;
    
    // Stable inverse normal approximation with bounded outputs
    real z_x = inv_Phi(clamped_u_x);
    real z_y = inv_Phi(clamped_u_y);

    // Quadratic form computation with stabilized denominator
    real rho_sq = square(rho);
    real inv_one_m_rho_sq = 1 - rho_sq;
    
    return -0.5 * log(2 * pi())
           - 0.5 * log1m(rho_sq)
           - 0.5 * inv_one_m_rho_sq * (z_x*z_x + z_y*z_y - 2*rho*z_x*z_y);
  }
}

data {
  int<lower=1> K;
  int<lower=1> N;
  int<lower=1> I;
  
  array[I, N] real<lower=0, upper=1> x;
  array[I, N] real<lower=0, upper=1> y;
  
  array[I, K] real<lower=0> alphax;
  array[I, K] real<lower=0> betax;
  array[I, K] real<lower=0> alphay;
  array[I, K] real<lower=0> betay;
  
  array[I] row_vector<lower=0.04>[K] theta;
  array[I, K] real<lower=-0.99, upper=0.99> rho;
}

transformed data {
  real w_min = 0.01;
  real w_max = 0.96;  // Corrected weight calculation
}

parameters {
  row_stochastic_matrix[I, K] v;
}

transformed parameters {
  row_stochastic_matrix[I, K] w = w_min + w_max * v;
}

model {
  // Prior for mixture weights
  for (i in 1:I) {
    v[i] ~ dirichlet(theta[i]);
  }

  // Likelihood calculation
  for (i in 1:I) {
    for (n in 1:N) {
      row_vector[K] lps = log(w[i]);
      
      for (k in 1:K) {
        // Beta component log-probs
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        
        // Copula term
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real log_copula = gaussian_copula_density(u_x, u_y , rho[i, k]);
        
        lps[k] += beta_x + beta_y + log_copula;
      }
      target += log_sum_exp(lps);
    }
  }
}

generated quantities {
  array[I, N] real x_rep;
  array[I, N] real y_rep;
  array[I, N] real log_lik;
  array[I, N, K] real w_prob;

  for (i in 1:I) {
    for (n in 1:N) {
      vector[K] lps;
      vector[K] ps;

      for (k in 1:K) {
        // Component probabilities
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real log_copula = gaussian_copula_density(u_x, u_y , rho[i, k]);
        
        lps[k] = log(w[i, k]) + beta_x + beta_y + log_copula;
      }
      
      // Normalize probabilities
      ps = softmax(lps);
      log_lik[i, n] = log_sum_exp(lps);
      w_prob[i, n] = to_array_1d(ps);
      
      // Generate posterior predictions
      int comp = categorical_rng(ps);
      x_rep[i, n] = beta_rng(alphax[i, comp], betax[i, comp]);
      y_rep[i, n] = beta_rng(alphay[i, comp], betay[i, comp]);
    }
  }
}



I Fixed it, I was constraining w but I had to constrain ps. All fixed.

functions {
  vector bounded_softmax(vector logits, real ps_min, real ps_max) {
    int K = num_elements(logits);
    vector[K] ps = softmax(logits);
    
    // Linearly rescale to [ps_min, ps_max] while preserving relative probabilities
    real min_ps = min(ps);
    real max_ps = max(ps);
    if (max_ps - min_ps > 0) {
      ps = ps_min + (ps_max - ps_min) * (ps - min_ps) / (max_ps - min_ps);
    }
    ps = ps / sum(ps);  // Renormalize to sum to 1
    return ps;
  }

  real gaussian_copula_density(real u_x, real u_y, real rho) {
    // Numerical stabilization parameters
    real epsilon = 0.01;
    
    // Safely clamp u values to (epsilon, 1-epsilon) with gradient preservation
    real clamped_u_x = u_x * (1 - 2*epsilon) + epsilon;
    real clamped_u_y = u_y * (1 - 2*epsilon) + epsilon;
    
    // Stable inverse normal approximation with bounded outputs
    real z_x = inv_Phi(clamped_u_x);
    real z_y = inv_Phi(clamped_u_y);

    // Quadratic form computation with stabilized denominator
    real rho_sq = square(rho);
    real inv_one_m_rho_sq = 1 - rho_sq;
    
    return -0.5 * log(2 * pi())
           - 0.5 * log1m(rho_sq)
           - 0.5 * inv_one_m_rho_sq * (z_x*z_x + z_y*z_y - 2*rho*z_x*z_y);
  }
}

data {
  int<lower=1> K;
  int<lower=1> N;
  int<lower=1> I;
  
  array[I, N] real<lower=0, upper=1> x;
  array[I, N] real<lower=0, upper=1> y;
  
  array[I, K] real<lower=0> alphax;
  array[I, K] real<lower=0> betax;
  array[I, K] real<lower=0> alphay;
  array[I, K] real<lower=0> betay;
  
  array[I] row_vector<lower=0.04>[K] theta;
  array[I, K] real<lower=-0.99, upper=0.99> rho;
}

transformed data {
  real ps_min = 0.01;
  real ps_max = 0.96;
}

parameters {
  row_stochastic_matrix[I, K] w;
}

model {
  // Prior for mixture weights
  for (i in 1:I) {
    w[i] ~ dirichlet(theta[i]);
  }

  // Likelihood calculation
  for (i in 1:I) {
    for (n in 1:N) {
      row_vector[K] lps = log(w[i]);
      
      for (k in 1:K) {
        // Beta component log-probs
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        
        // Copula term
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real log_copula = gaussian_copula_density(u_x, u_y , rho[i, k]);
        
        lps[k] += beta_x + beta_y + log_copula;
      }
      target += log_sum_exp(lps);
    }
  }
}

generated quantities {
  array[I, N] real x_rep;
  array[I, N] real y_rep;
  array[I, N] real log_lik;
  array[I, N, K] real w_prob;

  for (i in 1:I) {
    for (n in 1:N) {
      vector[K] lps;
      vector[K] ps;

      for (k in 1:K) {
        // Component probabilities
        real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
        real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
        real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
        real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
        real log_copula = gaussian_copula_density(u_x, u_y , rho[i, k]);
        
        lps[k] = log(w[i, k]) + beta_x + beta_y + log_copula;
      }
      
      // Normalize probabilities
      ps = softmax(lps);
      log_lik[i, n] = log_sum_exp(lps);
      w_prob[i, n] = to_array_1d(bounded_softmax(lps,ps_min, ps_max));
      
      // Generate posterior predictions
      int comp = categorical_rng(ps);
      x_rep[i, n] = beta_rng(alphax[i, comp], betax[i, comp]);
      y_rep[i, n] = beta_rng(alphay[i, comp], betay[i, comp]);
    }
  }

Thanks for reporting back. I’m going to mark your reply as the solution. If that’s not right, please uncheck it and clarify what’s left to answer.