Loglikelihood in bivariate beta mixture model with copula

[edit: escaped the rest of the code]

Hi,

I am trying to implement a hierarchical beta mixture model with fixed alphas and betas and prior on the components weights. Everything runs but I am having an hard time understanding how to get the values to use in the loo package afterwards. Can someone help me?

functions{
  // Function to compute the Gaussian copula density
  real gaussian_copula_density(real u_x, real u_y, real rho) {
    real rho_sq = square(rho);
    real one_m_rho_sq = 1 - rho_sq;
    real z_x = inv_Phi(u_x); // inverted normal cdf
    real z_y = inv_Phi(u_y); // inverted normal cdf
    real log_term1 = -log(2 * pi()) - 0.5 * log1m(rho_sq);
    real log_term2 = -0.5 * (square(z_x) + square(z_y) - 2 * rho * z_x * z_y) / one_m_rho_sq;

  return exp(log_term1 + log_term2);
  }
}

data {
  int<lower=1> K; // number of mixture components
  int<lower=1> N; // number of data points
  int<lower=1> I;//number of individuals
  int<lower=1, upper=I> participant[N]; // Participant index for each observation
  real<lower=0,upper=1> y[N]; // observations
  real<lower=0,upper=1> x[N]; // Observations 
  array[I, K] real<lower=0> alphax;
  array[I, K] real<lower = 0> betax;
  array[I, K] real<lower = 0> alphay;
  array[I, K] real<lower = 0> betay;
  row_vector<lower=0.0001>[K] theta[I]; // Dirichlet prior parameters for each participant
  real<lower = -1, upper = 1> rho[I,K];
}
parameters {
  array[I] simplex[K] w;
}

//transformed parameters {
//  array[I] vector[K] log_w  = log(w);
//}

model {
  
  // priors
  for(i in 1:I){
    w[i] ~ dirichlet(theta[i]);
  }

  // individual likelihoods, as sum of component contributions
  for (n in 1:N) {
     int i = participant[n];
     vector[K] log_w  = log(fmax(1e-10,w[i]));
     vector[K] lps = log_w;  // Initialize with log weights
     // Retrieve participant index for nth data point
  
    for(k in 1:K){
      real beta_x = beta_lpdf(x[n] | alphax[i,k], betax[i,k]);
      real beta_y = beta_lpdf(y[n] | alphay[i,k], betay[i,k]);
      real u_x = beta_cdf(x[n] | alphax[i,k], betax[i,k]);
      real u_y = beta_cdf(y[n] | alphay[i,k], betay[i,k]);
      real copula_density = gaussian_copula_density(u_x, u_y, rho[i,k]);
      
      lps[k] += beta_x + beta_y+ log(fmax(1e-10, copula_density));
      }
      target += log_sum_exp(lps);
    }
  }
generated quantities {
  real x_rep[N];
  real y_rep[N];

  for (n in 1:N) {
    int i = participant[n]; // Retrieve participant index for nth data point
    vector[K] ps;

    for (k in 1:K) {
      real u_x = beta_cdf(x[n] | alphax[i, k], betax[i, k]);
      real u_y = beta_cdf(y[n] | alphay[i, k], betay[i, k]);
      real copula_density = gaussian_copula_density(u_x, u_y, rho[i, k]);

      ps[k] = w[i, k] * copula_density;
    }
    
    // Check for NaNs in ps
    for (k in 1:K) {
      if (is_nan(ps[k])) {
        print("ps[", k, "] is NaN at iteration ", n);
      }
    }
    // Normalize ps vector
    ps = ps / sum(ps);

    // Debug print to check if ps sums to 1
    real ps_sum = sum(ps);
    print("ps_sum: ", ps_sum);
    
    // Ensure ps is a valid simplex
    if (ps_sum <= 0) {
      print("ps_sum is not positive at iteration ", n);
    } else if (is_nan(ps_sum)) {
      print("ps_sum is NaN at iteration ", n);
    } else {
      int comp = categorical_rng(ps); // Sample a component
      x_rep[n] = beta_rng(alphax[i, comp], betax[i, comp]); // Generate new x data
      y_rep[n] = beta_rng(alphay[i, comp], betay[i, comp]); // Generate new y data
    }
  }
}

instead of

model{
  vector[N] mu = alpha+beta*x;
  y~normal(mu,sigma); 
}

Sorry this didn’t get answered sooner as it’s not a hard question.

P.S. I don’t think your check is quite right—you need to make sure that each entry in ps is finite and non-negative and then ps / sum(ps) will be a simplex. Testing the sum isn’t enough as that might be ps = (-2, 1) and the normalization will fail without blowing up into NaN.

For loo, you just need the individual data point likelihoods defined in a generated quantities variable named log_lik. And I’m pretty sure the data has to be conditionally i.i.d. given the parameters to be usable for loo.

I’m afraid you need to duplicate the loop in your model (you can pull it out into a function to avoid code duplication) and calculate

vector[N] log_lik;
for (n in 1:N) {
  ...
  log_lik[n] = log_sum_exp(lps);

The recompilation is super fast because there’s no autodiff and you only have to do it once, not each leapfrog step. But if you really want to avoid duplicated code, define log_lik as a transformed parameter, define it in that block, then just replace the block in the model defining the likelihood with

target += sum(log_lik);

Hi, Sorry I just saw this, I had no received a notification.

this is my current model

functions { 
// Function to compute the Gaussian copula density
  real gaussian_copula_density(real u_x, real u_y, real rho) {
    real rho_sq = square(rho);
    real one_m_rho_sq = 1 - rho_sq;
    real z_x = inv_Phi(u_x); // inverted normal cdf
    real z_y = inv_Phi(u_y); // inverted normal cdf
    real log_term1 = -log(2 * pi()) - 0.5 * log1m(rho_sq);
    real log_term2 = -0.5 * (square(z_x) + square(z_y) - 2 * rho * z_x * z_y) / one_m_rho_sq;

    return exp(log_term1 + log_term2);
  }
}

data {
  int<lower=1> K; // number of mixture components
  int<lower=1> N; // number of data points
  int<lower=1> I; // number of individuals
  array[I, N] real<lower=0> y; // observations
  array[I, N] real<lower=0> x; // observations
  array[I, K] real<lower=0> alphax;
  array[I, K] real<lower=0> betax;
  array[I, K] real<lower=0> alphay;
  array[I, K] real<lower=0> betay;
  array[I] row_vector<lower=0.0001>[K] theta; // Dirichlet prior parameters for each participant
  array[I, K] real<lower=-1, upper=1> rho; // Correlation parameters for each participant and component
}

parameters {
  array[I] simplex[K] w; // Mixture weights for each participant
}

model {
  // Priors for w
  for (i in 1:I) {
    w[i] ~ dirichlet(theta[i]);
  }
  
  
  for (i in 1:I) {    
    for (n in 1:N) {
      vector[K]  log_w  = log(fmax(1e-10,w[i]));
      vector[K] lps= log_w;

    for (k in 1:K) {
      real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
      real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
      real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
      real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
      real copula_density = gaussian_copula_density(u_x, u_y, rho[i, k]);

      lps[k] += beta_x + beta_y+ log(fmax(1e-10, copula_density));
    }

    // Use log_sum_exp to sum the log-probabilities in a stable way
    target += log_sum_exp(lps);
  }
  }
}
generated quantities {
  array[I,N] real x_rep;
  array[I,N] real y_rep;
  array[I,N] real log_lik; // Log-likelihood for each observation
   array[I,N,K] real w_prob; // Mixing probabilities for each participant, trial, and cluster
   
  for (i in 1:I) {
    for (n in 1:N) {
    vector[K] ps;
    vector[K] lps;
   

    for (k in 1:K) {
      real beta_x = beta_lpdf(x[i, n] | alphax[i, k], betax[i, k]);
      real beta_y = beta_lpdf(y[i, n] | alphay[i, k], betay[i, k]);
      real u_x = beta_cdf(x[i, n] | alphax[i, k], betax[i, k]);
      real u_y = beta_cdf(y[i, n] | alphay[i, k], betay[i, k]);
      real copula_density = gaussian_copula_density(u_x, u_y, rho[i, k]);

      lps[k] = log(fmax(1e-10, w[i, k])) + beta_x + beta_y + log(fmax(1e-10, copula_density));
      ps[k] = w[i,k]* exp(beta_x + beta_y) * copula_density;
    }

    // Compute log-likelihood for observation n using log_sum_exp
    log_lik[i, n] = log_sum_exp(lps);
    // Normalize ps vector
    ps = ps / sum(ps);
    // Save the mixing probabilities for each cluster k in each trial n for participant i
      for (k in 1:K) {
        w_prob[i, n, k] = ps[k]; // Storing the normalized mixing probability for the k-th cluster
      }

    

    // Ensure ps is a valid simplex
    int comp = categorical_rng(ps); // Sample a component
    x_rep[i,n] = beta_rng(alphax[i, comp], betax[i, comp]); // Generate new x data
    y_rep[i,n] = beta_rng(alphay[i, comp], betay[i, comp]); // Generate new y data
  }
  }
}

Was this what you were suggesting?

I have another question. I am getting divergent transitions, would reparametrising help?

[edit: escaped Stan code]

Yes, at least in terms of the placement of the log-sum-exp. But this is way too much for me to understand just from looking at it briefly.

There are some red flags in here, like fmax(1e-10, w[i, k]). the reason this is a problem is that if w[i, k] < 1e-10, then derivative information gets cut from getting back through w and the model won’t fit. If you want to constrain w, I would just do that directly, for example by assigning 1e-10 to each area and adding a simplex scaled by (1 - 1e10 * K) where there are K entries in the simplex. This will keep the information flowing through w.

1 Like

Thank you for your help.
Is this what you were suggesting?


parameters {
  array[I] simplex[K] v; // Base simplex for mixture weights
}

transformed parameters {
  array[I] vector[K] w; // Adjusted mixture weights
  for (i in 1:I) {
    w[i] = 1e-10 + (1 - K * 1e-10) * v[i];
  }
}

Yes, exactly. We just introduced (in version 2.36, January 2025) a left-stochastic matrix where the rows are simplexes, which allows doing this in a one liner.

transformed data {
  real p_min = 1e-10;
  real p_mult = 1 - K * p_min;
}
parameters {
  row_stochastic_matrix[I, K] v;
}
transformed parameters {
   row_stochastic_matrix[I, K] w = p_min + p_mult * v;
}

You might want to evaluate sensitivity to the value of p_min by moving it to the data block and trying different values.

1 Like