Dealing with dependencies between parameters?

Alasdair_Clarke · September 18, 2024, 9:27am

Hello,

I am making progress with my mixture model: it now fits the data pretty nicely:

(the histogram shows the data, while the lines show samples from the posterior)

I still have some warnings from the sampler though:

Warning: 83 of 8000 (1.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.

Warning: 2008 of 8000 (25.0%) transitions hit the maximum treedepth limit of 10.
See https://mc-stan.org/misc/warnings for details.

Warning: 4 of 4 chains had an E-BFMI less than 0.3.
See https://mc-stan.org/misc/warnings for details.

My current guess is that this due to the dependency between my two parameters:

theta - weight of component (i.e, I am running a mixture model)
kappa - von Mises concentration.

As can be seen in the plot below, if kappa is very small, then theta is relatively unconstrained, and vice versa.

Are there any “standard tricks” I should know about for dealing with such situations?

For now, I am experimenting with different priors to see if that will help.

As always, and hints, tips and advice very welcome.

Alasdair

Model below:

data {
  int<lower = 0> N;
  int <lower = 1> K; // number of experimental conditions  
  array[N] int <lower = 1, upper = K> X; //  which condition are we in
  array[N] real phi;
  int<lower = 0> L; // number of participants
  array[N] int<lower = 1, upper = L> Z;
  real prior_theta;
  real prior_sigma_u;
}

transformed data {

  array[8] real mu;
  
  mu[1] = 0 * pi()/4;
  mu[2] = 1 * pi()/4;
  mu[3] = 2 * pi()/4;
  mu[4] = 3 * pi()/4;
  mu[5] = 4 * pi()/4;
  mu[6] = 5 * pi()/4;
  mu[7] = 6 * pi()/4;
  mu[8] = 7 * pi()/4;
}

parameters {
  // mixing proportions
  // 8 von Mises + 1 uniform 

  // fixed effects
  array[K] simplex[9] theta; 
  array[K] vector[8] log_kappa;

  // random effects
  array[K, L] vector[9] u; 
  array[K, 9] real<lower = 0> sigma_u;
}

transformed parameters {
  
  // create a vector for each participant's theta values
  array[K, L] vector[9] log_theta_u;
  // transform kappa
  array[K] vector[8] kappa = exp(log_kappa);

  for (k in 1:K) {

    for (l in 1:L) {

      log_theta_u[k, l] = log_softmax(log(theta[k]) + u[k, l]);

    }
  }  
}

model {

  //////////////////////////////////////////////
  // setting priors/////////////////////////////
  //////////////////////////////////////////////

  // priors for theta

  for (kk in 1:K) {

    theta[kk] ~ dirichlet(rep_vector(prior_theta, 9)); 
    log_kappa[kk] ~ normal(4, 0.5);
    
      for (psi in 1:9) {

        sigma_u[kk, psi] ~ exponential(prior_sigma_u);

        for (ll in 1:L) {

          u[kk, ll][psi] ~ normal(0, sigma_u[kk, psi]); //

      }         
    }    
  }

  //////////////////////////////////////////////
  // fitting model /////////////////////////////
  //////////////////////////////////////////////
  
  int kk, ll; // which condition are we in and which person is it
 
  for (n in 1:N) {

    kk = X[n];
    ll = Z[n];

    vector[9] lps;
    //print(log_theta_u[kk, ll]);

    for (psi in 1:8) {
      //lps[psi] = log(theta[kk, psi]) + von_mises_lpdf(phi[n] | mu[psi], kappa);
      lps[psi] = log_theta_u[kk, ll][psi] + von_mises_lpdf(phi[n] | mu[psi], kappa[kk, psi]);
    }

    lps[9] = log_theta_u[kk, ll][9] + uniform_lpdf(phi[n] | -pi(), pi());

    target += log_sum_exp(lps);
  }
  
}

Bob_Carpenter · September 19, 2024, 10:13pm

The likely culprit is the centered parameterization of u:

It’s generally better to define a u_std ~ normal(0, 1) and then define u as a transformed parameter that multiplies by the relevant scale.

You can clean up a lot of the code with vectorization to improve speed.

theta ~ dirichlet(rep_vector(prior_theta, 9));
log_kappa ~ normal(4, 0.5);

Then you can make it faster by moving the rep_vector to define a variable in transformed data that only gets computed once.

You could define kappa with a lower bound of 0 and give it a lognormal prior to avoid having to manually transform.

It’s not easy to find, but you can code that evenly spaced array as a vector,

vector[8] mu = lin_spaced_array(7, 0, 7) * pi() / 4;

or

vector[8] mu  = [0, 1, 2, 3, 4, 5, 6, 7]' * pi() / 4;

Alasdair_Clarke · September 20, 2024, 1:22pm

Thanks a lot!

That has helped a lot - I continually undervalue the power of centred v uncentered priors (and find the terminology confusing, as I think of centred as being about the mean, while scaling is about the standard deviation! Anyway, my bad for forgetting about this!)

Most of the warnings have evaporated and I know just have two transitions ending with a divergence which I suspect will be easy enough to tackle. I have plotted the posterior below.

Overall, things seem to work pretty well but it would be nice to somehow build in the knowledge that theta → 0 implies that kappa → 0. I suppose there might be a way to do this with a multivariate prior?

Or define kappa as b x theta?

I don’t think it’s a big deal for what I am trying to do with this model, but it would be quite satisfying to learn the best way to implement this.

Anyway, thanks a huge amount.

Bob_Carpenter · September 20, 2024, 7:36pm

Glad it helped.

If you really have \kappa = b \times \theta, then by all means build that in. If it’s a soft relationship and you have a reasonable prior on \theta, you can define a bivariate prior conditionally as something like p(\theta, \kappa) = p(\theta) \cdot p(\kappa \mid \theta) where p(\kappa \mid \theta) is something like a lognormal with location \log \theta (for median \theta) and scale \sigma \ll 1. That’ll keep the scales in line because the error on lognormal is multiplicative.

Topic		Replies	Views
Mixture Models: Constraint two parameters to be the same? brms	2	658	August 18, 2020
About chains mixing and sloppy parameters General	1	732	August 28, 2020
Multimodality issues in regression model with mixture prior Modeling techniques , fitting-issues	4	1011	August 29, 2019
Mixtures: A debugging story Modeling	2	317	December 8, 2023
Can anybody help with multi-level mixture models Modeling	2	130	September 2, 2024

Dealing with dependencies between parameters?

Related topics