Simple Normal multilevel model, failure to converge


I’m trying to build a series of models to compare on my dataset. My data is structured in observational units (electrodes), that are subjected to repetitions of different events types. Some electrodes have been recorded on the same events as others, some haven’t. Either way, they have all been subjected to all event types. In my simplest model, I’d like to model the data with a Normal distribution, foregoing event type and event number, but accounting for electrode ID.

I thought this would be easy, but it turns out my model doesn’t converge well (divergent transitions). I’m assuming moving to more complex models will go poorly if I don’t fix this now.

I’m at a loss, what am I doing wrong?

Thank you.

data {
  int Nobs; // number of data obervations
  vector[Nobs] dat; // data, one per row
  int Nelecs; // ID of electrodes
  int elec[Nobs]; // number of electrodes

parameters {
  // population parameters
  vector[1] b;
  vector<lower=0>[1] pos_b;
// individual parameters
  vector<lower=0>[2] sigma_u;      
  cholesky_factor_corr[2] L_u;
  matrix[2,Nelecs] z_u;       

transformed parameters{
  matrix[2,Nelecs] u;
  u = diag_pre_multiply(sigma_u, L_u) * z_u;

model {
  real M;
  real S;
  // population
  b ~ normal(0, 5); 
  pos_b ~ normal(0,2);
  // individual
  L_u ~ lkj_corr_cholesky(2);
  to_vector(z_u) ~ normal(0,1);
  sigma_u ~ cauchy(0, 2);
  // sampling
  for(n in 1:Nobs){
    M = b[1] + u[1,elec[n]];
    S = pos_b[1] + u[2,elec[n]];
    dat[n] ~ normal(M, S);

Not sure it’s the source of your woes, but there’s a reparameterization of the Cauchy that you might try. Frankly I find the use of cauchy tends to be folks’ default from seeing examples rather than something strongly theory-motivated and suspect that normal() would usually suffice. I think original examples used cauchy under the intent to make things more “robust” amid the possibility of large outliers, but nothing comes for free (cauchy amid no such outliers induces lower informativity from the data) and I’d suggest only doing this if your posterior predictive checks suggest it’s necessary rather than defaulting to it off the bat. I also think folks need to consider a not-peaked-at-zero prior for variance terms more often, where I like the weibull personally as a peaked-away-from-zero alternative (esp. for measurement noise parameters).

Also, note that you can vectorize your observation-level structure:

  dat ~ normal(b[1] + u[1,elec], pos_b[1] + u[2,elec]);
1 Like

Thank you!

I picked Cauchy priors from an example as you suspected, changing my priors worked like a charm.

1 Like