How to avoid divergent transitions for a hierarchical logistic regression model


Hi,
I wanted to fit a hierarchical model, where the response variable depends/conditioned on a another variable (group status) as in this model description:

The sample n can be split in to two groups (n=n_1+n_2), depending on the value for the group status. The response variable is a binary variable. The response variable in the second group is highly unbalanced. For the simplicity, I used ridge priors for the beta coefficients.

My code looks like this:

data {
int<lower=1> N;
int<lower=1> N1;
  int<lower=1> N2;
  int<lower=1> N11;
  int<lower=1> w;
  int<lower=0,upper=1> y[N];
  
  int<lower=1> K1; 
  matrix[N,K1] x11;
  
   real <lower=0,upper=1> pi1;
  real <lower=0,upper=1> pi2;
  
}

parameters {

  vector[w] alpha1;
  
  matrix[K1,w] beta1;
  vector <lower=0> [w] lambda;
   
}

transformed parameters{

real logit_pi1= log(pi1/(1.0-pi1));
real logit_pi2= log(pi2/(1.0-pi2));
}

model {

  vector[N] mu1;
  for(i in 1:N1){
  
   mu1[i] = alpha1[1]+ x11[i,] * beta1[,1];
   
  }
  
  for(i in N11:N){
  
    mu1[i] = alpha1[2]+ x11[i,] * beta1[,2];
   
  }
  
       //priors
       
       for(k in 1:w){
target +=normal_lpdf(beta1[,k]| 0,1/sqrt(2*lambda[k]));
target +=normal_lpdf(alpha1[k]| 0,10);

       }

   target +=  cauchy_lpdf(lambda[1]|0,1);
   target +=  cauchy_lpdf(lambda[2]|0,1);
   
   
   target += normal_lpdf(logit_pi1| 0.85,0.21);
 target += normal_lpdf(logit_pi2| -0.85,0.21);
   
    target += log(pi1);
   target +=  log(pi2);  

  for(i in 1:N1){
    
  target += bernoulli_logit_lpmf(y[i] | mu1[i]);  
  }
  
   for(i in N11:N){
  
  target += bernoulli_logit_lpmf(y[i] | mu1[i]);  
  }
 
  }

After running this model, I got some divergent transitions after the warm-up period. Is there a way to optimize this code avoid divergent transitions? Any suggestion would be great. There may be efficient ways to code/reparametrize the variables which I don’t use in here.

You currently have things coded with a “non-centered” parameterization of mu1, which usually works best when the data is sparse relative to the priors, and in my experience that’s usually the case with binomial data, but if you have lots and lots of data you might try the centered parameterization instead and see if that behaves better.

1 Like

@mike-lawrence Thank you for your suggestion. But it didn’t work. I am just wondering whether I can Bernoulli likelihood function using “bernoulli_logit_glm_lpmf” ? Will that be possible?