Choosing correct non-centered parametrization

dhanur88 · September 30, 2020, 7:11pm

Hi , I am trying to fit a hierarchical prior distribution for regression coefficients as follows:

My response variable is a binary response variable. I tried several different parameterizations. But I am getting the divergent transitions warning message.

There were 38 divergent transitions after warmup

parameterization 1

parameters {
   real alpha1;
   vector[K1] beta1_tilde;
    vector<lower=0>[K1] tau_tilde;
   real<lower=0> lambda ;
}
transformed parameters {
    
   vector[K1] tau = tau_tilde /lambda;
   vector[K1] beta1= beta1_tilde .* tau;
}
model {
      
    beta1_tilde ~ normal(0, 1);
    lambda ~ cauchy(0, 1);
    tau_tilde ~ inv_gamma (0.5 , 0.5 );
    alpha1 ~ normal(0, 100);
  
  y1 ~ bernoulli_logit_glm(x1, alpha1, beta1);
}

parameterization 2

parameters {
   real alpha1;
   real<lower=0> lambda ;
   vector<lower=0> [K1] tau_tilde;
   vector[K1] beta1_tilde;
       
}
transformed parameters {
  
   vector[K1] beta1= beta1_tilde .* tau_tilde/lambda;
}
model {
      
    beta1_tilde ~ normal(0, 1);
    lambda ~ cauchy(0, 1);
    tau_tilde ~ inv_gamma (0.5 , 0.5 );
    alpha1 ~ normal(0, 100);
  
  y1 ~ bernoulli_logit_glm(x1, alpha1, beta1);
}

parameterization 3

parameters {
   real alpha1;
   real<lower=0> lambda ;
   vector<lower=0> [K1] tau_tilde;
   vector<multiplier=1/lambda>[K1] beta1_tilde;
   
}
transformed parameters {
  
     vector[K1] beta1= beta1_tilde .* tau_tilde;
}
model {
  
    
    beta1_tilde ~ normal(0, 1/lambda);
    lambda ~ cauchy(0, 1);
    tau_tilde ~ inv_gamma (0.5 , 0.5 );
    alpha1 ~ normal(0, 100);
  
  y1 ~ bernoulli_logit_glm(x1, alpha1, beta1);
}

I tried above mentioned 3 parametrizations. But still I am getting the same warning message.
In addition to this I have used following statement when running the models too.

control = list(adapt_delta = 0.99,max_treedepth=15)

Can anyone give any advice about to get rid of this warning message ?
Any help would be highly appreciated.

Thank you

Ara_Winter · October 1, 2020, 1:45pm

Is it possible to tighten up the prior on alpha1?

dhanur88 · October 1, 2020, 5:35pm

I tried that , But didn’t work :(

Ara_Winter · October 1, 2020, 8:43pm

What are the possible values for alpha1? And can you share some of the data or simulated data?

dhanur88 · October 1, 2020, 9:01pm

@Ara_Winter Thank you for reaching me out again . I played with a toy data set and I am getting that error for that data also.
Now, I have a feeling that my parameterization is incorrect. What do you think about that ?
Especially the section :

transformed parameters {

   vector[K1] tau = tau_tilde /lambda;
   vector[K1] beta1= beta1_tilde .* tau;
}

Because \tau^2 ~ has a inverse gamma distribution. But the way I specified the model, \tau has a inverse gamma distribution. Please share your idea about this if possible.

If that is the case, will it be possible to guide to specify the parameters correctly ? Thank you.
I am actually following the Local Student’s t distribution in the paper attached.
Review_Shrinkage_Priors.pdf (1.1 MB)

Thank you once again.

dhanur88 · October 1, 2020, 9:22pm

@Ara_Winter Attached is my updated code with a toy data set.
diabetes.csv (23.3 KB) logistic_model_reg_t4.stan (683 Bytes)

student.mat <- read.csv("C:/Users/diabetes.csv")
stu_data=student.mat %>% select(c(Outcome,Glucose,BMI))
stu_data_stan_ridge=list(N=dim(stu_data)[1],K1=2,y1=stu_data$Outcome,
                         x1=stu_data %>% select(-c(Outcome)))
stufit_reg_t <- stan(file="logistic_model_reg_t4.stan", data=stu_data_stan_ridge,
                          control = list(max_treedepth=15),
                          iter=2500, chains=4)

Check this if you have a time. Thanks

daniel_h · October 2, 2020, 2:11am

Sorry for the low effort response, I’m not near a computer right now, but it Iooks like the divergent transitions are coming from the cauchy prior on lambda.

You could try one of the reparameterizations described here: https://betanalpha.github.io/assets/case_studies/fitting_the_cauchy.html#4_second_alternative_implementation (I’m usually using the second one).

Edit: If that also fails, another reason for the divergent transitions could just be that the unshrunk coefficients are just too weakly identified. This is a common problem with these global-local shrinkage priors (essentially by design, because they are built to not regularize the coefficients far away from zero).
In logistic regression, this problem becomes even more apparent in presence of strong collinearity. The best solution would probably be to use the regularzed horseshoe prior, which has some regularization even for the coefficients far from zero.

dhanur88 · October 2, 2020, 2:39am

@daniel_h Thank you for the source. This will be really helpful for now and future. However for this analysis, after doing the proposed reparameterization, the results seems to be very bad than previously. I am not confident about the transformed parameters section. I feel like I have done a done a wrong transformation. (the priors are specified on \tau^2 ).

transformed parameters {
  real <lower=0> lambda = xa * sqrt(xb);
  vector[K1] tau = sqrt(tau_tilde_sqr) * sqrt(lambda);
   vector[K1] beta1= beta1_tilde .* (tau);
}
model {
  
    
    beta1_tilde ~ normal(0, 1);
    
   tau_tilde_sqr ~ inv_gamma (0.5 , 0.5 );
   
    xa ~ normal(0, 1);
    xb ~ inv_gamma(0.5,0.5);
    
    alpha1 ~ normal(0, 1);
  
  y1 ~ bernoulli_logit_glm(x1, alpha1, beta1);
}

I am not sure I have defined the transformed parameter tau correctly. Please have a look at that if you have time.

Thank you once again

daniel_h · October 2, 2020, 6:32pm

Hi, finally had time to look into this (who am I kidding, I’m just procrastinating).
Here is an implementation of the model in your first post that samples well on your toy data:

data {
  int<lower=1> N;
  int<lower=1> K1; 
  int<lower=0,upper=1> y1[N];
  matrix[N,K1] x1;
}

parameters {
  vector<lower=0>[K1] tau_j_sq;
  vector[K1] beta_j;
  real alpha;
  real<lower=0> aux1_lambda;
  real<lower=0> aux2_lambda;
}

transformed parameters {
    real<lower=0> lambda = aux1_lambda * sqrt(aux2_lambda); // implies lambda ~ C+(0, 1);
}

model {
  beta_j ~ normal(0, sqrt(tau_j_sq));
  tau_j_sq ~ inv_gamma(0.5 , 1/(2 * lambda));
  alpha ~ normal(0, 100);

  aux1_lambda ~ std_normal();
  aux2_lambda ~ inv_gamma(0.5, 0.5);

  y1 ~ bernoulli_logit_glm(x1, alpha, beta_j);
}

It’s a direct tanslation of the model in the first post with only lambda begin reparameterized.
Please keep in mind that your predictors should probably have equal variance to imply the same amount of shrinkage for all of them.

I would also suggest to go with the regularized horseshoe again (https://arxiv.org/pdf/1707.01694.pdf)

In another post you mentioned that you did try the vanilla horseshoe and had divergent transitions,
even with the reparameterization for the cauchy. This probably means that some of your beta coefficients are weakly identified through the likelihood and there is a pretty good chance that you will also see divergent transitions in the model here. By putting some degree of regularization on the coefficients that are far away from 0 you get much better sampling behavior, it’s really worth a try.
The paper I linked above even provides Stan code with the necessary changes required for the logistic regression model.

Please don’t hesitate to ask again if this doesn’t help.

dhanur88 · October 2, 2020, 11:31pm

@daniel_h Thank you. This is perfect.

Topic		Replies	Views
Divergent transitions for hierarchical model for binomial proportion Modeling	1	343	February 21, 2023
Looking for an advice regarding divergent transitions General rstan , fitting-issues , performance	7	596	October 1, 2020
Divergences in a non-centered computational model Modeling fitting-issues	21	1320	October 30, 2019
How to avoid divergent transitions for a hierarchical logistic regression model Modeling rstan , techniques , fitting-issues , specification	2	692	September 22, 2021
Divergent transitions after warmup for logistic regression model with horseshoe priors Modeling rstan , techniques , fitting-issues , specification , regularization	2	1212	October 5, 2020

Choosing correct non-centered parametrization

Related topics