How to Reduce Divergent Transitions for Non-linear Hierarchical Model?

TAOKA-Daiki · October 18, 2020, 2:55pm

Hi,

I fitted a non-linear hierarchical model to simple gamble data using rstan.

The estimated values of the parameters are almost right, but rstan reported warnings: divergent transitions after warmup.

> fit_stan <- rstan::sampling(object = model_stan,
+                             data = data_stan,
+                             iter = 3000,
+                             warmup = 1000,
+                             chains = 4,
+                             seed = 123,
+                             control = list(adapt_delta = 0.80, max_treedepth = 10))
Warnings: 
1: There were 908 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems

I increased the adapt_delta as instructed by rstan, but it did not work.
Pairs plot shows divergent transitions below the diagonal. So I think increasing adapt_delta will not resolve this problem.

I’m going to try reparameterizing, but I don’t know what to do.
Does anyone have any advice on reparameterizing the model below or other solutions?
Or, does anyone know a reference for reparameterizing?

Thank you.

Model:

BetRatio = 1/(1 + \eta_i*OddsAgainst)

, where Bet Ratio is data: the bet size proportion in current chips, Odds Against is input: winning probability of gamble, \eta is a shape parameter, i is ID.
This model is called “Probability Discounting Function” in psychology.

Stan code is here.

data {
  int<lower=0> N; //Number of data
  int<lower=0> Person; //Number of person
  int<lower=0> ID[N]; //ID
  vector<lower=0,upper=1>[N] P; //Gamble's winning probability(input)
  vector<lower=0, upper=1>[N] BR; //Bet Ratio(response)
}

parameters {
  vector<lower=0>[Person] eta; // Shape parameter of nonlinear function(per Person)
  real<lower=0> sigma; // Error(standard deviation shared by people)
  
  real<lower=0> mu_eta; // Hyper parameter for eta
  real<lower=0> sigma_eta; // Hyper parameter for eta
}

model {
  vector[N] pred; // Predicted BR by the nonlinear function
  vector[N] theta; // Odds against(transformed from P)
  
  //Priors
  sigma ~ uniform(0, 2);
  mu_eta ~ uniform(0, 100);
  sigma_eta ~ uniform(0, 100);
  
  //Model
  for(i in 1:Person){
    eta[i] ~ normal(mu_eta, sigma_eta);
  }
  for(j in 1:N){
    if(P[j] == 0){
      pred[j] = 0; // Note: When P = 0, Odds against get infinite.
    }
    else{
      theta[j] = (1 - P[j])/P[j];
      pred[j] = 1/(1 + eta[ID[j]]*theta[j]); // Nonlinear Function
    }
    BR[j] ~ normal(pred[j], sigma) T[0, 1]; // Likelihood
  }
}

Data:
I obtained data from 8 people. People were presented gambles (varying winning rate: 0%, 10%, 20%, … ,100%) and asked how much they would bet for each gamble. People answered the bet size.
One person was questioned twice, resulting in 22 data per person. Therefore, the total is 176.

mike-lawrence · October 18, 2020, 11:19pm

Try a non-centered parameterization for eta.

It’s probably not related to divergences, but I find that uniform priors rarely actually make sense.

TAOKA-Daiki · October 22, 2020, 12:57am

Thanks, Mike.

I have never used the non-centered reparameterization method, but I will keep it in mind for future reference.

Instead, I would like to report that increasing ranges of noninformative priors for the hyperparameters (i.e., mu_eta and sigma_eta) worked for me.

I changed them to:

mu_eta ~ uniform(0, 10000);
sigma_eta ~ uniform(0, 10000);

Divergent transitions no longer appeared even when I used other seeds for MCMC sampling.
Fitted nonlinear curves sound good.
I’m going to test the parameter recovery for confirmation.

I have no idea why increasing ranges of noninformative priors improved MCMC sampling, but I guess that more flat priors may have resulted in less steep geometry of posterior distributions.

Please let me know if anyone else has a valid idea about that.

Thanks.

mike-lawrence · October 22, 2020, 1:02am

I suggest you do some prior predictive checks.

TAOKA-Daiki · October 22, 2020, 2:53am

Okey, I will conduct prior predictive checks and parameter recovery checks.
And, I found a page explaining prior predictive checks in the Stan manual and one referenced paper.
To get started, I follow them.

Thank you for your helpful advice.

Topic		Replies	Views
Divergent transitions after warmup in hierarchical mixture of agents model Modeling rstan , fitting-issues	2	68	April 14, 2025
Hierarchical Model in Stan Taking to Long Modeling specification , performance	12	1233	June 19, 2019
Persistent divergent transitions in simple model Modeling fitting-issues	5	541	April 17, 2020
Divergence warnings, decreasing adapt_delta doesn't work Modeling fitting-issues , divergences	7	832	February 26, 2021
Divergent transitions after warmup Modeling rstan , fitting-issues , performance	1	613	February 21, 2024

How to Reduce Divergent Transitions for Non-linear Hierarchical Model?

Related topics