How to Reduce Divergent Transitions for Non-linear Hierarchical Model?


I fitted a non-linear hierarchical model to simple gamble data using rstan.

The estimated values of the parameters are almost right, but rstan reported warnings: divergent transitions after warmup.

> fit_stan <- rstan::sampling(object = model_stan,
+                             data = data_stan,
+                             iter = 3000,
+                             warmup = 1000,
+                             chains = 4,
+                             seed = 123,
+                             control = list(adapt_delta = 0.80, max_treedepth = 10))
1: There were 908 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See 
2: Examine the pairs() plot to diagnose sampling problems

I increased the adapt_delta as instructed by rstan, but it did not work.
Pairs plot shows divergent transitions below the diagonal. So I think increasing adapt_delta will not resolve this problem.

I’m going to try reparameterizing, but I don’t know what to do.
Does anyone have any advice on reparameterizing the model below or other solutions?
Or, does anyone know a reference for reparameterizing?

Thank you.


BetRatio = 1/(1 + \eta_i*OddsAgainst)

, where Bet Ratio is data: the bet size proportion in current chips, Odds Against is input: winning probability of gamble, \eta is a shape parameter, i is ID.
This model is called “Probability Discounting Function” in psychology.

Stan code is here.

data {
  int<lower=0> N; //Number of data
  int<lower=0> Person; //Number of person
  int<lower=0> ID[N]; //ID
  vector<lower=0,upper=1>[N] P; //Gamble's winning probability(input)
  vector<lower=0, upper=1>[N] BR; //Bet Ratio(response)

parameters {
  vector<lower=0>[Person] eta; // Shape parameter of nonlinear function(per Person)
  real<lower=0> sigma; // Error(standard deviation shared by people)
  real<lower=0> mu_eta; // Hyper parameter for eta
  real<lower=0> sigma_eta; // Hyper parameter for eta

model {
  vector[N] pred; // Predicted BR by the nonlinear function
  vector[N] theta; // Odds against(transformed from P)
  sigma ~ uniform(0, 2);
  mu_eta ~ uniform(0, 100);
  sigma_eta ~ uniform(0, 100);
  for(i in 1:Person){
    eta[i] ~ normal(mu_eta, sigma_eta);
  for(j in 1:N){
    if(P[j] == 0){
      pred[j] = 0; // Note: When P = 0, Odds against get infinite.
      theta[j] = (1 - P[j])/P[j];
      pred[j] = 1/(1 + eta[ID[j]]*theta[j]); // Nonlinear Function
    BR[j] ~ normal(pred[j], sigma) T[0, 1]; // Likelihood

I obtained data from 8 people. People were presented gambles (varying winning rate: 0%, 10%, 20%, … ,100%) and asked how much they would bet for each gamble. People answered the bet size.
One person was questioned twice, resulting in 22 data per person. Therefore, the total is 176.

Try a non-centered parameterization for eta.

It’s probably not related to divergences, but I find that uniform priors rarely actually make sense.

Thanks, Mike.

I have never used the non-centered reparameterization method, but I will keep it in mind for future reference.

Instead, I would like to report that increasing ranges of noninformative priors for the hyperparameters (i.e., mu_eta and sigma_eta) worked for me.

I changed them to:

mu_eta ~ uniform(0, 10000);
sigma_eta ~ uniform(0, 10000);

Divergent transitions no longer appeared even when I used other seeds for MCMC sampling.
Fitted nonlinear curves sound good.
I’m going to test the parameter recovery for confirmation.

I have no idea why increasing ranges of noninformative priors improved MCMC sampling, but I guess that more flat priors may have resulted in less steep geometry of posterior distributions.

Please let me know if anyone else has a valid idea about that.


I suggest you do some prior predictive checks.

Okey, I will conduct prior predictive checks and parameter recovery checks.
And, I found a page explaining prior predictive checks in the Stan manual and one referenced paper.
To get started, I follow them.

Thank you for your helpful advice.