Divergent transitions after warmup to be sloved

Hi all,

I have no idea why the following code have the issues, if anyone can help with this, i really appreciate it.

dat<-read.table(header = TRUE, text = "
id[]  dose[]	n[]	y[,1]	y[,2]   y[,3]	y[,4]	y[,5]	y[,6]
1	10	3	-0.43	-0.64	0	NA	NA	NA
2	10	3	0.35	-2	-0.92	NA	NA	NA
3	10	3	0.07	-0.07	0.15	NA	NA	NA
4	10	3	0.29	-1.42	0.14	NA	NA	NA
5	50	3	0.57	-2.57	-1.07	NA	NA	NA
6	10	3	0.08	0.43	-1.07	NA	NA	NA
7	50	3	-0.36	-0.14	0.64	NA	NA	NA
8	10	3	-0.21	1.28	-1.5	NA	NA	NA
9	10	3	0.68	-0.25	-0.09	NA	NA	NA
10	10	3	-0.64	1	-0.29	NA	NA	NA
11	10	4	0.05	-0.22	0.57	0.36	NA	NA
12	25	6	0.64	1.08	-0.36	0.79	-0.64	1.5
13	5	3	-0.29	1.08	1.5	NA	NA	NA
14	20	4	4.29	3.15	0.78	4.49	NA	NA
15	20	3	0.64	0	0.5	NA	NA	NA
16	10	4	1.22	1.07	-0.08	0.5	NA	NA
17	25	4	-0.08	0.86	1.07	1.15	NA	NA
18	10	3	1.85	0.78	0.54	NA	NA	NA
19	50	4	0.86	1.43	0.65	1.86	NA	NA
20	20	3	0.86	1.39	1.85	NA	NA	NA
21	10	3	1.35	1.36	0.79	NA	NA	NA
22	10	3	0.47	0.93	0.86	NA	NA	NA
23	25	3	0.42	0.71	0.66	NA	NA	NA
")
names(dat) <- gsub(".", "", names(dat), fixed = TRUE)

y1 = as.array(as.numeric(na.exclude (dat$y1)))
y2 = as.array( as.numeric(na.exclude (dat$y2)))
y3 = as.array(as.numeric(na.exclude (dat$y3)))
y4 = as.array(as.numeric(na.exclude (dat$y4)))
y5 = as.array(as.numeric(na.exclude (dat$y5)))
y6 = as.array(as.numeric(na.omit (dat$y6)))

y=as.array(c(y1,y2,y3,y4,y5,y6))
s=dat$n
  
options(mc.cores = parallel::detectCores()) 

dataList = list(
  Ntotal = length(y), 
  s = s,    
  K= length(s), 
  y= y
  
)
modelString1 = "
data{
  int<lower=0>  Ntotal;
  int<lower=0>  K;
  vector[Ntotal] y;
  int s[K];
}
parameters {
  real theta[K]; 
  real<lower=-10,upper=10> mutheta;
  real<lower=0> sigma2[K];
  real<lower=-10,upper=10>  musigma;
  real<lower=0,upper=10> tausigma;
  real<lower=0,upper=50> tautheta;
}

model {
  int pos;
  pos = 1;

for (k in 1:K) {
  segment(y, pos,s[k])  ~ normal(theta[k],sqrt(sigma2[k])); 
  pos = pos + s[k];
  theta[k]  ~ normal(mutheta, tautheta) ;
  sigma2[k]  ~ lognormal(musigma,tausigma);
} 
  mutheta  ~ uniform(-10,10);   
  musigma  ~ uniform(-10,10);   
  tausigma ~ uniform(0,10);
  tautheta ~ uniform (0,50);
}

generated quantities {
    real<lower=0,upper=1> Ppop ;
    real<lower=0,upper=1> Ppopdiff;
    real<lower=0> meanvar;

    Ppop = mutheta>0;
    Ppopdiff = mutheta>0.5;
    meanvar = exp(-musigma+tausigma/2);
}
"

parameters = c( "mutheta", "Ppop", "Ppopdiff", "tautheta","tausigma","meanvar")  

writeLines( modelString1 , con="TEMPmodel.stan" )
stanModel <- stan(
  data = dataList ,
  file = "TEMPmodel.stan",  
  chains = 4,            
  warmup = 5000,
  pars = parameters ,
  iter = 25000,
 control = list(adapt_delta = 0.99))

Warnings are as follows:
Warning messages:
2: There were 99 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
3: Examine the pairs() plot to diagnose sampling problems

Hi,

for #1 you can can disregard it. For #2 have you read:
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

?

I think the model will need to be reimplemented to avoid biases. Not sure how to adjust the model

Hierarchical models often run into divergences, see this section of the User’s Guide for more background: 22.7 Reparameterization | Stan User’s Guide

Hierarchical models refer to priors where the hyperparameters also have priors, like:

theta[k]  ~ normal(mutheta, tautheta);

A good fix for this is (often) the non-centered parameterisation (discussed in the chapter above). You can use this simply via the offset and multiplier syntax. Change the declaration of theta to:

vector<offset=mutheta,multiplier=tautheta>[K] theta; 

You’ll need to move this after the declaration of mutheta and tautheta, so your parameters block should look like:

parameters {
  real theta[K]; 
  real<lower=-10,upper=10> mutheta;
  vector<lower=0>[K] sigma2;
  real<lower=-10,upper=10>  musigma;
  real<lower=0,upper=10> tausigma;
  real<lower=0,upper=50> tautheta;
  vector<offset=mutheta,multiplier=tautheta>[K] theta; 
}
2 Likes

Thanks for helping to look at this.

I tried your suggestions with updating the theta and sigma2 to vector, there are still some divergents as follows:

2: There were 8 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
3: Examine the pairs() plot to diagnose sampling problems

Try using a non-centered parameterisation for sigma2as well:

parameters {
  real<lower=-10,upper=10> mutheta;
  real<lower=-10,upper=10>  musigma;
  real<lower=0,upper=10> tausigma;
  real<lower=0,upper=50> tautheta;
  vector<offset=musigma ,multiplier=tausigma>[K] sigma2;
  vector<offset=mutheta,multiplier=tautheta>[K] theta; 
}
1 Like

Even more issues after changing sigma2

2: There were 490 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
3: There were 47638 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
4: Examine the pairs() plot to diagnose sampling problems

5: The largest R-hat is 11.34, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

It looks like you are running MANY more iterations than is typical in Stan. You almost certainly don’t need to run this many iterations. The good news is that if your model with 8 divergent transitions after warmup also had 40000 or more sampling iterations (i.e. only something on the order of 1 in 5000 iterations was divergent), then I think your divergences probably don’t indicate a huge problem. You might increase adapt.delta by a bit just to feel comfortable and get no divergences…

Ah, looks like that was the wrong way to non-center the lognormal, according to this post from Bob.

Try with the following model:

data{
  int<lower=0>  Ntotal;
  int<lower=0>  K;
  vector[Ntotal] y;
  int s[K];
}
parameters {
  real<lower=-10,upper=10> mutheta;
  real<lower=-10,upper=10>  musigma;
  real<lower=0,upper=10> tausigma;
  real<lower=0,upper=50> tautheta;
  vector<offset=mutheta,multiplier=tautheta>[K] theta; 
  vector[K] sigma2_raw;
}

transformed parameters {
  vector[K] sigma2_jacob = musigma + sigma2_raw * tausigma;
  vector<lower=0>[K] sigma2 = exp(sigma2_jacob); 
}

model {
  int pos;
  pos = 1;

  sigma2_raw  ~ std_normal();

  for (k in 1:K) {
    segment(y, pos,s[k])  ~ normal(theta[k],sqrt(sigma2[k])); 
    pos = pos + s[k];
    theta[k]  ~ normal(mutheta, tautheta) ;
  } 
  mutheta  ~ uniform(-10,10);   
  musigma  ~ uniform(-10,10);   
  tausigma ~ uniform(0,10);
  tautheta ~ uniform (0,50);

  target += sum(sigma2_jacob);
}

generated quantities {
    real<lower=0,upper=1> Ppop ;
    real<lower=0,upper=1> Ppopdiff;
    real<lower=0> meanvar;

    Ppop = mutheta>0;
    Ppopdiff = mutheta>0.5;
    meanvar = exp(-musigma+tausigma/2);
}

2 Likes

Cooooool! Thank you so much!!!