Hierarchical model suddenly gets divergent transitions

I worked on a logistic hierarchical model in january and posted a question about it here on stan discourse, see: Fitting hierarchical logistic regression to large dataset. I fitted a model with 2000 data points and 710 “slopes” without problems and without errors or warnings. Now that I have decided to take up stan modelling again, I run into some issue that weren’t there last time I worked on the model. There is no problems with compiling the model. I have tried to fit the model on a shortened version of the dataset, with 100 data points and 20 “slopes”. This results in the dreaded warning:

Warning messages:
1: There were 52 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 50 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
3: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
4: Examine the pairs() plot to diagnose sampling problems
 
5: The largest R-hat is 1.06, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

Increasing adapt_delta to 0.99 or increasing iterations did not help. Besides this, stan also seems off at some points. Stan models takes a really long to time save (simply crtl+s) and the model often complains about missing newline in the end, even though there is a newline at the end of the model. Fitting the model to the entire dataset yields this:

Warning message:
Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess

My model looks like this.

// h_logreg.stan

data{
  int<lower=1> N; // Rows
  int<lower=1> M; // Columns
  
  int<lower=0, upper=1> y[N]; // Outcome variables
  matrix<lower=0, upper=2>[N, M] x; // Predictor variables
}

parameters{
  // Hyper priors
  real mu;
  real<lower=0> sigma;
  
  // Priors
  real a;
  vector[M] b;
}

model{
  // Hyper-priors
  mu ~ normal(0, 5);
  sigma ~ cauchy(0, 5);
  
  // Priors
  a ~ normal(0, 5);
  b ~ normal(mu, sigma);
  // Likelihood
  //y ~ bernoulli_logit(a + x * b);
  y ~ bernoulli_logit_glm(x, a, b);
}


My rstan code looks like this:

library(BEDMatrix)
library(rstan)
library(scales)
library(shinystan)
library(devtools)
rstan_options(auto_write=TRUE)
options(mc.cores = parallel::detectCores())

y <- read.delim("file.txt", head=FALSE)
y <- y$V1
y <- ifelse(y=="2", 1, 0)
path <- "C:/Users/Documents/Stan/sim1.bed"
m <- BEDMatrix(path, n=2000, p=710)
x <- m[1:100, 1:20]
y <- y[1:100]

logModel <- stan_model("h_logreg.stan")
logFit <- sampling(logModel,
                   list(N = 100, M = 20, y=y, x=x),
                   iter=2000,
                   chains = 4,
                   save_warmup=FALSE)


My computer specifications are:

  • Processor: Intel® Core(MT) i7-10510U CPU @ 1.80 GHz 2.30 GHz
  • Installed RAM: 16,0 GB (15,8 GB usable)
  • System type: 64-bit operating system, x64 based processor

I hope some of you can shed light on this issue, since last january I didn’t have these issues with the same model and the same data.

Just to be clear: you are using the same model on the same dataset, only thing that changed is Stan version? If so, this might be some bug and we would want to check it out in more detail…

This is very weird - maybe your installation is corrupted or this is an IDE bug? Some people had issues when moving to R 4.0…

It is quite possible your data cannot inform all of your parameters - see the case study at Underdetermined Linear Regression for an example how this would manifest.