Sampling from the prior - why am I seeing divergent transitions?

I’m (trying) to work with the following survival model. My goal is to obtain prior distributions using Stan, but I’m getting several divergent transitions (see below). I observed this behavior when working with a hierarchical model, but was able to solve the problem by using non-centered parameterization. I’m puzzled because this model is not (I think) hierarchical.

Why is this happening and how can I solve this issue? I can share data if necessary.

data {
  int<lower=1> N;
  int<lower=1> N_site;
  int<lower=1> N_plot;
  int<lower=1> N_cover;
  //Survival stuff
  vector[N] y;
  vector[N] y1;
  vector[N] y2;
  int<lower=1,upper=2> cens[N];

  //Explanatory variables
  vector[N] LAI;
  vector[N] OM;
  vector[N] pH;
  vector[N] SM;
  vector[N] ST;
  int<lower=1,upper=5> cover_id[N];
  //group variables
  int<lower=1> site_id[N];
  int<lower=1> plot_id[N];

parameters {
  real a;
//varying intercepts
  vector[N_site] a_site;
  vector[N_plot] a_plot;
  vector[N_cover] a_cover;

   real b_om;
   real b_ph;
   real b_sm;
   real b_st;
   real<lower=0> k;
transformed parameters {
    vector[N] mu=exp(a + a_site[site_id]+a_plot[plot_id]+ a_cover[cover_id]+b_sm*SM+b_st*ST+
   vector[N] lambda= mu / tgamma(1 + 1 /k);
model {
  //for (i in 1:N)
  //if ( cens[i] == 1 ) target += weibull_lccdf(y[i] | k, lambda[i]);
  //else target += log_diff_exp(weibull_lcdf(y2[i] | k, lambda[i]),weibull_lcdf(y1[i] | k, lambda[i]));
  a ~ normal(0, 1);
  k ~ gamma(0.5, 0.5);
  a_site ~ normal(0,1);
  a_plot ~ normal(0,1);
  a_cover  ~ normal(0,1);
  b_om ~ normal(0,1);
  b_ph ~ normal(0,1);
  b_sm ~ normal(0,1);
  b_st ~ normal(0,1);
generated quantities{
  //vector[N] log_lik;
  vector[N] surv;
  for(i in 1:N){
  surv[i] = weibull_rng(k, lambda[i]);
mod <- cmdstan_model("survival_prior.stan")
fit <- mod$sample(
  data = data_list,
  chains = 4,
  parallel_chains = 4


Warning: 385 of 4000 (10.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions

Changing the prior for “k” seems to solve the issue of the divergent transitions.

k ~ gamma(2, 0.5);

1 Like

Divergent transitions can happen with any posterior, doesn’t have to be for a hierarchical model. Even though the parameters are a priori independent, Stan is sampling the 6 + N_site + N_plot + N_cover dimensional distribution.


In this case (and I think this is broadly true when parameters are completely independent, in the sense that the gradient for a given parameter never depends on any other other parameters), the global problems are no worse than the problems for the margins. In this case, we get divergences even if we restrict the model to a single tricky margin.

parameters {
  real <lower=0> y;
model {
  y ~ gamma(0.5, 0.5);


Warning: 8 of 4000 (0.0%) transitions ended with a divergence.

Given that the remainder of the model doesn’t yield divergences, the number of divergences can be larger or smaller depending on the remainder of the multivariate model, but this is less because the rest of the model makes the posterior fundamentally easier or harder to sample from, and more because the rest of the model influences the step-size adaptation.