Why do chains get stuck, and how can I diagnose this behaviour?

Consider the following Beta Binomial Model

data{
  int n;
  array[n] int N;
  array[n] int y;
}
parameters{
  real<lower=0, upper=1> mu;
  real<lower=0> kappa;
}
model{
  kappa ~ cauchy(0, 1);
  mu ~ beta(1, 1);
  y ~ beta_binomial(N, mu/kappa, (1-mu)/kappa);
}
generated quantities{
  array[n] int ppc = beta_binomial_rng(N, mu/kappa, (1-mu)/kappa);
}

Suppose I had 100 seperate binomial experiments, each of which had the same number of trials. Unbeknownst to me, each trial had an identical probability of success


N <- rpois(100, 50)
y <- rbinom(100, N, 0.1)

Given these data, I know that kappa should be 0, and so I expect there to be divergences from this model.

library(cmdstanr)
library(cmdstanr)
library(tidyverse)
library(marginaleffects)
library(bayesplot)
library(tidybayes)

N <- rpois(100, 50)
y <- rbinom(100, N, 0.1)


code <- "
data{
  int n;
  array[n] int N;
  array[n] int y;
}
parameters{
  real<lower=0, upper=1> mu;
  real<lower=0> kappa;
}
model{
  kappa ~ cauchy(0, 1);
  mu ~ beta(1, 1);
  y ~ beta_binomial(N, mu/kappa, (1-mu)/kappa);
}
generated quantities{
  array[n] int ppc = beta_binomial_rng(N, mu/kappa, (1-mu)/kappa);
}
"

data <- list(N=N, y=y, n=length(N))
model <- code %>% write_stan_file() %>% cmdstan_model()
fit <- model$sample(
  data=data, 
  parallel_chains = parallel::detectCores(),
  seed = 0)



fit$diagnostic_summary()
Warning: 1123 of 4000 (28.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.

Warning: 2877 of 4000 (72.0%) transitions hit the maximum treedepth limit of 10.
See https://mc-stan.org/misc/warnings for details.

$num_divergent
[1] 131  98 710 184

$num_max_treedepth
[1] 869 902 290 816

$ebfmi
[1] 2.009023 1.993664 1.970364 2.044654


Not surprised. However, when I visualize the chains, its clear that some of them get “stuck”

How would I diagnose why these chains get stuck? The beta binomial model above is simple but just an example. I know why the model is pathological, but I might not know why in some other model in which I experience this behaviour.

When I see chains do this, how can I diagnose what is happening and where in my model I should consider reparameterizing?

EDIT: The same behaviour occurs even when I specify the beta binomial parameters as mu*kappa, (1-mu)*kappa

2 Likes

They almost always get stuck when they enter a region of high curvature and the step size is too large to escape easily. Usually when they’re stuck, you get divergent transitions.

I’m used to seeing this done as beta_binomial(N, mu * kappa2, (1 - mu) * kappa2) with a prior that penalizes large values of kappa2. You’ve done it the other way around, which I understand is more consistent with your setting of expecting kappa = 0. I suspect that’ll be more stable than trying to run kappa down to zero, which is likely to introduce high curvature.

This is a general problem that @andrewgelman has been trying to solve since before we started working on Stan—he calls it the “unfolding flower”. The idea is that you want your model to expand as the data expands—from say, a binomial to a beta-binomial. But you want the more general model to fall back to the simpler model when the data’s simpler. For that, I like the penalized complexity priors from the INLA gang, but they’re done one-off for particular problems in that setting.

1 Like

@Bob_Carpenter

I’m used to seeing this done as beta_binomial(N, mu * kappa2, (1 - mu) * kappa2) with a prior that penalizes large values of kappa2

Right. I wrote this model in this way specifically to create this type of behaviour. However, I can replicate this behaviour even under the parameterization you provide. The problem now is the prior, and if I switch to something less heavy tailed than a Cauchy, the model samples well.

So the problem is curvature and stepsize. When I encounter this, do you recommend then I try decreasing step size (via adapt_delta I believe?) or is the parameterization a better approach? I anticipate the folk theorem of statistical computing applies.

Is there a way to diagnose what part fo the model might be a target for reparameterization? Somewhat selfishly, I’m experiencing this behaviour in another model (not in Stan) but could only replicating the behaviour in a minimal working example using Stan.

Any time the data want the parameters to lie on the boundary of parameter space, you’re going to have problems. Here you see in my parameterization that kappa2 tries to go to +infinity, whereas in yours, kappa tries to go to zero. Both are big warning flags as that’s the boundary of parameter space. It usually means you want a simpler model that just assumes the boundary and that’s what we usually do to mitigate this kind of problem.

Not directly that I know of, but you can plot in Stan where the divergences arise. Those are often a clue as to where problems arise.

2 Likes

Appreciated!