Odd behavior of Stan model

I’m attempting to run a hierarchical model where the outcome is a binary variable. My model is:

y ~ bernoulli§
p ~ beta(alpha * pi, alpha * (1-pi))
alpha ~ gamma(1,1)
pi ~ beta(1,1)

When running just the prior models in JAGS and Stan (i.e. no data), I would expect the distribution of alpha to just be its prior distribution of gamma(1,1). However, in Stan, this does not happen. I’ve attached R code that compares the output of Stan and JAGS to just sampling from a gamma(1,1) distribution. The distribution of alpha from JAGS is correct, while the output from Stan is not. Is there something odd going on with Stan here?stan_vs_jags_repex.R (1.1 KB)

Hi jfiksel,

This is because the bounds aren’t actually consistent between the Stan and JAGS models.

Where the Stan model is:

parameters {
  real<lower=.001> alpha;
  real<lower=.001, upper = .999> p0;
  real<lower=.001, upper = .999> p;
model {
  alpha ~ gamma(1, 1);
  p0 ~ beta(1, 1);
  p ~ beta(alpha * p0, alpha * (1-p0));

And the JAGS model is:

 alpha ~ dgamma(1, 1)T(1.0E-3,)
 pi ~ dbeta(1, 1)T(1.0E-3, 1-1.0E-3)
 p ~ dbeta(alpha * pi, alpha * (1-pi))

The Stan lower/upper bounds are resulting in the constraint:

.001 \lt \alpha

Whereas the truncation operator in JAGS is resulting in the constraint:

.001 \leq \alpha

If I change the alpha lower bound to 0.0009 (i.e., real<lower=.0009> alpha), this gives approximately the same constraint as JAGS, and gives consistent results:


@andrjohns Thanks for the response. However, I made the change you suggested and was not able to reproduce the figure you made. I am using rstan 2.21.2.

@andrjohns I take back my last comment–setting the Stan code to

parameters {
  real<lower=.0009> alpha;
  real<lower=.0009, upper = 0.9991> p0;
  real<lower=0, upper = 1> p;
model {
  alpha ~ gamma(1, 1);
  p0 ~ beta(1, 1);
  p ~ beta(alpha * p0, alpha * (1-p0));

Worked. I find it odd that the samplers are that sensitive to a \\\geq vs a > sign.

1 Like

@andrjohns sorry for adding another response here, but when I change the model (code attached) so that it’s like a hierarchical logistic regression model (no need to worry about the model itself), it’s clear that the samples for \\\alpha don’t match the gamma prior distribution. I also removed all truncation notation, so that shouldn’t be affecting things.stan_vs_jags_repex.R (1.7 KB)

I think something strange is going on in your model. I don’t think this is quite as simple as ‘without data it should just be the prior for alpha’, but I can’t quite put my finger on why just yet.

If we boil this down to the simplest example, we see that Stan and the naive Monte Carlo estimates are identical (I think JAGS might be clever enough to sample from a gamma directly in this instance, so no need to double check It is doing slice, JAGS does slice for everything).


model_txt <- "parameters {real <lower = 0> alpha;} model {alpha ~ gamma(1.0, 1.0);}"
gamma_model <- stan_model(model_code = model_txt)
stan_samples <- sampling(gamma_model, iter = 5e4) %>% 
  rstan::extract("alpha") %>% 
mc_samples <- rgamma(1e5, 1.0, 1.0)

plot_df <- data.frame(
  x = c(stan_samples, mc_samples),
  grp = rep(c("stan", "mc"), each = 1e4)

quantiles <- seq(from = 0.001, to = 0.999, length.out = 500)
qq_df <- data.frame(
  stan_quantile <- quantile(stan_samples, quantiles),
  mc_quantiles <- quantile(mc_samples, quantiles)

# Indistinguishable
ggplot(qq_df, aes(x = stan_quantile, y = mc_quantiles)) +
  geom_point(col = "red") +
  geom_abline(slope = 1, intercept = 0)

ggplot(plot_df, aes(x = x, color = grp)) +

Looking at the pairs plot of all the parameters from your latest stan_vs_jags_reprex.R suggests the prior on p is inappropriate (far too much mass at 0 and 1), which is breaking the adaptation in some way, and makes the samples of \alpha non-representative. Indeed, if I turn up adapt_delta on your original example, I get warnings about divergences and max_treedepth.

I suspect this is making the adaptation break in a strange way, and you end up with an inappropriate stepsize, which can’t move far enough per iteration (the gamma prior has a lot of mass at the asymptote at zero, and when this gets log-transformed it can’t move far enough towards -\infty to correctly sample the prior).

1 Like

@hhau thanks for digging into this. The pairs plot is definitely interesting–although I’m not sure why the posterior for alpha in this case shouldn’t be the prior, especially because that is exactly what happens if you run the jags model. If there’s no likelihood, shouldn’t the marginal “posterior” for alpha just be the prior?

Mathematically yes; in practice one is still sampling a density using HMC, so the prior sampling process can break in all the same ways as the posterior sampling process. The following issues seem related to your model/prior:

I’m surprised that the sampler isn’t throwing more warning messages here. It is usually fairly warning-happy with these kinds of things, so I’ve probably missed something somewhere in the model.