Odd behavior of Stan model

jfiksel · September 11, 2020, 2:02pm

I’m attempting to run a hierarchical model where the outcome is a binary variable. My model is:

y ~ bernoulli§
p ~ beta(alpha * pi, alpha * (1-pi))
alpha ~ gamma(1,1)
pi ~ beta(1,1)

When running just the prior models in JAGS and Stan (i.e. no data), I would expect the distribution of alpha to just be its prior distribution of gamma(1,1). However, in Stan, this does not happen. I’ve attached R code that compares the output of Stan and JAGS to just sampling from a gamma(1,1) distribution. The distribution of alpha from JAGS is correct, while the output from Stan is not. Is there something odd going on with Stan here?stan_vs_jags_repex.R (1.1 KB)

andrjohns · September 11, 2020, 3:04pm

Hi jfiksel,

This is because the bounds aren’t actually consistent between the Stan and JAGS models.

Where the Stan model is:

parameters {
  real<lower=.001> alpha;
  real<lower=.001, upper = .999> p0;
  real<lower=.001, upper = .999> p;
}
model {
  alpha ~ gamma(1, 1);
  p0 ~ beta(1, 1);
  p ~ beta(alpha * p0, alpha * (1-p0));
}

And the JAGS model is:

model 
{
 alpha ~ dgamma(1, 1)T(1.0E-3,)
 pi ~ dbeta(1, 1)T(1.0E-3, 1-1.0E-3)
 p ~ dbeta(alpha * pi, alpha * (1-pi))
}

The Stan lower/upper bounds are resulting in the constraint:

.001 \lt \alpha

Whereas the truncation operator in JAGS is resulting in the constraint:

.001 \leq \alpha

If I change the alpha lower bound to 0.0009 (i.e., real<lower=.0009> alpha), this gives approximately the same constraint as JAGS, and gives consistent results:

jfiksel · September 11, 2020, 3:11pm

@andrjohns Thanks for the response. However, I made the change you suggested and was not able to reproduce the figure you made. I am using rstan 2.21.2.

jfiksel · September 11, 2020, 3:14pm

@andrjohns I take back my last comment–setting the Stan code to

parameters {
  real<lower=.0009> alpha;
  real<lower=.0009, upper = 0.9991> p0;
  real<lower=0, upper = 1> p;
}
model {
  alpha ~ gamma(1, 1);
  p0 ~ beta(1, 1);
  p ~ beta(alpha * p0, alpha * (1-p0));
}

Worked. I find it odd that the samplers are that sensitive to a \\\geq vs a > sign.

jfiksel · September 11, 2020, 3:53pm

@andrjohns sorry for adding another response here, but when I change the model (code attached) so that it’s like a hierarchical logistic regression model (no need to worry about the model itself), it’s clear that the samples for \\\alpha don’t match the gamma prior distribution. I also removed all truncation notation, so that shouldn’t be affecting things.stan_vs_jags_repex.R (1.7 KB)

hhau · September 11, 2020, 6:31pm

I think something strange is going on in your model. I don’t think this is quite as simple as ‘without data it should just be the prior for alpha’, but I can’t quite put my finger on why just yet.

If we boil this down to the simplest example, we see that Stan and the naive Monte Carlo estimates are identical (~~I think JAGS might be clever enough to sample from a gamma directly in this instance, so no need to double check~~ It is doing slice, JAGS does slice for everything).

library(rstan)
library(ggplot2)

model_txt <- "parameters {real <lower = 0> alpha;} model {alpha ~ gamma(1.0, 1.0);}"
gamma_model <- stan_model(model_code = model_txt)
stan_samples <- sampling(gamma_model, iter = 5e4) %>% 
  rstan::extract("alpha") %>% 
  unlist()
mc_samples <- rgamma(1e5, 1.0, 1.0)

plot_df <- data.frame(
  x = c(stan_samples, mc_samples),
  grp = rep(c("stan", "mc"), each = 1e4)
)

quantiles <- seq(from = 0.001, to = 0.999, length.out = 500)
qq_df <- data.frame(
  stan_quantile <- quantile(stan_samples, quantiles),
  mc_quantiles <- quantile(mc_samples, quantiles)
)

# Indistinguishable
ggplot(qq_df, aes(x = stan_quantile, y = mc_quantiles)) +
  geom_point(col = "red") +
  geom_abline(slope = 1, intercept = 0)

ggplot(plot_df, aes(x = x, color = grp)) +
  geom_density()

Looking at the pairs plot of all the parameters from your latest stan_vs_jags_reprex.R suggests the prior on p is inappropriate (far too much mass at 0 and 1), which is breaking the adaptation in some way, and makes the samples of \alpha non-representative. Indeed, if I turn up adapt_delta on your original example, I get warnings about divergences and max_treedepth.

I suspect this is making the adaptation break in a strange way, and you end up with an inappropriate stepsize, which can’t move far enough per iteration (the gamma prior has a lot of mass at the asymptote at zero, and when this gets log-transformed it can’t move far enough towards -\infty to correctly sample the prior).

jfiksel · September 11, 2020, 7:28pm

@hhau thanks for digging into this. The pairs plot is definitely interesting–although I’m not sure why the posterior for alpha in this case shouldn’t be the prior, especially because that is exactly what happens if you run the jags model. If there’s no likelihood, shouldn’t the marginal “posterior” for alpha just be the prior?

hhau · September 21, 2020, 1:03pm

Mathematically yes; in practice one is still sampling a density using HMC, so the prior sampling process can break in all the same ways as the posterior sampling process. The following issues seem related to your model/prior:

I’m surprised that the sampler isn’t throwing more warning messages here. It is usually fairly warning-happy with these kinds of things, so I’ve probably missed something somewhere in the model.

Topic		Replies	Views
Setting up priors correctly Modeling rstan , fitting-issues , specification	3	903	May 15, 2023
Divergent transitions with hierarchical model Modeling	5	740	July 15, 2019
Jeffreys priors for gamma distribution (unknown shape and scale) Modeling	13	2645	February 20, 2023
Gamma regression in Stan vs Frequentist approach Modeling specification	11	3227	March 26, 2024
Gamma Regression in Stan VERY different than Frequentist from lme4 Modeling fitting-issues , performance	16	2640	December 9, 2018

Odd behavior of Stan model

Related topics