Problems with divergence for prior predictive checks for Gamma model

So, I am currently in search for suitable priors for my hierarchical gamma regression model for a cognitive task where human participants retrieve object locations in an area (measured: distance to true location as Euclidean distance). I therefore thought the Gamma distribution might be a good choice but I really struggle to get prior predictive check with the priors that I choose that doesn’t come with divergences.

Facts, about the data/model

  • y > 0 & y < 153
  • x is scaled to have mean of 0 and SD of 0.5
  • Because I don’t have strong reasons to assume a direction, I want the prior for x to be symmetrical around zero. For a different model, I would expect y to get lower and lower and use a skew normal distribution, for which I have the same issues as in this example here.

Because I struggled so much, I just coded up a simple manual version of prior predictive check that uses this formula:

image

So, I use this equation to draw what the relationships would look like for a simple regression model with parameters as specified in he priors (Note: intercept chosen based on the mean of simulation)

image

# Translate priors into mean & sd
# Intercept
m1  <- log(70.89713)
sd1 <- 0.1
# beta1
m2 <- 0
sd2 <- 0.5
n <- 100 # data points
percentile <- 0.05

# Create beta based the percentiles
betas0 <- c(qnorm(percentile, mean = m1, sd = sd1), m1, qnorm(1-percentile, mean = m1, sd = sd1))
betas1 <- c(qnorm(percentile, mean = m2, sd = sd2), m2, qnorm(1-percentile, mean = m2, sd = sd2))
betas0_lab <- paste0("beta0_", c(percentile, 0.5, 1 - percentile))
betas1_lab <- paste0("beta1_", c(percentile, 0.5, 1 - percentile))

# Make sure to sample everything with everything
params <- expand.grid(betas0, betas1)
names(params) <- c("beta0", "beta1")
params_lab <- expand.grid(betas0_lab, betas1_lab)
names(params_lab) <- c("beta0", "beta1")

# Keep x for each beta
x     <- scale_m0_sd0.5(runif(n))

# Create empty data.frame
manual_prior_check <- data.frame(beta0 = integer(0), beta1 = integer(0), x = integer(0), y = integer(0))

# Create the data
for(i in 1:nrow(params)){
  # Generate the data and add to data frame
  manual_prior_check <- rbind(manual_prior_check, data.frame(beta1 = params_lab[i, 1],
                                                             beta0 = params_lab[i, 2],
                                                             x = x,
                                                             y = exp(params[i, 1] + params[i, 2]* x)))
}

The 9 lines that are shown here correspond to true relationship created by using the betas. For visualisation, I plotted the mean of the corresponding distributions, the 5th and 95th percentile. I think that looks relatively reasonable to me but when I try to run prior predictive checks for my data that should look like this via

# Set brms priors
priors2  <- c(prior(normal(4.26123, 0.1), class = "Intercept", lb = 0, ub = log(153)),
              prior(normal(0, 0.5), class = "b")) 

# Fit brms model
currentModel <- brm(euclideanDistance | trunc(ub = 153) ~ s_start2centre + (s_start2centre | subject), 
                    data = OLM_7T_retrieval, family = Gamma(link = "log"),
                    prior = priors2, 
                    sample_prior = "only", seed = 1234,
                    control = list(adapt_delta = 0.99999, max_treedepth = 20),
                    backend = "cmdstanr",
                    silent = 2,
                    refresh = 0)

I always get divergent transitions. Why is that? Also the conditional plot looks like this:

Why is the blue mean line at the bottom? Do I do something horribly wrong? I am really confused at the moment.

Things that do not seem to make a difference:

  • Removing the random effects
  • Removing the truncation

Additional information:

  • Operating System: Windows 10
  • brms Version: brms_2.18.0

Is there anything I am missing? In this thread, @bgoodri says

I would say that your priors are putting positive probability on regions of the parameter space with high curvature and / or low numerical accuracy but conditional on the data, those regions have zero probability. So, your priors are probably too weak or otherwise not that consistent with the data, but the data saved you and the posterior draws are fine to make inferences with.

but if anything my priors are pretty narrow are they not? I played around with various widths and nothing makes it really better.

Could it be that the default link function for Gamma is log in brms, i.e., \mathrm{exp}(0.5) \approx 1.65. That is quite wide perhaps?

Not sure if I understand your point correctly.

I understand the log-link as in equation above transforms \mu and therefore also makes the \beta values for the slope smaller but I don’t understand what exp(0.5) means in this context but FWIW I tested SD of 0.1, 0.01 and even 0.0001 (instead of 0.5) and I still get the same number of divergent transitions more or less. Does the log-link transform my priors in a way I currently don’t understand (like making it essentially N(0, 1.648721) )?

Generally, I feel (maybe wrongly) that even when I take extreme \beta values from N(0, 0.5) such as the 95th percentile, i.e. 0.8224268, I still get somewhat realistic numbers that don’t exceed the possible values.

I finally cracked the case. The problem was the default prior for the shape parameter, which I now set to

shape \sim Gamma(1, 1)

Now much more sensible results:

2 Likes