I am having difficulties setting custom priors for the sdgp and lscale parameters of a gaussian process and would appreciate some help figuring this out.
The statistical problem I am trying to solve is this:
We measure the values of a response variable (Response) yearly for 16 years and are interested in detecting a trend over time in these values. But the Response values are (small) proportions so we will use a beta regression with Year as a predictor. (Year is coded as 1, 2, …, 16 for simplicity.) To capture the effect of Year on the logit-transformed expected value of the Response, we will use a gaussian process via gp(Year) in brms. However, when checking the fit provided by the model to the data, it is clear we need to make the length-scale parameter (lscale) of the gaussian process smaller.
Here is the data:
Y <- c(0.5,0.3,0.4,0.7,0.3,0.8,0.2,0.4,0.4,0.1,0.3,0.9,0.4,1.0,0.9,0.5)
Y <- Y/100
X <- 1:length(Y)
Data <- data.frame(Year = X, Response = Y)
Here is a plot of the data, showing how the Response values vary from one year to the next; note the dip in the middle of the plot, which is impossible to capture via s smooth term of year, s(Year), given the relatively small number of years available:
library(ggplot2)
library(magrittr)
Data %>%
ggplot(aes(x = Year, y = Response)) +
geom_line(colour = "dodgerblue", size = 0.7) +
geom_point(size = 3, colour = "dodgerblue")
which looks like this:
Here is the fit of the beta regression model with default priors chosen by brms:
library(brms)
fit_default_priors <-
brm(bf(Response ~ gp(Year)),
family = "beta",
data = Data,
seed = 1234,
iter = 4000,
warmup = 2000,
chains = 4,
cores = 4,
control = list(adapt_delta = 0.99,
max_treedepth = 12))
fit_default_priors
The model summary looks like this and indicates that there are two Gaussian Process Terms, denoted as sdgp(gpYear) and lscale(gpYear) :
Here is a plot of the beta regression model fit corresponding to the default priors used by brms:
library(emmeans)
library(tidybayes)
library(ggdist)
model_fit_default_priors <-
fit_default_priors %>%
emmeans(~ Year,
at = list(Year = seq(1, 16, by = 1)),
epred = TRUE) %>%
gather_emmeans_draws()
model_fit_default_priors
plot_model_fit_default_priors <-
model_fit_default_priors %>%
ggplot() +
aes(x = Year, y = .value) +
stat_lineribbon() +
scale_fill_brewer(palette = "Reds") +
labs(x = "Year",
y = "Response (Expressed as Proportion)",
fill_ramp = "Credible interval") +
geom_line(data = Data, aes(x = Year, y = Response),
inherit.aes = FALSE, col = "dodgerblue", size = 0.7, linetype = 2) +
geom_point(data = Data, aes(x = Year, y = Response),
inherit.aes = FALSE, size = 3, col = "dodgerblue")
plot_model_fit_default_priors
The plot of the model fit is shown below - as can be seen, the plot illustrates that we need to tweak the choice of priors to help the model capture the pattern of the data located in the middle of the plot.
Getting the default priors used by brms is easy:
get_prior(data = Data,
family = "beta",
Response ~ gp(Year))
This is what brms uses for the priors:
My first set of questions is this: If the gaussian process used to capture the effect of Year has two parameters, denoted by lscale and sdgp, why does brms report 4 priors instead of 2 priors? Where do the other 2 priors come from and why are they needed? If we want to change the default priors used by brms for the two parameters, which of the 4 priors do we need to change?
In an attempt to change the priors, I referred to a related example by Solomon Kurz, which is available here: https://bookdown.org/ajkurz/Statistical_Rethinking_recoded/adventures-in-covariance.html#continuous-categories-and-the-gaussian-process. Here’s a snapshot of the related example, which shows that we would only need to change one prior per gaussian process term:
But brms throws warnings when I try to follow the technique for setting priors illustrated in this example:
fit_custom_priors <-
brm(bf(Response ~ gp(Year)),
family = "beta",
prior = c(
set_prior("inv_gamma(1.743402, 0.307202)", class = "lscale"),
set_prior("student_t(3, 0, 2.5)", class = "sdgp")),
data = Data,
seed = 1234,
iter = 4000,
warmup = 2000,
chains = 4,
cores = 4,
control = list(adapt_delta = 0.99,
max_treedepth = 12))
fit_custom_priors
The warnings posted by brms are listed below, indicating that “The global prior ‘inv_gamma(1.743402, 0.307202)’ of class ‘lscale’ will not be used in the model as all related coefficients have individual priors already.”:
My second set of questions is as follows: What does the first warning mean, why is it posted and how do I overcome it? Why is brms mentioning a global prior for the lscale parameter? Is there a local prior for this parameter as well? If yes, do we need to change both the global and the local prior (as per my first set of questions)? Are there also a global prior and a local prior for the sdgp parameter?
Thank you in advance for any light you can shed on these issues.
Isabella