Specifying bernoulli prior for zero-inflated beta response variable in brm() HGAM

Hi Stan community,

I am new to this forum so please bear with me. For some context, in my work I would like to examine individual-level differences in the response of my study species toward a more frugivorous diet. My response variable is “prop.fruit” which represents the proportion (in volume) of the fecal sample that is comprised of fruit. This value can range between 0.00 and 1.00 and after plotting my data, it is clear that it follows a zero-inflated beta distribution. I believe that using the brms package to fit an HGAMM will provide the necessary flexibility I need to appropriately model the various variables in my system.

My model is currently specified as the following:

library(mgcv)
library(brms)
library(readxl)
library(cmdstanr)

data ← read_excel(“~/Directory/data.xlsx”, sheet = “data1”)
data$ind ← as.factor(data$ind)
data$town ← as.factor(data$town)
data$town_ind ← paste0(data$town, “_”, data$ind)

fit ← brm(bf(prop.fruit ~ s(time.index, by=ind, m=1, bs=“tp”) +
s(ind, bs=“re”) +
(1|town) +
(1|town_ind),
phi ~ 1 + (1|town) + (1|town_ind) + bls + cas + oas + sps + srs,
zi ~ 1 + s(time.index) + (1|town) + (1|town_ind) + blp + cap + oap + spp + srp),
family = zero_inflated_beta(),
data=data,
chains=4,
iter=2000,
warmup=1000,
cores=14,
seed=1234,
backend=“cmdstanr”)

time.index = integer variable representing the sampling day (from 1 through 548).
town = the population of this species I am working with. Categorical random effect with three levels.
ind = categorical random effect of the specific individual I collected the data from.
town_ind = a concatenated category representing each given individual in its own “town.” None of the individuals moved from one town to another over the course of the study.
bls:srs = 5 different focal plant species whose seeds were recovered from fecal samples. This is an integer count of seeds recovered in the fecal sample where prop.fruit is collected from. I wanted to use this data to model the dispersion of the phi parameter in the ZIB distribution.
blp:srp = the same 5 focal plant species as above, but this is a record of the ripe fruit phenology for each of the 12 calendar months in a year. The phenology data was recorded for each plant species by summing the presence/absence of ripe fruits throughout the course of multiple years in a particular month and then dividing that number by the total number of observations made for that species in that month. This was used to estimate the proportion of observations that contained ripe fruit for each species in each calendar month.

I want to note that this is my first experience with any Bayesian analysis, and I’ve learned very quickly that this is a pretty complex problem I have here, especially because I think it would be important to account for time-lags in the blp:srp variables, but I’ll save that for later.

For now, I wanted to ask how could I specify the prior of the zi() portion of my model, when I expect the presence of zero values to follow a Bernoulli distribution where the probability of obtaining a prop.fruit value of 0 is about 75%? I can’t seem to find any guidance on how to specify a Bernoulli prior within brm().

Thank you for your help and in welcoming me to the Stan community in my first post.

Hey @AdrianEcology, welcome to the Stan forums!

If this is first experience with Bayesian analysis, I would back up and start with a simpler version of your model, and build up from there. Otherwise, it’s just too many moving parts. With that in mind, consider:

fit0.0 <- brm(
  prop.fruit ~ s(time.index, by=ind, m=1, bs="tp") + s(ind, bs="re") + (1|town) + (1|town_ind),
  family = zero_inflated_beta(),
  data = data,
  chains = 4, iter = 2000, warmup = 1000, cores = 14, seed = 1234,
  backend = "cmdstanr")

where you temporarily drop the linear models for phi and zi. In this case, the brm() default is to use the identity link for zi, with a default prior of beta(1, 1). With the conventional \alpha \beta parameterization, the mean of the beta distribution is \alpha / (\alpha + \beta). So a beta prior with a 3-to-1 \alpha-to-\beta ratio will put your prior mean for zi at .75. Here’s a quick example in code:

a <- 3 
b <- 1

a / (a + b)
[1] 0.75

If you wanted a stronger prior, use something like beta(30, 10).

Now if you add in a reduced version of your full zi model, like

fit0.1 <- brm(
  bf(prop.fruit ~ s(time.index, by=ind, m=1, bs="tp") + s(ind, bs="re") + (1|town) + (1|town_ind),
     zi ~ 1 + (1|town) + (1|town_ind)),
  family = zero_inflated_beta(),
  data = data,
  chains = 4, iter = 2000, warmup = 1000, cores = 14, seed = 1234,
  backend = "cmdstanr")

the brm() function will automatically switch to the logit link for zi. On the logit scale, 0.75 is about 1.1.

qlogis(0.75)
[1] 1.098612

So in this case, you could use something like \mathcal N(1.1, 1) for the intercept of the zi model. Here’s what that could look like in code:

fit0.2 <- brm(
  bf(prop.fruit ~ s(time.index, by=ind, m=1, bs="tp") + s(ind, bs="re") + (1|town) + (1|town_ind),
     zi ~ 1 + (1|town) + (1|town_ind)),
  # this is where you set your prior
  prior = prior(normal(1.1, 1), class = Intercept, dpar = zi),
  family = zero_inflated_beta(),
  data = data,
  chains = 4, iter = 2000, warmup = 1000, cores = 14, seed = 1234,
  backend = "cmdstanr")

The difficulty in your case is you have other predictors in the full version of your zi model. So you’ll need to consider whether they’re mean centered and so on. But this should get you moving.