Hi everyone,
I would like to specify a prior distribution for my response variable, which follows a zero-inflated beta (ZIB) distribution. For some context, I want to create a Hierarchical Generalized Additive Mixed Model in brms().
After specifying the ZIB family in my code, I wanted to specify my prior for the “Intercept” class of dpar “mu.” Since my understanding is that mu considers only the non-zero values in the ZIB model, I wanted to specify a weakly informative prior that matches the distribution of my non-zero data values without being too informative.
After performing some prior predictive simulation, a weakly informative prior distribution is the Bimodal Skew Symmetric Normal Distribution (BSSN) which I have specified below. Below is the code from my simulation:
library(brms)
library(readxl)
library(cmdstanr)
library(tidyverse)
library(gamlssbssn)
data <- read_excel("~/data.xlsx", sheet = "data1")
# Filter out non-zero values of response variable
response.nz <- data %>% filter(response>0)
response.nz <- data.nz$response
### Perform prior predictive simulation with user-specified values
sample.prior <- rBSSN(n = 15000, mu = 0.01, sigma = 0.125, nu = 0.75, tau = 0.40)
sample.prior <- subset(sample.prior, sample.prior >= 0 & sample.prior <= 1)
# Plot histogram, density, and qq plots to compare the distribution of prior sample data vs. actual non-zero data being analyzed
par(mfrow=c(1,2))
{
hist(sample.prior, xlab = 'response.prior values', main = '')
hist(response.nz, xlab = 'response non-zero values o
![prior vs. reponse histo|690x399](upload://aVXOWlG0x79u48p65zkY215GqRV.png)
nly', main = '')
}
{
plot(density(sample.prior), xlab = 'response.prior values', main = '')
plot(density(response.nz), xlab = 'response non-zero values only', main = '')
}
{
car::qqPlot(sample.prior, xlab = 'response.prior values', main = '')
car::qqPlot(response.nz, xlab = 'response non-zero values only', main = '')
}
This distribution allows the user to specify two means where the bimodality takes place, and in my case, I set mu (first peak) to 0.01 and nu (second peak) to 0.75. My sigma value is pretty wide (0.125), considering the response is bounded by 0 and 1, and I set tau (which determines the amount of bimodality in the distribution) to 0.40.
After I sampled from the BSSN distribution, I subsetted my prior distribution samples to lie between 0 and 1, to match the nature of my response variable. The subsequent plots matched my data well enough to be informative, but weak enough to (I believe) allow the data to prove me wrong (see below).
Below I have written the code for my HGAMM model, which I call fit.
fit <- brm(bf(response ~ s(x, m=2, bs="tp") +
s(x, by=fac, m=1, bs="tp") +
(1|fac) +
(1|ref),
phi ~ 1 + s(x) + (1|fac) + (1|ref),
zi ~ 1 + s(x) + (1|fac) + (1|ref)),
family = zero_inflated_beta(link = "logit"),
prior = c(
set_prior("**insert BSSN distribution here**", class = "Intercept", dpar = "mu"),
set_prior("normal(0,1.6)", class = "sds", dpar = "mu"),
),
data=data,
chains=4,
iter=8000,
warmup=4000,
cores=4,
seed=1234,
backend="cmdstanr")
response = ZIB distributed response variable. High incidence of zero values ~80%, non-zero values all <1.00.
x = discrete integer predictor
fac = categorical factor with 14 levels
ref = random effect with 3 levels
In my fit model, I would like to specify my BSSN prior for the Intercept value of “mu,” but as I mentioned, I’m having trouble finding a way to do so. Any suggestions on how to do this would be greatly appreciated.
Thank you Stan, rStan, and brms() community!