Specifying Bimodal Skew Symmetric Normal Distribution as Prior in brms()

Hi everyone,

I would like to specify a prior distribution for my response variable, which follows a zero-inflated beta (ZIB) distribution. For some context, I want to create a Hierarchical Generalized Additive Mixed Model in brms().

After specifying the ZIB family in my code, I wanted to specify my prior for the “Intercept” class of dpar “mu.” Since my understanding is that mu considers only the non-zero values in the ZIB model, I wanted to specify a weakly informative prior that matches the distribution of my non-zero data values without being too informative.

After performing some prior predictive simulation, a weakly informative prior distribution is the Bimodal Skew Symmetric Normal Distribution (BSSN) which I have specified below. Below is the code from my simulation:

library(brms)
library(readxl)
library(cmdstanr)
library(tidyverse)
library(gamlssbssn)

data <- read_excel("~/data.xlsx", sheet = "data1")

# Filter out non-zero values of response variable
response.nz <- data %>% filter(response>0)
response.nz <- data.nz$response

### Perform prior predictive simulation with user-specified values
sample.prior <- rBSSN(n = 15000, mu = 0.01, sigma = 0.125, nu = 0.75, tau = 0.40)
sample.prior <- subset(sample.prior, sample.prior >= 0 & sample.prior <= 1)

# Plot histogram, density, and qq plots to compare the distribution of prior sample data vs. actual non-zero data being analyzed
par(mfrow=c(1,2))
{
  hist(sample.prior, xlab = 'response.prior values', main = '')
  hist(response.nz, xlab = 'response non-zero values o
![prior vs. reponse histo|690x399](upload://aVXOWlG0x79u48p65zkY215GqRV.png)
nly', main = '')
}

{
  plot(density(sample.prior), xlab = 'response.prior values', main = '')
plot(density(response.nz), xlab = 'response non-zero values only', main = '')
}

{
  car::qqPlot(sample.prior, xlab = 'response.prior values', main = '')
  car::qqPlot(response.nz, xlab = 'response non-zero values only', main = '')
}

This distribution allows the user to specify two means where the bimodality takes place, and in my case, I set mu (first peak) to 0.01 and nu (second peak) to 0.75. My sigma value is pretty wide (0.125), considering the response is bounded by 0 and 1, and I set tau (which determines the amount of bimodality in the distribution) to 0.40.

After I sampled from the BSSN distribution, I subsetted my prior distribution samples to lie between 0 and 1, to match the nature of my response variable. The subsequent plots matched my data well enough to be informative, but weak enough to (I believe) allow the data to prove me wrong (see below).



Below I have written the code for my HGAMM model, which I call fit.

fit <- brm(bf(response ~ s(x, m=2, bs="tp") + 
              s(x, by=fac, m=1, bs="tp") + 
              (1|fac) + 
              (1|ref),
              phi ~ 1 + s(x) + (1|fac) + (1|ref),
              zi ~ 1 + s(x) + (1|fac) + (1|ref)),
           family = zero_inflated_beta(link = "logit"),
           prior = c(
             set_prior("**insert BSSN distribution here**", class = "Intercept", dpar = "mu"),
             set_prior("normal(0,1.6)", class = "sds", dpar = "mu"),
             ),
           data=data,
           chains=4, 
           iter=8000, 
           warmup=4000, 
           cores=4, 
           seed=1234, 
           backend="cmdstanr")

response = ZIB distributed response variable. High incidence of zero values ~80%, non-zero values all <1.00.
x = discrete integer predictor
fac = categorical factor with 14 levels
ref = random effect with 3 levels

In my fit model, I would like to specify my BSSN prior for the Intercept value of “mu,” but as I mentioned, I’m having trouble finding a way to do so. Any suggestions on how to do this would be greatly appreciated.

Thank you Stan, rStan, and brms() community!

2 Likes

I think the easiest way to inject a prior like this into a brms model will be to set an improper flat prior via set_priorand then to pass a stan implementation of the desired prior density via stanvar. See User-defined variables passed to Stan — stanvar • brms

Couple of quick comments though:

  • You don’t need your prior on the intercept to match the data. It’s possible that you residuals, after regressing out the covariates, won’t be bimodal at all.
  • On the other hand, even with this bimodal prior, you will not get a bimodal posterior as long as the data are informative. That is, the data will (hopefully) tell you which of your two prior modes the intercept falls into, and your posterior for the intercept won’t be bimodal. Hopefully, your posterior predictive distribution for the observed data will still be bimodal, but that will be thanks to the covariates.
  • If the covariates don’t contain enough information to distinguish which mode your data points fall into, then you will likely end up with a posterior predictive distribution that is smeared out and not bimodal, regardless of whether you prior is bimodal. The remedy, if you covariates are not sufficiently predictive, is not to change the priors on the intercept, but rather to change the likelihood function such that it predicts a bimodal response, conditional on the parameters and covariates. For example, your response could be a mixture of two betas. This is possible to implement with brms but requires constructing a custom family. See Define Custom Response Distributions with brms to get started.