Zero-one beta regression with only categorical predictors

Helsinki-Ronan · September 28, 2023, 9:18am

Hi all,

I am working with frequency data where the response variable consists of 500 observations for the affinity of the population of a town to one of two, larger population clusters. There are 500 different towns. Values at or closer to zero represent greater affinity to the larger “A” population. Values at or closer to one represent greater affinity to the larger “B” population. The explanatory variables consist of four two-level categorical variables. These variables represent administrative borders where a “0” value means that the given town was on the A side of the administrative border. A value of “1” means that the given town was on the B side of the administrative border. I am interested in whether which side of the border a town falls on can explain the population affinity of that town.

I have fit a zero-one beta regression model to the data. Initial model runs using default brms priors had very long computation times and a lot of divergent transitions. I “solved” this by setting a normal prior to the condition-one inflation parameter. Given that the data mostly consists of counts of 0s and 1s, I specified an informed prior on the mu parameter of the “non-zero” and “none-one” frequencies. This was a normal prior with a mean and standard deviation equal to the mean and standard deviation of the actual “non-zero” and “non-one” frequencies.

My questions are these: (1) is this an appropriate statistical model for data such as these? (2) if it is appropriate, are the priors correctly specified (I am not a statistician but I have the feeling that the priors I used are inappropriate), and (3) how can I evaluate model performance given that LOO isn’t appropriate (as far as I know) for models with a zero-inflation component.

The below R code generates data and fits a model that replicates my actual data and model. I am not in a position to share the actual data.

Thank you in advance!

Operating System: Windows 10 Enterprise LTSC, version 21H2, OS Build 19044.3324
brms Version: 2.20.1

library(brms)

# 1.0 Create dummy data where the response variable is both zero-inflated and one-conditional inflated----
one_inflation<- rbinom(n = 225, size = 1, prob = 0.97)
zero_inflation<- rbinom(n = 225, size = 1, prob = 0.03)
frequencies<- runif(50, 0.01, 0.99)
data<- rbind(c(one_inflation, zero_inflation, frequencies))

data_1<- as.data.frame(t(data))
names(data_1)[1]<- "pop_frequency"

data_1$border_1<- rbinom(n = 500, size = 1, prob = 0.80)
data_1$border_2<- rbinom(n = 500, size = 1, prob = 0.70)
data_1$border_3<- rbinom(n = 500, size = 1, prob = 0.50)
data_1$border_4<- rbinom(n = 500, size = 1, prob = 0.60)

# 2.0 Run model----
model_formulae<- bf(pop_frequency ~ border_1 + border_2 + border_3 + border_4,
                            phi ~ border_1 + border_2 + border_3 + border_4, 
                            zoi ~ border_1 + border_2 + border_3 + border_4,
                            coi ~ border_1 + border_2 + border_3 + border_4)

model_1<- brm(formula = model_formulae,
                      
                      data = data_1,
                      
                      chains = 4, cores = 8, iter = 11000, warmup = 1000,
                      
                      family = zero_one_inflated_beta(),
                      
                      # Set a normal prior on the conditional one-inflation parameter as the beta prior
                      # leads to non-convergence and very long computation times----
                      prior = c(prior(normal(0, 1), class = "b", dpar = "coi"), 
                                prior(normal(0.4828837,0.2706979), class = "b", dpar = "mu")), 
                      
                      control=list(max_treedepth=15, adapt_delta = 0.99),
                      
                      sample_prior = "yes")


# Model checks----
mcmc_plot(model_1, type = "acf")
pp_check(model_1)
plot(model_1)
prior_summary(model_1)```

Solomon · September 29, 2023, 5:42pm

Hey @Helsinki-Ronan, it sounds like you might want to refresh yourself with the ZOIB parameters and their default link functions. Those will give you a better sense of how to interpret the unit normal prior. If you haven’t seen it, Andrew Heiss has a nice blog post on the ZOIB here. For what it’s worth, I really like the unit normal prior for \beta coefficients on the log-odds scale. It makes for a nice weakly-regularizing default.

Topic		Replies	Views
Interpretation of the a zero-one inflated beta regression model estimated using brms General	4	1461	November 15, 2021
Improving model fit with zero_one_inflated_beta with a specific case study Modeling brms	4	689	August 24, 2022
Zero-one-inflated beta regression with brms not finishing the chains Modeling techniques , fitting-issues , mcmc , brms	5	931	October 18, 2023
Inclusion of Conditional Autoregressive priors to a zero and one inflated beta model in BRMS brms	1	160	February 23, 2024
Zero-One Inflated Beta Model and Syntax brms General brms	7	2538	September 19, 2021

Zero-one beta regression with only categorical predictors

Related topics