Issue with group-level effects using 0-1-inflated beta family


#1

Hi,

I have been working on a model that on the surface seems easy but has proved challenging for me. I have a proportional response variable y that represents proportion of presence of a species in a habitat, and a predictor x which represents a body measurement. There are 112 measurements from 20 species. The data are included in the .csv below. It is important to note that, though not shown here, this is a phylo regression model that will hopefully be used to predict y for species of unknown habitat affinity.

As the response includes ones and zeros, I am trying out the zero-one-inflated beta family. The simplest model seems to work ok.

mod1 <- brm(
  y ~  x , data = data, 
  family = zero_one_inflated_beta("logit"),
  iter = 4000, 
  prior = c(
    set_prior("normal(0, 50)", class = "b", coef = "x"),
    set_prior("normal(0, 50)", class = "Intercept")
  ),
  control = list(adapt_delta = 0.95, max_treedepth = 15),
  cores = parallel::detectCores()
)

Summary and marginal effects plot:

 Family: zero_one_inflated_beta 
  Links: mu = logit; phi = identity; zoi = identity; coi = identity 
Formula: y ~ x 
   Data: data (Number of observations: 112) 
Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup samples = 8000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept   -10.18      1.16   -12.45    -7.93       3639 1.00
x            14.26      1.74    10.89    17.68       4161 1.00

Family Specific Parameters: 
    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
phi     3.30      0.58     2.27     4.54       3901 1.00
zoi     0.25      0.04     0.17     0.33       6862 1.00
coi     0.24      0.08     0.10     0.41       6243 1.00

However when I apply the group effect (1 | Species) to account for within-species measurement error, my results seem off

mod2 <- brm(
  y ~  x + (1 | Species), data = data, 
  family = zero_one_inflated_beta("logit"),
  iter = 4000, 
  prior = c(
    set_prior("normal(0, 50)", class = "b", coef = "x"),
    set_prior("normal(0, 50)", class = "Intercept")
  ),
  control = list(adapt_delta = 0.95, max_treedepth = 15),
  cores = parallel::detectCores()
)

Summary:

Group-Level Effects: 
~Species (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     2.80      0.67     1.85     4.44       1553 1.00

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -1.89      0.82    -3.54    -0.23        897 1.00
x             0.01      0.19    -0.35     0.39       8000 1.00

Family Specific Parameters: 
    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
phi  3552.77    589.72  2502.62  4813.25       8000 1.00
zoi     0.25      0.04     0.17     0.33       8000 1.00
coi     0.24      0.08     0.11     0.40       8000 1.

You cans see that the mean for x goes to 0, and the phi error is very large. I’ve tried a few reasonable group-level priors and nothing changes. I would expect some variation in the results, as there is certainly intraspecific variation in the measurements, but this seems like a bit much.

  1. With my limited experience modeling, I am not sure if this is “working” and the variance cancels out any population-level effect, or if this is a bug, or if I am just not understanding this inflated beta family.

  2. Perhaps the problem is bigger. Looking at the data in the plot, does zero-one-inflated beta seem like a good family selection? My most extreme x values are not ones or zeros. I have tried out ordinal models but the strong bimodal distribution in the data leaves few intermediate values. Any advice on this would be very helpful!

Thanks for your time

Test_Data.csv (3.2 KB)

  • Operating System: OSX 10.11.6
  • brms Version: 2.3.4

#2

Phi is a precision parameter which means that higher values imply less variation. Perhaps this answers your question already?


#3

Thanks for responding Paul.

I’m not sure I understand, but my guess is that the large within-group variation decreases the between group variation, resulting in a large phi and a bx value close to 0, and suggesting no effect of x on y.

It looks like I have more research to do. Thanks again for the help