Hi,
I have been working on a model that on the surface seems easy but has proved challenging for me. I have a proportional response variable y that represents proportion of presence of a species in a habitat, and a predictor x which represents a body measurement. There are 112 measurements from 20 species. The data are included in the .csv below. It is important to note that, though not shown here, this is a phylo regression model that will hopefully be used to predict y for species of unknown habitat affinity.
As the response includes ones and zeros, I am trying out the zero-one-inflated beta family. The simplest model seems to work ok.
mod1 <- brm(
y ~ x , data = data,
family = zero_one_inflated_beta("logit"),
iter = 4000,
prior = c(
set_prior("normal(0, 50)", class = "b", coef = "x"),
set_prior("normal(0, 50)", class = "Intercept")
),
control = list(adapt_delta = 0.95, max_treedepth = 15),
cores = parallel::detectCores()
)
Summary and marginal effects plot:
Family: zero_one_inflated_beta
Links: mu = logit; phi = identity; zoi = identity; coi = identity
Formula: y ~ x
Data: data (Number of observations: 112)
Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup samples = 8000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept -10.18 1.16 -12.45 -7.93 3639 1.00
x 14.26 1.74 10.89 17.68 4161 1.00
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
phi 3.30 0.58 2.27 4.54 3901 1.00
zoi 0.25 0.04 0.17 0.33 6862 1.00
coi 0.24 0.08 0.10 0.41 6243 1.00
However when I apply the group effect (1 | Species)
to account for within-species measurement error, my results seem off
mod2 <- brm(
y ~ x + (1 | Species), data = data,
family = zero_one_inflated_beta("logit"),
iter = 4000,
prior = c(
set_prior("normal(0, 50)", class = "b", coef = "x"),
set_prior("normal(0, 50)", class = "Intercept")
),
control = list(adapt_delta = 0.95, max_treedepth = 15),
cores = parallel::detectCores()
)
Summary:
Group-Level Effects:
~Species (Number of levels: 20)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 2.80 0.67 1.85 4.44 1553 1.00
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept -1.89 0.82 -3.54 -0.23 897 1.00
x 0.01 0.19 -0.35 0.39 8000 1.00
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
phi 3552.77 589.72 2502.62 4813.25 8000 1.00
zoi 0.25 0.04 0.17 0.33 8000 1.00
coi 0.24 0.08 0.11 0.40 8000 1.
You cans see that the mean for x goes to 0, and the phi error is very large. I’ve tried a few reasonable group-level priors and nothing changes. I would expect some variation in the results, as there is certainly intraspecific variation in the measurements, but this seems like a bit much.
-
With my limited experience modeling, I am not sure if this is “working” and the variance cancels out any population-level effect, or if this is a bug, or if I am just not understanding this inflated beta family.
-
Perhaps the problem is bigger. Looking at the data in the plot, does zero-one-inflated beta seem like a good family selection? My most extreme x values are not ones or zeros. I have tried out ordinal models but the strong bimodal distribution in the data leaves few intermediate values. Any advice on this would be very helpful!
Thanks for your time
Test_Data.csv (3.2 KB)
- Operating System: OSX 10.11.6
- brms Version: 2.3.4