Settin priors for unknown data in zero-one inflated beta

Hello,

I am new to bayesian statistics and am running brms in R (version 2.21.2) on Windows 10.

I have a fairly simple dataset of seedsets of seven different plant species, which I collected from four different elevations on a mountain. I have created pollen limitation indexes from these seedset (an index to assess whether a plant has enough pollen to produce the maximum amount of seeds). The index ranges from 0 to 1 (0 means no pollen limitation and 1 indicates maximum possible pollen limitation). All the pollen limitation indexes are elevation and species specific.

The question which I set out to answer is whether (and how significantly) does the amount of pollen limitation vary between individual elevations, with species set as a random factor. With a quick look at a histogram it is apparent that I would have to use some zero-one inflated beta distribution, since both the 0’s and the 1’s are important in my research question.


I decided to try my luck with brms and after learning about the joys af bayesian statistics for a week, the best model which I could come up with was this:

zoib_model10 <- brm(
  bf(
    PL.index ~ elevation + (1|species),
    phi ~ elevation + (1|species),
    zoi ~ elevation + (1|species),
    coi ~ elevation + (1|species)
  ),
  data = PL.indexes,
  control = list(adapt_delta = 0.99,
                 max_treedepth = 15),
  chains = 4, iter = 2000, warmup = 1000,
  seed = 1234,
  family = zero_one_inflated_beta(),
  init = 0,
  backend = "cmdstanr",
  file = "zoib_model10"
)

The adapt_delta and max_treedepth are set pretty high, yet I am still getting some errors:
#*Warning: 2 of 4000 (0.0%) transitions ended with a divergence.
#*Warning message: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help
For the first warning, I have already tried increasing the max_treedepth up to 20, but the same warning still persists. For the second warning, I cannot set the alpha_delta higher.

Are these two warnings of concers, or can I ignore them and analyze my model?

I have understood from the blogs here, that even if I could, that setting the delta higher would not necessarilly be better and that it would be preferable to set priors. However, I am not sure that I can…


Maybe this is more of a question about how to interpret the priors. I have no expectations about the data, since no similar studies looking at pollen limitation in these conditions exist. And since I have such zero-one inflated data, I cannot set any normal priors, since that would be innacurate. Or is it best to set priors which would be similar to the histogram, which I uploaded above? If so, how could I do this? I have not found anywhere what would be proper priors for my distribution of data.

Thanks in advance for any suggestions.
Dominik

Hello @Dominik_Anyz, yes I suspect that it will be very difficult to achieve good results from this model without meaningful priors as there is a lot going on and it will want more regularization. You probably should work upwards from the simplest possible representation of this system to understand it fully and implement priors along the way.

If as you say there is no substantive knowledge about this system, then it may be more helpful to approach this with weakly informative priors. Presuming that the beta component of your model for example uses (link = logit), then the coefficients for this component represent effects on the logistic scale. A potential selection for weakly informative priors for categorical coefficients on that scale, presuming that very large effects are unlikely, is something like N(0,2). Given an intercept of 0 which corresponds to 0.5 on the response scale i.e. plogis(0) = 0.5, this suggests that the effect of the categorical predictor probably wouldn’t result in a transition to beyond plogis(0-4) = 0.018 or plogis(0+4) = 0.98.

It would probably be unwise to try and use the data to define prior information because this risks circular reasoning. The prior information should be justified externally to the extent possible.

A side note is that the zero-one-inflated-beta model here may be overcomplicated, unless the 0 and 1 have a concrete meaning beyond just ‘some arbitrarily extreme value’. You might be interested in the model described here (New paper using Stan/brms to estimate feeling thermometers/VAS scales).

Yes that is a very top heavy ZOIB parameterization if you have the same linear model for all submodels. Ordbetareg will fit that better – see R package ordbetareg on CRAN (uses brms as a back-end).

1 Like

Hi Andrew,

thanks for the reply. Although the 0’s and 1’s are imporptant in my research question and tell me in the plants in each elevation were or were not pollen limited, I believe that the ordbetareg package might be exactly what I am looking for. I will give it a go and see.

Hello Robert,

thanks for the reply. After a quick read I believe that your package might be the right choice for me. I will read up more on it and try to fit my data into a ordbetareg model.

1 Like

Hi @saudiwin Robert,

so I did try to fit my data into and ordbetareg model, but I have some questions about it.

First I would like to ask, is there any way to check the validity/goodness of fit of the model? The reason why I made the original post was that the warnings popped up, however without them I would not have know whether the model was a good fit or not. Is there any was of checking this, other than comparing the model visually to other models made with different packages?

Next, I have a question about the results, when I try to interpret them. I believe that what the model summary and the marginal effects are telling me is that there is not a “significant” difference in the PL index between elevations 2300 (Intercept), 2800 and 3500. There is a difference in elevation 4000 compared to the intercept. Would I be correct in drawing these conclusions from the outputs, or would I have to check something else?
obr

obr2

Thank you in advance for any answers.

Hi Dominik -

You can use all brms functions for model evaluation, like loo, so see the brms docs for more info. Also the package has a pp_check_ordbeta function for creating posterior predictions that are specific to this model.

1 Like