Help for finding an appropriate distribution for a positive and continuous responsee variable that includes zeros

Hi,
I am working with a response variable that represents the percentage of trees per sampling plot that have been partially eaten by insects (i.e., % of trees with defoliation signs). This variable is positive and continuous, but it also includes zeros because there are no defoliated trees in some plots. The distribution of the values looks like this:

Initially, I considered the Log-normal distribution and the Gamma distribution for modeling this variable. However, these distributions can’t handle the zeros in the dataset.

My ultimate goal is to model the percentage of defoliated trees as a function of a treatment.

Do anyone have any suggestions or recommendations for choosing a suitable distribution?

Hi, I think zero-inflated beta regression might be what you’re looking for (divide the values by 100 so that the % is expressed as a proportion). The brms package has good functionality for fitting such models - for example, see this blog post. This paper about modeling proportions might be helpful as well. Good luck!

1 Like

Thanks. A zero inflated beta regression seems to be difficult to fit, and I’m looking for a simpler solution that I can explain to colleague (who collected the data I’m analysing) who is not familiar with Bayesian statistics at all.

I’m trying to obtain the original count data, but I’m not sure that’s possible.

Would transforming the response variable be an acceptable workaround,i.e, new_y= y+1 ?

1 Like

I’m not sure about the transformation workaround (i.e., adding some small constant to avoid zeros & then using gamma regression). I’ve seen it done but can’t speak to the statistical validity. This post might help.

I think this model would be pretty straightforward to fit with brms - syntax, etc., is very similar to traditional packages like lme4. It seems intimidating but I bet you can figure it out!

2 Likes