Non linear distribution with some zeros in data

ssalimi · October 19, 2021, 11:18pm

Hello,
I want to model a non-linear outcome and a covariate with age as a random slope. I wanted to use Non-Linear Models with brms, however, the assumption there is omega and theta is the same across age ( in the vignette across the time). The outcome is the proportion (range 0 to 1) as below. I appreciate your advice on the model that suits this distribution. I wanted to choose gamma but there are zero values in the data. I appreciate hints.

mike-lawrence · October 20, 2021, 3:52pm

Are you truly observing proportions as your most-raw data? Or do you have access to less-processed data consisting of zeroes and ones?

ssalimi · October 20, 2021, 6:38pm

@mike-lawrence the data I need to look at is in proportion between 0 and 1 as shown in the histogram.

mike-lawrence · October 20, 2021, 6:46pm

Ok, I asked because quite often folks want to achive inference on the proportion scale and make the mistake of aggregating their binomial 0/1 data to proportions to serve as input to modelling; this would be a mistake as inference on the proportion scale can be achieved most accurately by allowing the model to see the raw binomial data.

mike-lawrence · October 20, 2021, 6:58pm

Just double-checking: for each proportion, do you happen to have the number of observations/events that led to that proportion? If so, then it’s trivial to work out the original counts of 1s/0s

ssalimi · October 20, 2021, 7:08pm

@mike-lawrence I have the original counts. Someone has developed this proportion data and claims it is the best metric to capture health. I am arguing this is not. So I have to use this proportion as the outcome and develop models with its predictor with a model which is based on the raw data. To develop a model on this proportion I want to make sure that the model is correct and when I make my argument it is valid. Does this make sense?

ssalimi · October 21, 2021, 10:48pm

I think I use beta regression in brms.

stijn · October 22, 2021, 6:06am

Just to follow up on what Mike said. If you have the counts of observations and events, you can model this data as a binomial or poisson with an offset. You are still estimating a parameter for the proportion/rate, which you are interested in, and you explicitly take into account that 0’s are possible.

ssalimi · October 22, 2021, 7:25am

@stijn This is a metric developed from 30 items as 1/0 (yes/no) over the total number of items (30) for each individual which results in a proportion between (0 to 1). This is not a binary event.
Now I need to use this developed metric as an outcome in a model. With this distribution, I wonder what family is the best. I used the NL model in the brms, also set the zeros to a very small number (0.0001), and used the gamma family, but neither of them is performing well. So, the problem is I am obliged to use this already developed metric as an outcome to address reviewers’ comments. I need to make sure to use the right model for this distribution. Thanks for any input.

stijn · October 22, 2021, 9:04am

That sounds a lot like an outcome variable that follows a binomial distribution with p success and 30 trials, divided by 30 where you are interested in what explains p.

ssalimi · October 22, 2021, 9:27am

@stijn a beta distribution? because binomial or Bernoulli didnt work!

cmcd · October 22, 2021, 2:49pm

@ssalimi , I think that the suggestion of using a binomial model sounds like the right way to go, but it may help if you share your brms code. If the binomial doesn’t work or sounds wrong to you, it could be that we’re missing or misreading some information you have, or it could be a mistake in the brms code.

jd_c · October 22, 2021, 3:30pm

As others have said, it sounds like you can do better than using this aggregated data by using the raw counts. If however, for some reason you are required to use the proportions, and they contain zeroes, then you could try the zero_inflated_beta in brms. Based on the histogram that you showed, it doesn’t seem like your data contains any ones, but if so, you could try the zero_one_inflated_beta.

ssalimi · October 22, 2021, 7:28pm

@jd_c I totally agree with you and others on using raw data as counts. Indeed, the purpose is to defend this suggestion to respond to the reviewers. I need to use this method of aggregate and compare models to show them the aggregate method is not optimal. I will try again on the binomial approach. I will also use `zero_inflated_beta versus beta.
Thanks a lot.

avehtari · October 25, 2021, 11:49am

Binomial model was already mentioned, and if the data is over-dispersed compared to the binomial, then beta-binomial (Beta-binomial distribution - Wikipedia) is also available in Stan.

The hisogram looks like there is some zero-inflation which could be taken into account, too.

Not certain if these are available in brms directly, but at least the necessary compinents are available in Stan

mike-lawrence · October 25, 2021, 5:29pm

And in case you need references to bolster your assertion that a proper treatment of such data would involve a hierarchical model with a bernoulli likelihood, it’s been discussed extensively in the quantitative methods for psychology literature; here are a few key refs:

Dixon, 2008
Jaeger, 2008
Good blog post on a related psyc-stats-twitter hubub from last year

ssalimi · October 25, 2021, 6:17pm

@avehtari Thanks for confirming on beta regression approach.

ssalimi · October 25, 2021, 6:18pm

@mike-lawrence Thank you for the references. Much appreciate it.

avehtari · October 25, 2021, 6:57pm

But, I didn’t! I said binomial and beta-binomial, which are both models for discrete counts with some maximum. Beta is for continuous data, but based on your description the data is discrete and beta is then the wrong model and can be especially bas as there are many zeros.

ssalimi · October 25, 2021, 7:13pm

@avehtari Opps, my bad! Yes, I will use zero_inflated beta-binomial.
Indeed, beta regression was truly unstable with many K>0.7. I can communicate this in the paper.
I appreciate it for correcting me.

Topic		Replies	Views
Specifying trials in a binomial non-linear model (four-parameter logistic) giving strange results brms fitting-issues , specification , brms	4	1103	June 8, 2023
Non-linear models using family = Beta() Modeling rstan , brms	22	2033	June 23, 2021
Advice on a non linear regression model (brms) brms	2	624	February 28, 2019
Zero-One Inflated Beta Model and Syntax brms General brms	7	2458	September 19, 2021
Distribution regression for non-linear parameters brms	1	1319	February 20, 2022

Non linear distribution with some zeros in data

Related topics