Non-linear binomial model with boundary on the minimum number of cases

Angelo_D_Ambrosio · July 30, 2021, 9:02am

Hello,

I have an ordered dataset in which the probability of the outcome being positive decreases with the id of the observation. Here is a plot of the data:

I am modelling the phenomenon using a Bernoulli model with a power-law: a * ID ^ b in brms. The goal is to predict the total number of positive in a much larger dataset of which the training set is made by the first 1200 observations, in order to compare the predicted number of matches with the observed ones. We want to estimate how many matches in the held-out data could have been missed.
The problem is that the model can predict also a total number of matches below the observed one, which is impossible since the positive matches have been manually reviewed and there could not be false negatives.

Is it possible to model such a process? I’m not sure is possible also because the minimum boundary in the training data is a fraction of that of the total data and such fraction is of course not constant. Any idea?
At the moment I just resolved to throw out predictions with less positive cases than the observed ones, which is ~ 60% of them.

I’d prefer a brms based solution but I can manage also with stan.

jsocolar · August 11, 2021, 3:07am

Sorry nobody got to this question earlier. It seems like your problem is solved by predicting the model just to the holdout set (instead of training-and-holdout together), and then to that holdout prediction adding the number of observed positives in the training set. Does that work for you?

Angelo_D_Ambrosio · August 12, 2021, 5:26pm

Yep! That’s what I did! Even if I switched to a simple logistic model and used as predictor a probability generated by a black box model.

My issue is that even if the simple logistic fits very well (bayes R2 >95%), once I predict on very large datasets with very low predictor value, the upper posterior interval may reach unrealistically high values. But I guess that this is expected since any non zero probability would predict a ever growing number of cases without any upper boundary. Also, this provably should go in another question…

Topic		Replies	Views
Negative binomial model predictions unexpectedly large for values larger than seen in sample data Modeling brms , count-data	2	1293	April 13, 2021
Specifying trials in a binomial non-linear model (four-parameter logistic) giving strange results brms fitting-issues , specification , brms	4	1128	June 8, 2023
Beta regression with bounded predictor Modeling brms	6	107	December 3, 2024
Generating unrealistically high draws out of in posterior in zero-inflated negative binomial model Modeling specification	12	1982	April 30, 2018
Regarding prediction/posterior_epred/posterior_predict from brms ordinal model brms posterior-predictive	9	3033	February 24, 2023

Non-linear binomial model with boundary on the minimum number of cases

Related topics