Non-linear binomial model with boundary on the minimum number of cases


I have an ordered dataset in which the probability of the outcome being positive decreases with the id of the observation. Here is a plot of the data:

I am modelling the phenomenon using a Bernoulli model with a power-law: a * ID ^ b in brms. The goal is to predict the total number of positive in a much larger dataset of which the training set is made by the first 1200 observations, in order to compare the predicted number of matches with the observed ones. We want to estimate how many matches in the held-out data could have been missed.
The problem is that the model can predict also a total number of matches below the observed one, which is impossible since the positive matches have been manually reviewed and there could not be false negatives.

Is it possible to model such a process? I’m not sure is possible also because the minimum boundary in the training data is a fraction of that of the total data and such fraction is of course not constant. Any idea?
At the moment I just resolved to throw out predictions with less positive cases than the observed ones, which is ~ 60% of them.

I’d prefer a brms based solution but I can manage also with stan.

Sorry nobody got to this question earlier. It seems like your problem is solved by predicting the model just to the holdout set (instead of training-and-holdout together), and then to that holdout prediction adding the number of observed positives in the training set. Does that work for you?

Yep! That’s what I did! Even if I switched to a simple logistic model and used as predictor a probability generated by a black box model.

My issue is that even if the simple logistic fits very well (bayes R2 >95%), once I predict on very large datasets with very low predictor value, the upper posterior interval may reach unrealistically high values. But I guess that this is expected since any non zero probability would predict a ever growing number of cases without any upper boundary. Also, this provably should go in another question…