I have an ordered dataset in which the probability of the outcome being positive decreases with the id of the observation. Here is a plot of the data:
I am modelling the phenomenon using a Bernoulli model with a power-law: a * ID ^ b in brms. The goal is to predict the total number of positive in a much larger dataset of which the training set is made by the first 1200 observations, in order to compare the predicted number of matches with the observed ones. We want to estimate how many matches in the held-out data could have been missed.
The problem is that the model can predict also a total number of matches below the observed one, which is impossible since the positive matches have been manually reviewed and there could not be false negatives.
Is it possible to model such a process? I’m not sure is possible also because the minimum boundary in the training data is a fraction of that of the total data and such fraction is of course not constant. Any idea?
At the moment I just resolved to throw out predictions with less positive cases than the observed ones, which is ~ 60% of them.
I’d prefer a brms based solution but I can manage also with stan.