Predictive Logistic Probability Fitting Advice

Happy New Year! This question could be filed under the category of ‘Stan makes even people with questionable stats knowledge very powerful/dangerous)’.

I have a linear logistic predictive probability model that seems to work well (with major caveats).

I’m interested in the probabilities themselves… So I calculate them from the generated y_tildas samples.
I then compare the actual outcome for xx with the predicted probabilities implied by the samples (e.g) …


When I bin the predictive probabilities and look at the actual outcome, I get the following Z scores (assuming a binomial process/variance at the average probability of the samples in that bin).

Lastly here are the samples sizes for each percentile bin:

So basically at very high and very low levels of probability, the process does not seem to fit a binomial process given by the implied probabilities of the generated samples. I’ve tried nonlinear terms, but they are not significant.

I think the process (even at tremendously high/low levels of input) have an inherently random quality that is not captured in the model or type of model. Any advice is deeply appreciated.

Thank you!!

That is going to be noisy. You could just calculate the probabilities in the generated quantities block.

Thanks for the response. So what you’re saying is there is a more direct way to get the probability other than generating samples. Could you elaborate? How do I do that?

The same way ordered_logistic_rng calculates the probability to draw with. The probability of falling in the k-th category is inv_logit(eta - c[k - 1]) - inv_logit(eta - c[k]) where eta = xx[j] * beta, implicitly letting c[0] be negative_infinity() and c[K] be positive_infinity().

1 Like

Thanks so much! Worked.