I’m using PyStan to model upvote data which is produced by a social media site. Specifically, the observed data is total_score where total_score = num_upvotes - num_downvotes.
num_upvotes and num_downvotes are long-tailed, so I’m modeling this as
num_upvotes ~ neg_binomial(mu_up, phi_up)
num_downvotes ~ neg_binomial(mu_down, phi_down)
total_score = num_upvotes - num_downvotes
because num_upvotes and num_downvotes are unobserved, I have to sum over the number of downvotes, truncating at some reasonable value.
This works pretty well – the sampled histogram of counts looks close to the true histogram of counts, parameter values seem sane, Rhat is close to 1.
But I’m getting some errors: first, early in sampling (just early in the warmup phase) I get a lot of
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
"Exception: neg_binomial_2_lpmf: Precision parameter is inf, but must be finite! (in ‘unkown file name’ at line 79)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."
I’m not exactly sure why this is happening. Also, Rhat is sometimes nan, even though the parameter values don’t seem to be nan. I’ve tried messing around with the priors, but that doesn’t fix the problem. I’m worried perhaps it’s that the mean of the upvote negative binomial and the downvote negative binomial are correlated, and that’s making the model unidentifiable, but I’m not sure how to fix that.