I recently came across this great article discussing predictive accuracy.
Bob Carpenter makes the point that binary classification in an applied setting really needs to account for the costs of false positives/negatives (if I understood correctly). This is just what I need from my model. Fraud investigators use my model output to determine what to investigate. False negatives will cost the company money, but false positives cost investigator time. I’d like the ability to tune the model accordingly.
Is it possible to incorporate these into a logistic regression model using Stan?
I’ve tried a cost-sensitive loss function using xgboost and it works well. But, I really like the interpretability of Stan models. Using Stan, I can not only tell an investigator what to look at, but I can also tell them why they should look at it. My current Stan model (hierarchical logistic regression using brms) has a very large number of false positives.
But the computation can be more efficient if they are combined. It’s more common in importance sampling to consider proposal distributions that take into account the function in addition of the distribution. This has been proposed also for distributional variational approximations. We could combine also MCMC and IS and intentionally sample more draws where they matter for the decision task and use importance weighting to get the correct expectations in the end.
I would change the target in Stan. How to change that is probably not trivial before some initial sampling. For example, consider that we would like to estimate some extreme tail quantile. We could do an initial run with the log density, and after learning approximate location of that quantile we could change the target to have higher values near that approximate location and lower values for the bulk of the distribution. For complex models and decisions tasks depending, e.g. predictions this is probably more difficult. If this would be easy, people would be doing it more often.
After working with this model more, I’ve realized there’s an additional complexity to my approach that I ignored. Not only does the ratio P(+1)/P(-1) imply a cost ratio, but when doing hierarchical logistic regression (with group G), I need to preserve the class ratio within each group P(+1|G)/P(-1|G). Ignoring this has given my strange results, but once I adjusted my training data to have the correct proportions per group things looked much better.