Cost sensitive logistic regression?

rsteckel · June 25, 2018, 10:16pm

I recently came across this great article discussing predictive accuracy.

Bob Carpenter makes the point that binary classification in an applied setting really needs to account for the costs of false positives/negatives (if I understood correctly). This is just what I need from my model. Fraud investigators use my model output to determine what to investigate. False negatives will cost the company money, but false positives cost investigator time. I’d like the ability to tune the model accordingly.

Is it possible to incorporate these into a logistic regression model using Stan?

I’ve tried a cost-sensitive loss function using xgboost and it works well. But, I really like the interpretability of Stan models. Using Stan, I can not only tell an investigator what to look at, but I can also tell them why they should look at it. My current Stan model (hierarchical logistic regression using brms) has a very large number of false positives.

bgoodri · June 25, 2018, 11:00pm

You don’t incorporate the costs of misclassification into the model block. You do your same model but evaluate your cost function in the generated quantities block.

Bob_Carpenter · July 20, 2018, 4:47pm

The beauty of the Bayesian approach is that fitting the model can be factored from the decision theory.

rsteckel · July 20, 2018, 5:33pm

I’m beginning to understand that. The Bayesian Decision Theory chapter in Pattern Classification by Duda and Hart was very helpful. Along with this paper:

http://www.jmlr.org/papers/volume11/dmochowski10a/dmochowski10a.pdf
(which references Duda and Hart)

To me, this line in the paper made things much more clear:

risk = p(+1)c(+1)p(error|+1) + p(−1)c(−1)p(error| −1)

Having a dataset where p(+1)=p(-1) implies equal costs. But, by changing p(+1) and p(-1) I can imply a certain cost ratio c(-1)/c(+1) (i.e. 1/100 False Positive Cost to False Negative Cost ratio)

avehtari · July 21, 2018, 4:46pm

But the computation can be more efficient if they are combined. It’s more common in importance sampling to consider proposal distributions that take into account the function in addition of the distribution. This has been proposed also for distributional variational approximations. We could combine also MCMC and IS and intentionally sample more draws where they matter for the decision task and use importance weighting to get the correct expectations in the end.

Bob_Carpenter · July 22, 2018, 2:14pm

That would be great if we could do it stably. How do you intentionally take more draws from where they matter? Do you change the log density being sampled or change the algorithm somehow?

avehtari · July 22, 2018, 8:42pm

I would change the target in Stan. How to change that is probably not trivial before some initial sampling. For example, consider that we would like to estimate some extreme tail quantile. We could do an initial run with the log density, and after learning approximate location of that quantile we could change the target to have higher values near that approximate location and lower values for the bulk of the distribution. For complex models and decisions tasks depending, e.g. predictions this is probably more difficult. If this would be easy, people would be doing it more often.

rsteckel · August 30, 2018, 5:07pm

After working with this model more, I’ve realized there’s an additional complexity to my approach that I ignored. Not only does the ratio P(+1)/P(-1) imply a cost ratio, but when doing hierarchical logistic regression (with group G), I need to preserve the class ratio within each group P(+1|G)/P(-1|G). Ignoring this has given my strange results, but once I adjusted my training data to have the correct proportions per group things looked much better.

Hopefully, that helps someone else.

Bob_Carpenter · September 3, 2018, 10:45am

Thanks for reporting back—we appreciate the accumulation of advice!

Topic		Replies	Views
Question re: classification model evaluation General	2	417	October 12, 2020
Improving Performance on Logistic Regression with Informative Priors Modeling performance , rstanarm	4	1561	May 1, 2020
Reverse engineering a STAN model Modeling rstan	3	529	May 5, 2020
Fitting Model -> Predicting Model -> Fitting a 2nd Model Using Predicted Values: All within one iteration? Modeling	0	397	December 20, 2018
Hierarchical logistic regression on anomalies Modeling	1	632	March 26, 2018

Cost sensitive logistic regression?

Related topics