Observe importance of different features in making predictions

Howdy! @harrelfe asked a similar question Model Selection in BRMS - #19 by harrelfe and also asked it on Stack Exchange regression - Relative variable importance/explained variation from a single model fit - Cross Validated with some replies there that could help. I don’t have an answer to your question (or Frank’s), as I do not remember the term “importance” outside of random forest or CART. I think I have seen this question about “importance” a few times now on the Stan forum, and I would be curious what exactly the purpose of determining so-called importance would be - i.e. what is the end goal? It looks like if the goal is look at contributions to prediction, then one could compare models via some criteria like LOO-CV for K-fold CV or something else. If the goal is feature selection, then something like projection predictive variable selection could be used Projection Predictive Feature Selection • projpred . In a Bayesian model though, as @avehtari writes in Cross-validation FAQ , you can avoid any model selection “using the model which includes all predictors and includes all uncertain things. Then optimal thing is to integrate over all the uncertainties. When including many components to a model, it is useful to think more carefully about the prior.” You could, for example, use a regularized horseshoe prior in the case of many predictors. One could also think of “importance” in terms of the coefficients when all the predictors are standardized to the same scale. What is the end goal of finding the “most important” variable in your model for sales? i.e. why do you want to know “how each feature contributes to the prediction”? Is it so that you know which predictor of sales is the biggest bang for the buck to intervene on when you can’t have an intervention for every predictor? I think the end goal might help decide what you use to define “important.”

1 Like