Observe importance of different features in making predictions

I am using the STAN model through cmdstanpy. I want to understand how each feature contributes to the prediction. Generally in machine learning models, we use SHAP values but in the STAN model, I could not find a way.

I am calculating sales explained by each promotional spend but I want to see how intercept and other key features are contributing to prediction.

Howdy! @harrelfe asked a similar question Model Selection in BRMS - #19 by harrelfe and also asked it on Stack Exchange regression - Relative variable importance/explained variation from a single model fit - Cross Validated with some replies there that could help. I donā€™t have an answer to your question (or Frankā€™s), as I do not remember the term ā€œimportanceā€ outside of random forest or CART. I think I have seen this question about ā€œimportanceā€ a few times now on the Stan forum, and I would be curious what exactly the purpose of determining so-called importance would be - i.e. what is the end goal? It looks like if the goal is look at contributions to prediction, then one could compare models via some criteria like LOO-CV for K-fold CV or something else. If the goal is feature selection, then something like projection predictive variable selection could be used Projection Predictive Feature Selection ā€¢ projpred . In a Bayesian model though, as @avehtari writes in Cross-validation FAQ , you can avoid any model selection ā€œusing the model which includes all predictors and includes all uncertain things. Then optimal thing is to integrate over all the uncertainties. When including many components to a model, it is useful to think more carefully about the prior.ā€ You could, for example, use a regularized horseshoe prior in the case of many predictors. One could also think of ā€œimportanceā€ in terms of the coefficients when all the predictors are standardized to the same scale. What is the end goal of finding the ā€œmost importantā€ variable in your model for sales? i.e. why do you want to know ā€œhow each feature contributes to the predictionā€? Is it so that you know which predictor of sales is the biggest bang for the buck to intervene on when you canā€™t have an intervention for every predictor? I think the end goal might help decide what you use to define ā€œimportant.ā€

1 Like

Great questions. Though many people use variable importance for feature selection I donā€™t do that. What I use variable importance for is descriptive, to tell medical researchers what are the big players in predicting Y. My physician colleagues have really liked relative explained variation in frequentist models, e.g., ratio of a variableā€™s likelihood ratio \chi^2 statistic to the overall model \chi^2. With ordinary linear regression we use the partial R^2 divided by the total R^2 for the model. Iā€™d like to have a Bayesian version especially one that is log-likelihood based. But the proposed second solution in the StackExchange posting today may be a good way to do it. It uses a linear model as a bridge model and uses standard partitioning of sums of squares in the linear model to derive relative explained variation. Iā€™m going to code that to get HPD intervals for all the variableā€™s relative explained variations using that trick.

1 Like

Interesting. That is what I have thought most people were implying by the word ā€œimportanceā€ when I have seen it here on this forum.

However, I am still a bit confused as to what is practically and usefully important about importance. What are your thoughts on the following example? What am I missing? I am particularly curious about this idea of ā€œimportanceā€ because I have not only seen the question here on the forum, but I have encountered questions/discussions aiming at similar ideas from researchers that I have worked with. I would be curious to hear how your physician colleagues use ā€œimportanceā€ in practice. How exactly do they use the knowledge of ā€œthe big players in predicting Yā€?

Letā€™s assume an overly simple example (Iā€™m not a physician, so ignore the potentially awful physiology here), where we have a regression model for systolic blood pressure, systolic_bp, with continuous standardized predictors drug, drinks, and age. Letā€™s say that drug is a dose of blood pressure medication that affects systolic_bp, and drinks is number of alcoholic drinks and it affects both systolic_bp and the drug, i.e. classic confounder. We fit the basic Gaussian linear regression model, systolic_bp ~ 1 + age + drug + drinks.

Now, we find some measure of ā€œimportanceā€ for the predictors. I can only think of 3 reasons why importance is practically and usefully important (but I may easily be overlooking something!!!): 1) purely academic, statistical modeling question; 2) for the purpose of intervening: while age cannot be intervened on, drug and possibly drinks could, so it would be nice to know which to choose to intervene on to affect systolic_bp if one doesnā€™t have the resources to intervene on both (i.e. biggest bang for buck); 3) for the purpose of prediction if the physician only has limited information: given you know age only, can you make a good guesstimate at systolic_bp (obviously not, but this is all hypothetical).

A physician would not be interested in (1), only (2) and (3). For (2), it seems that one would need to be very careful about the causal structure of the problem at hand. The most ā€˜importantā€™ predictor, may not necessarily be the best predictor to intervene on. It seems like one would need a good grasp of the causal structure, and then they could use the coefficients from standardized predictors to determine biggest bang for the buckā€¦ would relative explained variation as ā€œimportanceā€ in this scenario be better? For (3), it seems that one would want to run multiple models, since using any measure of importance would not seem to imply that the predictor would necessarily be a good one on its own. For this case, it seems multiple models and model comparison, or something like projection predictive feature selection, would be better than measuring ā€œimportanceā€ā€¦?

I can think of a tempting 4th reason, but I think it would be incorrect - to imply some sort of discovery of causal nature. It would be extremely tempting, when seeing a regression with many predictors and rank of their ā€œimportanceā€, and having a general lack of or fuzzy understanding of the causal structure, for one to attribute some causal connection with the most important predictor and the outcome. I think that this could be misleading, though, as good prediction doesnā€™t imply causality. This might be extremely tempting for general users, though, and I would wonder, from a general user/practitioner/researcher perspective, if this isnā€™t what people may have in mind when they are looking for the ā€œbiggest players in predicting Y.ā€

I feel that you are making this far more complex than needed. Weā€™ve been estimating importance for > 100 years using statistical models. Think of an analysis of variance table in a 6 period 2 treatment crossover designs. We partition the sum of squares into various sources of variation: between treatments, within patient, between patients, pure error, etc. This is all about understanding. Learning about the magnitudes of the various sources of variation is very, very important to understanding processes (this has been used extensively in manufactoring) and in planning future experiments, plus a host of other uses. In predictive modeling in medicine we frequently want to know how much of a patientā€™s phenotype comes from DNA, how much from protein expression, and how much from environment. Getting the total importance of each of these types of variables is extremely informative.

2 Likes

Haha maybe, but the reasons that I gave, particularly the last one, is something that I have seen people try to do, and it seems a legitimate concern if given to the general user.

Gotcha. That makes sense. Mostly when I have been asked about importance, it wasnā€™t simply from the viewpoint of understanding variation.

Good points - I stay away from causal interpretations for non-randomized portions of the model. Itā€™s true that many people misinterpret importance. But that doesnā€™t make importance unimportant :-) Perhaps a better term would be relative predictive information but importance is shorter.

2 Likes