Multiple Variable Selection in Stan

Hi to everyone. All this time I have been searcing about ways to implement variable selection for p>15 in Stan for user defined models and not for usual like Normal, Binomial, Poisson (see projpred library) and I have not found anything about this. In Winbugs someone could implement through posterior inclusion probabilities (see for Details Ntoufras, Bayesian Modelling Using Winbugs) which defines for each variable a binary parameter \gamma_{j}. But in Stan we cannot sample from discrete distributions and in this case we cannot marginalize out this suggestion (from Ntzoufras).

Is there another alternative way for Variable selection in Stan for many variables? Or maybe a procedure like Stepwise through WAIC Criterion, etc. ?

Thanks in advance for any suggestion!

1 Like

And this works for user defined distributions (in other words for distributions I have made with stancode and withour rstanarm and brms libraries)? Because I have searched it and I could not find how to use a stanfit object (model run with user defined distribution) in order to implement this procedure. I have found it working only for Normal, Binomial , Poisson.

You would have to implement it yourself for your likelihood function / model.

See Piironen and Vehtari (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735.. If n>>p stepwise selection may work, but projection predictive approach implemented in projpred is has much lower variance and is more stable if p is large.

See Piironen, Paasiniemi, and Vehtari (2018). Projective inference in high-dimensional problems: prediction and feature selection. arXiv preprint arXiv:1810.02406. If the covariates are correlating the inclusion probabilities are misleading,

If you tell which model you would like to have, I can tell how easy it would be to add to projpred. If the model belongs to the exponential family, then Piironen, Paasiniemi, and Vehtari (2018). Projective inference in high-dimensional problems: prediction and feature selection. arXiv preprint arXiv:1810.02406 section 3 provides the equations you need.

1 Like

I apply a Skellam model using Y_1, Y_2 as latent variables with correspoding linear predictors (like Karlis, Ntzoufras ). More specifically, my linear predictors are these:

\lambda_1=exp(mu+home+att[home]+defense[away]+skills_home
\lambda_2=exp(mu+att[away]+defense[home]+skills_away where skills_home is a matrix with 17 covariates for home team and skills_away a matrix with 17 covariates for away team.

Furthermore I compare this model with an ordered logistic of this form:
$logit(\gamma_ij)=mu-(team_abil[home]-team_abil[away]+skills) where skills has 17 features. I wuld like also to implement variable selection of this problem like the Skellam.

I have 34 variables and If I have understood correctly this stepwise procedure demands 2^34 different models. Howeer, is there any code in stan for stepwise variable selection through WAIC Criterion?

Thank you for your time!

I’m not familiar with Skellam, but I read from Wikipedia that Y_1, Y_2 are independently Poisson distributed (conditionally on \lambda_j. So it seems you could modify projpred for your model.

Again conditionally independent terms, and you should be able to modify projpred.

Send me email if you are interested in the projpred option, and I can check if one of my students could help.

Stepwise requires max (34^2+24)/2 different models, but depending on the stopping rule, usually less.

No, because it cannot be generally recommended (see the paper I mentioned).