Multiple Variable Selection in Stan

billas · February 15, 2019, 9:52am

Hi to everyone. All this time I have been searcing about ways to implement variable selection for p>15 in Stan for user defined models and not for usual like Normal, Binomial, Poisson (see projpred library) and I have not found anything about this. In Winbugs someone could implement through posterior inclusion probabilities (see for Details Ntoufras, Bayesian Modelling Using Winbugs) which defines for each variable a binary parameter \gamma_{j}. But in Stan we cannot sample from discrete distributions and in this case we cannot marginalize out this suggestion (from Ntzoufras).

Is there another alternative way for Variable selection in Stan for many variables? Or maybe a procedure like Stepwise through WAIC Criterion, etc. ?

Thanks in advance for any suggestion!

bgoodri · February 15, 2019, 3:49pm

billas · February 15, 2019, 4:04pm

And this works for user defined distributions (in other words for distributions I have made with stancode and withour rstanarm and brms libraries)? Because I have searched it and I could not find how to use a stanfit object (model run with user defined distribution) in order to implement this procedure. I have found it working only for Normal, Binomial , Poisson.

bgoodri · February 15, 2019, 4:29pm

You would have to implement it yourself for your likelihood function / model.

avehtari · February 15, 2019, 5:19pm

See Piironen and Vehtari (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735.. If n>>p stepwise selection may work, but projection predictive approach implemented in projpred is has much lower variance and is more stable if p is large.

See Piironen, Paasiniemi, and Vehtari (2018). Projective inference in high-dimensional problems: prediction and feature selection. arXiv preprint arXiv:1810.02406. If the covariates are correlating the inclusion probabilities are misleading,

If you tell which model you would like to have, I can tell how easy it would be to add to projpred. If the model belongs to the exponential family, then Piironen, Paasiniemi, and Vehtari (2018). Projective inference in high-dimensional problems: prediction and feature selection. arXiv preprint arXiv:1810.02406 section 3 provides the equations you need.

billas · February 16, 2019, 9:35am

I apply a Skellam model using Y_1, Y_2 as latent variables with correspoding linear predictors (like Karlis, Ntzoufras ). More specifically, my linear predictors are these:

\lambda_1=exp(mu+home+att[home]+defense[away]+skills_home
\lambda_2=exp(mu+att[away]+defense[home]+skills_away where skills_home is a matrix with 17 covariates for home team and skills_away a matrix with 17 covariates for away team.

Furthermore I compare this model with an ordered logistic of this form:
$logit(\gamma_ij)=mu-(team_abil[home]-team_abil[away]+skills) where skills has 17 features. I wuld like also to implement variable selection of this problem like the Skellam.

I have 34 variables and If I have understood correctly this stepwise procedure demands 2^34 different models. Howeer, is there any code in stan for stepwise variable selection through WAIC Criterion?

Thank you for your time!

avehtari · February 16, 2019, 10:09am

I’m not familiar with Skellam, but I read from Wikipedia that Y_1, Y_2 are independently Poisson distributed (conditionally on \lambda_j. So it seems you could modify projpred for your model.

Again conditionally independent terms, and you should be able to modify projpred.

Send me email if you are interested in the projpred option, and I can check if one of my students could help.

Stepwise requires max (34^2+24)/2 different models, but depending on the stopping rule, usually less.

No, because it cannot be generally recommended (see the paper I mentioned).

Topic		Replies	Views
How to model mixed variables Gaussian, Ranking binary and categorical in stan all together to generate a joint distribution of them? Modeling	2	440	October 14, 2023
A question on "stan cannot deal with discrete parameters" Developers	19	6674	February 28, 2018
Variable selection mixture model in Stan Modeling	3	338	May 11, 2021
Using projpred on stanfit object RStan rstan , projpred	2	657	November 12, 2021
Feature selection via stan Modeling fitting-issues	5	2579	May 15, 2017

Multiple Variable Selection in Stan

Related topics