Latent indicator variable for each regression coefficient

Dear Stan community,

I would like to include a latent indicator variable for each independent variable in my regression model. The indicator variables have independent Bernoulli priors, where the success probabilities are iid standard uniform random variables, say. This is a way of doing variable selection, wherein one examines the posterior distributions of the success probabilities. How can I implement such a model in Stan?

My current implementation, which of course doesn’t work, includes the following syntax.

y ~ bernoulli_logit(beta0 + (X .* rep_matrix(indicators, n)) * beta);

Here ‘X’ is the design matrix, ‘indicators’ is the vector of indicator variables, and ‘beta’ are the regression coefficients. It seems sensible to compute the Hadamard product (X .* rep_matrix(indicators, n)) to select, for the current iteration, some subset of the explanatory variables.

The rest of my model is implemented satisfactorily, but I’m stuck on this part. Many thanks in advance for your guidance.

Warmly,

John

Hey John, those would be latent discrete parameters, which Stan doesn’t allow. I think for variable selection you could try model stacking with loo and/or implementing R2D2 priors.

1 Like