Mix of continuous and categorical predictors

linas · April 18, 2018, 1:55pm

Hi,

How to model mix of continuous and categorical predictors in stan? There are 4 continuous and 8 categorical predictors. In https://stackoverflow.com/questions/29183577/how-to-represent-a-categorical-predictor-rstan dummy variables are suggested. Ben recommends having design matrix of predictors. I wonder if categorical predictor should be a simplex instead. What is the best practice?

Another question is on the choice of distribution. If the outcome is 0 or 1 but at each design point the experiment is replicated 10 times: should bernoulli or binomial be used?

Linas

paul.buerkner · April 18, 2018, 2:02pm

If you want to work with your predictors in a regression framework, I suggest, as did Ben, to work with design matrices and use, for instance, dummy coding for the categorical predictors (although the type of coding depends on what you are interested in).

If you don’t expect anything to change for these 10 trials, you can use a binomial model, which is faster than a bernoulli model for the single trials, but otherwise equivalent in this case.

linas · April 21, 2018, 2:51pm

Many thanks. Is it OK if instead of dummies I assigned predictor values “red” and “blue” to 1 & 2?

The response is binary - 0 or 1. Any suggestions how to speedup the attached stan code?
I am still not sure that logistic regression is a way to go. Are there alternative approaches to handle binary responses?

The design matrix x has N rows and Nx columns. The response is a matrix consisting of 0s & 1s with Ny columns.

Thanks for any suggestions.

Linas

ablation.stan (1.68 KB)

paul.buerkner · April 21, 2018, 4:06pm

I don’t see any good reason why you should prefer 1 & 2 instead of 0 & 1 for coding of your categorical predictors

If you are an R user and what to do logistic regression with Stan, I recommend the brms or rstanarm packages. For instance:

brms::brm(response ~ predictor1 + predictor2 + …, data = your_data, family = bernoulli())

or

rstanarm::stan_glm(response ~ predictor1 + predictor2 + …, data = your_data, family = binomial())

linas · April 21, 2018, 5:53pm

Many thanks. Can I calculate predictive posterior in rstanarm? I am familiar with rstan but maybe it is time to explore…

What I meant by alternative - is logistic regression the only way to fit binary responses?

Linas

paul.buerkner · April 21, 2018, 5:57pm

See for instance posterior_predict and pp_check. You can change the link function of the binomial / bernoulli family. It’s not the only model, but I would go for it unless you have strong objections to why you should use something else (which you know is better in a certain situation).

linas · April 21, 2018, 6:36pm

Thanks a lot.

The reason I want to try different models is that I am bit not satisfied with predictive power of logistic regression. I am new to such problems (I was in continuous domain before) therefore value any insight. In my problem I have binary response vector of length Ny for the each design point. What I really want is for model to predict the # of 0s in the response vector. A good model in my case is the one which matches well with observed # of 0s in the response vector. For some reason I thought that logistic regression will help but posterior predictive of # of 0s didn’t mach with observed # of 0s.

Any ideas why?

Linas

paul.buerkner · April 21, 2018, 6:44pm

Mabye your expectations are unrealistic for this data, or maybe your predictors have not enough predictive power, or they may be non-linear relationships which you didn’t model yet. You might as well try something like random forests or similar typical machine learning models, but that’s not an area of my expertise.

Topic		Replies	Views
Using categorical predictors Modeling	6	1629	June 19, 2020
Bernoulli_logit with different number of predictors Modeling rstan	3	981	February 5, 2022
Issues with running spatial survival model in Stan Modeling rstan , fitting-issues	2	345	June 10, 2021
Categorical factor coding in stan Modeling techniques , specification	6	5605	April 23, 2023
Representing categorical variables in Stan Modeling	4	1057	March 8, 2023

Mix of continuous and categorical predictors

Related topics