How to model mix of continuous and categorical predictors in stan? There are 4 continuous and 8 categorical predictors. In https://stackoverflow.com/questions/29183577/how-to-represent-a-categorical-predictor-rstan dummy variables are suggested. Ben recommends having design matrix of predictors. I wonder if categorical predictor should be a simplex instead. What is the best practice?

Another question is on the choice of distribution. If the outcome is 0 or 1 but at each design point the experiment is replicated 10 times: should bernoulli or binomial be used?

If you want to work with your predictors in a regression framework, I suggest, as did Ben, to work with design matrices and use, for instance, dummy coding for the categorical predictors (although the type of coding depends on what you are interested in).

If you donâ€™t expect anything to change for these 10 trials, you can use a binomial model, which is faster than a bernoulli model for the single trials, but otherwise equivalent in this case.

Many thanks. Is it OK if instead of dummies I assigned predictor values â€średâ€ť and â€śblueâ€ť to 1 & 2?

The response is binary - 0 or 1. Any suggestions how to speedup the attached stan code?
I am still not sure that logistic regression is a way to go. Are there alternative approaches to handle binary responses?

The design matrix x has N rows and Nx columns. The response is a matrix consisting of 0s & 1s with Ny columns.

See for instance posterior_predict and pp_check. You can change the link function of the binomial / bernoulli family. Itâ€™s not the only model, but I would go for it unless you have strong objections to why you should use something else (which you know is better in a certain situation).

The reason I want to try different models is that I am bit not satisfied with predictive power of logistic regression. I am new to such problems (I was in continuous domain before) therefore value any insight. In my problem I have binary response vector of length Ny for the each design point. What I really want is for model to predict the # of 0s in the response vector. A good model in my case is the one which matches well with observed # of 0s in the response vector. For some reason I thought that logistic regression will help but posterior predictive of # of 0s didnâ€™t mach with observed # of 0s.

Mabye your expectations are unrealistic for this data, or maybe your predictors have not enough predictive power, or they may be non-linear relationships which you didnâ€™t model yet. You might as well try something like random forests or similar typical machine learning models, but thatâ€™s not an area of my expertise.