Using categorical predictors

nipnipj · June 18, 2020, 4:01pm

Hello!

I’m learning how to use RSTAN and I already did my first multiple linear regression and 2 point predictions. Now I’m trying to go a bit further by trying to use categorical predictors. I transformed the categorical predicto to k-1 binary predictors — where k is the number of categories in that predictor — but a wild noob question appeared, what priors are recommended for binary predictors?

Also, how to make RSTAN distinguise binary, integer, and continuous predictos in a same predictors matrix?

FJCC · June 18, 2020, 6:35pm

I will try to answer this in part but keep in mind that I am a novice with Stan and I may not get everything right.
Setting priors is a big topic and it is hard to cover every situation in a single post. In the context of linear regression and a binary predictor, keep in mind that the prior is describing the size of the effect when the category coding goes from zero to one. If you are modeling the effect of a category on the height of an adult human measured in meters, a plausible effect would be much less than 1 in absolute value. A standard normal prior would not be very informative at all. If you are modeling a categorical effect on adult human weight in Kg, a coefficient of 10 might not be out of the question. A standard normal in that case might be very strong.
Your domain knowledge should guide you towards what is plausible and what is not. You may not know whether a category has an effect but you probably can put some bounds on effects that go beyond surprising into the realm of unbelievable.
I hope that helps.

nipnipj · June 18, 2020, 7:06pm

The prior distribution of the coefficients can be any continuous distribution, right? Of course these distributions and their bounds should make sense.

But, How can I declare in STAN a matrix composed by binary, integers, and continuous variables (column vectors)?

FJCC · June 18, 2020, 7:46pm

I am not sure I understand your problem with the matrix. I think vectors and matrices are always real. Version 2.23 of the Reference Manual says

Vectors and matrices cannot be typed to return integer values. They are restricted to real values.

A column of a matrix may happen to contain integer values but the variable type is real. Are you encountering an error or is this a problem you are expecting?

nipnipj · June 18, 2020, 8:00pm

No problem at all with the software. However, I was just wondering whether I was doing it correctly since I’m trying to use 1 categorical predictor (i.e. k-1 binary predictors), 3 continuos predictors, and 1 continuous response.

So declaring a matrix for these 4 predictor variables and 1 continuous response by using

data {
int<lower=1> N;
int<lower=1> K;
vector[N] y;
matrix[N,K] X;
}

Will be enought?

FJCC · June 18, 2020, 8:09pm

Yes, that seems fine. I will admit to a nagging fear that I am forgetting something but your data set up seems completely reasonable.

abartonicek · June 19, 2020, 5:39am

You always have to consider the prior in the context of the response variable/likelihood (see https://arxiv.org/abs/1708.07487). Afaik, the type of the predictor generally (continuous/binary) generally matters less, as long as the predictors are roughly on the same scale (i.e. all continuous predictors are scaled/normalized). Still, you generally want the prior to reflect the reasonable values that the parameter could be expected to take - for example, in social science research, if you have a scaled continuous response and scaled continuous/binary predictors, it’s rare to see absolute multiple regression coefficients bigger than 0.2-0.3, so a normal(0, 1) prior could be considered a weakly informative prior, in the sense that you’d be surprised by parameter values falling outside the -2, 2 range.

Topic		Replies	Views
Mix of continuous and categorical predictors Modeling	7	3463	April 21, 2018
Categorical factor coding in stan Modeling techniques , specification	6	5614	April 23, 2023
Simple reproducible example of using discrete parameters in Stan? Modeling	2	544	September 10, 2018
Model with an ordered categorical predictor - NaN found and how do I add varying effects? Modeling rstan , techniques	4	672	July 30, 2020
Dinamic prior based on categorical information Modeling rstan , techniques	0	288	February 7, 2023

Using categorical predictors

Related topics