Representing categorical predictor variables

pyrena · April 17, 2019, 4:53pm

Suppose I have a regression problem y_{i} \sim N(\alpha + x_{i}\beta, \sigma), where for each sample i I observe some continuous y_{i} and x_{i}. If now each sample i also belongs to a group kk_{i} and I expect \beta to vary by group, am I allowed to write the model like this?

data {
  int N;
  real y[N];
  real x[N];
  int K;
  int<lower=1, upper=K> kk[K];
}
 
parameters {
  real alpha;
  real<lower=0> sigma;
  vector[K] beta;
}

model {
  for (i in 1:N)
    y[i] ~ normal(alpha + beta[kk[i]] * x[i], sigma);
}

What confuses me is that most people seem to use a K-1 dimensional design matrix when representing a K dimensional categorical variable and here I have a K-dimensional vector \beta plus the intercept. I’ve tried it with some fake data and it seems to recover the original values, so is there anything obvious I’m missing here (aside from things like computational speed)?

Petulla · April 17, 2019, 6:31pm

The k-1 is a result of resolving identifiability issues. There’s are a few good threads referenced in this issue that might help. This vignette by @rtrangucci I also found helpful.

torkar · April 18, 2019, 6:13am

@martinmodrak wrote up a nice piece on non-identifiability a while back that might come in handy
https://www.martinmodrak.cz/2018/05/14/identifying-non-identifiability/

pyrena · April 18, 2019, 8:48am

Thank you all for your replies! I expected it to have something to do with identifiability, but for me it is really hard to see where it is coming from. If I’d imagine to just split the dataset by group, I’d also get K different estimates for \beta.

torkar · April 18, 2019, 9:10am

Hi,

perhaps @Max_Mantei’s post will make things more concrete (it did help me):

Topic		Replies	Views
Categorical_logit fitting problem Modeling fitting-issues	6	367	August 25, 2021
Choosing categories Modeling	6	783	January 3, 2022
Valid contrasts of 2x2 categorical variables in linear regression of reinforcement learning parameters Modeling specification , reinforcement-learning	3	389	January 20, 2023
Help in designing the best model for my data Modeling specification , hierarchical-model	2	385	September 15, 2023
Help for setting parameters to solve non-identifiability problem Modeling specification	2	935	February 11, 2018

Representing categorical predictor variables

Related topics