I’m running a simple linear model that has individual-specific intercepts and a categorical variable capturing the effect of the day of the week.
data {
int<lower=1> N; // number of observations
int<lower=1> K; // number of weekdays - 1
matrix[N,K] X; // design matrix including weekdays (except from the reference category Tuesday)
vector[N] y; // outcome
int<lower=0> id[N]; // identifier of individuals
int<lower = 1> H; // number of individuals
}
parameters {
real alphai[H];
vector[K] beta;
real<lower=0> sigma;
}
model {
for (n in 1:N){
y[n] ~ normal(alphai[id[n]] + X[n]*beta, sigma_y);
}
// priors
}
I construct the design matrix X using the following code:
X <- model.matrix(outcome ~ weekday, data = da)[,-1]
Thus, I do not have an (additional) intercept in the predictor design matrix, and it looks like this (6 columns with dummary variables for all weekdays except from Tuesday, which is the reference category):
When I now make predictions from the posterior output of the model, the prediction for the reference category (in which I set all predictors included in X to zero) looks strange:
I get the predictions using (ex post in R):
tuesday = c(0,0,0,0,0,0)
y_tuesday = mean(alphai) + tuesday %*% beta
monday = c(0,0,0,1,0,0)
y_monday = mean(alphai) + monday %*% beta
etc.
My questions are:
- Do I need to add a global intercept to the model (capturing the effect of the reference category) to make correct predictions from the model for the reference category? The mean of the individual-specific intercepts may not adequately reflect the effect of the reference category?
- With my model only including the individual-specific intercept alphai, can I somehow get correct predictions for my reference category?
I checked that Tuesdays should not be so much different from other days summarizing the mean of my outcome of interest for different weekdays and running simple non-Bayesian models, so something is indeed wrong with my predictions.
Thanks in advance.