Hi,
I’ve been running a hierarchical multinomial logit model using the Stan code shown below. I’m finding that the model tends to be overfitted when comparing in-sample and holdout accuracies (for each respondent I hold out responses to some questions and fit the model on the remaining questions, the accuracy is based on predicting respondent choices). The in-sample accuracies are much larger than the holdout accuracies (e.g. 100% vs 60%).
Is there anything I can change to the model to reduce the amount of overfitting, i.e. the difference between in-sample and holdout accuracies while still maintaining or improving the holdout performance? I have tried tweaking the gamma and LKJ settings to reduce the amount of respondent parameter variation but this tends to decrease performance. Any helpful feedback would be welcome. Thanks!
data {
int<lower=2> C; // Number of alternatives (choices) in each question
int<lower=1> R; // Number of respondents
int<lower=1> S[R]; // Number of questions per respondent
int<lower=1> RS; // sum(S)
int<lower=1> A; // Number of attributes
int<lower=1> V; // Number of parameters
int<lower=1> V_attribute[A]; // Number of parameters in each attribute
int<lower=1,upper=C> Y[RS]; // choices
matrix[C, V] X[RS]; // matrix of attributes for each obs
vector[V] prior_mean; // Prior mean for theta
vector[V] prior_sd; // Prior sd for theta
real<lower=1> lkj_shape; // shape parameter for LKJ prior
real<lower=0> gamma_alpha; // alpha parameter for gamma prior for sigma
real<lower=0> gamma_beta; // beta parameter for gamma prior for sigma
}
parameters {
vector<lower=0>[V] sigma;
row_vector[V] theta;
cholesky_factor_corr[V] L_omega;
matrix[V, R] standard_normal;
}
transformed parameters {
matrix[V, V] L_sigma;
matrix[R, V] beta;
L_sigma = diag_pre_multiply(sigma, L_omega);
beta = rep_matrix(theta, R) + (L_sigma * standard_normal)';
}
model {
int rs = 1;
vector[V * R] vector_normal;
sigma ~ gamma(gamma_alpha, gamma_beta);
theta ~ normal(prior_mean, prior_sd);
L_omega ~ lkj_corr_cholesky(lkj_shape);
vector_normal = to_vector(standard_normal);
vector_normal ~ normal(0, 1);
for (r in 1:R)
{
for (s in 1:S[r])
{
Y[rs] ~ categorical_logit(X[rs] * to_vector(beta[r]));
rs += 1;
}
}
}