Overfitting in hierarchical multinomial logit model

JustinYap · November 7, 2018, 3:25am

Hi,

I’ve been running a hierarchical multinomial logit model using the Stan code shown below. I’m finding that the model tends to be overfitted when comparing in-sample and holdout accuracies (for each respondent I hold out responses to some questions and fit the model on the remaining questions, the accuracy is based on predicting respondent choices). The in-sample accuracies are much larger than the holdout accuracies (e.g. 100% vs 60%).

Is there anything I can change to the model to reduce the amount of overfitting, i.e. the difference between in-sample and holdout accuracies while still maintaining or improving the holdout performance? I have tried tweaking the gamma and LKJ settings to reduce the amount of respondent parameter variation but this tends to decrease performance. Any helpful feedback would be welcome. Thanks!

data {
    int<lower=2> C; // Number of alternatives (choices) in each question
    int<lower=1> R; // Number of respondents
    int<lower=1> S[R]; // Number of questions per respondent
    int<lower=1> RS; // sum(S)
    int<lower=1> A; // Number of attributes
    int<lower=1> V; // Number of parameters
    int<lower=1> V_attribute[A]; // Number of parameters in each attribute
    int<lower=1,upper=C> Y[RS]; // choices
    matrix[C, V] X[RS]; // matrix of attributes for each obs
    vector[V] prior_mean; // Prior mean for theta
    vector[V] prior_sd; // Prior sd for theta
    real<lower=1> lkj_shape; // shape parameter for LKJ prior
    real<lower=0> gamma_alpha; // alpha parameter for gamma prior for sigma
    real<lower=0> gamma_beta; // beta parameter for gamma prior for sigma
}

parameters {
    vector<lower=0>[V] sigma;
    row_vector[V] theta;
    cholesky_factor_corr[V] L_omega;
    matrix[V, R] standard_normal;
}

transformed parameters {
    matrix[V, V] L_sigma;
    matrix[R, V] beta;

    L_sigma = diag_pre_multiply(sigma, L_omega);
    beta = rep_matrix(theta, R) + (L_sigma * standard_normal)';
}

model {
    int rs = 1;
    vector[V * R] vector_normal;

    sigma ~ gamma(gamma_alpha, gamma_beta);

    theta ~ normal(prior_mean, prior_sd);
    L_omega ~ lkj_corr_cholesky(lkj_shape);

    vector_normal = to_vector(standard_normal);
    vector_normal ~ normal(0, 1);

    for (r in 1:R)
    {
        for (s in 1:S[r])
        {
            Y[rs] ~ categorical_logit(X[rs] * to_vector(beta[r]));
            rs += 1;
        }
    }
}

jeremy.koster · November 7, 2018, 5:15pm

I’ve done some previous work with multilevel, multinomial models in Stan.

It would be helpful to learn more about the research and data structure. It sounds like you have data on respondents who make multiple choices throughout the study. Can you tell us more?

andre.pfeuffer · November 8, 2018, 2:01am

You could add a random intercept for each observation.
Second use a noise nugget to your covariance matrix.
Your covariance matrix \Sigma would become \Sigma + \sigma I,
with I identity matrix. No diagonal update in the cholesky form
is known, thus sampling the full matrix and doing a cholesky decomposition
of the sum is needed.

JustinYap · November 8, 2018, 5:48am

Yes, the data is from a conjoint analysis where a number of survey respondents are asked to respond to a number of questions in which they choose between a number of alternatives, each with different attributes. The matrix beta contains the parameters (utilities) for each respondent * attribute, X contains the attribute levels for each question, and Y contains the choices made by the respondents.

jeremy.koster · November 8, 2018, 5:50am

You might benefit from this thread (though it doesn’t directly relate to your question about tuning parameters):

JustinYap · November 8, 2018, 6:09am

Thanks, I’ll have a read!

JustinYap · November 8, 2018, 6:20am

Thanks Andre, I’ll look into your suggestions!

Topic		Replies	Views
Hierarchical multinomial model Modeling	8	4126	February 8, 2018
Slow Stan Code for Hierarchical Multivariate Model Modeling	8	1223	July 24, 2018
Speeding up a hierarchical multinomial logit model Modeling	27	7779	October 18, 2022
Hierarchical Multivariate Normal model Modeling fitting-issues , performance	6	2126	June 12, 2019
Hierarchical overdispersed count models Modeling binomial-response , count-data	2	778	November 10, 2021

Overfitting in hierarchical multinomial logit model

Related topics