Addition of seemingly-innocuous predictor variables ruin sampling

I have a model that runs with great sampling, but when I add a few additional predictors (notated below in the code), sampling tanks. These are basically dummy variables w/1=yes, 2=no, (and for smk and etoh), 3=former (smoker or alcoholic). What has happened and how can I fix it?

Update: in thinking about this, 4 of these (dm = diabetes, htn = high blood pressure, hpl = high cholesterol, cad = heart disease) are probably highly correlated. Could this disrupt sampling?

Many thanks!

data{
    int rows;
// start of seemingly-innocuous predictor variables
    int dm[rows];
    int htn[rows];
    int hpl[rows];
    int cad[rows];
    int smk[rows];
    int etoh[rows];
// end of seemingly-innocuous predictor variables
    int metSyn_alpha[4];
    int metSyn[rows];
    vector[rows] bmi;
    int rupt[rows];
    vector[rows] mean_lnInc;
    int ins[rows];
    int FU[rows];
    int mRS_preop[rows];
    int gcs[rows];
    int stroke_tia[rows];
    int prev[rows];
    int age2[rows];
    int age[rows];
    int mRS_discharge[rows];
    int comp[rows];
    int migraine[rows];
    vector[rows] time;
    int sex[rows];
    int dist[rows];
    vector[rows] sd_lnInc;
    vector[rows] sem_lnInc;
    int rowid[rows];
    int insXrupt[rows];
    real grandSEM_lnInc;
    real grandMean_lnInc;
    vector[6] mRS_discharge_alpha;
    vector[12] gcs_alpha;
    vector[6] mRS_preop_alpha;
}
parameters{
    vector[4] a_ins;
    vector[rows] nu;
    vector[rows] stdN1;
    vector[rows] stdN2;
    vector[4] b_lnInc;
    real b_dist;
    vector[2] a_sex;
    real b_time;
    vector[2] a_migraine;
    vector[3] a_comp;
    real b_mRS_discharge;
    simplex[6] DELTA_mRS_discharge;
    real b_age;
    real b_age2;
    vector[2] a_prev;
    vector[2] a_stroke_tia;
    real b_gcs;
    simplex[12] DELTA_gcs;
    real b_mRS_preop;
    simplex[6] DELTA_mRS_preop;
// start of seemingly-innocuous predictor variables
    vector[2] a_dm;
    vector[2] a_htn;
    vector[2] a_hpl;
    vector[2] a_cad;
    vector[3] a_etoh;
    vector[3] a_smk;
// end of seemingly-innocuous predictor variables
    real<lower=0> phi;
}
model{
    vector[rows] lambda;
    vector[7] delta_mRS_discharge;
    vector[13] delta_gcs;
    vector[7] delta_mRS_preop;
    phi ~ exponential( 2 );
    DELTA_mRS_preop ~ dirichlet( mRS_preop_alpha );
    delta_mRS_preop = append_row(0, DELTA_mRS_preop);
    b_mRS_preop ~ normal( 0 , 1 );
    DELTA_gcs ~ dirichlet( gcs_alpha );
    delta_gcs = append_row(0, DELTA_gcs);
    b_gcs ~ normal( 0 , 1 );
    a_stroke_tia ~ normal( 0 , 1 );
    a_prev ~ normal( 0 , 1 );
    b_age2 ~ normal( 0 , 1 );
    b_age ~ normal( 0 , 1 );
    DELTA_mRS_discharge ~ dirichlet( mRS_discharge_alpha );
    delta_mRS_discharge = append_row(0, DELTA_mRS_discharge);
    b_mRS_discharge ~ normal( 0 , 1 );
    a_comp ~ normal( 0 , 1 );
    a_migraine ~ normal( 0 , 1 );
    b_time ~ normal( 1 , 1 );
    a_sex ~ normal( 0 , 1 );
    b_dist ~ normal( 0 , 1 );
    b_lnInc ~ normal( 0 , 1 );
    stdN2 ~ normal( 0 , 1 );
    stdN1 ~ normal( 0 , 1 );
    nu ~ normal( grandMean_lnInc , grandSEM_lnInc );
    a_ins ~ normal( 0 , 1 );
    for ( i in 1:rows ) {
        lambda[i] = a_ins[insXrupt[i]] + 
                    b_lnInc[insXrupt[i]] * ((nu[rowid[i]] + stdN2[rowid[i]] * sem_lnInc[i]) + stdN1[rowid[i]] * sd_lnInc[i]) + 
                    b_dist * dist[i] + 
                    a_sex[sex[i]] + 
                    b_time * time[i] + 
                    a_migraine[migraine[i]] + 
                    a_comp[comp[i]] + 
                    b_mRS_discharge * sum(delta_mRS_discharge[1:mRS_discharge[i]]) + 
                    b_age * age[i] + b_age2 * age2[i] + 
                    a_prev[prev[i]] + 
                    a_stroke_tia[stroke_tia[i]] + 
                    b_gcs * sum(delta_gcs[1:gcs[i]]) + 
                    b_mRS_preop * sum(delta_mRS_preop[1:mRS_preop[i]]) + 
// start of seemingly-innocuous predictor variables
                    a_dm[dm[i]] + 
                    a_htn[htn[i]] + 
                    a_hpl[hpl[i]] + 
                    a_cad[cad[i]] + 
                    a_etoh[etoh[i]] + 
                    a_smk[smk[i]];
// end of seemingly-innocuous predictor variables
        lambda[i] = exp(lambda[i]);
    }
    FU ~ neg_binomial_2( lambda , phi );
}
generated quantities{
    vector[rows] lnInc;
    vector[rows] mu;
    vector[rows] log_lik;
    vector[rows] lambda;
    vector[7] delta_mRS_discharge;
    vector[13] delta_gcs;
    vector[7] delta_mRS_preop;
    delta_mRS_preop = append_row(0, DELTA_mRS_preop);
    delta_gcs = append_row(0, DELTA_gcs);
    delta_mRS_discharge = append_row(0, DELTA_mRS_discharge);
    
    for(i in 1:rows) lnInc[i] = ((nu[rowid[i]] + stdN2[rowid[i]] * sem_lnInc[i]) + stdN1[rowid[i]] * sd_lnInc[i]);
    
    for(i in 1:rows) mu[i] = (nu[rowid[i]] + stdN2[rowid[i]] * sem_lnInc[i]);
      

    for ( i in 1:rows ) {
        lambda[i] = a_ins[insXrupt[i]] + b_lnInc[insXrupt[i]] * ((nu[rowid[i]] + stdN2[rowid[i]] * sem_lnInc[i]) + stdN1[rowid[i]] * sd_lnInc[i]) + b_dist * dist[i] + a_sex[sex[i]] + b_time * time[i] + a_migraine[migraine[i]] + a_comp[comp[i]] + b_mRS_discharge * sum(delta_mRS_discharge[1:mRS_discharge[i]]) + b_age * age[i] + b_age2 * age2[i] + a_prev[prev[i]] + a_stroke_tia[stroke_tia[i]] + b_gcs * sum(delta_gcs[1:gcs[i]]) + b_mRS_preop * sum(delta_mRS_preop[1:mRS_preop[i]]) + a_dm[dm[i]] + a_htn[htn[i]] + a_hpl[hpl[i]] + a_cad[cad[i]] + a_etoh[etoh[i]] + a_smk[smk[i]];
        lambda[i] = exp(lambda[i]);
    }

    for ( i in 1:rows ) log_lik[i] = neg_binomial_2_lpmf( FU[i] | lambda[i] , phi );
}

One possible thing stands out to me: usually k-level categorical predictors are coded as k - 1 dummy variables to avoid multicollinearity with the intercept term, though with sufficient regularization or strong priors the one-hot coding should work.

Can you be more specific as to what happens? Negative binomials are very hard to fit given that you can explain things by shifting mean or by increasing over dispersion.

Again, not sure what you mean by “disrupt” here. Correlations will make sampling slower, for sure, but it should still work.

Your “innocuous” predictors should probably be bounded as presumably they take on finitely many values for which you have random effects. The much more likely problem is what @js592 points out—using K variables instead of K-1 leads to greater identifiability problems.

I don’t see any priors for those new effects, and if two of them really are nearly perfectly correlated, that’ll cause huge problems by introducing a non-identifiable ridge into the posterior.

1 Like

This was the problem

1 Like