Exponentiated predictor variable leads to poor sampling in neg-binomial GLM

Nicklaus_Millican · August 19, 2022, 10:16pm

[edit: escape Stan code]

data{
    int smk[626];
    int metSyn_alpha[3];
    int metSyn[626];
    int etoh[626];
    int cad[626];
    vector[626] bmi;
    int stroke_tia[626];
    int prev[626];
    int mRS_preop_alpha[6];
    int mRS_preop[626];
    int gcs_alpha[12];
    int gcs[626];
    int age2[626];
    int age[626];
    vector[626] time;
    int sex[626];
    int rupt[626];
    int migraine[626];
    int mRS_discharge_alpha[6];
    int mRS_discharge[626];
    int insXrupt[626];
    int dist[626];
    int comp[626];
    int FU[626];
    int ins[626];
    vector[626] sd_lnInc;
    vector[626] sem_lnInc;
    real grandSEM_lnInc;
    real grandMean_lnInc;
    vector[626] mean_lnInc;
}
parameters{
    vector[2] a_ins;
    real lnInc;
    real mu;
    vector[2] b_inc;
    real<lower=0> phi;
}
model{
    vector[626] lambda;
    phi ~ exponential( 3 );
    b_inc ~ normal( 0 , 1 );
    mean_lnInc ~ normal( grandMean_lnInc , grandSEM_lnInc );
    mu ~ normal( mean_lnInc , sem_lnInc );
    lnInc ~ normal( mu , sd_lnInc );
    a_ins ~ normal( 0 , 1 );
    for ( i in 1:626 ) {
        lambda[i] = a_ins[ins[i]] + b_inc[ins[i]] * (lnInc);
        lambda[i] = exp(lambda[i]);
    }
    FU ~ neg_binomial_2( lambda , phi );
}
generated quantities{
    vector[626] log_lik;
    vector[626] lambda;
    for ( i in 1:626 ) {
        lambda[i] = a_ins[ins[i]] + b_inc[ins[i]] * (lnInc);
        lambda[i] = exp(lambda[i]);
    }
    for ( i in 1:626 ) log_lik[i] = neg_binomial_2_lpmf( FU[i] | lambda[i] , phi );
}

Okay, so the variable lnInc is log income. But I want this to be in income. However, when using either
b_inc[ins[i]]*exp(lnInc) or lnInc ~ lognormal, I get treedepth, BFMI, and/or divergent warnings and very bad sampling. I have also tried non-centering reparameterization, but to no avail.

Bob_Carpenter · August 22, 2022, 6:56pm

But does it work OK if you use log income?

If lnInc is log income but you’d rather parameterize in terms of income, then you need to declare the linear form with a lower bound constraint and then use it in the lognormal:

parameters {
  real<lower=0> inc;
}
model {
  inc ~ lognormal(mu, sd_lnInc);
}

That should give you the same result as what you wrote. Using the lnInc parameterization is a bit more efficient and a bit more robust, as it’s not going back and forth multiple times.

I don’t understand where you’d have something like a non-centered parameterization as you don’t have a hierarchical model (though a_ins and b_inc are two-dimensional, that’s not enough to fit a hierarchical model).

As a side note, this code

can be vectorized and converted to the log scale with this:

FU ~ neg_binomial_2_log(a_ins[ins] + b_inc[ins] .* lnInc, phi);

Nicklaus_Millican · August 22, 2022, 8:23pm

Great! Thank you, this worked well

Topic		Replies	Views
Pooled Nonlinear Regression Modeling specification	12	571	November 9, 2020
Efficient implementation of the for loop? Modeling rstan , fitting-issues	1	536	February 18, 2022
Multiple logistic regression Modeling rstan	1	501	January 18, 2022
Pathologies with respect to lp__ and energy__ Modeling fitting-issues	3	724	August 11, 2020
Compiling error help (beginner) Modeling rstan , fitting-issues	1	315	January 21, 2021

Exponentiated predictor variable leads to poor sampling in neg-binomial GLM

Related Topics