Exponentiated predictor variable leads to poor sampling in neg-binomial GLM

[edit: escape Stan code]

data{
    int smk[626];
    int metSyn_alpha[3];
    int metSyn[626];
    int etoh[626];
    int cad[626];
    vector[626] bmi;
    int stroke_tia[626];
    int prev[626];
    int mRS_preop_alpha[6];
    int mRS_preop[626];
    int gcs_alpha[12];
    int gcs[626];
    int age2[626];
    int age[626];
    vector[626] time;
    int sex[626];
    int rupt[626];
    int migraine[626];
    int mRS_discharge_alpha[6];
    int mRS_discharge[626];
    int insXrupt[626];
    int dist[626];
    int comp[626];
    int FU[626];
    int ins[626];
    vector[626] sd_lnInc;
    vector[626] sem_lnInc;
    real grandSEM_lnInc;
    real grandMean_lnInc;
    vector[626] mean_lnInc;
}
parameters{
    vector[2] a_ins;
    real lnInc;
    real mu;
    vector[2] b_inc;
    real<lower=0> phi;
}
model{
    vector[626] lambda;
    phi ~ exponential( 3 );
    b_inc ~ normal( 0 , 1 );
    mean_lnInc ~ normal( grandMean_lnInc , grandSEM_lnInc );
    mu ~ normal( mean_lnInc , sem_lnInc );
    lnInc ~ normal( mu , sd_lnInc );
    a_ins ~ normal( 0 , 1 );
    for ( i in 1:626 ) {
        lambda[i] = a_ins[ins[i]] + b_inc[ins[i]] * (lnInc);
        lambda[i] = exp(lambda[i]);
    }
    FU ~ neg_binomial_2( lambda , phi );
}
generated quantities{
    vector[626] log_lik;
    vector[626] lambda;
    for ( i in 1:626 ) {
        lambda[i] = a_ins[ins[i]] + b_inc[ins[i]] * (lnInc);
        lambda[i] = exp(lambda[i]);
    }
    for ( i in 1:626 ) log_lik[i] = neg_binomial_2_lpmf( FU[i] | lambda[i] , phi );
}

Okay, so the variable lnInc is log income. But I want this to be in income. However, when using either
b_inc[ins[i]]*exp(lnInc) or lnInc ~ lognormal, I get treedepth, BFMI, and/or divergent warnings and very bad sampling. I have also tried non-centering reparameterization, but to no avail.

But does it work OK if you use log income?

If lnInc is log income but you’d rather parameterize in terms of income, then you need to declare the linear form with a lower bound constraint and then use it in the lognormal:

parameters {
  real<lower=0> inc;
}
model {
  inc ~ lognormal(mu, sd_lnInc);
}

That should give you the same result as what you wrote. Using the lnInc parameterization is a bit more efficient and a bit more robust, as it’s not going back and forth multiple times.

I don’t understand where you’d have something like a non-centered parameterization as you don’t have a hierarchical model (though a_ins and b_inc are two-dimensional, that’s not enough to fit a hierarchical model).

As a side note, this code

can be vectorized and converted to the log scale with this:

FU ~ neg_binomial_2_log(a_ins[ins] + b_inc[ins] .* lnInc, phi);

Great! Thank you, this worked well