Nonlinear formulation changes parameter estimates of logistic regression

Hi,

I am analyzing survival of individuals (binary response) in relation to three predictors, two of which have to raised to an exponent. I had previously posted about this nonlinear model here. This is the model I set up:

fit_1 = brm(
  bf(status ~ b0 + b1 * var1^c1 + b2 * var2^c2 + b3*var3,
     b0 ~ 1 + (1|g)  
     b1 ~ 1 + (1|g), 
     b2 ~ 1 + (1|g), 
     c1 ~ 1 , 
     c2 ~ 1 ,
     nl = TRUE)

While working through models of different complexity, I also set up this model:

fit_2 = brm(
  bf(status ~ b0 + b1 * var1 + b2 * var2 + b3*var3,
     b0 ~ 1 + (1|g)  
     b1 ~ 1 + (1|g), 
     b2 ~ 1 + (1|g), 
     b3 ~ 1 + (1|g), 
     nl = TRUE)

Now, fit_2 should be same as this:

fit_3 = brm(status ~ var1+ var2 + var3 + 
                  (1+ var1+ var2 + var3 | g),
                family=bernoulli, data = data,
                chains = 4, iter = 2000, warmup = 1000, cores = 4, thin = 2)

However, fit_2 and fit_3 give very different parameter estimates, which I found surprising. For example:

> fixef(fit_1)
                Estimate Est.Error        Q2.5     Q97.5
b0_Intercept  0.55315545 0.2764453  0.01718413 1.1042612
b1_Intercept  0.06040814 0.0726198 -0.08236811 0.2103593
b2_Intercept -0.13928427 0.2540105 -0.64212694 0.3650797
b3_Intercept  0.80389800 0.2513916  0.30919201 1.3099177
c1_Intercept -1.63763565 0.9386900 -3.55646061 0.2719193
c2_Intercept -0.99848071 2.2785871 -5.83316607 2.8979777

> fixef(fit_2)
                Estimate  Est.Error        Q2.5     Q97.5
b0_Intercept  0.51183385 0.28255293 -0.03478977 1.0704994
b1_Intercept  0.04841162 0.07470821 -0.09886431 0.1956527
b2_Intercept -0.03542412 0.24820811 -0.51505618 0.4612422
b3_Intercept  0.84898955 0.23663168  0.38540498 1.3049237

> fixef(fit_3)
            Estimate  Est.Error         Q2.5     Q97.5
Intercept -0.6652988 0.60257372  -1.87421760 0.4653411
var1      -6.3638288 3.90875670 -15.13751383 0.5118952
var2       0.7459177 0.99294856  -1.21309907 2.7805406
var3       0.1471336 0.06510745   0.03026837 0.2862436

Why are the estimates for 2 and 3 so different? All models had converged, with decent behavior in PP checks. However, the nonlinear model was quite sensitive to prior specification. It looks like the nonlinear formulation in general is doing something different. I checked the stan code for the two models and there seems to be no difference except in the generated quantities, but I cannot figure out how or why that should create such different estimates.

I would greatly appreciate help with this issue.

Thanks,
Meghna

Side question: I was also wondering whether linear and nonlinear formulations of the same data can be compared using LOO-IC?

I’m not sure everything else is equivalent, but a clear difference is that fit_3 has a multivariate normal for coefficients of (1, var1, var2, var3) with full covariance, while fit_2 has separate normals for each four (or one multivariate with diagonal covariance, if you wish). That’s part of the semantics of putting many things related to the same grouping into the same term in formula as you have in fit_3.

This could be enough if the model is very sensitive to priors to start with. The difference is quite large. Obviously neither models is really useful with their current data.

Hi. Thanks for your input. To try whether MV normal made a difference to the estimate, I set up fit_3 such that each parameter was drawn from univariate distributions, like so:

make_stancode(status ~ var1+ var2 + var3 +
                      (1+var1|x|sp) + (0+var2|x|sp) + (0+var3|x|sp),
              family=bernoulli, data = bci.se.nosp$data)

The estimates remain the same as for the formulation with multivariate distribution. So, it would appear that the difference between setting up [what I think] is the same model using a nonlinear syntax and GLMM syntax in brms is not due to the specification of the distributions.

1 Like

Hi again…bumping this up as I have not been able to solve the problem.

@paul.buerkner : would you have any inputs?

Thanks,
Meghna

Can you provide the full code that went into brm() also for fit_1 and fit_2? I am asking because in non-linear syntax you need to specify priors and I don’t see what you used from your description.

Hi Paul,

Thanks for taking the time to respond. Please excuse my delayed response - I was trying to fix a few other matters in the model before writing back.

Below are the priors I used. A note about the priors: in a previous thread I had mentioned that the nonlinear model(s) had divergences, which seemed sensitive to prior specification. So, I had to set fairly tight priors for NLM based on previous analyses, but the GLMM had diffuse priors.

> fit_2$prior
                  prior class      coef           group resp dpar nlpar bound       source
                 (flat)     b                                        b0            default
         normal(1, 0.3)     b Intercept                              b0               user
                 (flat)     b                                        b1            default
        normal(0, 0.25)     b Intercept                              b1               user
                 (flat)     b                                        b2            default
        normal(0, 0.25)     b Intercept                              b2               user
                 (flat)     b                                        b3            default
        normal(1, 0.25)     b Intercept                              b3               user
 lkj_corr_cholesky(0.8)     L                                                         user
 lkj_corr_cholesky(0.8)     L                        g                       (vectorized)
   student_t(3, 0, 2.5)    sd                                       b0            default
             gamma(5,2)    sd                                       b1               user
             gamma(5,2)    sd                                       b2               user
             gamma(3,1)    sd                                       b3               user
             gamma(5,2)    sd                        g              b0               user
             gamma(5,2)    sd Intercept              g              b0       (vectorized)
             gamma(5,2)    sd                        g              b1       (vectorized)
             gamma(5,2)    sd Intercept              g              b1       (vectorized)
             gamma(5,2)    sd                        g              b2       (vectorized)
             gamma(5,2)    sd Intercept              g              b2       (vectorized)
             gamma(3,1)    sd                        g              b3       (vectorized)
             gamma(3,1)    sd Intercept              g              b3       (vectorized)

This is the prior for GLMM:

> fit_3$prior
                prior     class      coef   group resp dpar nlpar bound       source
               (flat)         b                                              default
               (flat)         b   var1                               (vectorized)
               (flat)         b   var2                              (vectorized)
               (flat)         b   var3                               (vectorized)
 student_t(3, 0, 2.5) Intercept                                              default
 lkj_corr_cholesky(1)         L                                              default
 lkj_corr_cholesky(1)         L                g                       (vectorized)
 student_t(3, 0, 2.5)        sd                                              default
 student_t(3, 0, 2.5)        sd                g                       (vectorized)
 student_t(3, 0, 2.5)        sd   var1         g                       (vectorized)
 student_t(3, 0, 2.5)        sd   var2         g                       (vectorized)
 student_t(3, 0, 2.5)        sd   var3         g                       (vectorized)
 student_t(3, 0, 2.5)        sd Intercept      g                       (vectorized)
Best wishes,
Meghna

@paul.buerkner : do you have any inputs on what might be happening with the NLM vs GLMM? I shared the priors in the last message - please let me know if you need more information.

  • Meghna

Would it be possible that these difference in priors explain the differences in results?

What happens if you also use diffuse priors for the non-linear model and check the model results? Yes, they may be biased by divergent transitions, but at least we may get an indication if the currently used informative priors on the parameters of the non-linear modelswhere perhaps quite influencial for the posterior.

One reason why the one model had divergences while the other has not is the linear model formulation applies automatic centering of the intercept (which is not visible in the results as it is corrected for already in the transformed parameters block of Stan), that can often both increase sampling efficiency and reduce overall sampling problems.

Hi Paul,

Thanks for the response. I have run the NLM with more diffuse priors and the parameter estimates do not change much. I thought that the NLM also centered the intercept - guess I was wrong there. Is there a way to specify centering for NLM?

Alternatively, I can use the code generated for the GLMM and tweak it into a NL form, and then run the models in Stan. Might that work?

Any other suggestions to deal with the problem?

In general, when we have such cases, how do we know which model is reliable? I have run necessary PP checks, including simulation based checks using package SBC. The NLM does not throw out any major issues.

  • Meghna

you can enfore centering via the center argument of bf() and lf().

Your idea of teaking the Stan code sounds sensible.

For me to really understand what is going on there, I would need a fully reproducible example, that is, code I can take and just run on my own machine, ideally waiting only shortly for the models to finish running.

Let me try using the centering and get back to you in a couple of days. These models take super long, even using a reasonable subset of the data that still captures the expected variability in response.

If that doesn’t work, I will try to generate some data and share with you.

Thanks for taking the time!