Stan/brms fit is worse than simple linear mixed model fit in terms of RSS, what could be the reasons?

I am using the R package brms as the interface to Stan. And I assume that I should get similar fit using Stan as that from using lme4 package with not so tight priors and similar model structures. I am asking the question here because I am not sure if the “nonlinear” way to construct the linear model in brms would affect the fitting dramatically.

The model basically assumes a linear relationship between y (log of drift deposition) and x (log of distance) for each trial. One trial can have multiple lines along the x(distance). The project is related to pesticide drift deposition for regulatory use.

The model structure can be written as as following:

y \sim b1 + b2 * x + \epsilon

Intercept: b1 \sim 1 + Speed + Pressure + Temp + Winds peed + Boom Height + \epsilon_{Trial} + \epsilon_{Line}

Slope: b2 \sim 1 + Speed + Pressure + Temp + Wind speed + \epsilon_{Trial}

Although not completely the same, the above structure can be fitted by linear mixed model directly, the R code (with a reduced model of no random slopes) would be:

lmeRes <- lmer(y~x + Speed + Pressure + Temp_1 + Windspeed + Boom+  x:Speed + x:Temp_1 + x:Windspeed+(1|TrialG/LineG),data=dat)

The code I used in R with bmrs is as below. It was specified more in a nonlinear model way with the coefficient in the linear model being fitted by a second level linear model.

priors1 <- c(set_prior("student_t(5,0,2)", nlpar = "b1"),
               set_prior("student_t(5,0,2)", nlpar = "b2"),
               set_prior("cauchy(0, 1)", class= "sd", nlpar = "b1", group = "TrialG"),
               #set_prior("cauchy(0, 1)", class= "sd", nlpar = "b1", group = "LineG"),
               set_prior("cauchy(0, 1)", class= "sd", nlpar = "b2", group = "TrialG"))

 fit <- brm( bf(y ~ b1+b2*x, 
             b1 ~ 1+Speed+Pressure+Temp_1+Windspeed+Boom+Crop.Height+(1|TrialG/LineG),
             b2 ~ 1+Speed+Pressure+Temp_1+Windspeed+Boom+Crop.Height+(1|TrialG), 
             nl = TRUE),
             data = dat, prior = priors1, 
             iter =2000, chains = 3, #warmup = 1000, 
             control = list(adapt_delta = 0.88,max_treedepth=maxTree), cores = 4,
             save_model = paste0(filename,".txt"))

An example part of the data would look like

          y          x Speed Pressure Temp_1 Windspeed Boom TrialG LineG TrialG:LineG Using
1 -3.329807 -0.6931472     8      300  23.84  3.144205  0.5      1     1          1_1 FALSE
2 -4.816369  0.0000000     8      300  23.84  3.144205  0.5      1     1          1_1 FALSE
3 -4.852973  0.6931472     8      300  23.84  3.144205  0.5      1     1          1_1 FALSE
4 -5.008920  1.0986123     8      300  23.84  3.144205  0.5      1     1          1_1 FALSE
5 -5.614826  1.6094379     8      300  23.84  3.144205  0.5      1     1          1_1 FALSE
6 -6.835159  2.3025851     8      300  23.84  3.144205  0.5      1     1          1_1 FALSE

I cannot provide the full data directly without permission here but I can ask for the permission if necessary.

Thank you for any hints and suggestions.

Also, I am not sure if this is a topic of Modelling or just the interface brms.

There is no need to use the non-linear syntax of brms here. You can directly use the lme4 syntax in brms as well.

Hi Paul, thank you for the quick reply.

I know it is true. I thought that with the nonlinear structure (adding random slopes for each trial), the model structure is more clear (and more complex) to non-statisticians and easy to explain. Also for the summary table and plots, and so on.

Anyway, I will try to use the lme4 syntax to refit the model so that I know it is because of the nonlinear specification.

I just would like to make sure. Also, what do you mean by “you would expect similar fit”?

1 Like

I assume I will get similar fit performance. If the data is really gaussian distributed, I would even get similar parameter estimates.

Performance as measured by what? If it is some sort of in-sample fit (i.e., fit to the data used in the model), one of the two showing better fit doesn’t tell you much if at all.

However, I would agree that at least the main regression coefficients should be rather similar if you don’t use informative priors.


I finally got the time to rerun the model in the linear specification. And yes, the coefficients estimates are almost the same. The adjusted R2 of lmer output is similar to the Bayes R2 of brms output. Only if I just calculated the correlation between y and fitted y, the \hat y of linear mixed model has a higher correlation coefficient. I am guessing the fitted values from brms are not calculated by the parameters directly, is that right?

Another thing I am confused is the fitted and predict method for brms fit.

Please ignore the blue and purple lines below. For the first picture, the intervals/bands are derived by the fitted function while for the 2nd picture, the intervals are derived from the predict function. The only difference between these 2 methods are just that the predict function adds the residual variance. If sampling from the old levels, then only one level will be sampled for the uncertainty calculation, which is not meaningful is I want some sort of uncertainty bands. Do I understand these correctly ? Could I only account for one grouping level of variance not all hierarchical groupings by specifying re_formula=~(1|TrialG) instead of re_formula=NULL

The black lines are a fitted line for one row of new data. the yellow band is the confidence band calculated with fitted(fit1, newdata = newavg, re_formula = NA)[,-2] ; the grey bands are with the argument sample_new_levels="old_levels" from fitted(fit1, newdata = newavg, re_formula = NULL,allow_new_levels = TRUE,sample_new_levels="old_levels",probs=c(0.05,0.95))[,-2]; while the light green bands are from fitted(fit1, newdata = newavg, re_formula = NULL,allow_new_levels = TRUE,sample_new_levels="uncertainty",probs=c(0.05,0.95))[,-2].

fit2 <- brm(y~x + Speed + Pressure + Temp_1 + Windspeed + 
    x:Speed + x:Temp_1 + x:Windspeed+(1+x|TrialG/LineG),data=dat,prior=bprior1, 
             iter =2000, chains = 3, #warmup = 1000, 
             control = list(adapt_delta = 0.88,max_treedepth=maxTree), cores = 4,
             save_model = paste0(filename,"_randomSlope.txt"))

I also wonder why using the nonlinear way to specify almost the same model does not work as expected. Any possible reasons?

Hi Paul, I am sorry that I have to get back to the same question after one year. The main regression coefficients are indeed similar without informative priors. I agree also that in-sample fit does not mean too much. However, what should I use as a fair model comparison criteria?

For example, I can easily compare two brmsfit with loo.

## library(performance)
bprior1 <- prior(student_t(5,0,10), class = b) + prior(cauchy(0,2), class = sd)
fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
            data = epilepsy, family = poisson(), prior = bprior1)
fit1 <- add_criterion(fit1, c("loo", "waic","kfold"),K=10)
fit1.0 <- brm(count ~  zBase * Trt + (1|patient),
            data = epilepsy, family = poisson(), prior = bprior1)
fit1.0 <- add_criterion(fit1.0, c("loo", "waic","kfold"),K=10)
loo_compare(fit1, fit1.0, criterion = c("loo", "waic", "kfold"))

But with

fit0 <- glmer(count ~ zAge + zBase * Trt + (1|patient), data = epilepsy, family = poisson())
fit00 <- glm(count ~ zAge + zBase * Trt , data = epilepsy, family = poisson())

Model | Type | AIC | BIC | RMSE

fit0 | glmerMod | 1357.58 | 1378.37 | 1.32
fit00 | glm | 1729.92 | 1747.24 | 2.01

In terms of RMSE, BRMS fit with fit1 gives RMSE of 4.75, almost 4 times of that with glmer, how should I explain this in-sample difference in terms of goodness-of-fit?

How did you compute the RMSE for both packages exactly?

When correctly computed, I doubt the RMSEs of the point estimates will differ much. But keep in mind that what Bayes adds are usually better uncertainty intervals which are not accounted for my any measure just based on point estimates.