Brms model linear vs non linear

Felix_Requena · May 6, 2024, 1:03am

Good morning,

I am running both linear and non-linear models in brms, and my results are quite similar. First, I am running my model using my response variable, along with two predictors and one random effect. Below are the results I obtained:

Family: lognormal
Links: mu = identity; sigma = identity
Formula: Social_dist ~ 1 + X50 + Num_Unique_Healthy_Ids + (1 | p | Name)
Data: final_data (Number of observations: 30538)
Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 4;
total post-warmup draws = 4000

Group-Level Effects:
~Name (Number of levels: 292)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.30 0.01 0.27 0.33 1.00 2133 3041

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 1.45 0.02 1.41 1.49 1.00 1027
X50 0.10 0.01 0.08 0.12 1.00 3443
Num_Unique_Healthy_Ids -0.11 0.01 -0.12 -0.09 1.00 3532
Tail_ESS
Intercept 2006
X50 3890
Num_Unique_Healthy_Ids 3737

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.83 0.00 0.82 0.83 1.00 4041 3974

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

I have also created a non-linear model. After extensive research, I’m not entirely certain if it’s the correct approach for a non-linear model, but I have included quadratic effects for both predictors.

Family: lognormal
Links: mu = identity; sigma = identity
Formula: Social_dist ~ 1 + X50^2 + Num_Unique_Healthy_Ids^2 + (1 | p | Name)
Data: final_data (Number of observations: 30538)
Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 4;
total post-warmup draws = 4000

Group-Level Effects:
~Name (Number of levels: 292)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.30 0.01 0.27 0.33 1.00 2033 3073

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.45 0.02 1.41 1.49 1.00 1065 2007
X50 0.10 0.01 0.08 0.12 1.00 3478 3663
Num_Unique_Healthy_Ids -0.11 0.01 -0.13 -0.09 1.00 3618 3818

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.83 0.00 0.82 0.83 1.00 3740 3356

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

This is the LOO comparison.

loo_compare(loo_linear, loo_nonlinear)
elpd_diff se_diff
fit_log 0.0 0.0
fit_nonlinear -1.0 0.4

After doing some reading, I think the models have been executed well, but I’m not entirely sure. Could you please provide me with some assistance? Thank you very much!

Cheers,
Felix

wpetry · May 6, 2024, 3:03am

Welcome to the Stan community, @Felix_Requena!

It looks like you’re fitting two candidate models: one with untransformed predictors and one where the predictors have been squared. Importantly, both models still fall within the linear model framework. (brms has a separate syntax for fitting a wider range of non-linear functional forms, but simple polynomial terms are better fit using the standard linear syntax as you do here.) Squaring the predictors will throw away information on the sign of the predictors, which is ok if only magnitude matters. Also, it should make your fit more sensitive to the data points with large values of the predictor variables. For some problems, it often is useful to include linear and polynomial terms for the predictors in the model. It’s difficult to say what’s reasonable in your case without knowing more about your specific motivations and having domain expertise in your field.

On a minor note, your group-level intercept for Name doesn’t need the (... | p | ...). It’s not hurting anything, but isn’t being used either. This syntax is useful for setting up correlations between the group-level effects for models with multivariate responses. (1 | Name) will do just fine for your models.

Both of your candidate models seem to fit without any obvious problems based on the R-hat statistics you’ve provided. Any warning messages about divergences wouldn’t show up in these summary tables, but would be reason to investigate before proceeding further. A posterior predictive check is another good step to add to your post-fitting workflow.

Lastly, you use leave-one-out cross validation to measure the expected predictive accuracy of the models and to compare the two fits. By this metric, fit_log is the better model.

You’re doing a great job thinking carefully, asking questions, and sharing details for others to help you navigate your issue. Please feel welcome to add more to the thread if this didn’t answer your question and to open new topics as challenges arise.

Felix_Requena · May 6, 2024, 4:55am

Thank you very much for the warm welcome and your assistance. I’ll strive to clarify the advertisement further.

To delve into my models, I’m exploring how my dependent variable changes based on mixed effects, represented by the number of healthy individuals within a home range, denoted as X50. I’ve developed a same model for disease individuals.

I’m endeavoring to construct both nonlinear and linear models. In animals, it has been previously observed that when there are more healthy individuals, animals tend to be closer, suggesting a linear relationship. However, with the increasing number of individuals contracting diseases each year, I’m investigating the possibility of a nonlinear relationship. Additionally, my findings deviate somewhat from prior research; specifically, I’m noticing that animals closer to diseased animals tend to draw closer, whereas conventional wisdom suggests that more diseased animals should lead to greater separation, and healthy animals should cluster together.

As the results were pretty similar between linear and non linear model (see first comment), I did more reading and I found the possibility to use poli term to do more flexibility to the non model, and I found some non linear effects. When I compared (LOO) with the original model (see first comment) this model is better fit.

This are outcomes that I have obtained in my new models
1- Healthy individuals

H_fit_nonlinear_improved1 <-bf(Social_dist ~ 1 + poly(X50, 2) + poly(Num_Unique_Healthy_Ids, 2) + (1|Name)) + lognormal()

Outcome

Group-Level Effects: 
~Name (Number of levels: 292) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.27      0.01     0.25     0.30 1.00     2163     3246

Population-Level Effects: 
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                        1.44      0.02     1.41     1.48 1.00     1245     2306
polyX5021                       17.16      1.74    13.75    20.64 1.00     3496     3724
polyX5022                       -7.08      1.40    -9.87    -4.39 1.00     3624     3679
polyNum_Unique_Healthy_Ids21   -18.88      1.55   -21.91   -15.86 1.00     2995     3430
polyNum_Unique_Healthy_Ids22     6.52      1.31     3.96     9.06 1.00     3995     3932

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.83      0.00     0.82     0.83 1.00     3709     3518

Posterior predictive check

2- Diseased individuals

fit_nonlinear_improved1 <- bf(Social_dist ~ 1 + poly(X50, 2) + poly(Num_Unique_Diseased_Ids, 2) + (1|Name)) + lognormal()

Outcome

Group-Level Effects: 
~Name (Number of levels: 292) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.33      0.02     0.30     0.37 1.00     1840     2805

Population-Level Effects: 
                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                         1.45      0.02     1.41     1.49 1.00      835     1586
polyX5021                        13.31      1.63    10.09    16.47 1.00     3128     3739
polyX5022                        -4.61      1.41    -7.42    -1.92 1.00     3321     3420
polyNum_Unique_Diseased_Ids21   -13.32      1.65   -16.61   -10.07 1.00     3003     3739
polyNum_Unique_Diseased_Ids22     2.29      1.29    -0.25     4.76 1.00     3700     3749

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.83      0.00     0.82     0.83 1.00     3755     3667

Posterior predictive check

The posterior checks are not perfect, but the data fits well with the line, and from what I see, the models are well-built. Could you please advise on which model would be better to use in my case? I have been working on these models for four months, and I am unsure if the results I am obtaining are accurate.

Thank you very much.
Cheers,
F

wpetry · May 6, 2024, 2:08pm

Thanks for the additional information. It sounds like you have good rationale for using squared predictors in your model, and I agree that poly(..., 2) is a good way to express these in your formula. (Note that it’s equivalent to adding a linear/untransformed term for the predictor as I was asking about above, i.e.,y ~ 1 + x + I(x^2).)

If I understand you correctly, you’ve now fit four candidate models: fit_log, fit_nonlinear, and H_fit_nonlinear_improved1 include the same predictor variables just with increasing complexity. The fourth model, fit_nonlinear_improved1, mirrors the third model but substitutes the diseased individuals for healthy individuals as a predictor.

There isn’t a universal meaning of “better” when it comes to modeling. Better for what purpose? If your goal is prediction, you can compare all four models with LOO to find the model with the best predictive performance. If you want to evaluate the evidence in favor/against a given predictor, you should simply include it in the model and evaluate the posterior distribution of its coefficient. For example, it sounds like there is an interesting hypothesis about the effects of healthy vs. diseased individuals on the response. For this, you’d need both predictors in a single model. Note that this approach is very different than the model selection route you’ve been taking. Doing model selection prior to inference on the same data set will bias your inferences (Treddenick et al. 2021), although there are some regularization approaches that will downweight predictors that appear to be less important (lasso, ridge, horseshoe priors).

All of your models seem to be fitting without obvious technical problems, which is where the Stan forums are most useful. Which one you should use is a scientific question and will rely on your goals and expertise to move ahead.

Felix_Requena · May 8, 2024, 10:46pm

Sorry for the delay in replying. Thank you very much for your help. I tried to follow one of your pieces of advice—I decided to include both predictors in a model with an interaction effect. After discussing it with my supervisor, it seems better this way. I am now using this model.

fit_log_all_int <- bf(Social_dist ~ 1 + X50 +
                    Num_Unique_Diseased_Ids*Num_Unique_Healthy_Ids +  
                      (1|Name)) + lognormal()

Population-Level Effects: 
                                               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                                          1.37      0.03     1.32     1.42 1.00     1895     2815
X50                                                0.13      0.01     0.11     0.15 1.00     3658     3797
Num_Unique_Diseased_Ids                           -0.05      0.01    -0.07    -0.03 1.00     3437     3629
Num_Unique_Healthy_Ids                            -0.17      0.01    -0.19    -0.15 1.00     3392     3809
Num_Unique_Diseased_Ids:Num_Unique_Healthy_Ids     0.03      0.01     0.02     0.04 1.00     3805     3892

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.82      0.00     0.82     0.83 1.00     3625     3895

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

I think up until now the outcome is giving me what I need. The posterior checks are exactly the same as my previous comments. Thank you so much again. One last piece of advice?
Best,
F

wpetry · May 8, 2024, 11:24pm

I don’t understand—what part are you looking for feedback on?

Felix_Requena · May 9, 2024, 1:53am

Sorry, I think I sent a previous version of the draft. I believe I am getting to the point, but I will also spend some time thinking deeply about it. Thank you very much for all your help.

Cheers,
Felix

Topic		Replies	Views
Stan/brms fit is worse than simple linear mixed model fit in terms of RSS, what could be the reasons? brms	10	2340	March 18, 2020
Autoregression on non-linear model parameters in brms brms	16	1605	September 20, 2018
Best practice building non-linear brms models? brms	6	3540	July 2, 2019
Syntax/inferences for nested non-linear MLM brms	9	1904	June 19, 2019
Advice on a non linear regression model (brms) brms	2	661	February 28, 2019

Brms model linear vs non linear

Related topics