Additive model hit the maximum treedepth, what could be wrong

I am trying to run some additive models using brms.

Some context:
I am trying to model some infant pupillometry data (for info you can even see my previous post: Additive model with random smooth for participant). I have run all my models with MGCV with success and now I am trying to move to brms.

I successfully run my main model that tests for

  • Interaction between Event (variable of interest; factorial with 2 levels) and Trial number
  • Smooth term on time (ms) for each Event (factorial with 2 levels)
  • Random smooth for each Subject
  • Random intercept for each subject (additional random intercept)

It is very slow (2/3 days) but in works perfectly.

brm(mean_pupil ~ Events*Trials
          + s(Time,  by = Events, k=8)
          + s(Time, Subject, bs = 'fs', m=1)
          + (1|Subject),
          data =  db, family = student,
          chains = 4, cores= 4,
          backend = "cmdstanr", threads = threading(2),
          iter = 8000, warmup = 6000,
          control=list(adapt_delta=0.99, max_treedepth = 12))

The second model is very similar with some changes:

  • Interaction between Event (variable of interest; factorial with 2 levels) and Gen (factorial with 2 levels)
  • controlling for Trials (number)
  • Smooth term on time (ms) for each Event (factorial with 2 levels)
  • Smooth term on time (ms) for each Gen (factorial with 2 levels)
  • Random smooth for each Subject
  • Random intercept for each subject (additional random intercept)

The second model is slightly more complex and unfortunately, we have to rely on less data (13969 datapoints instead of 26108).
When I run my second model I incur in some problems. I am trying to figure out how to solve them but I am not sure what could be the cause and where to start really.

brm(mean_pupil ~ Events*Gen+Trials
	+ s(Time,  by = Events, k=4)
	+ s(Time,  by = Gen, k=4)
	+ s(Time, Subject, bs = 'fs', m=1) # Smooth random effect
	+ (1|Subject),                     # Random effect
	data =  df, family = student,
	chains = 4, cores= 4,
	backend = "cmdstanr", threads = threading(2),
	iter = 16000 , warmup = 1000,
	control=list(adapt_delta=0.99, max_treedepth = 12))

The output of the models is:

Warning: 16000 of 16000 (100.0%) transitions hit the maximum treedepth limit of 12.

and

Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors.

As you can see I have already increased the number of iterations and treedepth but I am not sure if my problem could be caused by any other thing. I could increase the treedepth but everything would be even slower so I was trying to understand if there is something wrong with the specification of my model before running such an “expensive” model again.

Would anyone have any idea/solution?
Thank you a lot!

  • operating system: I run the model on my windows machine and the university HPC cluster
  • brms Version: 2.18.1/2.18.0
1 Like

I’m not an expert on GAM or brms, but I have run into similar issues with my GAM models fitted with brms, so perhaps I can help.

First, one thing to check is your divergences (when and where those occurred). I prefer to use pairs() or plot() on the brms model object. Check this excellent vignette, especially that part that shows how you can show the divergences in traceplots: Visual MCMC diagnostics using the bayesplot package • bayesplot. Also, this one: pairs.brmsfit: Create a matrix of output plots from a 'brmsfit' object in brms: Bayesian Regression Models using 'Stan'

Doing prior predictive simulations to set weakly informative priors is also a good idea. To do this, you can set sample_prior = only. To get a list of all parameters on which you can set priors, use get_prior: get_prior: Overview on Priors for 'brms' Models in brms: Bayesian Regression Models using 'Stan'. Then I like to plot the predictions from my model by using tidybayes.

I generally find it tricky to set priors on SDS’s for my GAM models. These regularise the wiggliness of the smooths. When I make predictions from my assumptions (before seeing the data), I am often wrong, leading to divergence. So I often need to set tighter priors on these parameters, which solves most of my issues.

Can you perhaps try this?

Additional thoughts:
Do you need the (1|Subject) part in the model? For example, to get a varying smooth, you only need to include the varying smooth, not the intercept. I hope someone else can comment on this, but when you fit a varying smooth in MGCV you will generally remove the random intercept because that is somehow already incorporated in the random smooth. See this paper for a reference: Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English - ScienceDirect.

Best regards,
Christian

3 Likes

@cmagelssen
Sorry, it took me a while before I was able to look at the model again. I followed your suggestions and managed to set some informative priors instead of using the defaults. That made all the difference!!! the model converged and was much faster.

Thank you for all the suggestions!!

2 Likes