Difficulties getting model to converge: model has Weights, standard error; for large meta-analysis dataset

Hi everyone,

I am building a generalized additive model using brms . I am using the brms package because I am gathering data from the literature to build my model, so to combine them I need to use meta-analytic methods, and brms has the se() option for inputting standard errors while mgcv does not. My covariates have all been standardized and I added one to my response variable to avoid the error of log(0) which happens because I am using a log link with my gaussian family distribution. My model has this structure:

fit1 <- brm(response|weights(Weight) + se(stderr2, sigma = TRUE) ~ 
x1 + x2 + x3 + offset(control_variable), data = Data, family=gaussian(link="log"), iter=5000, warmup=500, chains=4, cores = 2,
                        control = list(adapt_delta = 0.99, max_treedepth = 18), save_pars = save_pars(all = TRUE))

I have come across some barriers during my model running that is beyond my ability to troubleshoot, so I am including my diagnostic output and showing them to you in the hopes that you can help me figure out the issue:

I am having difficulty getting the chains of two of my models to converge. Let’s call them fit1 and fit2. I have tried assigning pseudo-random inits - to all of my covariates as well as Intercept and sigma - based on l-95% and u-95% of previous model runs, but since those chains didn’t converge, the inits I am assigning might not be helpful after all.

My dataset for fit1 has 1000 datapoints and nine covariates, but the chains do not converge even with adapt_delta = 0.99 and iter = 10000.

My other dataset for fit2 contains approximately 9000 datapoints, and I have ten covariates in my model. The chains are not converging no matter how many iterations I assign, even with iter = 15000.

For meta-analysis approach where I can enter standard errors into my model, there are only three options for distributions I can fit my data to: skew_normal, gaussian, and student. I am using gaussian because it gives fewer divergent transitions.

For the smaller dataset of 1000 points, the model (fit1) gives lower R-hat values (but still > 1.01) and runs more quickly. This data is measuring a similar phenomena as that used in fit2; however, the actual measurement method is different. Otherwise the covariates and model structure are the same.

For the large dataset of 9000 points, the model (fit2) takes very long to compile – over two weeks. I tried removing one out of a pair of highly-correlated covariates, but it did not speed up the model very much. I find that when I use the default max_treedepth as opposed to assigning one manually (that is higher than 10), it speeds the model up significantly.

I have included my response variable histograms, model details, and traceplots for fit1 and fit2 for your perusal (attached below).
Model details and traceplots for Stan Forum.pdf (205.1 KB)

fit1 takes around 24 hours to compile. To try to figure out why fit2 takes so long to run (over 2 weeks), I tried a little experiment: I randomly sampled 1000 points from the dataset 4 times to make datasets of similar length to that of fit1 (n = 1000), and ran those 4 mini datasets to see if it speeds things up. If they speed up, it means the slow time is due to computational expense. If they are still much slower than ROV, it means intrinsic nature of data is slowing it down; for instance, the existence of outliers can make it difficult to fit a distribution to the data.

It turns out that the smaller datasets (n = 1000) randomly sampled from fit2 still take much longer to run compared to fit1. So there is something inherent about the data that is making it difficult to fit a gaussian distribution to. Also, (as with the overall models), there is always one or two chains that finish right away while the other two are super slow. It is like the chains get stuck in parameter space and are unable to push through to the values they are supposed to take on.

Finally, I tried using priors:

fit1 <- brm(abundanceMod|weights(Weight) + se(stderr2, sigma=TRUE) ~  
                     offset(area_sampled)+median_year+depth+slope+ship+yannick+latitude+longitude, 
                   family = gaussian(link="log"),chains=4,iter=1000,warmup=500,data = ROV, cores = 2, 
                   control = list(adapt_delta = 0.99, max_treedepth = 18),inits = my_second_list_of_inits, 
                   prior = c(prior(normal(0, 100), class = Intercept), 
                             prior(normal(0, 10), class = b), 
                             prior(cauchy(0, 2), class = sigma))) 

The chains converge only if I use a maximum of 6 covariates (out of 9) for fit1, and chains still do not converge regardless of how many covariates I remove for fit2 even with priors.

I am really at a loss as to what to try next – I have tried initialization, setting priors, increasing “iter”, standardizing all of the covariates, parallelization by increasing “cores”, and removing a highly-correlated covariate. I still cannot get the chains to converge and the models take a very long time to run. Thus, I am wondering: do you have any thoughts on why my models are not converging, and how to mitigate this issue?

Here are some of my thoughts:

  • perhaps the combination of average data (with SE) and raw data (with SE = 0) takes more of a computational toll on the computer
  • my response variables are highly right-skewed, and yet I cannot use Tweedie or negative binomial distributions because brms meta-analysis se() parameter does not support these distributions
  • there might be outliers in the dataset that make fitting the data difficult
  • there may be multiple highly-correlated covariates that I have not removed yet; however, even when I tried running the fit2 model with just 6 out of my 10 covariates, the chains still did not converge.

Your help is greatly appreciated!

Thank you,
Alice