Some chains hanging during fitting a random effects model to a large dataset

I’m working with a relatively large dataset (~210,000 rows) with multiple random effects. The model is:

photo_rating ~ positiveemotion_baseline_mean + contrast_1 + contrast_2 + (1+contrast_1+contrast_2|country) + (1|unique_id)

where photo_rating is a Likert rating in response to a photo, positive_emotion_baseline_mean is the average response to several Likert questions prior to viewing the set of photos, contrast_1 and contrast_2 are contrast codes of a four-level condition variable, country is a set of ~70 country IDs, and unique_id is a an ID for each participant (total ~21,000 participants).

The model uses the following two priors on contrast_1 and contrast_2:

prior_1 ← prior(cauchy(0, .18), class=b, coef=contrast_1)
prior_2 ← prior(cauchy(0, .25), class=b, coef=contrast_2)

The call to brm is:

m ← brm(photo_rating ~ positiveemotion_baseline_mean + contrast_1 + contrast_2 + (1+contrast_1+contrast_2|country) + (1|unique_id),

prior = c(prior_1, prior_2),

cores = 4,
iter = 1.1e4,
warmup = 1e3,

sample_prior = TRUE,
save_all_pars = TRUE,

data = pos_photo_long
)

Once the model is estimated, I need to compute a Bayes factor for contrast_1 and contrast_2. I plan to do so using bayestestR::bayesfactor_models. (Before anyone says so, yes, I realize there are reasons not to use Bayes factors; I don’t have much flexibility about the testing approach for this task)

I’ve been having trouble getting the model to fit. I have already successfully fit very similar models to this same dataset (i.e., different outcomes that are not measured multiple times per participant). However, when I try this one, each time one of the chains hangs at the warmup stage without progressing to the sampling stage.

I’ve tried each of the following:

  1. Reducing the number of samples to, say, 4e3. This doesn’t work because the problem comes at the very beginning of estimation (the chain just hangs at the very beginning of the warmup stage)
  2. Fitting the model to a smaller subset of the data (~10%). This worked on the subset that I tried
  3. Changing “inits” to “0”. This didn’t work; some of the chains hung at the beginning
  4. Changing “init_r” to other values (currently in progress; I’m trying .5). I’m not sure whether this will work yet

I could use advice on how to troubleshoot these problems. Model-fitting takes a long time so troubleshooting this issue is very time-consuming!

I can think of two things to try here. The first, a quick one, would be try running your models with the decomp="QR" option. Which can help with correlated predictors.

Additionally, you can try changing your priors (for parameters other than contrast_1 and contrast_2 as well). The default priors might not be very consistent with your data, or they could be inducing some tricky posterior geometry that the sampler is getting stuck in.

Given that a dataset of this size is likely to be fairly slow to fit, I’d also suggest installing the cmdstanr package (instructions here) and running your models with the backend="cmdstanr" option. This allows you to use within-chain parallelisation (more background here). Alternatively, if you have a discrete GPU in your system, you can setup OpenCL to GPU-accelerate your models. Let me know if you want more info on that.

2 Likes

Thanks, this is very helpful!

It seems quite plausible to me that the estimation problems are occurring with the (1|unique_id) random intercepts – the models that I’ve already fit don’t have this term because the variables weren’t measured within participants and those models fit just fine.

Do you have any general thoughts on how I should try adjusting the prior for (1|unique_id)? Maybe make the prior narrower than the default?

A quick way to check if that’s the case would be to just fit the model without that random effect, if the model makes through warmup then you know you’re on to something.

For a rough analogue of the scale of variance that should be included in the prior for (1|unique_id), you can look at the variance in the grouped means of photo_rating (or even just the quantiles). This will give you a (very) rough indication of how wide the distribution of random intercepts will need to be. Then, if you run your brm model with sample_prior = "only", this will give the implied prior distributions for the random effects and you can see how wide/narrow it is in comparison to your data.

1 Like

wonderful! this gives me a lot to go on. i’ll give these ideas a shot