Symptom:
One or more chains does not stop within reasonable time (+10x the fastest chain)
Platform: brms on rocker based Docker containers, (earthlab/r-greta:latest, methodsconsultants/tidyverse-h2o:latest)
But with different models, both with and without future set to TRUE, to see if that did make a difference, I have chains ‘hanging’ with no visible progress but full cpu usage.
And terminating the process does result is loss of information from the other chains.
Both adapt_delta and treedepth have been increased as suggested from prior test runs.
The image show the latest try with the following code.
chains = 8
model5 <- brm(formula = y ~ -1 + antal_0 + antal_1 + antal_2 + antal_3 + antal_4 + antal_5
+ antal_6 + antal_7 + antal_8 + antal_9 + antal_10 + antal_11 + antal_12 + antal_13
+ antal_14 + antal_15 + antal_16 + (1 | omNavn) + (1 | monthNr)
, chains = chains, iter = 50000,
data = brmData, control = list(max_treedepth = 20, adapt_delta = 0.9999),
inits = initfun, prior = set_prior("exponential(0.1)"), cores = chains)
nr of rows 84, dim(omNavn) = 7, dim(monthNr) = 12.
Selected output:
Fastest:
Chain 6: Elapsed Time: 612.074 seconds (Warm-up)
Chain 6: 35.4076 seconds (Sampling)
Chain 6: 647.482 seconds (Total)
Slowest (but finished)
Chain 5: Elapsed Time: 1687.63 seconds (Warm-up)
Chain 5: 196.166 seconds (Sampling)
Chain 5: 1883.79 seconds (Total)
Non completed chains (last message):
Chain 1: Iteration: 20000 / 50000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 50000 [ 10%] (Warmup)
Chain 4: Iteration: 35000 / 50000 [ 70%] (Sampling)
I hope you can help me clarify, where to search for the cause, and if there are ways to end a chain without loosing information from completed chains (or is there some model problems, if the slower chains are from hard to diagnose areas of the distribution)
I have seen the problem with this model, but also with other types of models, but searching the net does not give any real idea as why some chains ‘dies’ on me.
Kind regards