Erratic running time for hierarchical poisson model

Hi guys,

I am characterizing COVID-19 deaths in Ohio by zip-code using a poison regression. I did not have several issues building the unadjusted version against age-distribution. However, I once run the full model including all covariates. I noted one weird behavior.
It starts blazing-fast ending 3/4 chains in less than 3 minutes. Then, the four chain can take almost 1 day and a half to finish! :(

Chain 1: Iteration: 7600 / 8000 [ 95%]  (Sampling)
Chain 1: Iteration: 8000 / 8000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 12.3 seconds (Warm-up)
Chain 1:                38.835 seconds (Sampling)
Chain 1:                51.135 seconds (Total)

Another issue is that the output sometimes converges, sometimes does not.
I am scaling my predictors and using weak informative priors for both betas and random effects.

#scale
brmsDFs.whitePE<-scale(brmsDF$whitePE,scale = F)
#formula
formula<-formula(Covid.Deaths ~ s.pm25+s.EP_POV+s.whitePE+s.blackPE+s.latinoPE+s.clrd.SMR+s.diabetes.SMR+s.hta.SMR +s.ischemic.SMR+ s.aU25+s.a2534+s.a3544+s.a4559+s.a6074+s.aOver75 + (1|County))
bprior <- c(prior_string("normal(0,10)", class = "b"),
                  prior_(~cauchy(0,2), class = ~sd, group = ~County, coef = ~Intercept))
brm.proto6.sf <- brm(Covid.Deaths ~ offset(log(Ecovid)) +s.pm25+s.EP_POV+s.whitePE+s.blackPE+s.latinoPE+s.clrd.SMR+s.diabetes.SMR+s.hta.SMR +s.ischemic.SMR+ s.aU25+s.a2534+s.a3544+s.a4559+s.a6074+s.aOver75 + (1|County),
              data = brmsDF,
              prior = bprior,
              family = "poisson",chains=4,warmup=2000,
              iter = 8000, cores  = 20,seed = 2020,
              control = list(adapt_delta = 0.95,max_treedepth=15))

The output looks like this:

There were 13012 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupThere were 5853 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 15. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceededThere were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems
The largest R-hat is 5.54, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hatBulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-essTail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
> summary(brm.proto5b.sf)
Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors.There were 13012 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup Family: poisson 
  Links: mu = log 
Formula: Covid.Deaths ~ offset(log(Ecovid)) + s.pm25 + s.EP_POV + s.whitePE + s.blackPE + s.latinoPE + s.malePE + s.femalePE + s.clrd.SMR + s.diabetes.SMR + s.hta.SMR + s.ischemic.SMR + s.aU25 + s.a2534 + s.a3544 + s.a4559 + s.a6074 + s.aOver75 + (1 | County) 
   Data: brmsDF (Number of observations: 2438) 
Samples: 4 chains, each with iter = 8000; warmup = 2000; thin = 1;
         total post-warmup samples = 24000

Group-Level Effects: 
~County(Number of levels: 42) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.21      1.26     0.00     3.32 3.57        4       11

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -1.14      1.26    -2.36     0.77 2.79        5    24000
s.pm25            -0.17      1.03    -1.90     0.78 4.11        4    24000
s.EP_POV           0.56      0.94    -0.70     1.69 5.09        4        5
s.whitePE         -0.10      0.90    -0.91     1.35 2.85        5    24000
s.blackPE          0.22      1.06    -1.11     1.85 4.03        4       11
s.latinoPE         0.31      0.85    -0.94     1.18 3.88        4       11
s.malePE          -0.94      3.17    -9.17     1.75 2.92        5       43
s.femalePE        -0.77      3.25    -9.07     1.75 3.14        5       43
s.clrd.SMR         0.19      0.30    -0.06     0.72 3.63        4       11
s.diabetes.SMR     0.05      0.67    -1.12     0.75 4.28        4       11
s.hta.SMR          0.20      0.17     0.01     0.47 3.66        4    24000
s.ischemic.SMR    -0.33      0.32    -0.87     0.01 3.35        4       11
s.aU25             0.53      1.06    -0.80     1.89 5.31        4        5
s.a2534            0.37      1.20    -1.39     1.61 5.16        4        5
s.a3544            0.30      1.26    -1.52     1.53 3.48        4       11
s.a4559           -0.05      1.29    -1.65     1.86 5.38        4    24000
s.a6074            0.10      0.61    -0.70     0.71 4.32        4    24000
s.aOver75         -0.52      0.90    -1.86     0.50 4.07        4    24000
  • Operating System: Windows 10 x64
  • brms Version: 2.12.0

Any help related to understand the unpredictable running time or the model output is greatly appreciated.

Thanks,
Esteban

Oke simple thing could be to set inits = 0 and see if that stabilizes estimation.

Hi Paul,

After running three times the model I can confirm inits=“0” stabilized the way how the chains executed and the results. I mean all four chains behaving similar and ending with little time difference and 700 divergent transitions that might be reasonable but having RHAT<1.05 with good Bulk_ESS.

I would like to say I am happy with the results but something is weird with two covariates. Specifically s.malePE and s.femalePE who had a very large SE (7.02) and an uncommon credible interval [-13.86, 13.67 ]. Although those two covariates seem to not affect the model. The way how they have similar values makes me worry a little bit

> bprior <- c(prior_string("normal(0,10)", class = "b"),
+             prior_(~cauchy(0,2), class = ~sd, 
+                    group = ~County, coef = ~Intercept))
> 
> brm.proto5b.sf <- brm(Covid.Deaths ~ offset(log(Ecovid)) +s.pm25+s.EP_POV+s.whitePE+s.blackPE+s.latinoPE+s.malePE+s.femalePE+s.clrd.SMR+s.diabetes.SMR+s.hta.SMR +s.ischemic.SMR+ s.aU25+s.a2534+s.a3544+s.a4559+s.a6074+s.aOver75 + (1|County),
+               data = brmsDF,
+               prior = bprior,
+               family = "poisson",chains=4,warmup=2000,inits="0",
+               # family = poisson(),
+               iter = 8000, cores  = 20,seed = 2020,
+               control = list(adapt_delta = 0.95,max_treedepth=15))
Rows containing NAs were excluded from the model.Compiling the C++ model
Start sampling
There were 700 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 15. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceededExamine the pairs() plot to diagnose sampling problems
summary(brm.proto5b.sf)
 Family: poisson 
  Links: mu = log 
Formula: Covid.Deaths ~ offset(log(Ecovid)) + s.pm25 + s.EP_POV + s.whitePE + s.blackPE + s.latinoPE + s.malePE + s.femalePE + s.clrd.SMR + s.diabetes.SMR + s.hta.SMR + s.ischemic.SMR + s.aU25 + s.a2534 + s.a3544 + s.a4559 + s.a6074 + s.aOver75 + (1 | County) 
   Data: brmsDF (Number of observations: 2438) 
Samples: 4 chains, each with iter = 8000; warmup = 2000; thin = 1;
         total post-warmup samples = 24000

Group-Level Effects: 
~County (Number of levels: 42) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.96      0.12     0.76     1.24 1.00     3092     5649

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -0.85      0.15    -1.15    -0.55 1.00     2073     3688
s.pm25             0.14      0.01     0.12     0.16 1.00    16277    15923
s.EP_POV           0.02      0.00     0.02     0.03 1.00    25185    18169
s.whitePE         -0.02      0.00    -0.02    -0.01 1.00    25290    18712
s.blackPE          0.00      0.00    -0.00     0.00 1.00    25074    19348
s.latinoPE         0.01      0.00     0.01     0.02 1.00    28391    21988
s.malePE          -0.08      7.02   -13.86    13.67 1.00    12186    14499
s.femalePE         0.01      7.02   -13.77    13.76 1.00    12186    14499
s.clrd.SMR        -0.01      0.00    -0.01    -0.01 1.00    32953    21542
s.diabetes.SMR    -0.00      0.00    -0.00     0.00 1.00    37276    19736
s.hta.SMR          0.00      0.00     0.00     0.00 1.00    32451    21629
s.ischemic.SMR     0.00      0.00    -0.00     0.00 1.00    26624    23182
s.aU25             0.03      0.08    -0.13     0.18 1.00     4875     7443
s.a2534            0.04      0.08    -0.11     0.19 1.00     4881     7367
s.a3544            0.14      0.08    -0.01     0.29 1.00     4981     7416
s.a4559            0.19      0.08     0.04     0.34 1.00     4906     7430
s.a6074           -0.01      0.08    -0.17     0.14 1.00     4896     7513
s.aOver75          0.31      0.08     0.16     0.46 1.00     5003     7560

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Again, any help related to understand that output is greatly appreciated.

Thanks,
Esteban

I don’t know what these covariates mean so I cannot really be helpful. But could it be that you just want to include one of them?

I finally solved. The inits=“0” was perfect to align the chains execution

Thanks,