Divergent transitions due to group level standard deviations

Hi, I am currently measuring the effect of pollination on a number of variables associated with plant yields in brms(version 2.10.0). The model consists of total plant weight (log transformed) a single population level effect (Po- (n=12) or Po+ (n=32)). I have a nested group level structure (Block (3 levels)/Colony (27 levels).
Po+ (Min. 1st Qu. Median Mean 3rd Qu. Max. )
0.000 0.687 3.557 7.586 10.895 40.619
Po- Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 1.090 1.415 12.721
fins.csv (1.3 KB)

When I run the following model i currently get divergent transitions warnings.

norm_c <- brm(log(sum_pod_weight) ~ 1+Pol+ (1|Block/Colony1), 
                data = fins,
                  family = "normal",prior = c(
                  set_prior ("normal(0,100)",class="Intercept"),
                   set_prior("normal(0,100)",class="b")),
                chains = 4,
                iter = 2000,control = list(adapt_delta =0.99 ,max_treedepth=13) ,
                warmup = 1000)

when i check the output i can see that the sd(Intercept) for the block term is large

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(sum_pod_weight) ~ 1 + Pol + (1 | Block/Colony1) 
   Data: fins (Number of observations: 44) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~Block (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.22      2.46     0.33     9.94 1.00      759      493

~Block:Colony1 (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.34      0.23     0.02     0.86 1.00     1033     1485

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.43      2.06    -3.18     4.01 1.00      786      491
PolPoP        1.03      0.39     0.25     1.80 1.00     3405     2751

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.99      0.13     0.76     1.27 1.00     2104     2149

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 22 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Unfortunately, i think the size of the sd for the block effect is leading to divergent transmissions and to unrealistic values for my parameters being sampled, particularly as the variable is on the log scale. This is evident for example when i run:

> hypothesis (norm_c, "exp(Intercept+PolPoP)=exp(Intercept)")
Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
1 (exp(Intercept+Po... = 0 17691.03  435325.3     0.02   103.15         NA        NA
  Star
1    *

Further when i drop the block group level effect i get no divergent transitions and estimates far closer to what i would expect

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(sum_pod_weight) ~ 1 + Pol + (1 | Colony1) 
   Data: fins (Number of observations: 44) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~Colony1 (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.67      0.27     0.10     1.19 1.00      789      830

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.99      0.36     0.28     1.70 1.00     3221     2354
PolPoP        0.62      0.46    -0.26     1.52 1.00     2571     2466

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.01      0.16     0.74     1.36 1.00     1366     2692

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
> hypothesis (norm_c, "exp(Intercept+PolPoP)=exp(Intercept)")
Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
1 (exp(Intercept+Po... = 0      2.3       1.8    -1.15      5.9         NA        NA
  Star
1     
---

I’ve have tried sqrt transformation and skew normal, gamma, weibull distributions but this increases the number of divergent transmissions and aren’t particularly good fit to the data. My first thought is to use a much tighter prior on the intercept and b parameters, but i feel this is potentially masking a larger issue and does not eliminate the divergent transitions warnings. I’m currently using the default prior in brms on the sd of the group level effects, so was wondering whether there may be other priors to try and constrain the sd of the group level effect of block. I also wonder as Block only has 3 levels whether this may be causing issues. While dropping block would solve the issue i think it is important it is included as it accounts for variation in the length of treatments. Any suggestions would be greatly welcomed. Data is attached. Thanks.

Since nobody else answered, I will give it a try, despite not being really an expert on this.

I think the problem is that your data cannot identify the Block effects. Note that in your data, Colony1 uniquely identifies Block, but for Block effects to be identifiable when Colony1 is included, Colony1 has to cut across Blocks. Does that make sense?

Also your priors are IMHO waaaay too wide for log scale. Since the log of your data is between -0.85 and 1.6, it is IMHO completely justified to put normal(0,1) on both Intercept, b and sd - and that still may be too wide. Your current priors put reasonable probability for sum_pod_weight of exp(200) > 10^85.

2 Likes

Thank you for the response its been really helpful. I had suspected the same thing and i am going to reconsider the random effects structure. When i include just the Colony1 effect it becomes far less sensitive to priors which also alerted me to the issue surrounding the Block random effect.

1 Like