Hi, I am currently measuring the effect of pollination on a number of variables associated with plant yields in brms(version 2.10.0). The model consists of total plant weight (log transformed) a single population level effect (Po- (n=12) or Po+ (n=32)). I have a nested group level structure (Block (3 levels)/Colony (27 levels).

Po+ (Min. 1st Qu. Median Mean 3rd Qu. Max. )

0.000 0.687 3.557 7.586 10.895 40.619

Po- Min. 1st Qu. Median Mean 3rd Qu. Max.

0.000 0.000 0.000 1.090 1.415 12.721

fins.csv (1.3 KB)

When I run the following model i currently get divergent transitions warnings.

```
norm_c <- brm(log(sum_pod_weight) ~ 1+Pol+ (1|Block/Colony1),
data = fins,
family = "normal",prior = c(
set_prior ("normal(0,100)",class="Intercept"),
set_prior("normal(0,100)",class="b")),
chains = 4,
iter = 2000,control = list(adapt_delta =0.99 ,max_treedepth=13) ,
warmup = 1000)
```

when i check the output i can see that the sd(Intercept) for the block term is large

```
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log(sum_pod_weight) ~ 1 + Pol + (1 | Block/Colony1)
Data: fins (Number of observations: 44)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~Block (Number of levels: 3)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.22 2.46 0.33 9.94 1.00 759 493
~Block:Colony1 (Number of levels: 27)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.34 0.23 0.02 0.86 1.00 1033 1485
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.43 2.06 -3.18 4.01 1.00 786 491
PolPoP 1.03 0.39 0.25 1.80 1.00 3405 2751
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.99 0.13 0.76 1.27 1.00 2104 2149
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 22 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
```

Unfortunately, i think the size of the sd for the block effect is leading to divergent transmissions and to unrealistic values for my parameters being sampled, particularly as the variable is on the log scale. This is evident for example when i run:

```
> hypothesis (norm_c, "exp(Intercept+PolPoP)=exp(Intercept)")
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
1 (exp(Intercept+Po... = 0 17691.03 435325.3 0.02 103.15 NA NA
Star
1 *
```

Further when i drop the block group level effect i get no divergent transitions and estimates far closer to what i would expect

```
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log(sum_pod_weight) ~ 1 + Pol + (1 | Colony1)
Data: fins (Number of observations: 44)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~Colony1 (Number of levels: 27)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.67 0.27 0.10 1.19 1.00 789 830
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.99 0.36 0.28 1.70 1.00 3221 2354
PolPoP 0.62 0.46 -0.26 1.52 1.00 2571 2466
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.01 0.16 0.74 1.36 1.00 1366 2692
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
> hypothesis (norm_c, "exp(Intercept+PolPoP)=exp(Intercept)")
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
1 (exp(Intercept+Po... = 0 2.3 1.8 -1.15 5.9 NA NA
Star
1
---
```

I’ve have tried sqrt transformation and skew normal, gamma, weibull distributions but this increases the number of divergent transmissions and aren’t particularly good fit to the data. My first thought is to use a much tighter prior on the intercept and b parameters, but i feel this is potentially masking a larger issue and does not eliminate the divergent transitions warnings. I’m currently using the default prior in brms on the sd of the group level effects, so was wondering whether there may be other priors to try and constrain the sd of the group level effect of block. I also wonder as Block only has 3 levels whether this may be causing issues. While dropping block would solve the issue i think it is important it is included as it accounts for variation in the length of treatments. Any suggestions would be greatly welcomed. Data is attached. Thanks.