Convergence failure (maybe) in brms

Benambridge · May 9, 2018, 6:39pm

Hi everyone

I’m getting a convergence failure message:

Warning message:
The model has not converged (some Rhats are > 1.1). Do not analyse the results!
We recommend running more iterations and/or setting stronger priors.

But when I look at the Rhats, all are 1.0 and the effective sample size also looks fine. Can anyone help?

Thanks
Ben

fit4a <- brm(formula = DV ~ (1+StypePCA1+ StypeTotal_Active_Freq + StypeTotal_Passive_Freq|Name) + (1+Stype|verb) + StypePCA1+StypeTotal_Active_Freq+StypeTotal_Passive_Freq, data = Kids,
family = bernoulli(link = “logit”),
set_prior(“normal(0,0.72)”, class = “b”),
warmup = 2000, iter = 10000, chains = 1, cores=4, save_all_pars = TRUE, control = list(adapt_delta = 0.99)) # All sentences

fit4a
Family: bernoulli
Links: mu = logit
Formula: DV ~ (1 + Stype * PCA1 + Stype * Total_Active_Freq + Stype * Total_Passive_Freq | Name) + (1 + Stype | verb) + Stype * PCA1 + Stype * Total_Active_Freq + Stype * Total_Passive_Freq
Data: Kids (Number of observations: 2160)
Samples: 1 chains, each with iter = 10000; warmup = 2000; thin = 1;
total post-warmup samples = 8000
ICs: LOO = NA; WAIC = NA; R2 = NA

Group-Level Effects:
~Name (Number of levels: 60)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 0.54 0.14 0.27 0.81 2963 1.00
sd(StypePASS) 0.95 0.17 0.63 1.30 2126 1.00
sd(PCA1) 0.23 0.10 0.03 0.44 2939 1.00
sd(Total_Active_Freq) 0.14 0.10 0.01 0.38 3196 1.00
sd(Total_Passive_Freq) 0.32 0.22 0.01 0.82 2720 1.00
sd(StypePASS:PCA1) 0.22 0.14 0.01 0.52 2597 1.00
sd(StypePASS:Total_Active_Freq) 0.12 0.10 0.01 0.37 4058 1.00
sd(StypePASS:Total_Passive_Freq) 0.25 0.20 0.01 0.74 4243 1.00
cor(Intercept,StypePASS) -0.43 0.20 -0.74 0.03 2012 1.00
cor(Intercept,PCA1) 0.31 0.28 -0.32 0.76 8000 1.00
cor(StypePASS,PCA1) 0.10 0.28 -0.46 0.61 8000 1.00
cor(Intercept,Total_Active_Freq) -0.05 0.32 -0.64 0.58 8000 1.00
cor(StypePASS,Total_Active_Freq) 0.07 0.30 -0.54 0.63 8000 1.00
cor(PCA1,Total_Active_Freq) 0.16 0.33 -0.52 0.73 8000 1.00
cor(Intercept,Total_Passive_Freq) 0.00 0.31 -0.59 0.62 8000 1.00
cor(StypePASS,Total_Passive_Freq) 0.10 0.31 -0.53 0.65 8000 1.00
cor(PCA1,Total_Passive_Freq) 0.20 0.33 -0.49 0.75 8000 1.00
cor(Total_Active_Freq,Total_Passive_Freq) -0.08 0.34 -0.70 0.58 8000 1.00
cor(Intercept,StypePASS:PCA1) 0.04 0.31 -0.57 0.63 8000 1.00
cor(StypePASS,StypePASS:PCA1) 0.12 0.30 -0.48 0.67 8000 1.00
cor(PCA1,StypePASS:PCA1) -0.08 0.33 -0.68 0.58 8000 1.00
cor(Total_Active_Freq,StypePASS:PCA1) 0.01 0.33 -0.61 0.63 6503 1.00
cor(Total_Passive_Freq,StypePASS:PCA1) 0.03 0.33 -0.60 0.65 6116 1.00
cor(Intercept,StypePASS:Total_Active_Freq) -0.02 0.32 -0.62 0.58 8000 1.00
cor(StypePASS,StypePASS:Total_Active_Freq) -0.07 0.32 -0.66 0.56 8000 1.00
cor(PCA1,StypePASS:Total_Active_Freq) -0.07 0.33 -0.67 0.58 8000 1.00
cor(Total_Active_Freq,StypePASS:Total_Active_Freq) -0.11 0.34 -0.71 0.56 8000 1.00
cor(Total_Passive_Freq,StypePASS:Total_Active_Freq) -0.11 0.34 -0.72 0.57 8000 1.00
cor(StypePASS:PCA1,StypePASS:Total_Active_Freq) 0.01 0.33 -0.63 0.64 8000 1.00
cor(Intercept,StypePASS:Total_Passive_Freq) 0.01 0.33 -0.62 0.63 8000 1.00
cor(StypePASS,StypePASS:Total_Passive_Freq) -0.04 0.33 -0.64 0.59 8000 1.00
cor(PCA1,StypePASS:Total_Passive_Freq) -0.04 0.33 -0.66 0.61 8000 1.00
cor(Total_Active_Freq,StypePASS:Total_Passive_Freq) -0.09 0.34 -0.71 0.59 8000 1.00
cor(Total_Passive_Freq,StypePASS:Total_Passive_Freq) -0.10 0.35 -0.72 0.58 8000 1.00
cor(StypePASS:PCA1,StypePASS:Total_Passive_Freq) 0.02 0.33 -0.62 0.65 8000 1.00
cor(StypePASS:Total_Active_Freq,StypePASS:Total_Passive_Freq) -0.08 0.35 -0.69 0.60 5588 1.00

~verb (Number of levels: 72)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 0.59 0.13 0.35 0.85 3550 1.00
sd(StypePASS) 0.84 0.17 0.53 1.18 3241 1.00
cor(Intercept,StypePASS) -0.95 0.06 -1.00 -0.81 3638 1.00

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 1.41 0.14 1.15 1.70 8000 1.00
StypePASS -1.37 0.20 -1.76 -0.98 8000 1.00
PCA1 0.32 0.11 0.09 0.54 8000 1.00
Total_Active_Freq 0.02 0.24 -0.44 0.50 8000 1.00
Total_Passive_Freq 0.16 0.51 -0.85 1.16 8000 1.00
StypePASS:PCA1 -0.05 0.15 -0.35 0.24 8000 1.00
StypePASS:Total_Active_Freq -0.05 0.28 -0.58 0.49 8000 1.00
StypePASS:Total_Passive_Freq -0.06 0.58 -1.20 1.08 8000 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

bgoodri · May 9, 2018, 8:11pm

It is probably the Rhat of the lp__

Benambridge · May 9, 2018, 9:07pm

Thanks! So what’s the solution - just running it for longer?

Thanks
Ben

bgoodri · May 9, 2018, 9:10pm

I would run more chains before running the existing chains longer, but I don’t pay much attention to Rhat anyways, especially for lp__.

avehtari · May 9, 2018, 11:20pm

Yes, do this. With 1 chain Rhat doesn’t work well. Run at least 4 chains.

You should. Also for lp__

paul.buerkner · May 10, 2018, 10:20am

You may want to look at rhat(fit4a) to see which parameters have high Rhat.

betanalpha · May 10, 2018, 1:50pm

To bring Ben and Aki’s comments into context – convergence diagnostics like \hat{R} consider the marginal behavior of your chains, and that can be very different for different variables. Some variables converge quickly, and their expectation values can be estimated quickly, while some converge more slowly and require longer running times.

If \hat{R} is close enough to one for all of your variables except for lp__ then the expectation value estimates for those variables are probably okay, especially if none of the other diagnostics are indicating problems.
But it doesn’t mean that the estimates for expectations of any function of those variables will be okay! Variables can correlate with each other in ways that makes the convergence of a function worse and hence the estimate untrustworthy.

lp__ tends to be extremely sensitive to the autocorrelation of the Markov chain and hence provides a reasonable bound on how well any function of the variables will converge. In other words, ensuring that the diagnostics for lp__ are good gives you the strongest evidence that your fit is okay but if you focus only on a few variables and carefully check the diagnostics for those variables then you may be able to ignore lp__ for that very specific context.

Benambridge · May 10, 2018, 5:01pm

Thanks so much everyone! OK so I reran it with 4 chains, 10,000 iterations and adapt delta=0.99, but still get the same problem. Looking at the rhats now, none are above 1.001 (many are 0.999), but the following are all NaN, which presumably suggests some problem with the model. Does anyone have any idea what that might be?

Thanks
Ben

`L_1[1,1]` L_1[1,2]
`L_1[1,3]` L_1[1,4]
`L_1[1,5]` L_1[1,6]
`L_1[1,7]` L_1[1,8]
`L_1[2,3]` L_1[2,4]
`L_1[2,5]` L_1[2,6]
`L_1[2,7]` L_1[2,8]
`L_1[3,4]` L_1[3,5]
`L_1[3,6]` L_1[3,7]
`L_1[3,8]` L_1[4,5]
`L_1[4,6]` L_1[4,7]
`L_1[4,8]` L_1[5,6]
`L_1[5,7]` L_1[5,8]
`L_1[6,7]` L_1[6,8]
`L_1[7,8]` L_2[1,1]
$L_2[1,2]

betanalpha · May 10, 2018, 5:02pm

The upper triangular components of a Cholesky factor are constant. This registers as having zero empirical variance which causes the \hat{R} calculation to explode into a NaN. You can safely ignore those NaNs.

Topic		Replies	Views
Is it possible to get an error with rhat() due to convergence problems? brms fitting-issues , brms	3	109	August 1, 2024
Problemas de convergência em brms Modeling brms	1	47	December 13, 2024
NAs for the rhat values of Cholesky factor Modeling fitting-issues , brms	1	37	July 29, 2024
Brms model does not converge - any other solutions than stronger priors? brms rstan	7	1840	December 3, 2019
Single Chain Doesn't Converge in Hierarchical Model [brms] Modeling fitting-issues	1	1020	March 13, 2018

Convergence failure (maybe) in brms

Related topics