Few effective samples for a nested three-level logistic model with a random intercept

Please also provide the following information in addition to your question:

  • Operating System: Win10
  • brms Version: 2.8.0

Hello! I have built up a three-level model binary logistic model where the small group is nested within a big group. I have five population-level variables (a1-a5 are category ones and a6 is numerical) and one group-level variable (b1 as a variable at the big group level). For quick coverage, I adopt the prior for “sd” as Cauchy (0, 2.5). However, I met a problem with my model results. As there are very few effective samples for the estimates of “sd” at both group levels, and few effective samples for the estimates of “Intercept” and the group-level variable “b1”. Anyone could help detect the problem with my model? Is there any suggestions to deal with the problem? Many thanks. Diva

The output shows as below:

Family: bernoulli
Links: mu = logit
Formula: Rf ~ a1 + a2 + a3 + a4 + a5 + a6 + b1 + (1 | BigGroup/SmallGroup)
Data: orgin (Number of observations: 5314)
Samples: 2 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup samples = 4000

Group-Level Effects:
~BigGroup(Number of levels: 14)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 1.00 0.34 0.54 1.79 578 1.00

~BigGroup:SmallGroup (Number of levels: 115)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 1.24 0.11 1.03 1.47 859 1.00

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept -0.91 0.33 -1.57 -0.27 1272 1.00
a1F -0.09 0.07 -0.23 0.06 3535 1.00
a2W 0.31 0.11 0.10 0.52 2588 1.00
a2G 0.78 0.17 0.46 1.11 2646 1.00
a2S 0.45 0.19 0.09 0.80 2701 1.00
a3dd -0.16 0.10 -0.35 0.04 3081 1.00
a3nn -0.06 0.13 -0.31 0.18 2769 1.00
a3tm -0.13 0.11 -0.36 0.10 2998 1.00
a3ot -0.29 0.16 -0.61 0.01 3326 1.00
a4low -0.04 0.10 -0.23 0.15 3390 1.00
a4mid 0.05 0.12 -0.18 0.29 3150 1.00
a5HH 0.49 0.09 0.32 0.66 3228 1.00
a6 -0.06 0.07 -0.20 0.08 2324 1.00
b1 -0.28 0.34 -0.84 0.41 784 1.00

To me those look like reasonably healthy diagnostics and I wouldn’t worry much as long as you get at least few hundred effective samples. Note that 578 effective samples is very good to determine the overall tendency and likely still very good for determining 90% credible intervals. Wider intervals (e.g. 95%) might be slightly unstable but if you need good resolution for those, then just run more iterations (via the iter argument).

That is actually quite a wide prior for logistic regression. Quickly simulating with

N_samples <- 1e4; 
mean(abs(rnorm(N_samples, 0, abs(rcauchy(N_samples, 0, 2.5)))) > log(100))

Shows that this prior assumes almost 25% probability of an effect with odds ratio > 100 or < 0.01. This to me seems unrealistic for most contexts. But please, make sure to check that my reasoning and code makes sense to you - I definitely do make mistakes :-)

Prior of N(0,2) puts such large effects a priori just below 5%, which is still quite wide but not unreasonably wide.

Hope that helps.

3 Likes

Thanks for your reply. Another question is how I could extract 90% credible intervals instead of 95% credible intervals as default? Thanks in advance.

Best,

Diva

Most functions, such as summary, posterior_interval and many more accept a prob argument, which you could set to prob=0.9 for a 90% posterior interval. Check the documentation of the functions you use to summarize your data.

Thanks for your answer. I have a new problem, about interaction. I constructed a formula like bf(Rf ~ a1 + a2 + a3 + a4 + a5 + a6 * b1 + (1 | BigGroup/SmallGroup). But when I run the model in brms, some errors appeared like “Error in unserialize (socklist[[n]]): error reading from connection”; and “Error in serialize (data, node$con, xdr = FALSE): error writing to connection”. Could you please help check what’s wrong with my model? Does the interaction symbolized by “a6 * b1” in the formula?

Do you use the future package?

No, just brms & rstan. Is there any document explaining how to construct a formula with interaction terms?

This almost certainly is not (directly) about the interaction term. These errors appear when some of the Stan threads crashed without giving out a nice error - this is typical of crashes within the C++ code. This might even be a bug in Stan, but is often more inocuous (e.g. running out of memory). To troubleshoot, you should definitely run with chains=1 and possibly also verbose=TRUE .

My best bet would be memory - how many group levels you have for the a6 and b1 terms? If this is very large you could indeed run out of memory.