Bulk_ESS too high !?

I am running a Wiener diffusion model using default priors in brms using the cmdstanr backend to improve runtime. Everything looks great except that Bulk_ESS is higher than the posterior draws. For the model parameters it is even (slightly) higher than the actual number of all draws (8000)

 Family: wiener 
  Links: mu = identity; bs = identity; ndt = identity; bias = identity 
Formula: RT1 | dec(cor) ~ cong + time + Raven_Score_c + concept_sum_c + Group + cong:time + cong:Raven_Score_c + cong:concept_sum_c + cong:Group + time:Raven_Score_c + time:concept_sum_c + time:Group + Raven_Score_c:concept_sum_c + Raven_Score_c:Group + (1 | ID) + (1 | subtopic) 
   Data: Wiener2 (Number of observations: 10577) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~ID (Number of levels: 102) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.05      0.01     0.03     0.06 1.00     1956     2695

~subtopic (Number of levels: 14) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.11      0.03     0.07     0.17 1.00      933     2149

Regression Coefficients:
                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                         0.48      0.03     0.41     0.54 1.00      952     1555
congincongruent                  -0.36      0.02    -0.39    -0.33 1.00     4841     3194
timepre                          -0.13      0.02    -0.16    -0.10 1.00     4427     3104
Raven_Score_c                     0.01      0.00     0.01     0.02 1.00     4842     3513
concept_sum_c                     0.00      0.00    -0.00     0.01 1.00     4793     3643
GroupTwin                        -0.05      0.02    -0.10    -0.01 1.00     3472     3072
congincongruent:timepre           0.02      0.02    -0.02     0.05 1.00     5229     3383
congincongruent:Raven_Score_c    -0.00      0.00    -0.01    -0.00 1.00     4930     3374
congincongruent:concept_sum_c    -0.00      0.00    -0.01     0.00 1.00     5750     3327
congincongruent:GroupTwin         0.04      0.02    -0.00     0.08 1.00     5605     3159
timepre:Raven_Score_c            -0.01      0.00    -0.01    -0.00 1.00     5631     3463
timepre:concept_sum_c             0.00      0.00    -0.00     0.01 1.00     6049     3310
timepre:GroupTwin                 0.03      0.02    -0.01     0.07 1.00     5242     2900
Raven_Score_c:concept_sum_c      -0.00      0.00    -0.00     0.00 1.00     4304     3504
Raven_Score_c:GroupTwin          -0.00      0.00    -0.01     0.00 1.00     3892     3549

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
bs       4.71      0.02     4.67     4.76 1.00     8193     3077
ndt      0.79      0.01     0.77     0.81 1.00     7627     3318
bias     0.48      0.00     0.47     0.49 1.00     8290     2913

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Maybe this is not an issue, but I wanted to check with someone with more expertise. Any idea what to investigate?

I checked the posterior predictive check, where the estimated curves are a bit lower than the curve of the data, indicating that the uninformative default priors still have an influence. However I am not sure what could affect the Bulk_ESS draws

2 Likes

It’s a feature, not a bug. :-)

The effective sample size indirectly tells you how big the monte carlo variance of your quantity of interest is: It answers the question “How many independent draws would I need to get the same amount of monte carlo error” (although the ess values have a substantial amount of uncertainty themselves, especially if they are small…). But it turns out that independent draws are not the best you can do. If you distribute the draws strategically to avoid clusters, you can get a lower monte carlo variance than independent draws, so you would need more independent draws than actual draws to match this.

(see Posterior Analysis for more info about the definitions)

This is for instance the idea behind quasi monte carlo integration. The nuts sampler can’t quite match a perfect quasi monte carlo estimator, but it does have a bias for longer jumps between draws, which tends to increase the effective sample size above the number of draws if it is working well.

5 Likes

Just to complement @aseyboldt 's good answer:

Independent draws are not the best you can do for a single, fixed quantity. Having successive samples negatively correlated (which is what frequently happens with NUTS, also called “antithetic behaviour”) can lead to higher ESS for the mean of the distribution or similar measure of centrality (as captured by Bulk_ESS), but this will happen at the cost of lower ESS for some other quantities (e.g. variance).

There is no free lunch and you can never improve over independent draws (i.e. increase ESS beyond the number of draws) for all quantities at the same time.

This is also the reason why you always compute ESS for some quantity as ESS can differ quite a bit based on the quantity of interest.

7 Likes