I’m wondering if anyone has any tips on how to improve model sampling so that all parameters have similar efficiencies (i.e. ESS values).
> summary(model2)
Family: bernoulli
Links: mu = identity
Formula: index ~ inv_logit(int + slope * Phi)
int ~ 0 + category
slope ~ 0 + category
Phi ~ 0 + g_id
Data: sdata_2c (Number of observations: 80502)
Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup draws = 8000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
int_category1 0.26 0.03 0.20 0.31 1.00 9329 5800
int_category2 0.30 0.07 0.16 0.42 1.00 2869 3545
int_category3 -0.05 0.04 -0.11 0.02 1.00 5775 5344
int_category4 -0.22 0.04 -0.30 -0.15 1.00 5331 4111
int_category5 0.31 0.04 0.23 0.40 1.00 3657 3771
int_category6 -0.94 0.05 -1.03 -0.85 1.00 3466 4421
int_category7 0.33 0.05 0.24 0.42 1.00 3388 3927
int_category8 -0.11 0.03 -0.18 -0.04 1.00 7077 5611
int_category9 -0.10 0.05 -0.20 -0.01 1.00 3132 4040
int_category10 -0.41 0.03 -0.48 -0.35 1.00 6692 5429
slope_category1 -0.13 0.02 -0.17 -0.10 1.02 228 573
slope_category2 2.43 0.32 1.85 3.11 1.02 220 477
slope_category3 0.46 0.06 0.35 0.60 1.02 206 460
slope_category4 0.62 0.08 0.47 0.79 1.02 205 483
slope_category5 -1.02 0.14 -1.30 -0.78 1.02 202 465
slope_category6 1.05 0.14 0.80 1.34 1.02 198 439
slope_category7 -1.23 0.16 -1.56 -0.94 1.02 199 424
slope_category8 -0.40 0.05 -0.51 -0.30 1.02 211 453
slope_category9 1.33 0.18 1.02 1.70 1.02 207 478
slope_category10 0.35 0.05 0.26 0.45 1.02 209 451
Phi_g_id1 0.12 0.08 0.01 0.30 1.00 4512 3352
Phi_g_id2 1.67 0.25 1.23 2.22 1.02 282 666
Phi_g_id3 0.23 0.11 0.04 0.45 1.00 2096 2230
Phi_g_id4 1.60 0.25 1.15 2.16 1.02 288 766
Phi_g_id5 9.40 1.77 6.39 13.30 1.01 410 1022
Phi_g_id6 4.10 0.63 3.04 5.47 1.02 264 707
Phi_g_id7 3.30 0.50 2.42 4.38 1.02 273 727
Phi_g_id8 1.55 0.25 1.12 2.09 1.01 317 874
[ reached getOption("max.print") -- omitted 192 rows ]
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
>
As you can see, the ESS for the int_categoryX
parameters have much more efficient sampling than the other two categories. Are there settings or rescalings I can do to even the sampling out and, thus, presumably, make the model run more efficiently?
Intuitively I take this to mean we have a lot more information on the int_categoryX
parameters, but ideally I would want that in my posterior intervals, not my ESS.
For reference, here’s some slices of the data
> sdata_2c[c(1:5,1000:1005, 10000:10005), ]
# A tibble: 17 × 5
g_id category phi pr index
<fct> <fct> <dbl> <dbl> <fct>
1 1 9 0.0185 0.470 1
2 1 2 0.0185 0.596 0
3 1 4 0.0185 0.446 0
4 1 8 0.0185 0.464 1
5 1 1 0.0185 0.572 0
6 3 9 0.345 0.561 1
7 3 3 0.345 0.518 0
8 3 9 0.345 0.561 0
9 3 7 0.345 0.502 1
10 3 3 0.345 0.518 0
11 3 1 0.345 0.563 1
12 25 9 0.113 0.496 1
13 25 5 0.113 0.554 1
14 25 5 0.113 0.554 0
15 25 9 0.113 0.496 0
16 25 3 0.113 0.497 1
17 25 1 0.113 0.569 1