Determine whether group is informative vs uninformative

Hi,

I am running the work of Hierarchical Partial Pooling for Repeated Binary Trials on my own dataset.
As I understand centered parameterization works better on informative data (large N relative to \sigma), while non-centered parameterization is better for uninformative data (small N relative to \sigma).
My dataset is pretty unbalanced regarding number of observations per group:

opponent_id serves_won serves
11264537 66 125
11264538 119 210
11274839 40 84
11353182 80 125
11400167 13 29
11436659 53 83
11436661 107 201
11454538 16 30
11459087 20 26
11467661 51 88
11522810 17 23
11561252 218 443
11601209 22 34
11697312 353 659
11763362 415 773
12377036 555 961
12403590 19 42
12462353 17 36
12965284 39 50
12965334 91 147
13006940 246 399
13222504 13 29
13227540 22 43
13244072 43 80
13246476 19 25
13247068 269 536
13435347 14 35
13770081 140 267
13777109 107 191
13884861 351 643
13949665 31 57
13984211 353 607
14815169 54 83
15237615 366 687
15404565 21 28
15468161 95 176
15557577 19 27
15716291 44 82
15924987 20 29
16341732 46 85
16393540 88 171
16435080 57 96
16513248 43 72
16519492 235 401
16598546 40 71
17007818 51 101
17082716 27 42
17808804 323 637
18624670 21 37
20102625 114 206
20102747 56 114
20176895 51 88
20237775 166 305
21136359 127 201
21205999 59 133
21206001 54 93
21297009 52 88
21304659 20 32
21523149 19 30
21656423 290 551
21677459 22 42
21706089 20 35
21841231 17 40
21985175 64 121
22099327 501 911
22529357 31 75
22539259 130 299
22610253 37 73
22654195 47 82
22663097 67 136
22664157 101 196
22742003 18 27
22790429 88 177
22957273 105 196
22957275 42 63
23161107 75 135
23191655 66 128
23538323 35 66
24255107 44 78
24284399 19 36
24504419 15 31

I wanted to explore the mixed-version on my dataset. The problem is that you have to divide the dataset in centered and non-centered in the data-part. Based on the knowledge shared in this topic I conclude that the mixing cannot be determined by Stan.

So, how would I do the mixing myself? I know N, but I do not know \sigma ?
Thanks

1 Like

I would start by non-centering everything. I would guess that none of these rows is so strongly informed that the non-centered version will fail. If that doesn’t work (i.e. yields divergences that cannot be suppressed by a modest increase in adapt_delta), I would try centering everything. If that still doesn’t work, I would start fiddling around with centering some groups and not others. However, I think @Niko can do far cooler and more impressive things to optimally choose the degree of centering on-the-fly.

3 Likes

Thanks for your answer! I will follow your steps.
@Niko; happy to hear your tips!

That’s correct. Though, although it’s quite easy, I doubt it will come to Stan.

I do have a working Julia prototype (which could interface with your Stan models) which should be public in a few weeks/months. For now, if you share your data and Stan model, I can probably just tell you which parametrization to choose for which parameter.

2 Likes