Hi,
I am running the work of Hierarchical Partial Pooling for Repeated Binary Trials on my own dataset.
As I understand centered parameterization works better on informative data (large N relative to \sigma), while non-centered parameterization is better for uninformative data (small N relative to \sigma).
My dataset is pretty unbalanced regarding number of observations per group:
opponent_id | serves_won | serves |
---|---|---|
11264537 | 66 | 125 |
11264538 | 119 | 210 |
11274839 | 40 | 84 |
11353182 | 80 | 125 |
11400167 | 13 | 29 |
11436659 | 53 | 83 |
11436661 | 107 | 201 |
11454538 | 16 | 30 |
11459087 | 20 | 26 |
11467661 | 51 | 88 |
11522810 | 17 | 23 |
11561252 | 218 | 443 |
11601209 | 22 | 34 |
11697312 | 353 | 659 |
11763362 | 415 | 773 |
12377036 | 555 | 961 |
12403590 | 19 | 42 |
12462353 | 17 | 36 |
12965284 | 39 | 50 |
12965334 | 91 | 147 |
13006940 | 246 | 399 |
13222504 | 13 | 29 |
13227540 | 22 | 43 |
13244072 | 43 | 80 |
13246476 | 19 | 25 |
13247068 | 269 | 536 |
13435347 | 14 | 35 |
13770081 | 140 | 267 |
13777109 | 107 | 191 |
13884861 | 351 | 643 |
13949665 | 31 | 57 |
13984211 | 353 | 607 |
14815169 | 54 | 83 |
15237615 | 366 | 687 |
15404565 | 21 | 28 |
15468161 | 95 | 176 |
15557577 | 19 | 27 |
15716291 | 44 | 82 |
15924987 | 20 | 29 |
16341732 | 46 | 85 |
16393540 | 88 | 171 |
16435080 | 57 | 96 |
16513248 | 43 | 72 |
16519492 | 235 | 401 |
16598546 | 40 | 71 |
17007818 | 51 | 101 |
17082716 | 27 | 42 |
17808804 | 323 | 637 |
18624670 | 21 | 37 |
20102625 | 114 | 206 |
20102747 | 56 | 114 |
20176895 | 51 | 88 |
20237775 | 166 | 305 |
21136359 | 127 | 201 |
21205999 | 59 | 133 |
21206001 | 54 | 93 |
21297009 | 52 | 88 |
21304659 | 20 | 32 |
21523149 | 19 | 30 |
21656423 | 290 | 551 |
21677459 | 22 | 42 |
21706089 | 20 | 35 |
21841231 | 17 | 40 |
21985175 | 64 | 121 |
22099327 | 501 | 911 |
22529357 | 31 | 75 |
22539259 | 130 | 299 |
22610253 | 37 | 73 |
22654195 | 47 | 82 |
22663097 | 67 | 136 |
22664157 | 101 | 196 |
22742003 | 18 | 27 |
22790429 | 88 | 177 |
22957273 | 105 | 196 |
22957275 | 42 | 63 |
23161107 | 75 | 135 |
23191655 | 66 | 128 |
23538323 | 35 | 66 |
24255107 | 44 | 78 |
24284399 | 19 | 36 |
24504419 | 15 | 31 |
I wanted to explore the mixed-version on my dataset. The problem is that you have to divide the dataset in centered and non-centered in the data
-part. Based on the knowledge shared in this topic I conclude that the mixing cannot be determined by Stan.
So, how would I do the mixing myself? I know N, but I do not know \sigma ?
Thanks