Hi,
I am running the work of Hierarchical Partial Pooling for Repeated Binary Trials on my own dataset.
As I understand centered parameterization works better on informative data (large N relative to \sigma), while non-centered parameterization is better for uninformative data (small N relative to \sigma).
My dataset is pretty unbalanced regarding number of observations per group:
| opponent_id | serves_won | serves |
|---|---|---|
| 11264537 | 66 | 125 |
| 11264538 | 119 | 210 |
| 11274839 | 40 | 84 |
| 11353182 | 80 | 125 |
| 11400167 | 13 | 29 |
| 11436659 | 53 | 83 |
| 11436661 | 107 | 201 |
| 11454538 | 16 | 30 |
| 11459087 | 20 | 26 |
| 11467661 | 51 | 88 |
| 11522810 | 17 | 23 |
| 11561252 | 218 | 443 |
| 11601209 | 22 | 34 |
| 11697312 | 353 | 659 |
| 11763362 | 415 | 773 |
| 12377036 | 555 | 961 |
| 12403590 | 19 | 42 |
| 12462353 | 17 | 36 |
| 12965284 | 39 | 50 |
| 12965334 | 91 | 147 |
| 13006940 | 246 | 399 |
| 13222504 | 13 | 29 |
| 13227540 | 22 | 43 |
| 13244072 | 43 | 80 |
| 13246476 | 19 | 25 |
| 13247068 | 269 | 536 |
| 13435347 | 14 | 35 |
| 13770081 | 140 | 267 |
| 13777109 | 107 | 191 |
| 13884861 | 351 | 643 |
| 13949665 | 31 | 57 |
| 13984211 | 353 | 607 |
| 14815169 | 54 | 83 |
| 15237615 | 366 | 687 |
| 15404565 | 21 | 28 |
| 15468161 | 95 | 176 |
| 15557577 | 19 | 27 |
| 15716291 | 44 | 82 |
| 15924987 | 20 | 29 |
| 16341732 | 46 | 85 |
| 16393540 | 88 | 171 |
| 16435080 | 57 | 96 |
| 16513248 | 43 | 72 |
| 16519492 | 235 | 401 |
| 16598546 | 40 | 71 |
| 17007818 | 51 | 101 |
| 17082716 | 27 | 42 |
| 17808804 | 323 | 637 |
| 18624670 | 21 | 37 |
| 20102625 | 114 | 206 |
| 20102747 | 56 | 114 |
| 20176895 | 51 | 88 |
| 20237775 | 166 | 305 |
| 21136359 | 127 | 201 |
| 21205999 | 59 | 133 |
| 21206001 | 54 | 93 |
| 21297009 | 52 | 88 |
| 21304659 | 20 | 32 |
| 21523149 | 19 | 30 |
| 21656423 | 290 | 551 |
| 21677459 | 22 | 42 |
| 21706089 | 20 | 35 |
| 21841231 | 17 | 40 |
| 21985175 | 64 | 121 |
| 22099327 | 501 | 911 |
| 22529357 | 31 | 75 |
| 22539259 | 130 | 299 |
| 22610253 | 37 | 73 |
| 22654195 | 47 | 82 |
| 22663097 | 67 | 136 |
| 22664157 | 101 | 196 |
| 22742003 | 18 | 27 |
| 22790429 | 88 | 177 |
| 22957273 | 105 | 196 |
| 22957275 | 42 | 63 |
| 23161107 | 75 | 135 |
| 23191655 | 66 | 128 |
| 23538323 | 35 | 66 |
| 24255107 | 44 | 78 |
| 24284399 | 19 | 36 |
| 24504419 | 15 | 31 |
I wanted to explore the mixed-version on my dataset. The problem is that you have to divide the dataset in centered and non-centered in the data-part. Based on the knowledge shared in this topic I conclude that the mixing cannot be determined by Stan.
So, how would I do the mixing myself? I know N, but I do not know \sigma ?
Thanks