Determine whether group is informative vs uninformative

HJAM24 · April 8, 2023, 8:54am

Hi,

I am running the work of Hierarchical Partial Pooling for Repeated Binary Trials on my own dataset.
As I understand centered parameterization works better on informative data (large N relative to \sigma), while non-centered parameterization is better for uninformative data (small N relative to \sigma).
My dataset is pretty unbalanced regarding number of observations per group:

opponent_id	serves_won	serves
11264537	66	125
11264538	119	210
11274839	40	84
11353182	80	125
11400167	13	29
11436659	53	83
11436661	107	201
11454538	16	30
11459087	20	26
11467661	51	88
11522810	17	23
11561252	218	443
11601209	22	34
11697312	353	659
11763362	415	773
12377036	555	961
12403590	19	42
12462353	17	36
12965284	39	50
12965334	91	147
13006940	246	399
13222504	13	29
13227540	22	43
13244072	43	80
13246476	19	25
13247068	269	536
13435347	14	35
13770081	140	267
13777109	107	191
13884861	351	643
13949665	31	57
13984211	353	607
14815169	54	83
15237615	366	687
15404565	21	28
15468161	95	176
15557577	19	27
15716291	44	82
15924987	20	29
16341732	46	85
16393540	88	171
16435080	57	96
16513248	43	72
16519492	235	401
16598546	40	71
17007818	51	101
17082716	27	42
17808804	323	637
18624670	21	37
20102625	114	206
20102747	56	114
20176895	51	88
20237775	166	305
21136359	127	201
21205999	59	133
21206001	54	93
21297009	52	88
21304659	20	32
21523149	19	30
21656423	290	551
21677459	22	42
21706089	20	35
21841231	17	40
21985175	64	121
22099327	501	911
22529357	31	75
22539259	130	299
22610253	37	73
22654195	47	82
22663097	67	136
22664157	101	196
22742003	18	27
22790429	88	177
22957273	105	196
22957275	42	63
23161107	75	135
23191655	66	128
23538323	35	66
24255107	44	78
24284399	19	36
24504419	15	31

I wanted to explore the mixed-version on my dataset. The problem is that you have to divide the dataset in centered and non-centered in the data-part. Based on the knowledge shared in this topic I conclude that the mixing cannot be determined by Stan.

So, how would I do the mixing myself? I know N, but I do not know \sigma ?
Thanks

jsocolar · April 8, 2023, 1:34pm

I would start by non-centering everything. I would guess that none of these rows is so strongly informed that the non-centered version will fail. If that doesn’t work (i.e. yields divergences that cannot be suppressed by a modest increase in adapt_delta), I would try centering everything. If that still doesn’t work, I would start fiddling around with centering some groups and not others. However, I think @Niko can do far cooler and more impressive things to optimally choose the degree of centering on-the-fly.

HJAM24 · April 10, 2023, 11:16am

Thanks for your answer! I will follow your steps.
@Niko; happy to hear your tips!

Niko · April 11, 2023, 7:10am

That’s correct. Though, although it’s quite easy, I doubt it will come to Stan.

I do have a working Julia prototype (which could interface with your Stan models) which should be public in a few weeks/months. For now, if you share your data and Stan model, I can probably just tell you which parametrization to choose for which parameter.

Topic		Replies	Views
Partial non-centered parametrizations in Stan Modeling techniques	7	2979	December 30, 2018
Centered vs. non-centered parameterizations Modeling performance	3	4563	January 20, 2019
Centered or non-centered parametrization for random effects Modeling	15	5689	July 15, 2017
Simple question on hierarchical non-centered parameterization Modeling	3	2051	October 16, 2017
Centered vs noncentered - General	4	4280	August 14, 2017

Determine whether group is informative vs uninformative

Related topics