I’m comparing operationalizations A, B, and C of the same predictor in an unordered categorical multilevel model. There’s a lot of uncertainty due to data sparseness (30 covariates in a dataset of 2217).

In a single comparison, operationalization B looks the most promising, with elpd_diff = 2.9 and se_diff = 5.0 compared to operationalization A. Operationalization C doesn’t look so good, with elpd_diff = -1.1 and se_diff = 2.0. But with the SEs larger than the differences, there’s reason to fear that this ranking might reflect nothing but random fluctuations of elpd_diff.

To guard against this danger, I have now fit all three models a total of five times using a different seed for each fit. The results are seen below. The first fit of operationalization A has been used as baseline in each comparison.

```
elpd_loo se_elpd_loo elpd_diff se_diff
B.1 -2085.783 40.930 2.912 5.011
B.5 -2086.004 40.936 2.692 5.012
B.4 -2086.316 40.980 2.380 5.042
B.2 -2087.295 40.990 1.400 5.037
A.2 -2087.949 40.952 0.747 0.512
B.3 -2088.020 41.031 0.676 5.055
A.4 -2088.648 40.983 0.047 0.514
A.1 -2088.696 40.946 0.000 0.000
A.3 -2088.921 41.001 -0.225 0.508
C.3 -2089.539 40.971 -0.843 2.181
C.1 -2089.840 40.962 -1.144 2.173
A.5 -2089.952 40.992 -1.256 0.524
C.2 -2090.632 40.997 -1.936 2.176
C.4 -2091.067 41.021 -2.372 2.175
C.5 -2091.520 41.040 -2.824 2.200
```

(I would round these to one digit if presented in my paper.)

To my mind, these results constitute tentative assurance that the initial difference (between A.1 and B.1) is not a mere chance fluctuation, and that operationalization B is probably the best. I’m thinking about characterizing operationalization B as “our best guess at the best operationalization based on a noisy signal”.

The table above is pretty large and detailed. Picking the plausibly best operationalization for this particular predictor is only one step in a long and onerous variable selection process, so if possible, I’d rather use something shorter and more succinct. **Would it be alright if I just averaged the elpd_diffs and se_diffs over the five fits?** Like so:

```
comparisons$Oper <- substr(rownames(comparisons), 1, 1)
means <- cbind(
mean_elpd_diff = tapply(comparisons$elpd_diff, comparisons$Oper, mean),
mean_se_diff = tapply(comparisons$se_diff, comparisons$Oper, mean))
round(
means[order(means[,"mean_elpd_diff"], decreasing = T),]
, digits = 3)
mean_elpd_diff mean_se_diff
B 2.012 5.031
A -0.138 0.412
C -1.824 2.181
```

This would be rounded to one digit if presented in my paper. To my mind, the smaller table makes the same point as the bigger one while not boring my readers with as much detail. What say our wise forum gurus?