So, take an hierarchical data scenario where each of multiple subjects in an experiment are measured many times in each condition of some manipulated variable `IV`

, a saturated model of which being (using lmer syntax):

```
DV ~ 1+IV + ( 1+IV | SUBJECT )
```

This will estimate an intercept and effect of V1 for each level of SUBJECT, plus the correlation therebetween.

If one has two sessions of data, it is fairly straightforward to achieve inference on the traditional “test-retest” correlations by binding the rows of the two sessions together and using indicator contrasts for each session, resulting in the following unique rows in the contrast matrix:

```
# A tibble: 4 x 4
`Intercept_1` IV_1 `Intercept_2` IV_2
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 0
2 1 -1 0 0
3 0 0 1 1
4 0 0 1 -1
```

With that formulation, the correlation between `Intercept_1`

& `Intercept_2`

encodes the test-retest reliability of the intercept, and the correlation between `IV_1`

& `IV_2`

encodes the test-retest reliability of the effect of the variable `IV`

. And note that each reliability comes with a full posterior distribution.

However, I *feel* that, given the existence of classical quantities such as intraclass-correlation, there should be a way to compute reliability from even a single session. Gelman & Pardoe (2006) show a simple computation for a quantity I think is related to this, the “pooling factor” (\lambda) but that quantity would take the form of a single point value for each term, and I (possibly naively) was hoping there was a sensible quantity for which we could obtain a full posterior distribution.

Now, to be clear, G&P2006 define \lambda for a given term (i.e. the intercept or the effect of IV) as (with my shorthand at the far right):

\lambda = 1 - \frac{V^{K}_{k=1}(E(\epsilon_k))}{E(V^{K}_{k=1}(\epsilon_k))} = 1 - \frac{\lambda_n}{\lambda_d}

Where, \lambda_n the numerator of the fraction is straightforward, representing the between-subjects variance (though possibly not an identical quantity to the latent between-subjects variance that we typically model but a quasi-empirical variant? At least, in the G&P2006’s R code they seem to look at the latter even though the BUGS model incorporates the former).

The denominator, \lambda_d, is a bit more complicated, but I think it’s intended to reflect the expected variance of the term *within* each subject. With multiple sessions, it’s straightforward to compute this for a given term (ex. the intercept) by computing it for each subject and session, then within each subject computing the variance of the values and finally collapsing the resulting collection of variances to a single mean across subjects. I expect that value would correspond well with the correlations derived above.

With one session, however, it seems more difficult to derive the expected within-subject variance of a term. G&P2006 propose using the variance *across samples in the posterior*, thus yielding a single point value rather than a full posterior.

In principle I guess one still has a full posterior on the between-subjects variance (or, at least if one uses the model parameter for this quantity, c.f. G&P2006’s use of a quasi-empirical quantity), one could then divide this value by the point value for \lambda_d to yield a full posterior on the pooling factor, but I still think I’m missing something with that “solution”.

I presume this is all old had somewhere and I’m just terrible at searching to find the relevant literature on this computation. Anyone have any tips? (@andrewgelman yourself perhaps?)

One idea I’ve had is to use the posterior samples to literally generate many simulated sessions of data for many simulated subject for each sample in the posterior, but that feels like a lot of compute and again I have this nagging feeling there’s an analytic solution that I’m missing. If the model were just an intercept, then the magnitude of measurement noise and number of samples would determine the within-subject variance of the intercept term, but with a more complicated model including effects reflecting difference scores between conditions, I’m finding it harder to think of the formula for the implied magnitude of within-subject variance for all terms.