Hello everyone,

I have been asked to re-examine a small dataset consisting of 36 mothers who who were administered a PTSD questionnaire. The original analysis included a set of exploratory analyses where each of the four instrument subscales were regressed against a set of nine participant characteristics. The motivation for my re-analysis is to take advantage of the shrinkage properties for linear models in rstanarm to help reign in incredible regression slopes.

I had originally set out to repeat the analyses as 4 seperate regressions as in the original paper, but I wondered if approaching this as a multivariate problem might provide some added protection against inflated estimates. To approach this analysis, I arranged my data as follows:

studyid = vector of participant ids

predictorx… to n = columns of coefficient values

outcome = index indicating the subscale

score = score on a given subscale

My understanding of the goal here is that I provide an intercept for each outcome and random slope for each predictor by outcome. Since subscales have different maximums, I mean centred and scaled each individually before stacking into long format. All predictors are continuous and scaled in the same way, as the original authors are looking for interpretation as standardized coefficients. My expectation of what this model will achieve is that the individual slopes will be shrunk towards the mean across outcomes. For the sake of speed I am fitting preliminary models using lme4 but the plan is to move to rstanarm once I am confident I’m getting the right answer. I am currently working with a subset of predictors. In an effort to protect myself from happily reporting results from a model I specify incorrectly I did the following:

```
No pooling:lm(score ~ (mi9t1+GA+Sex):outcome, data = example)
coef
(Intercept) 3.760697e-17
mi9t1:outcomeB 4.378735e-01
mi9t1:outcomeC 4.085112e-01
mi9t1:outcomeD 1.892909e-01
mi9t1:outcomeE 2.865741e-01
GA:outcomeB -4.711827e-02
GA:outcomeC -6.810386e-02
GA:outcomeD 4.653566e-02
GA:outcomeE 2.331514e-01
Sex:outcomeB -8.990337e-02
Sex:outcomeC 1.685825e-01
Sex:outcomeD -2.087684e-01
Sex:outcomeE -1.084210e-01
complete pooling:
coef(completepool)
(Intercept) 1.795290e-17
mi9t1 3.305624e-01
GA 4.111624e-02
Sex -5.962755e-02
```

Finally my actual model:

```
partial pooling:
partial = lmer(score ~ (1|outcome) + (mi9t1 + GA + Sex - 1|outcome), data = example)
coef(partial)
$outcome
mi9t1 GA Sex (Intercept)
B 0.3566662 0.02388319 -0.04291586 -1.101495e-17
C 0.2919617 0.01955043 -0.03513029 -1.101495e-17
D 0.1792658 0.01200405 -0.02157016 -1.101495e-17
E 0.2293102 0.01535514 -0.02759175 -1.101495e-17
```

At first glance I was pretty happy that it seemed to work, but these results don’t seem to make sense to me. For mi9t1 the no pooling estimate for outcome D was 0.189, the complete pool was 0.30, but the partial is 0.179. Shouldn’t this value have been shrunken up towards the complete pool mean instead of down below it’s original estimate?

Any help would be much appreciated! I am absolutely positive I have made one or several mistakes along the way, but if I’m right that this CAN work I think it will be a huge benefit to the research we are doing.