Model comparison on separate validation data set

Hi there,

We have collected a pilot sample of 400 participants. Based on this, we have designed hypotheses (regressions) that we want to test in a second sample of 600 participants. We’re trying to figure out what the best way is to do this statistically. I need to decide now what to do as we’ll pre-register the analysis plan before collecting the second sample.

Our model is hierarchical, e.g.:

Y ~ X1 + X2 + X3 + (X1+X2+X3| ParticipantID)

The hypothesis we want to test is whether in the new sample, the model also significantly predicts Y. If the overall model is significant, we would then follow-up specific associations (e.g. test whether X1 or X2 specifically is a significant predictor).

I thought we could test this by comparing two models for the new data:

  1. Model where the population level effects are set to those from the pilot sample; group level effects for each participant are fitted (i.e. (X1+ X2+X3|ParticipantID)).
  2. Model where population level effects are set to 0; group level effects for each participant are fitted.

It seems to me that both models then would have the same number of regressors, so I can directly compare their fit (e.g. bayes_R2)? Would I do this by literally subtracting the samples of R2 from the two models and look at the 95% CI of this distribution? (I’ve looked at BRMS packages, and add_criterion can add R^2, but then to compare models I could only find loo_compare which does not accept R^2).

As an aside, another alternative I have considered, but I think is computationally not feasible would be to do:

  1. Model where the population level effects are set to those from the pilot sample; group level effects for each participant are fitted.
  2. Permutation tests: 1000 times model 1, but with Y shuffled to get a distribution of R^2 values to compare the first model to.
    I’ve seen other papers in my field do this, but I’m not sure about the advantage compared to the first method.

I’d be very grateful for any pointers.

1 Like

If you fix the “population level effect” to pilot sample value or 0, this corresponds to comparing two different priors on the same model. Note that, if the group level effects have high variation it is likely that you don’t see much difference in this comparison.

See ?brms::bayes_R2 how to get the posterior draws of R^2.

Permutation test is useful if you are worried that you model is fitting to the noise and R^2 estimate is overoptimistic. With the permutations you can see how much overfitting there would if there is no actually any real effect. Permutation test is computationally more expensive than cross-validation, but can work better for small number of observations.

So what would you then think is the best way to proceed?
I can see these options:

  • Try out on the pilot sample by fitting 1/2 of subjects and then applying the strategy above (with comparing model with regressors set to value from first 1/2 or set to 0) whether I have too much variability between people for this to work?
  • Completely ignore findings from pilot sample and for the confirmation sample use modelcomparions (via loo_compare) to compare the two models with either all predictors in, or all predictors completely removed (i.e. not just population effects set to 0).
    Is there any other option?
1 Like

Both of these are sensible. The first one is useful to set the expectations before making new observations. You can also simulate fake data with known properties and check how big the new sample needs to be to detect something interesting.

Hierarchical model combining both pilot and new sample.

1 Like