Dear all,
I have run a regression (with brms) on a sample of 500 participants (for a human psychology study). As I have made many decisions about exactly how to specify the model etc. while doing this, I think I need to have a replication sample to show that my results are real - by doing exactly the same analyses as I have done now. I was thinking of doing a power calculation to figure out how large the sample should be.
I have never done this before (for a regression analysis), so I am not exactly sure what to do. I have found G*Power online (https://stats.idre.ucla.edu/other/gpower/multiple-regression-power-analysis/) which has an option for computing power for a multiple regression. This requires R-squared. I see that brms can give me R-squared. So my thought was to compute R-squared for the model without my predictor of interest and for the model with my predictor and from this derive my needed sample size.
Does seem sensible?
Many thanks
Jacquie
So I’ve had some other issues to iron about, but I’m now back at this.
I think I was looking at this the wrong way.
Some more detail about the data:
I have 400 participants, for each I have 4 psychological measures (questionnaire scores) and 1 behavioural measure and some more covariates of no interest (age, gender). The 4 psychological measures are correlated.
I have fitted in brms (a very simple non-hierarchical model):
brm(formulat=behaviour ~ psych1 +psych2+psych3+ psych 4 + age+gender,
data=myData,family='gaussian')
I find say that for psych 1 the 95% interval does not include zero. And my aim is to find out what my sample size should be for a replication sample to replicate this effect with 95% power.
Now this thread (https://github.com/paul-buerkner/brms/issues/191) @tomwallis seems relevant, but I can’t quite work out what to do…
I’m wondering whether the steps should be:
1A) Fit a multivariate normal distribution to generate new participants (like here: https://baezortega.github.io/2018/05/28/robust-correlation/) [this would allow to generate combinations of psych1, psych 2, psych3, psych4 behaviour,age and gender that respect the correlations in the real data.
1B) Fit a multivariate normal distribution to generate new participants only for the psychological measures and covariates (age and gender) and then generate the behavioural measure from the regression above, using BRMs predict? I’m not sure whether this would be any better or worse? Just seems that in 1A I did not actually use the brms output…
1C) Is there a way to not have to fit a multivariate model to generate new data, but instead only use the brms model?
-
Analyse the new participants with the same model as above and check the credible interval for psych 1.
-
Repeat steps 1 and 2 for different sample sizes (say between 400 and 1000 participants) say 500 times each to see what percentage of credible intervals do not contain zero. Pick the sample size for which this is the case at least 95% of the time (to get 95% power)
In the same thread @jonah also said they were working on maybe putting some examples up for how to do power calculations? (i’ve not been able to find anything)