Question Context
This is a fairly open-ended question regarding best practices for open science using restricted third party data. By virtue of the data used for this study, I cannot share the raw data directly as this must be requested through the third party, but I do still want to share as much of my data analysis as possible. Included will be the code used to convert the datafile to the data used in my study, so anyone with the data should be able to repeat all of my descriptives/analyses/etc. Since I have all the brms
models saved, I would ideally share these fitted objects; however, these objects have a save the data used to fit the models.
I believe that this can be fairly easily resolved with the following:
modFit$data <- NULL
Questions
-
Does this declaration successfully remove all trace of the used data from the object? I know on there is a plan to clean up some redundancy in
brmsfit
objects, but it doesn’t seem like there is any additional redundancy of data (at least per this). -
What model validations does removing the stored data prevent? I’m fairly sure that anything requiring predictions will be impossible (e.g.,
pp_check()
,residuals()
, etc.). -
Without the ability to perform certain inspections of the models, is it worthwhile to bother sharing the objects at all? I declared a seed in R and in the call to
brm(..., seed = ###)
, so I’m hoping that rerunning the models in another R session/environment should yield very similar results anyway (but obviously at the computational and temporal cost of someone else having to re-run the models). -
As an alternative to providing the entire
brms
fitted model as a.rds
file, would it be perhaps more utilitarian to share the just posteriors? I’ll be providing supplementary material with diagnostics, fit results, model comparisons, etc., so the need to have the whole model may be unnecessary for 99% of readers (the remaining 1% could then theoretically run the models themselves using provided R scripts). It would seem to me then the only thing that someone would want for future research needs would be the posteriors, but I may be overlooking something.
I appreciate any feedback and readings on navigating open science with R objects that store publicly available but restricted access data, particularly as it relates to Bayesian results since there are implications for future priors from having existing posteriors.