This is a follow-up to a message posted on the old forum, with link and responses here.
Basically, I am trying to fit a multilevel time series using BRMS and had one question about implemention and another concerning efficiency. The basic information concerning the paradigm itself can be found here (quoted from the above link):
I am trying to fit a multilevel time series model to some physiological data I have from a psychological experiment. Specifically, in this experiment 36 participants completed a cognitive task, which involved 30 trials for each of 3 conditions. For each trial, I have a time series with approximately 200 time points reflecting physiological arousal throughout the trial. This means I have 36 x 30 x 3 x 200 data points (minus some missing data points) that I would like to model hierarchically as a function of condition while treating subject as a random effect.
I was told last time I posted that I should try to find a faster computer, and am fitting the model using Google Compute as we speak on a Standard 8-core Machine with 30 GB of RAM running Windows Server. The code I am running at the moment is:
brm(formula = y ~ condition + s(time, by = condition) + (1 | id),
data = dat,
family = gaussian(),
chains = 8,
cores = 8,
iter = 2000) -> bm1
Since beginning the fit, I spoke to someone with expertise in the mgcv package (which I believe the s() component is borrowed from above) who suggested that I should really be using s(time, id, bs=‘fs’, m=1, by = condition), which would allow the curve to vary as a function of time, subject and condition – which is an analog to a random slope in a more typical linear model (at present I include only a random intercept by subject). This is new to mgcv and does not appear to work in brms just yet. As an alternative, my colleague also proposed a combined predictor fit with the following code:
dat$IdCond <- interaction(dat$id, dat$condition, drop=TRUE)
s(time, IdCond, bs=‘fs’, m=1)
But this was presented as a less ideal option. My question would be, are there plans to implement something like either option in the near future?
My second question is a direct follow-up concerning efficiency. When I ran this model on my laptop, I found it took days to complete. So I tried moving to Google Compute. I can get more cores this way, which is great, but it is still taking ~4-5 days to fit, and I worry it is going to end with divergent transitions (it is still running). Is there any way to fit this model more efficiently? Or is this just the cost of using such a cutting edge approach? I ask in part because should I work out a manner in which to fit the random curves per subject, I anticipate it will increase the necessary time even more!
- Operating System: Windows Server (Google Compute Cluster)
- brms Version: 2.3.0 (fresh CRAN install)