I am working to fit an IRT hierarchical model using ‘brms’ in R, from which I intend to extract the estimation of a latent parameter (i.e. the person parameter in a GPCM model). So far, I have managed to fit a the full version of the model on a subset of the full dataset with no issues. However, I need to fit the full model on a dataset of roughly 1.2 million observations. To do so, I turned to AWS for the EC2 service as my personal machines is woefully under-equipped for the task. At the moment, I am working with a virtual cloud instance with following specs:
OS: Ubuntu 18.04
CPU: 16 cores (32 threads)
Ram: 128 Gb.
Storage: SSD 30 Gb.
The dataset itself is roughly 140 Mb. and from my estimations, the 30 Gb storage should be more than sufficient to hold the dataset and the saved fitted model.
My issue is that when I start to fit the full model, R becomes unresponsive after it reports beginning the first warm-up samples. I expect, given the specs I have rented and the time taken to fit the model on the subset data, that fitting should take roughly 20 hours. However, after roughly 4 hours, I saw no progress in the sampling.
I would love to hear suggestions for fitting hierarchical models on large datasets. Is it possible that I may need to rent an instance with greater compute capacities or is it possible to make use of a GPU in fitting a model through brms in R? Also, I could be missing an obvious mistake with how I have used AWS’s services.