I’m testing performance of parameter recovery for different models, and using fairly large datasets (although I’m not sure what size of data is considered large in stan), and running out of memory/RAM when fitting.
More specifically, I am testing a 2 arm bandit model from the hBayesDM package. The model I’m using has a slight customization in that the “tau” parameter has a maximum of 20 instead of 5 as in the linked code.
I’ve simulated 1000 subjects using the same model, and each subject does 1000 bandit trials. Each subject has randomly generated values for the two parameters.
I’m doing this to get a decent coverage of the parameter space, as these kinds of models tend to have difficulties finding a good fit for some parameter ranges.
MCMC is infeasible for this dataset (50 subjects with 1000 trials is around 2h on my laptop), so I’m using the variational method.
The fitting process manages to complete, and I get to where the output says it’s drawing samples from the posterior and then it says “COMPLETE” for that process. This is when memory use increases enormously, resulting in a crash.
I’ve tested the same data/model combination in regular rstan, and in cmdstanpy. Same thing happens, so it seems to be a general issue with stan and not an issue with the specific interfaces. I started in Windows, but have also tried in Linux on the same laptop, using a minimal install of Fedora without a desktop environment to maximise the available memory, but got the same issue there.
As the fitting itself looks like it completes, I guess it could have something to do with how stan collates all the data at the end?
For now, I’ve settled for 500 subjects, and that completes to the end, with RAM use maxing out just below my limit.
Am I doing something wrong? Are there any settings I could change that could alleviate the issue?