- Operating System: CentOS
- rstanarm Version: 2.15.3

Hi,

I am trying to fit a Stan model on a rather large data set with ~3 million observations. The model itself is relatively simple, with one smoothed variable, three binary variables, and one random effect,

Outcome ~ spline(age) + A + B + C + 1|Year

However, I’m quickly running out of memory when trying to run this model (the server has 1,1TB RAM, but thats not enough). I’ve tracked it down to the computations done in the mgcv::jagam function.

Is there any hope for running this model, or am I better off sub-sampling, say, 200.000 data points? Looking at the data, age is highly non-linear, hence the need for some smoothing.

Thanks in advance.