Hi, I am trying to fit a large Item Response Model with 39631 students and around 100 questions total (sparse response matrix since typically most students only answer a few questions)
I’m trying to run a single chain, 1000 samples but I get into some memory issues even if I have around 750Gb of RAM.
I am assuming that the response matrix gets a posterior stored for every iteration which is likely to blow up everything, so I’m wondering if there’s some way not to store that sample or some other best practice to scale the model?
Welcome to the Stan community. Could you share your model code? That would help in diagnosing any issues.
Only draws of parameters, transformed parameters, and generated quantities are stored. The response matrix would not be stored (assuming it is passed as data).
Thanks for sharing. Unfortunately, I don’t see any obvious ways to make your model more memory-efficient. Maybe someone else will have some suggestions
You could always use the thin argument to only save a draw every n iterations. You could then run multiple thinned chains in sequence and combine the draws after-the-fact.
I’m using pystan 3.3 and thin is not one of the allowed keyword arguments to pass to the sampler, when I try to pass it I get ValueError: {'json': {'thin': ['Unknown field.']}}