So I have only p=3 real parameters, but for my application in genetics N can be ~1e8. With N=1e7 x should occupy ballpark 80Mb, but running optimizing at this scale takes ~3Gb and a similar but slightly more involved model (https://github.com/davidaknowles/pisquared/blob/master/inst/stan/pi2.stan) takes 8Gb, and with roughly linear scaling (seemingly) in N this makes running with N=1e8 have a very heavy memory profile.
So, are there any tricks I could be employing to get the memory footprint down? My understanding is I can’t currently switch out double real for floats?
Ha, that’s actually how I had it originally (with target += inside the loop). The docs here suggest the summand version above should be preferable - from what I can tell it doesn’t make a difference. My attempt at reading the generated C++ suggests you end up getting the same thing.
I don’t know how to vectorize: beta_lpdf(x | 1, 1) gives the sum of individual log likelihood terms whereas you need the individual terms.
Maybe also negligible but you could calculate log(pi0) and log1m(pi0) outside the loop.
I think @rybern was still right. That doc is about local variables but target is a special accumulator object that does the vectorization optimization automatically.
But in general, yes, I think sum([a,b,c]) is a bit more memory-efficient that a+b+c.
Yup I was wondering if target is treated differently.
I have to admit I didn’t think precomputing log_pi0 would make much difference but it does seem to get memory consumption down about 30% which is a start. Not quite the order-of-magnitude I was looking for but thanks none-the-less!