# Reducing memory consumption for small p large N mixture model

Small p, large n: any tricks to reduce memory consumption? Here’s a simplified version of my model (two component mixture of a uniform + Beta):

``````data {
int<lower=0> N;
vector[N] x;
}
parameters {
real<lower=0,upper=1> pi0;
real<lower=0, upper=1> alpha;
real<lower=1> beta;
}
model {
vector[N] summands;
for(i in 1:N) {
summands[i] = log_sum_exp(log(pi0) + beta_lpdf(x[i] | 1, 1), log1m(pi0) + beta_lpdf(x[i] | alpha, beta));
}
target += sum(summands);
}
``````

So I have only p=3 real parameters, but for my application in genetics N can be ~1e8. With N=1e7 x should occupy ballpark 80Mb, but running `optimizing` at this scale takes ~3Gb and a similar but slightly more involved model (https://github.com/davidaknowles/pisquared/blob/master/inst/stan/pi2.stan) takes 8Gb, and with roughly linear scaling (seemingly) in N this makes running with N=1e8 have a very heavy memory profile.

So, are there any tricks I could be employing to get the memory footprint down? My understanding is I can’t currently switch out double real for floats?

Would it help to avoid building `summand` by doing `target+=` directly inside the loop?

My other guess would be vecotorizing the expression by not indexing `x`.

Ha, that’s actually how I had it originally (with `target +=` inside the loop). The docs here suggest the `summand` version above should be preferable - from what I can tell it doesn’t make a difference. My attempt at reading the generated C++ suggests you end up getting the same thing.

I don’t know how to vectorize: `beta_lpdf(x | 1, 1)` gives the sum of individual log likelihood terms whereas you need the individual terms.

Shows what I know! Hadn’t seen that doc. Sorry, my implementation tricks are lacking. Curious to see what others see here.

No worries, thanks for the thought!

Maybe also negligible but you could calculate `log(pi0)` and `log1m(pi0)` outside the loop.

I think @rybern was still right. That doc is about local variables but `target` is a special accumulator object that does the vectorization optimization automatically.
But in general, yes, I think `sum([a,b,c])` is a bit more memory-efficient that `a+b+c`.

1 Like

Yup I was wondering if `target` is treated differently.

I have to admit I didn’t think precomputing log_pi0 would make much difference but it does seem to get memory consumption down about 30% which is a start. Not quite the order-of-magnitude I was looking for but thanks none-the-less!