Reducing memory consumption for small p large N mixture model

Small p, large n: any tricks to reduce memory consumption? Here’s a simplified version of my model (two component mixture of a uniform + Beta):

data {
  int<lower=0> N;
  vector[N] x;
}
parameters {
  real<lower=0,upper=1> pi0;
  real<lower=0, upper=1> alpha;
  real<lower=1> beta;
}
model {
  vector[N] summands;
  for(i in 1:N) {
    summands[i] = log_sum_exp(log(pi0) + beta_lpdf(x[i] | 1, 1), log1m(pi0) + beta_lpdf(x[i] | alpha, beta));
  }
  target += sum(summands);
}

So I have only p=3 real parameters, but for my application in genetics N can be ~1e8. With N=1e7 x should occupy ballpark 80Mb, but running optimizing at this scale takes ~3Gb and a similar but slightly more involved model (https://github.com/davidaknowles/pisquared/blob/master/inst/stan/pi2.stan) takes 8Gb, and with roughly linear scaling (seemingly) in N this makes running with N=1e8 have a very heavy memory profile.

So, are there any tricks I could be employing to get the memory footprint down? My understanding is I can’t currently switch out double real for floats?

Would it help to avoid building summand by doing target+= directly inside the loop?

My other guess would be vecotorizing the expression by not indexing x.

Ha, that’s actually how I had it originally (with target += inside the loop). The docs here suggest the summand version above should be preferable - from what I can tell it doesn’t make a difference. My attempt at reading the generated C++ suggests you end up getting the same thing.

I don’t know how to vectorize: beta_lpdf(x | 1, 1) gives the sum of individual log likelihood terms whereas you need the individual terms.

Shows what I know! Hadn’t seen that doc. Sorry, my implementation tricks are lacking. Curious to see what others see here.

No worries, thanks for the thought!

Maybe also negligible but you could calculate log(pi0) and log1m(pi0) outside the loop.

I think @rybern was still right. That doc is about local variables but target is a special accumulator object that does the vectorization optimization automatically.
But in general, yes, I think sum([a,b,c]) is a bit more memory-efficient that a+b+c.

1 Like

Yup I was wondering if target is treated differently.

I have to admit I didn’t think precomputing log_pi0 would make much difference but it does seem to get memory consumption down about 30% which is a start. Not quite the order-of-magnitude I was looking for but thanks none-the-less!