Reducing memory consumption for small p large N mixture model

David_Knowles · July 6, 2020, 7:27pm

Small p, large n: any tricks to reduce memory consumption? Here’s a simplified version of my model (two component mixture of a uniform + Beta):

data {
  int<lower=0> N;
  vector[N] x;
}
parameters {
  real<lower=0,upper=1> pi0;
  real<lower=0, upper=1> alpha;
  real<lower=1> beta;
}
model {
  vector[N] summands;
  for(i in 1:N) {
    summands[i] = log_sum_exp(log(pi0) + beta_lpdf(x[i] | 1, 1), log1m(pi0) + beta_lpdf(x[i] | alpha, beta));
  }
  target += sum(summands);
}

So I have only p=3 real parameters, but for my application in genetics N can be ~1e8. With N=1e7 x should occupy ballpark 80Mb, but running optimizing at this scale takes ~3Gb and a similar but slightly more involved model (https://github.com/davidaknowles/pisquared/blob/master/inst/stan/pi2.stan) takes 8Gb, and with roughly linear scaling (seemingly) in N this makes running with N=1e8 have a very heavy memory profile.

So, are there any tricks I could be employing to get the memory footprint down? My understanding is I can’t currently switch out double real for floats?

rybern · July 6, 2020, 10:42pm

Would it help to avoid building summand by doing target+= directly inside the loop?

My other guess would be vecotorizing the expression by not indexing x.

David_Knowles · July 6, 2020, 11:46pm

Ha, that’s actually how I had it originally (with target += inside the loop). The docs here suggest the summand version above should be preferable - from what I can tell it doesn’t make a difference. My attempt at reading the generated C++ suggests you end up getting the same thing.

I don’t know how to vectorize: beta_lpdf(x | 1, 1) gives the sum of individual log likelihood terms whereas you need the individual terms.

rybern · July 7, 2020, 12:53am

Shows what I know! Hadn’t seen that doc. Sorry, my implementation tricks are lacking. Curious to see what others see here.

David_Knowles · July 7, 2020, 1:14am

No worries, thanks for the thought!

nhuurre · July 7, 2020, 1:09pm

Maybe also negligible but you could calculate log(pi0) and log1m(pi0) outside the loop.

I think @rybern was still right. That doc is about local variables but target is a special accumulator object that does the vectorization optimization automatically.
But in general, yes, I think sum([a,b,c]) is a bit more memory-efficient that a+b+c.

David_Knowles · July 8, 2020, 8:46pm

Yup I was wondering if target is treated differently.

I have to admit I didn’t think precomputing log_pi0 would make much difference but it does seem to get memory consumption down about 30% which is a start. Not quite the order-of-magnitude I was looking for but thanks none-the-less!

Topic		Replies	Views
Memory issues with custom model Modeling windows	5	762	March 10, 2021
Reduce memory usage for PPC in multilevel models PyStan	6	1132	February 26, 2019
Dealing with memory issues in Markov chain style model Modeling	3	172	November 10, 2024
Memory usage in model with matrix exponentials Modeling performance	14	294	February 11, 2026
R memory-conservation strategies with Stan Modeling specification , performance	4	643	February 28, 2021

Reducing memory consumption for small p large N mixture model

Related topics