Memory usage in model with matrix exponentials

Could someone help me understand the memory usage of this model, that does lots of matrix exponential calculations serially without storing them?

On either Windows (Rstudio), or on Linux, using rstan, memory goes up from a baseline of 200MB to arond 1.7 GB while the sampling is running. Is this to be expected, if so, what is it all being used for?

The memory use seems to scale with the size of the data (5000 here), and the memory appears to not be freed when the sampler stops. Number of iterations doesn’t seem to affect it. Optimisation is a little bit cheaper, but still uses about 1.1GB.

This is an artificial model to illustrate the issue - in reality Q will be different for each i, so I’ll need to recalculate those exponentials. I’m trying to determine the scalability of continuous-time Markov models that look somewhat like this.

Thanks,
Chris

library(rstan)

mtest <- stan_model(model_code="data {
  int<lower=1> nobs;
  array[nobs] int<lower=1,upper=5> state;
}

parameters {
  array[10] real<lower=0> q;
}

transformed parameters {
  matrix[5,5] Q = rep_matrix(0, 5, 5);
  matrix[5,5] P;
  real loglik=0;
  Q[1,2] = q[1]; Q[1,5] = q[2]; Q[2,1] = q[3]; Q[2,3] = q[4]; Q[2,5] = q[5];
  Q[3,2] = q[6]; Q[3,4] = q[7]; Q[3,5] = q[8]; Q[4,3] = q[9]; Q[4,5] = q[10];
  for (k in 1:5) {
	   Q[k,k] =  - sum(Q[k,1:5]);  // constrain rows to add to zero
  }

  for (i in 2:nobs){
    P = matrix_exp(Q);
    loglik += categorical_lpmf(state[i] | to_vector(P[state[i-1],]));
  }
}

model {
  target += loglik;
  for (i in 1:10) {
    q[i] ~ lognormal(0,2);
  }
}

")

Q5 <- rbind(c(0, 1, 0, 0, 1), c(1, 0, 1, 0, 1), c(0, 1, 0, 1, 1), c(0, 0, 1, 0, 1), c(0, 0, 0, 0, 0))
diag(Q5) <- -rowSums(Q5)
P <- expm::expm(Q5)
nobs <- 5000 
## simulate data from a continuous-time Markov model with equally-spaced observations
state <- numeric(nobs)
set.seed(1)
state[1] <- 1
for (i in 2:nobs){
  state[i] <- sample(1:5, size=1, prob=P[state[i-1],])
}

opt <- optimizing(mtest, data=list(nobs=nobs, state=state))
sam <- sampling(mtest, data=list(nobs=nobs, state=state),
                chains=1, iter=100)
1 Like

I might be missing something, but it looks like this has no need to be in the loop. The primary memory cost of operations like this is in autodiff storage, and this will end up storing nobs copies of the same autodiff information, which will be very expensive.

1 Like

I know - I explained this in the last paragraph of my post - in the real model, Q will be different for each data point i. I just wrote it like that to make the MRE more minimal. So is this memory size as expected for this model?

1 Like

Ah, I see. Then yes, it is likely that your model will need memory which increases as you increase your data size.

The reason you don’t see it change with number of iterations and that it only goes down when the sampler is finished is that the memory is re-used between different evaluations. Doing this is dramatically faster than releasing it back to the operating system only to ask for it again the next iteration

1 Like

OK, it is clear that memory should increase with the data size, but I was asking why the consumption is so large in this specific example. From these posts (Dealing with memory issues in Markov chain style model - #2 by Bob_Carpenter) and (Memory issues with custom model - #4 by bbbales2) I understand that computations that connect the data and parameters will consume memory due to autodiff. If this is about 40 bytes per elementary computation, and 5000 exponentiations of a 5 x 5 matrix use 1 GB, this suggests that one of those matrix exponentiations involves a few thousand computations. I can believe this.

Even so, it’s not clear to me why the memory usage doesn’t appear go down when rstan::sampling() returns?

Thanks,
Chris

1 Like

Part of this is because you’re using rstan and R, all of the draws from sampling are stored in memory. So you’ll see greater memory usage when you have more parameters/iterations/chains, and the memory usage will remain after sampling.

Using cmdstanr will result in smaller memory usage while sampling (as the results are written directly to disk), but then the results will need to be read into memory afterwards for processing

I don’t think saved parameter storage is the issue in this case. Storing 10 parameters (and a couple of small matrices) for one iteration will not cost 1.7GB. Empirically, when I use just 1 iteration, this memory usage does not go down, and when I switch to cmdstanr, the same amount of memory is used by the Stan process that is invoked from R.

I suspect part of this is the expense of autodiff with matrix exponentiation, but I’d appreciate if someone could confirm my back-of-the-envelope thinking about this above. I don’t know if there is any saving possible there.

Aside from that, it still doesn’t seem right that the the memory involved in that autodiff is not freed when the sampler returns?

Storing 10 parameters (and a couple of small matrices) for one iteration will not cost 1.7GB.

It’s not 1 iteration - it’s every iteration for every chain

Aside from that, it still doesn’t seem right that the the memory involved in that autodiff is not freed when the sampler returns?

It’s not freed because every iteration for every chain is still stored in memory by R.

If you want to see the memory usage for purely sampling then try running your model with cmdstan directly - as the results are written directly to disk and not stored in memory

1 Like

Is it only models with a matrix exponential that you see this level memory usage? If you remove the matrix exponential from your model does the usage change significantly?

If possible, can you compare a model with and without a matrix exponential outside of rstan? Just so we can narrow down a bit more where exactly the usage is coming from

Besides what @andrjohns is saying (which is definitely correct), I think RStan also just never returns memory to the system. If you sample multiple times in the same R session, I believe this just gets re-used. This probably isn’t as bad as it sounds, since the memory Stan has reserved should be paged out if your system needs it.

But, we could also consider calling free_all on our stack allocator at the end of the stan services functions. In CmdStan this doesn’t matter, since the process teardown will release all the memory, but in an in-memory interface like RStan it could make a difference.

@WardBrian Thanks - that clears up why the memory usage doesn’t go down when sampling stops, but I am still not 100% clear on where the usage comes from in the first place.

@andrjohns Using cmdstanr, the Stan process still uses this large amount of memory while sampling. It seems clear to me that this is related to repeated matrix exponentiation, because when the initial example is modified so that only one exponentiation is done (taking the definition of P outside the loop) the memory does not explode. Does this imply that one 5x5 matrix exponentiation is costing ~ 200K memory (~ 1 GB / (nsam=5000)) for autodiff?