Exception: std::bad_alloc during execution

I have a model where I defined the covariance matrix with 2 parameters that I want to infer. To do so,
I wrote the following stan code (that I’m running with pystan):

data {
int<lower=1> Mn;
int<lower=1> Mm;
int<lower=0> N;
row_vector[Mm] y[N];
matrix[Mm,3Mn] Umodes;
vector<lower=0>[Mm] singval;
matrix<lower=0>[Mn,Mn] dij;
transformed data {
matrix[Mm,Mm] diagSigma;
matrix[Mm,Mn] Umodes1;
matrix[Mm,Mn] Umodes2;
matrix[Mm,Mn] Umodes3;
diagSigma = diag_matrix(singval);
Umodes1[1:Mm,1:Mn] = Umodes[1:Mm,1:Mn];
Umodes2[1:Mm,1:Mn] = Umodes[1:Mm,(Mn+1):(2
Umodes3[1:Mm,1:Mn] = Umodes[1:Mm,(2Mn+1):(3Mn)];
parameters {
row_vector[Mm] lambdaHat;
real var1;
real var2;
transformed parameters {
row_vector[Mm] lambdaCoeff=lambdaHatdiagSigma;
cov_matrix[Mm] sigmareduced; //reduced covariance
real l=exp(var1);
real nusq=exp(var2);
matrix[Mn,Mn] pcordsigma=exp(-(1.0/l)dij); // Full covariance for a single coordinate
matrix[Mm,Mn] V1=Umodes1
matrix[Mm,Mn] V2=Umodes2
matrix[Mm,Mn] V3=Umodes3pcordsigma;
model {
lambdaHat ~ std_normal();
y ~ multi_normal(lambdaCoeff, sigmareduced);
generated quantities{
real nu=exp(var2/2);
real l=exp(var1);

When tested with a dataset of N=14 samples and, Mn=655, Mm=55 all worked fine. When I tried with a data set still formed by N=14 samples and with Mm=55 but with Mn=13569 I got the following error:

Exception: std::bad_alloc (in 'shapeInferenceReducedUnboundedLog.stan' at line 31) [origin: bad_alloc]

I guess the problem comes when stan tries to allocate the memory for the matrix pcordsigma; however, with Mn=13569 I’m expecting that the matrix requires 1.4Gb maximum, while the other data no more than 3 Gb,
hence a total of less than 5Gb. I would exclude my Desktop ran out of memory since it has 31.4 Gb of physical memory (and a swap memory od the same size).
Are there workarounds to run the code on my desktop machine with that problem size?



That’s not a safe assumption, since there can be memory overhead from the data structures used to store the data, memory used to store temporary copies or partial results, etc. Also, even if not all your memory is used up, there may not be a single contiguous chunk of 1.4 GB of free memory available.

Offhand, I’d suggest looking for approximations to your algorithm that are less memory hungry. Chapter 8 of Gaussian Processes for Machine Learning may be a good start.

ETA: Another place to start may be the article “Understanding Probabilistic Sparse Gaussian Process Approximations”. The negative of equation (5) in that article can be used to define a custom log-likelihood that you can use in place of multi_normal_lpdf. However, you’ll need the Woodbury identities (i.e., eqs. (A.9) and (A.10) in Gaussian Processes for Machine Learning) in order to evaluate the second and third terms of equation (5) without building an Mn \times Mn matrix.

ETA: After looking at your Stan code more closely, it looks like you already are using some kind of reduced covariance matrix, but you are still creating a very large temporary matrix pcordsigma in the process. See if you can rework your algorithm to avoid explicitly creating that matrix.

1 Like

Unfortunately after you consider the stack used for the auto-diff and the tree built when creating the HMC trajectory within each iteration, you end up with quite a few times more the memory use than you expected.