Error: cannot allocate vector of size 703998.9 Gb

This usually means that RTools40 was not found or perhaps was not before RTools35 on the PATH.

I uninstalled everything and did a fresh install of R, Rstudio and stan. But now am back at the first error. When i run more than one chain I get cannot allocate vector of 2.2 GB.

My makvars file looks like so:
CXX14FLAGS=-O3 -march=native -mtune=native
CXX11FLAGS=-O3 -march=corei7 -mtune=corei7
CXX14=C:/RBuildTools/4.0/mingw64/bin/g++ -m$(WIN) -v

Please help!

This is my ouput for

Sys.which(“make”)

“C:\RBUILD~1\4.0\usr\bin\make.exe”

Sys.getenv(“BINPREF”)
" "

Sys.getenv(“PATH”)
“C:\RBuildTools\4.0\usr\bin;C:\Program Files\R\R-4.0.2\bin\x64;C:\Program Files\R\R-4.0.2\bin\x64;C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\SASHome\SASFoundation\9.4\core\sasexe;C:\Program Files\SASHome\SASFoundation\9.4\ets\sasexe;C:\Program Files\SASHome\Secure\ccme4;c:Rtoolsbin;c:Rtoolsperlbin;c:RtoolsMinGWbin;c:Rbin;C:\RBuildTools\4.0\usr\bin;”

Hi AAkash,

Some of those Makevars flags will cause issues with RTools 4.0. The first step is to reset those flags to only those that you need. You can do this from R via:

cat("CXX14FLAGS += -O3 -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -Wno-ignored-attributes",file = "~/.R/Makevars.win", sep = "\n", append = FALSE)

After that, can you try running the RStan example model:

library(rstan)
example(stan_model,run.dontrun=TRUE,verbose=TRUE)

And post any lines that start with error:?

I ran the example model you have suggested. This is what i got some warnings:

Warning messages:
1: In find.package(package, lib.loc, verbose = verbose) :
package ‘base’ found more than once, using the first from
“C:/R/R-40~1.2/library/base”,
“C:/R/R-4.0.2/library/base”
2: In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
‘C:/RBUILD~1/4.0/usr/mingw_/bin/g++’ not found

Are these serious or can be ignored?

This might be stupid but could this error be because of large dataset. In the past I have analysed big data with stan without issues so am not sure

Those warnings are safe to ignore

If the example models work but your own model gives the vector size error, then there’s a good chance that its the model and/or data. Try fitting the model to a subset of your data and see of you still get the error. Also, if you post your model and some information about your data, I can make some suggestions on how to cut down the memory usage.

I am acutally running it on subset of 2000 observations. I reran the model to get the following error:

Warning messages:
1: In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
‘C:/RBUILD~1/4.0/usr/mingw_/bin/g++’ not found
2: In .local(object, …) :
some chains had errors; consider specifying chains = 1 to debug

No sampling was done.
I have tried deleting the rds file and recompiling

data {
int<lower=0> N;//Number of observations
int<lower=1> J;//Number of predictors with random slope
int<lower=1> K;//Number of predictors with non-random slope
int<lower=1> L;//Number of customers/groups
int<lower=0,upper=1> y[N];//Binary response variable
int<lower=1,upper=L> ll[N];//Number of observations in groups
row_vector[K] x1[N];
row_vector[J] x2[N];
}
parameters {
real rbeta_mu[J]; //mean of distribution of beta parameters
real<lower=0> rbeta_sigma[J]; //variance of distribution of beta parameters
vector[J] beta_raw[L]; //group-specific parameters beta
vector[K] beta;
}
transformed parameters {
vector[J] rbeta[L];
for (l in 1:L)
for (j in 1:J)
rbeta[l][j] = rbeta_mu[j] + rbeta_sigma[j] * beta_raw[l][j]; // coefficients on x
}
model {
vector[N] p;
rbeta_mu ~ normal(0,5);
rbeta_sigma ~ gamma(1,1);
beta~normal(0,5);
for (l in 1:L)
beta_raw[l] ~ normal(0,5);
for (n in 1:N)
p[n]=((x1[n]*beta)+ (x2[n] * rbeta[ll[n]]));
y~bernoulli_logit§;
}

This is my model. I have 19 predictors. 4 of which are fixed effects and 15 are random effects which includes the intercept. Of the 4 fixed effects 3 are dummy variables taking values 0 or 1 and the fourth one is a count variable.

In terms of the random effects all but two predictors take values between 0-5. The other two are positive continous variables with maximums of 2000 and 9999 respectively.

Any suggestions on improving efficiency are greatly appreciated .

This is a new error:

2: In .local(object, …) :
some chains had errors; consider specifying chains = 1 to debug

If you run with chains=1, what error is reported?

It has started sampling. But is very slow. runinng for 2000 observations for 2000 iterations with 1000 burnin. Is there anything i could do to speed it up?

Did the model finish without erroring? If it did, are you able to then run the model with chains=4 without errors?

Still sampling and stuck at 0%. trying again with 1000 iterations and 500 burnin

To optimise your model, you can try using matrix arithmetic rather than loops to construct your p vector. To do this, you should pass your covariates in as matrices.

Additionally, I believe you had a typo in the prior for beta_raw. Since you were using this for a non-centered parameterisation, beta_raw should have an std_normal prior.

Full syntax below:

data {
  int<lower=0> N;//Number of observations
  int<lower=1> J;//Number of predictors with random slope
  int<lower=1> K;//Number of predictors with non-random slope
  int<lower=1> L;//Number of customers/groups
  int<lower=0,upper=1> y[N];//Binary response variable
  int<lower=1,upper=L> ll[N];//Number of observations in groups
  matrix[N,K] x1;
  matrix[N,J] x2;
}
transformed data {
  vector[J] ones = rep_vector(1, J);
}
parameters {
  row_vector[J] rbeta_mu; //mean of distribution of beta parameters
  row_vector<lower=0>[J] rbeta_sigma; //variance of distribution of beta parameters
  row_vector[J] beta_raw[L]; //group-specific parameters beta
  vector[K] beta;
}
transformed parameters {
  matrix[L,J] rbeta;
  for (l in 1:L)
    rbeta[l] = rbeta_mu + rbeta_sigma .* beta_raw[l]; // coefficients on x
}
model {
  vector[N] p;
  rbeta_mu ~ normal(0,5);
  rbeta_sigma ~ gamma(1,1);
  beta~normal(0,5);
  for (l in 1:L)
    beta_raw[l] ~ std_normal();

  p = x1 * beta + (x2 .* rbeta[ll]) * ones; // Multiplication by vector of ones as a row-wise summation of matrix
  y~bernoulli_logit(p);
}
1 Like

Model just finished without any errors. However it did not converge.

Thank you very much for helping to optimise the code.

I want to ask that given that the code ran for 2000 observations for just 1000 iterations. Will i be able to run longer chains for bigger subsets without vector size allocation error… I ask this because my actual dataset is large considering it is of 600000 observations.

Unfortunately that’s not something I can give you a definitive answer for. How much RAM does your computer have?