# Encountering "Error: cannot allocate vector of size 174 Kb"

Hi, I have encounterd an error “Error: cannot allocate vector of size 174 Kb.” I’m running Windows 10, 64 bit R, RStan Version 2.19.2. Note that this only happens WHEN I’M INCLUDING THE GENERATED QUANTITIES BLOCK, which makes me think that I may be doing wrong something there.

Am I writing the generated quantities correctly? The model samples MUCH faster without the generated quantities. Basically, I’m trying to create generated quantities for each mean, predicted interval and log likelihood.

`````` data {
// Define variables in data
// Number of level-1 observations (an integer)
int<lower=0> N_obs;
// level 1 categorial predictor
int upc_id[N_obs];
//Number of Level 1 categorial predictors
int<lower=0> N_upc;
// Continuous outcome
vector[N_obs] Price;
}

transformed data{
vector[N_obs] Price_norm;
Price_norm = (Price-mean(Price))/sd(Price);
}

parameters {
// Population intercept
real beta_0;
// Population Slope- a different slope for each factor
vector[N_upc] beta_1;
// Level-1 errors
real<lower=0> sigma_e0;
}

model {
vector[N_obs] mu;
mu = beta_0 + beta_1[upc_id];
Price_norm ~ normal(mu, sigma_e0);
//priors
sigma_e0  ~ exponential(1);
beta_0 ~ normal(0, 1);
beta_1 ~ normal(0, 1);
}

generated quantities {
vector[N_obs] log_lik;
vector[N_obs] y_pred;
vector[N_obs] mu;
for (n in 1:N_obs) mu[n] = beta_0 + beta_1[upc_id][n];
for (n in 1:N_obs) log_lik[n] = normal_lpdf(Price_norm[n] | beta_0 + beta_1[upc_id][n] , sigma_e0);
for (n in 1:N_obs) y_pred[n] = normal_rng(mu[n] , sigma_e0);
}
``````

stan_data_dump. R (223.6 KB)

1 Like

It is possible that with your dataset, the generated quantities block is just enough to make it run out of RAM. You may be able to proceed without the generated quantities block, and then use the `gqs` function in the rstan package to evaluate a standalone generated quantities block afterward.

There are a few places that you can optimise here, which will help cut down the runtime to something more manageable.

First, If you put the creation of `mu` in the `transformed parameters` block, you can re-use `mu` in the `generated quantities` without looping.

For the `beta_0` and `beta_1` parameters, there’s a `std_normal()` distribution that you could use.

The big slowdown with the generated quantities block is the three loops:

``````  for (n in 1:N_obs) mu[n] = beta_0 + beta_1[upc_id][n];
for (n in 1:N_obs) log_lik[n] = normal_lpdf(Price_norm[n] | beta_0 + beta_1[upc_id][n] , sigma_e0);
for (n in 1:N_obs) y_pred[n] = normal_rng(mu[n] , sigma_e0);
``````

Because each of these loops is being iterated `N_obs` times, the model has to iterate over 30000 times to generate these quantities. If you move the creation of `mu` from the `model` to the `transformed parameters` block, you can re-use it here and remove one loop. Then, the `normal_rng` function is vectorised, so can remove another loop and just declare:

``````real y_pred[N_obs] = normal_rng(mu, sigma_e0);
``````

After making these changes, the model runtime goes from 920 seconds to 240 seconds for me (using cmdstanr).

Full code here:

``````data {
// Define variables in data
// Number of level-1 observations (an integer)
int<lower=0> N_obs;
// level 1 categorial predictor
int upc_id[N_obs];
//Number of Level 1 categorial predictors
int<lower=0> N_upc;
// Continuous outcome
vector[N_obs] Price;
}

transformed data{
vector[N_obs] Price_norm = (Price-mean(Price))/sd(Price);
}

parameters {
// Population intercept
real beta_0;
// Population Slope- a different slope for each factor
vector[N_upc] beta_1;
// Level-1 errors
real<lower=0> sigma_e0;
}

transformed parameters {
vector[N_obs] mu = beta_0 + beta_1[upc_id];
}

model {
//priors
sigma_e0 ~ exponential(1);
beta_0 ~ std_normal();
beta_1 ~ std_normal();

Price_norm ~ normal(mu, sigma_e0);
}

generated quantities {
vector[N_obs] log_lik;
real y_pred[N_obs] = normal_rng(mu, sigma_e0);

for (n in 1:N_obs)
log_lik[n] = normal_lpdf(Price_norm[n] | mu[n], sigma_e0);
}
``````
2 Likes

Hi, thanks for the responses, but I’m still encountering a similar error, even using andrjohns code. I should add that I’m running the code on four chains with 6000 iterations each.

I have a fairly decent laptop (4 cores, 8 GB ram) so I don’t think this error should be happening.

Update: I was able to run 4 chains with 2000 iterations each. My RData file, which contains only these objects in the environment, is 2.8 GB, and I’m still getting the Tail ESS and Bulk ESS warnings.

Besides going to a more powerful computer, are there any other steps recommended?

If your main problem is memory, I can’t see what else to do other than running the generated quantities via R after sampling. And if you do that you can remove `mu` from the transformed parameters and put it back into the model block, where it’s a local variable and won’t be stored.

I would expect the ESS messages to go away if you ran your chains for longer.

The only other thing you could try is using `normal_id_glm` (https://mc-stan.org/docs/2_21/functions-reference/normal-id-glm.html), but that may only speed up the sampling, I don’t expect it to help with the errors/warnings you are seeing (but with that you won’t need to compute `mu`, so perhaps it could have an effect).

@mcol, (referencing @andrjohns code) the code runs quickly and without any errors when I pull mu out of the transformed quantites block and put it in the model. However, when trying to create generated quantities with gqs(), I get the error:

``````SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable "mu" does not exist.
error in 'model488420da1770_ee8e70e4920d7c95d9fb55e73f47629e' at line 13, column 36
``````

It seems to me that, for any generated quantity block run through `gqs()`, the parameters (transformed or not) need to be sampled so that they show up in the “draws” method for the `gqs()` function. If this is the case, how can I run the code in the way you mentioned- transferring the likelihood function for mu to the model block- and still run `gqs()`? I agree that this is the ideal solution, but I can’t get it to work. What am I missing?

To get around this, I was able to include the likelihood function in the transformed parameters block and sample with fewer iterations without error. Unfortunately, when I tried to generate quantities using the following script:

``````parameters {
vector[10598] mu;
// Level-1 errors
real<lower=0> sigma_e0;
}

generated quantities {
real y_pred[10598] = normal_rng(mu, sigma_e0);
}
``````

I encountered the error

``````Error in draws[, p_names, drop = FALSE] : subscript out of bounds
``````

This error has been addressed in the github issue here, but the issue is still relatively recent and in fact I am using the same version of RStan as the individual who posted the issue, 2.19.2. At this point, I don’t have a viable plan B approach so I’m asking for additional help. Thanks.

Didn’t you have a version in which `mu` was recomputed in the generated quantities? Does that work with `gqs`?

If I move mu from the transformed parameters to the generated quantities block in the initial sampling, and then I try `gqs()` with the code in my previous post, I still encounter the same error.

I would not store `mu`, as in

``````generated quantities {
vector[N_obs] log_lik;
vector[N_obs] y_pred;
for (n in 1:N_obs) {
real mu = beta_0 + beta_1[upc_id][n];
log_lik[n] = normal_lpdf(Price_norm[n] | mu, sigma_e0);
y_pred[n] = normal_rng(mu, sigma_e0);
}
}
``````

@bgoodri thanks, I was able to implement the script without the “Error in draws”, but that actually gets me back to the original error. At this point, I think I’ll proceed with running the model in the cloud. Thanks for your help everyone.