Hi, I have encounterd an error “Error: cannot allocate vector of size 174 Kb.” I’m running Windows 10, 64 bit R, RStan Version 2.19.2. Note that this only happens WHEN I’M INCLUDING THE GENERATED QUANTITIES BLOCK, which makes me think that I may be doing wrong something there.
Am I writing the generated quantities correctly? The model samples MUCH faster without the generated quantities. Basically, I’m trying to create generated quantities for each mean, predicted interval and log likelihood.
data {
// Define variables in data
// Number of level-1 observations (an integer)
int<lower=0> N_obs;
// level 1 categorial predictor
int upc_id[N_obs];
//Number of Level 1 categorial predictors
int<lower=0> N_upc;
// Continuous outcome
vector[N_obs] Price;
}
transformed data{
vector[N_obs] Price_norm;
Price_norm = (Price-mean(Price))/sd(Price);
}
parameters {
// Population intercept
real beta_0;
// Population Slope- a different slope for each factor
vector[N_upc] beta_1;
// Level-1 errors
real<lower=0> sigma_e0;
}
model {
vector[N_obs] mu;
mu = beta_0 + beta_1[upc_id];
Price_norm ~ normal(mu, sigma_e0);
//priors
sigma_e0 ~ exponential(1);
beta_0 ~ normal(0, 1);
beta_1 ~ normal(0, 1);
}
generated quantities {
vector[N_obs] log_lik;
vector[N_obs] y_pred;
vector[N_obs] mu;
for (n in 1:N_obs) mu[n] = beta_0 + beta_1[upc_id][n];
for (n in 1:N_obs) log_lik[n] = normal_lpdf(Price_norm[n] | beta_0 + beta_1[upc_id][n] , sigma_e0);
for (n in 1:N_obs) y_pred[n] = normal_rng(mu[n] , sigma_e0);
}
It is possible that with your dataset, the generated quantities block is just enough to make it run out of RAM. You may be able to proceed without the generated quantities block, and then use the gqs function in the rstan package to evaluate a standalone generated quantities block afterward.
There are a few places that you can optimise here, which will help cut down the runtime to something more manageable.
First, If you put the creation of mu in the transformed parameters block, you can re-use mu in the generated quantities without looping.
For the beta_0 and beta_1 parameters, there’s a std_normal() distribution that you could use.
The big slowdown with the generated quantities block is the three loops:
for (n in 1:N_obs) mu[n] = beta_0 + beta_1[upc_id][n];
for (n in 1:N_obs) log_lik[n] = normal_lpdf(Price_norm[n] | beta_0 + beta_1[upc_id][n] , sigma_e0);
for (n in 1:N_obs) y_pred[n] = normal_rng(mu[n] , sigma_e0);
Because each of these loops is being iterated N_obs times, the model has to iterate over 30000 times to generate these quantities. If you move the creation of mu from the model to the transformed parameters block, you can re-use it here and remove one loop. Then, the normal_rng function is vectorised, so can remove another loop and just declare:
real y_pred[N_obs] = normal_rng(mu, sigma_e0);
After making these changes, the model runtime goes from 920 seconds to 240 seconds for me (using cmdstanr).
Full code here:
data {
// Define variables in data
// Number of level-1 observations (an integer)
int<lower=0> N_obs;
// level 1 categorial predictor
int upc_id[N_obs];
//Number of Level 1 categorial predictors
int<lower=0> N_upc;
// Continuous outcome
vector[N_obs] Price;
}
transformed data{
vector[N_obs] Price_norm = (Price-mean(Price))/sd(Price);
}
parameters {
// Population intercept
real beta_0;
// Population Slope- a different slope for each factor
vector[N_upc] beta_1;
// Level-1 errors
real<lower=0> sigma_e0;
}
transformed parameters {
vector[N_obs] mu = beta_0 + beta_1[upc_id];
}
model {
//priors
sigma_e0 ~ exponential(1);
beta_0 ~ std_normal();
beta_1 ~ std_normal();
Price_norm ~ normal(mu, sigma_e0);
}
generated quantities {
vector[N_obs] log_lik;
real y_pred[N_obs] = normal_rng(mu, sigma_e0);
for (n in 1:N_obs)
log_lik[n] = normal_lpdf(Price_norm[n] | mu[n], sigma_e0);
}
Hi, thanks for the responses, but I’m still encountering a similar error, even using andrjohns code. I should add that I’m running the code on four chains with 6000 iterations each.
I have a fairly decent laptop (4 cores, 8 GB ram) so I don’t think this error should be happening.
Update: I was able to run 4 chains with 2000 iterations each. My RData file, which contains only these objects in the environment, is 2.8 GB, and I’m still getting the Tail ESS and Bulk ESS warnings.
Besides going to a more powerful computer, are there any other steps recommended?
If your main problem is memory, I can’t see what else to do other than running the generated quantities via R after sampling. And if you do that you can remove mu from the transformed parameters and put it back into the model block, where it’s a local variable and won’t be stored.
I would expect the ESS messages to go away if you ran your chains for longer.
The only other thing you could try is using normal_id_glm (https://mc-stan.org/docs/2_21/functions-reference/normal-id-glm.html), but that may only speed up the sampling, I don’t expect it to help with the errors/warnings you are seeing (but with that you won’t need to compute mu, so perhaps it could have an effect).
@mcol, (referencing @andrjohns code) the code runs quickly and without any errors when I pull mu out of the transformed quantites block and put it in the model. However, when trying to create generated quantities with gqs(), I get the error:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable "mu" does not exist.
error in 'model488420da1770_ee8e70e4920d7c95d9fb55e73f47629e' at line 13, column 36
It seems to me that, for any generated quantity block run through gqs(), the parameters (transformed or not) need to be sampled so that they show up in the “draws” method for the gqs() function. If this is the case, how can I run the code in the way you mentioned- transferring the likelihood function for mu to the model block- and still run gqs()? I agree that this is the ideal solution, but I can’t get it to work. What am I missing?
To get around this, I was able to include the likelihood function in the transformed parameters block and sample with fewer iterations without error. Unfortunately, when I tried to generate quantities using the following script:
Error in draws[, p_names, drop = FALSE] : subscript out of bounds
This error has been addressed in the github issue here, but the issue is still relatively recent and in fact I am using the same version of RStan as the individual who posted the issue, 2.19.2. At this point, I don’t have a viable plan B approach so I’m asking for additional help. Thanks.
If I move mu from the transformed parameters to the generated quantities block in the initial sampling, and then I try gqs() with the code in my previous post, I still encounter the same error.
@bgoodri thanks, I was able to implement the script without the “Error in draws”, but that actually gets me back to the original error. At this point, I think I’ll proceed with running the model in the cloud. Thanks for your help everyone.