Hello,
I recently switched from rstan to cmdstanr for a modeling project with a large number of parameters. The csv files output from cmdstanr take a long time to load into R using “as_cmdstan_fit” and “read_cmdstan_csv”. I’ve been killing the job after a couple hours. After running some tests, I discovered this problem occurs when I have the generated quantities block included in my stan model.
For example, when I use the following model without generated quantities:
data {
int<lower=0> N;
}
parameters {
vector[N] big_mat[N];
}
model {
for (n in 1:N)
big_mat[n] ~ std_normal();
}
The csv files load into R in 3.602 seconds with as_cmdstan_fit.
However, when I add the generated quantities block to the model:
data {
int<lower=0> N;
}
parameters {
vector[N] big_mat[N];
}
model {
for (n in 1:N)
big_mat[n] ~ std_normal();
}
generated quantities {
vector[N] big_mat2[N];
for(n in 1:N) {
big_mat2[n] = big_mat[n] + 1;
}
}
The csv files take 6.071 seconds to load into R using as _cmdstan_fit.
I have tested this with my actual model, which saves draws for many more parameters than this example. When generated quantities section is commented out, the files load instantly. When generated quantities is included the files get stuck reading in, use all of the available memory and after a couple hours I shut it down.
Our current solution is to calculate the generated quantities we need in R instead of in the stan model, so that the csv files from cmdstanr are usable. However, we are wondering if anyone can shed light on this issue?