Long waiting time before the first iteration when Including an array of matrices in Generated Quantities

anamartinovici · January 3, 2019, 1:37pm

Hi,

I am using K-fold cross-validation for model comparison, so I need to use the generated quantities block to get out-of-sample predictions. I have done this before using Stan and it went just fine. However, I am now running into some weird problem:

Once I add an array of matrices in the generated quantities, it takes a very long time until the first iteration.
Can you please let me know what I can do to improve this?

I find this weird because I though that the generated quantities block is only executed once per iteration. So if the gq block is somehow computationally intensive or inefficient, this would increase the time between iterations. But it shouldn’t influence the time before the first iteration.

Description of the example.

estimating a very simple linear model: y = \alpha + \beta * x + \epsilon
in the generated quantities block, I include an array of matrices (or vectors) and fill it with zeros. The zero matrix (vector) is created in transformed data and then just assigned to the array.
stan.R generates data, compiles stan code, and samples
stan_wo_gq.stan: estimates \alpha, \beta, and \sigma (the sd of \epsilon) but doesn’t have the generated quantities block
stan_w_gq_matrix.stan: estimates the same same model, but creates and array of matrices in the gq block
stan_w_gq_vector.stan: estimates the same same model, but creates and array of vectors in the gq block

stan.R (5.2 KB)
stan_w_gq_matrix.stan (1.0 KB)
stan_w_gq_vector.stan (1.1 KB)
stan_wo_gq.stan (877 Bytes)

Without any gq, the estimation is very fast:

> print(get_elapsed_time(res_wo_gq))
        warmup sample
chain:1  0.422  0.314
chain:2  0.416  0.284

With the gq block:

transformed data {
    vector[m] O_m;
    matrix[m, n] O_m_n;
    
    O_m   = rep_vector(0.0, m);
    O_m_n = rep_matrix(0.0, m, n);
}
...
generated quantities {
    matrix[m, n] sim_matrix[p, q];
    
    for(i in 1:p) {
        for(j in 1:q) {
            sim_matrix[i, j] = O_m_n;
        }
    }
    print("End gq ");
}

For matrix[70, 5] sim_matrix[20, 50], it takes way to long until the first iteration (more than 30 minutes).
For matrix[70, 5] sim_matrix[20, 20], it takes about 5 minutes until the first iteration, then about 20 times longer for estimation (as compared to wo_gq). In total, it takes about 9 minutes (it seems to get stuck also after sampling is done).

> print(get_elapsed_time(res_w_gq_matrix))
        warmup sample
chain:1  2.098  2.062
chain:2  2.197  2.173

This is what I have tried so far:

the good old Google search. I’ve found some previous discussions that seem related to this, but I didn’t find an answer and it’s not clear if this issue has been solved yet (It is closed on GitHub).
https://github.com/stan-dev/stan/issues/2516
added print statements in the stan code to see where it gets stuck.
tried with an array of vectors instead or matrices, but kept the same number of values to be stored.
played with the dimension of the matrices and of the arrays, keeping the same number of values
initialized the array in the gq
used pars = c("sim_matrix"), include = TRUE
Nothing seems to work. I would need 3 of these arrays with dims matrix[70, 5] sim_matrix[20, 50]. Is that too much? I would think the PC I am using has enough RAM (64GB) and during estimation it never went even close to the max (it goes to 13GB at most).

I would appreciate some advice on what I can do to speed this up, while still getting the simulated values I need.

Ana

Viewer pane stays like this for about 30 minutes before the first iteration:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.18.2         StanHeaders_2.18.0-1 ggplot2_3.1.0        loo_2.0.0           

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0         pillar_1.3.1       compiler_3.5.1     plyr_1.8.4         bindr_0.1.1        prettyunits_1.0.2  tools_3.5.1        pkgbuild_1.0.2    
 [9] tibble_1.4.2       gtable_0.2.0       pkgconfig_2.0.2    rlang_0.3.0.1      cli_1.0.1          rstudioapi_0.8     yaml_2.2.0         parallel_3.5.1    
[17] bindrcpp_0.2.2     gridExtra_2.3      withr_2.1.2        dplyr_0.7.8        stats4_3.5.1       grid_3.5.1         tidyselect_0.2.5   glue_1.3.0        
[25] inline_0.3.15      R6_2.3.0           processx_3.2.1     purrr_0.2.5        callr_3.1.1        magrittr_1.5       codetools_0.2-16   scales_1.0.0      
[33] ps_1.3.0           matrixStats_0.54.0 assertthat_0.2.0   colorspace_1.3-2   lazyeval_0.2.1     munsell_0.5.0      crayon_1.3.4

bgoodri · January 3, 2019, 2:49pm

There isn’t a lot you can do. It just takes a long time to recursively allocate space to hold those draws as a list of list of scalars.

anamartinovici · January 3, 2019, 3:22pm

Hi,

Thank you for your prompt reply.
I was hoping there’s some error in my code that can easily be fixed. Or some trick that can speed this up. But I guess sometimes all one can do is have patience and wait.

In the github issue conversation, bletham says that “It runs instantly in cmdstan, so it must be the interface”. Do you know if that’s the case? If R/RStudio/Rstan is what keeps this so slow, then I will just use cmdstan.

Ana

bgoodri · January 3, 2019, 3:25pm

Yes, it is related to keeping the draws in memory.

anamartinovici · January 4, 2019, 9:58am

Quick update, maybe useful for others who are facing similar problems.

I have just run this using cmdstan and the difference in speed is amazing. For the example included in this post (2 chains, 500 iterations, 50:50 warmup:sample, matrix[70, 5] sim_matrix[20, 20] it took only a few seconds:

It does indeed start to run instantly in cmdstan. For the larger array (70x5x20x50) it took a bit longer, but nothing as compared to the 1.5 hours that it took before.

Topic		Replies	Views
Encountering "Error: cannot allocate vector of size 174 Kb" RStan	9	2015	January 25, 2020
Question regarding generated quantities matrix output from cmdstanr / posterior Interfaces cmdstanr	4	524	November 6, 2020
Generated quantities from model block (i.e. not storing transformed parameters or computing predictions twice) General	5	1945	June 12, 2018
Stan running is so low General	5	781	June 15, 2018
"Fixed_param" mode agonisingly slow General	7	1039	August 3, 2019

Long waiting time before the first iteration when Including an array of matrices in Generated Quantities

Related topics