Hi,

I have a hierarchical linear model which takes a rather long time to fit properly. So far, I’ve managed to reduce the runtime to 21-26 hours (while maintaining convergence indicators), but recently I’ve hit a wall in terms of speed improvement. This is after a few months of testing different samples of the data and different code enhancements (mainly from the Stan Reference Manual and User Guide).

I’d love to hear everyone’s ideas on:

- In theory, what is the lowest runtime I can expect to achieve for this model, using existing Stan software and assuming I have unlimited compute resources?
- What other enhancements could I make to my code to speed up the model?
- What sort of compute setup would make this model faster?

This is a textbook example of simple hierarchical design. I am trying to predict a normal continuous outcome, with two sets of random effects and a few fixed effects. Each set of random effects corresponds to a natural grouping level in my data (technically, the groupings are fully nested, although for simplicity, I model them as crossed effects). For each of the random effects, I am modeling a set of intercepts and a set of slopes (partially-pooled, NCP with a Cholesky factorization).

Here is the Stan code, with a description of each input:

code.stan (4.1 KB)

I fit the model as follows:

model_7 <- rstan::stan(file=‘code.stan’,

data=data_final_5_stan,

seed=seed,

chains=4,

iter=4000,

warmup=3000,

control=list(adapt_delta=0.999999999,

max_treedepth=13))

Convergence diagnostics look OK to me but could be better. Highest Rhat is 1.07 and the lowest n_eff is 53. No divergent iterations and the posterior predictive check looks good. However, adapt_delta needs to be very high in order to keep Rhat within bounds.

I did some preprocessing before sending the data to Stan, so the magnitude of my predictors should not be an issue. The fixed effects are centered, scaled, and transformed to a normal shape. The target variable was logged to reduce skew and has a mean of 10.6 and a standard deviation of 0.95.

There is a lot of computation in the generated quantities block (two sets of predictions plus log_lik), but my understanding is that this block does not add much overhead compared to the model block.

I suspect what is consuming the most time here is the shape of my data, particularly the random effects. My first grouping variable has 2600+ unique levels of all different sizes. Unfortunately, the sizes of the groups are very right-skewed (majority <25 but up to 650 observations in one group). Since a different amount of pooling is happening in each group, I wonder if this is driving up the runtime.

My machine is a Windows server with 256 GB of RAM and a 32-core CPU E5. Rstan was set up with the recommended options:

```
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
```

The content of Makevars.win is:

```
CXX14FLAGS=-O3 -march=native -mtune=native
CXX11FLAGS=-O3 -march=native -mtune=native
```

The output of devtools::session_info(“rstan”) is:

```
- Session info -------------------------------------------------------------------------------
setting value
version R version 3.5.2 (2018-12-20)
os Windows Server 2008 R2 x64 SP 1
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Denver
date 2019-07-09
- Packages -----------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.1)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2)
BH 1.69.0-1 2019-01-07 [1] CRAN (R 3.5.2)
callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.2)
cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.2)
colorspace 1.4-0 2019-01-13 [1] CRAN (R 3.5.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.2)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.2)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.2)
ggplot2 * 3.1.0 2018-10-25 [1] CRAN (R 3.5.2)
glue 1.3.0 2018-07-17 [1] CRAN (R 3.5.1)
gridExtra 2.3 2017-09-09 [1] CRAN (R 3.5.1)
gtable 0.2.0 2016-02-26 [1] CRAN (R 3.5.1)
inline 0.3.15 2018-05-18 [1] CRAN (R 3.5.1)
labeling 0.3 2014-08-23 [1] CRAN (R 3.5.0)
lattice * 0.20-38 2018-11-04 [2] CRAN (R 3.5.2)
lazyeval 0.2.1 2017-10-29 [1] CRAN (R 3.5.1)
loo * 2.0.0 2018-04-11 [1] CRAN (R 3.5.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1)
MASS 7.3-51.1 2018-11-01 [2] CRAN (R 3.5.2)
Matrix 1.2-15 2018-11-01 [2] CRAN (R 3.5.2)
matrixStats 0.54.0 2018-07-23 [1] CRAN (R 3.5.1)
mgcv 1.8-26 2018-11-21 [2] CRAN (R 3.5.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.1)
nlme 3.1-137 2018-04-07 [2] CRAN (R 3.5.2)
pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2)
pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.2)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.1)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1)
processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.2)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2)
R6 2.3.0 2018-10-04 [1] CRAN (R 3.5.2)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.5.0)
Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.2)
RcppEigen 0.3.3.5.0 2018-11-24 [1] CRAN (R 3.5.2)
reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.1)
rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)
rstan * 2.18.2 2018-11-07 [1] CRAN (R 3.5.2)
scales * 1.0.0 2018-08-09 [1] CRAN (R 3.5.1)
StanHeaders * 2.18.1 2019-01-28 [1] CRAN (R 3.5.2)
stringi 1.2.4 2018-07-20 [1] CRAN (R 3.5.1)
stringr * 1.3.1 2018-05-10 [1] CRAN (R 3.5.1)
tibble * 2.0.1 2019-01-12 [1] CRAN (R 3.5.2)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.1)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.5.1)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)
[1] C:/Users/dcesar/Documents/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.2/library
```

Any suggestions for improvement welcome. Thanks in advance.