Hello, I am having very slow performance for time series modeling that involves matrix operations. I am showing a 10x slowdown over the equivalent operations with reals.
I have searched topics and found that perhaps it has to do with the compiler, but I don’t really know anything about compilers or configuring them. I’ve given some information about my setup at the end of this post.
The performance drop with matrix algebra is fairly prohibitive of the hierarchical state-space modeling that I am trying to do, which requires the kalman-bucy matrix operations at each time t.
I have modified the arma11.stan demo to create a minimal example. I added a simple line of algebra in the for loop. In one model, it operates on reals, and in the other, it operates on 1x1 matrices. The slowdown from 1.2s to 13s seems disproportionate.
The problem is identical between Rtools 35 and Rtools 40 / R 3.6.3 and R 4.0.0.
Is this typical, and is there any way to speed up matrix operations?
Model with reals:
data {
int<lower=1> T; // number of observations
real y[T]; // observed outputs
}
parameters {
real mu; // mean term
real phi; // autoregression coeff
real<lower=0> sigma; // noise scale
}
model {
vector[T] nu; // prediction for time t
vector[T] err; // error for time t
//ADDED TEST VARS
real b[T];
real a;
a = 1.0;
b[1] =1.0;
nu[1] <- mu + phi * mu; // assume err[0] == 0
err[1] <- y[1] - nu[1];
for (t in 2:T) {
//ADDED TEST ALGEBRA
b[t] = a*a'+t;
nu[t] <- mu + phi * y[t-1];
err[t] <- y[t] - nu[t];
}
// priors
mu ~ normal(0,10);
phi ~ normal(0,2);
sigma ~ cauchy(0,5);
// likelihood
err ~ normal(0,sigma);
}
Model with 1x1 matrices:
data {
int<lower=1> T; // number of observations
real y[T]; // observed outputs
}
parameters {
real mu; // mean term
real phi; // autoregression coeff
real<lower=0> sigma; // noise scale
}
model {
vector[T] nu; // prediction for time t
vector[T] err; // error for time t
//ADDED TEST VARS
matrix[1,1] b[T];
matrix[1,1] a;
a[1,1]= 1.0;
b[1][1,1] =1.0;
nu[1] <- mu + phi * mu; // assume err[0] == 0
err[1] <- y[1] - nu[1];
for (t in 2:T) {
//ADDED TEST ALGEBRA
b[t] = a*a'+t;
nu[t] <- mu + phi * y[t-1];
err[t] <- y[t] - nu[t];
}
// priors
mu ~ normal(0,10);
phi ~ normal(0,2);
sigma ~ cauchy(0,5);
// likelihood
err ~ normal(0,sigma);
}
R script for generating data and comparing the two models:
library(rstan)
Sys.setenv(LOCAL_CPPFLAGS = '-march=corei7 -mtune=corei7') #Recommended when rstan is called (PC has i7 3770)
options(mc.cores = parallel::detectCores())
m1<-stan_model("ar1_test_real.stan")
m2<-stan_model("ar1_test_matrix.stan")
mu <- -1.25;
sigma <- 0.75;
phi <- 0.2;
Time <- 1000;
err <- rnorm(Time,0,sigma);
nu <- rep(0,Time);
y <- rep(0,Time);
y[1] <- err[1] + mu + phi * mu;
for (t in 2:Time) y[t] <- err[t] + (mu + phi * y[t-1])
fit1 <- sampling(m1, data=list("T"=Time, "y"=y), chains=1, iter=2000, warmup=1000, seed=123)
fit2 <- sampling(m2, data=list("T"=Time, "y"=y),chains=1, iter=2000, warmup=1000, seed=123)
Console output with run times:
> fit1 <- sampling(m1, data=list("T"=Time, "y"=y), chains=1, iter=2000, warmup=1000, seed=123)
SAMPLING FOR MODEL 'ar1_test_real' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.637 seconds (Warm-up)
Chain 1: 0.607 seconds (Sampling)
Chain 1: 1.244 seconds (Total)
Chain 1:
> fit2 <- sampling(m2, data=list("T"=Time, "y"=y),chains=1, iter=2000, warmup=1000, seed=123)
SAMPLING FOR MODEL 'ar1_test_matrix' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.001 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 6.953 seconds (Warm-up)
Chain 1: 6.616 seconds (Sampling)
Chain 1: 13.569 seconds (Total)
Chain 1:
- Operating System: Windows 7
- RStan Version: 2.19.3
- Output of
writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))
:
CXX14FLAGS=-O3 -mtune=corei7 -march=corei7 -Wno-unused-variable -Wno-unused-function
Without really knowing anything about compilers, I did change -O2 flags to -O3 before running the above experiment.
Here are some of the modified flags in Makeconf
CXX11 = $(BINPREF)g++ $(M_ARCH)
CXX11FLAGS = -O3 -Wall $(DEBUGFLAG) -mfpmath=sse -msse2 -mstackrealign
CXX11PICFLAGS =
CXX11STD = -std=gnu++11
## these settings are for gcc >= 8
CXX14 = $(CXX11)
CXX14FLAGS = $(CXX11FLAGS)
CXX14PICFLAGS =
CXX14STD = -std=gnu++14
- Output of
devtools::session_info("rstan")
- Session info -----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 4.0.0 (2020-04-24)
os Windows 7 x64 SP 1
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/New_York
date 2020-05-31
- Packages ---------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
backports 1.1.6 2020-04-05 [1] CRAN (R 4.0.0)
BH 1.72.0-3 2020-01-08 [1] CRAN (R 4.0.0)
callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.0)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.0)
ggplot2 * 3.3.1 2020-05-28 [1] CRAN (R 4.0.0)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
inline 0.3.15 2018-05-18 [1] CRAN (R 4.0.0)
isoband 0.2.1 2020-04-12 [1] CRAN (R 4.0.0)
labeling 0.3 2014-08-23 [1] CRAN (R 4.0.0)
lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.0)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
loo 2.2.0 2019-12-19 [1] CRAN (R 4.0.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
MASS 7.3-51.5 2019-12-20 [1] CRAN (R 4.0.0)
Matrix 1.2-18 2019-11-27 [1] CRAN (R 4.0.0)
matrixStats 0.56.0 2020-03-13 [1] CRAN (R 4.0.0)
mgcv 1.8-31 2019-11-09 [1] CRAN (R 4.0.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
nlme 3.1-147 2020-04-13 [1] CRAN (R 4.0.0)
pillar 1.4.4 2020-05-05 [1] CRAN (R 4.0.0)
pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 4.0.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
praise 1.0.0 2015-08-11 [1] CRAN (R 4.0.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.0)
Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
RcppEigen 0.3.3.7.0 2019-11-16 [1] CRAN (R 4.0.0)
rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
rstan * 2.19.3 2020-02-11 [1] CRAN (R 4.0.0)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
StanHeaders * 2.19.2 2020-02-11 [1] CRAN (R 4.0.0)
testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
tibble 3.0.1 2020-04-20 [1] CRAN (R 4.0.0)
utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
vctrs 0.3.0 2020-05-11 [1] CRAN (R 4.0.0)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 4.0.0)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
[1] D:/Program Files/R/R-4.0.0/library