Here are some benchmarks on a smaller data set, using R-3.5.0 (which from my benchmarks on the PR, seems like it is roughly twice as fast as R-3.4.x).
rstan 2.17:
> summary(replicate(10,system.time(x <- read_stan_csv("output.csv"))[1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.221 1.237 1.242 1.255 1.269 1.306
my PR
> summary(replicate(10,system.time(x <- read_stan_csv("output.csv"))[1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.002 1.004 1.017 1.008 1.075
My branch using matrices rather than data frames:
> summary(replicate(10,system.time(x <- read_stan_csv("output.csv"))[1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8700 0.8740 0.8830 0.8819 0.8858 0.8980
The two above do some other stuff to create the RStan object, including parsing the comments, splitting off the diagnostic parameters, and computing means for the parameters.
Other functions that just read the CSV.
> summary(replicate(10,system.time(x <- scan("output.csv",comment.char="#",skip=39,sep=",",quiet=TRUE))[1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2770 0.2802 0.2835 0.2827 0.2848 0.2880
> summary(replicate(10,system.time(x <- fread('cat output.csv | sed "/^[^#]/!d"', sep=','))[1]))
/^[^#]/!d"', sep=','))[1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1780 0.1820 0.1835 0.1836 0.1847 0.1900
> summary(replicate(10,system.time(x <- fread('grep -v "^#" output.csv', sep=','))[1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1830 0.1847 0.1885 0.1878 0.1908 0.1920
@sakrejda’s read_stan_csv
summary(replicate(10,system.time(x <- stannis::read_stan_csv("output.csv"))[1]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1850 0.1872 0.1880 0.1885 0.1905 0.1910
So with R-3.5, the difference between stannis and scan drops from 3.5x to 1.5x. So I think the big difference that @sakrejda was noticing was due to buffering in his implementation, and not in R-3.4.2’s scan implementation.
Looking at the profiling in my code, about 50% of the time is spent in doing other things to create the stanfit object, such as formatting the names properly, calculating the parameter means, etc. This is on 100 samples for 10,000 parameters, so that may change for different dimensions of the dataset.
Using the more efficient CSV reader/data frame builder could potentially improve the performance an additional 30% or so (maybe more since it looks like the stannis code does some additional reshaping that rstan does not) and still create the same Rstan stanfit objects.