So I have a small model that is fast to run on its own, but I need to run it on a large number of different data sets.
In addition, I am trying to release a package that does this. When I developed my code outside of a package environment everything was working fine and dandy, but now when I am trying to build my package with rstantools
, it no longer samples the model.
In particular, I’ve used the multidplyr
package to run the model on several data-sets as I am comfortable with this package and generally see a better speedup over other parallel backends.
Here is the best MWE I could come up with to build the package and showcase the issue:
library("rstantools")
rstan_create_package(path = 'rstanlm')
setwd("rstanlm")
file.remove('Read-and-delete-me')
lm_file <- file('inst/stan/lm.stan')
writeLines(
'// Save this file as inst/stan/lm.stan
data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}
parameters {
real intercept;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(intercept + beta * x, sigma);
}
',
lm_file
)
close(lm_file)
rm(lm_file)
r_file <- file('R/lm_stan.R')
writeLines(
"#' Bayesian linear regression with Stan
#'
#' @export
#' @param x Numeric vector of input values.
#' @param y Numeric vector of output values.
#' @param ... Arguments passed to `rstan::sampling` (e.g. iter, chains).
#' @return An object of class `stanfit` returned by `rstan::sampling`
#'
lm_stan <- function(x, y, model = stanmodels$lm, cluster = 1, ...) {
standata <- list(x = x, y = y, N = length(y))
tib <- tibble::tibble(
data = list(standata, standata)
)
if(cluster != 1){
# Make new cluster
cl <- multidplyr::new_cluster(2)
# Add needed packages
multidplyr::cluster_library(cl,
c(
'dplyr',
'purrr',
'rstan',
'StanHeaders',
'rstantools'
)
)
# Add stan models
multidplyr::cluster_copy(cl,
c(
'model',
'soft_wrapper'
)
)
model_from_packge <- stringr::word(deparse(substitute(model)), 2, sep = '\\$')
if(!is.na(model_from_packge)){
multidplyr::cluster_copy(cl,
paste0(
'rstantools_model_',
model_from_packge
)
)
}
tib <- tib %>%
# Partition the data over the clusters
multidplyr::partition(cl)
}
tib %>%
# Add column with models
dplyr::mutate(
model = map(data, soft_wrapper, model)
) %>%
collect()
}
soft_wrapper <- function(data, model){
rstan::sampling(model, data = data)
}
",
r_file
)
close(r_file)
rm(r_file)
example(source)
try(roxygen2::roxygenize(load_code = sourceDir), silent = TRUE)
roxygen2::roxygenize()
devtools::load_all()
# Works!
lm_stan(x = 1:10, y = 2+2*(1:10), cluster = 1)
# Fails
lm_stan(x = 1:10, y = 2+2*(1:10), cluster = 2)
.Last.value$model
[[1]]
Stan model 'lm' does not contain samples.
[[2]]
Stan model 'lm' does not contain samples.
When I run the code using a model compiled outside of the package as a variable it works just fine.
I.e., if I do the following:
model <- rstan::stan_model('inst/stan/lm.stan')
# Works
lm_stan(x = 1:10, y = 2+2*(1:10), model = model, cluster = 2)
.Last.value$model
[[1]]
Inference for Stan model: lm.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
intercept 2.00 0.00 0.00 2.00 2.00 2.0 2.00 2.00 2202 1.00
beta 2.00 0.00 0.00 2.00 2.00 2.0 2.00 2.00 2245 1.00
sigma 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.00 3 4.62
lp__ 110.32 2.15 3.54 102.87 108.27 110.4 113.34 115.62 3 3.10
Samples were drawn using NUTS(diag_e) at Fri May 27 14:14:25 2022.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
[[2]]
Inference for Stan model: lm.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
intercept 2.00 0.00 0.00 2.00 2.0 2.00 2.00 2.00 2076 1.00
beta 2.00 0.00 0.00 2.00 2.0 2.00 2.00 2.00 2368 1.00
sigma 0.00 0.00 0.00 0.00 0.0 0.00 0.00 0.00 2 5.35
lp__ 111.46 1.84 2.92 106.17 108.6 111.72 113.24 116.78 3 2.56
Samples were drawn using NUTS(diag_e) at Fri May 27 14:14:25 2022.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Any help in resolving this issue would be highly appreciated.
Edit (sessionInfo):
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 gert_1.5.0 multidplyr_0.1.1.9000 prettyunits_1.1.1 ps_1.6.0 rprojroot_2.0.3
[7] utf8_1.2.2 R6_2.5.1 sys_3.4 stats4_4.0.5 ggplot2_3.3.5 pillar_1.7.0
[13] rlang_1.0.2 rstudioapi_0.13 callr_3.7.0 desc_1.4.1 devtools_2.4.3 stringr_1.4.0
[19] loo_2.5.1 munsell_0.5.0 compiler_4.0.5 xfun_0.30 rstan_2.21.5 pkgconfig_2.0.3
[25] askpass_1.1 pkgbuild_1.3.1 openssl_2.0.0 tidyselect_1.1.2 tibble_3.1.6 gridExtra_2.3
[31] roxygen2_7.2.0 codetools_0.2-18 matrixStats_0.62.0 fansi_1.0.2 crayon_1.5.0 dplyr_1.0.8
[37] withr_2.5.0 brio_1.1.3 grid_4.0.5 gtable_0.3.0 lifecycle_1.0.1 magrittr_2.0.2
[43] credentials_1.3.2 StanHeaders_2.21.0-7 scales_1.1.1 RcppParallel_5.1.5 cli_3.3.0 stringi_1.7.6
[49] cachem_1.0.6 fs_1.5.2 remotes_2.4.2 testthat_3.1.2 xml2_1.3.3 ellipsis_0.3.2
[55] generics_0.1.2 vctrs_0.3.8 tools_4.0.5 glue_1.6.2 purrr_0.3.4 processx_3.5.2
[61] pkgload_1.2.4 parallel_4.0.5 fastmap_1.1.0 inline_0.3.19 colorspace_2.0-3 sessioninfo_1.2.2
[67] memoise_2.0.1 knitr_1.37 usethis_2.1.5
>devtools::session_info('rstan')
- Session info --------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 4.0.5 (2021-03-31)
os Windows 10 x64 (build 19042)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Chicago
date 2022-05-27
rstudio 2022.02.0+443 Prairie Trillium (desktop)
pandoc NA
- Packages ------------------------------------------------------------------------------------------------------------------------------------------------
package * version date (UTC) lib source
backports 1.4.1 2021-12-13 [1] CRAN (R 4.0.5)
BH 1.78.0-0 2021-12-15 [1] CRAN (R 4.0.5)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.5)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.5)
cli 3.3.0 2022-04-25 [1] CRAN (R 4.0.5)
colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.0.5)
crayon 1.5.0 2022-02-14 [1] CRAN (R 4.0.5)
desc 1.4.1 2022-03-06 [1] CRAN (R 4.0.5)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.0.5)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
fansi 1.0.2 2022-01-14 [1] CRAN (R 4.0.5)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.5)
ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.0.5)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.0.5)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.5)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.5)
inline 0.3.19 2021-05-31 [1] CRAN (R 4.0.5)
isoband 0.2.5 2021-07-13 [1] CRAN (R 4.0.5)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.3)
lattice 0.20-45 2021-09-22 [2] CRAN (R 4.0.5)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.5)
loo 2.5.1 2022-03-24 [1] CRAN (R 4.0.5)
magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.0.5)
MASS 7.3-55 2022-01-13 [2] CRAN (R 4.0.5)
Matrix 1.4-0 2021-12-08 [2] CRAN (R 4.0.5)
matrixStats 0.62.0 2022-04-19 [1] CRAN (R 4.0.5)
mgcv 1.8-39 2022-02-24 [2] CRAN (R 4.0.5)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.5)
nlme 3.1-155 2022-01-13 [2] CRAN (R 4.0.5)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.0.5)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.0.5)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.5)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.5)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.0.5)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.5)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.0.5)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.3)
Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.0.5)
RcppEigen 0.3.3.9.2 2022-04-08 [1] CRAN (R 4.0.5)
RcppParallel 5.1.5 2022-01-05 [1] CRAN (R 4.0.5)
rlang 1.0.2 2022-03-04 [1] CRAN (R 4.0.5)
rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.5)
rstan 2.21.5 2022-04-11 [1] CRAN (R 4.0.5)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.5)
StanHeaders 2.21.0-7 2020-12-17 [1] CRAN (R 4.0.5)
tibble 3.1.6 2021-11-07 [1] CRAN (R 4.0.5)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.0.5)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.0.5)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.0.5)
[1] C:/Users/pb1015/Documents/R/win-library
[2] C:/R/R-4.0.5/library
-----------------------------------------------------------------------------------------------------------------------------------------------------------
> devtools::session_info('rstantools')
- Session info --------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 4.0.5 (2021-03-31)
os Windows 10 x64 (build 19042)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Chicago
date 2022-05-27
rstudio 2022.02.0+443 Prairie Trillium (desktop)
pandoc NA
- Packages ------------------------------------------------------------------------------------------------------------------------------------------------
package * version date (UTC) lib source
cli 3.3.0 2022-04-25 [1] CRAN (R 4.0.5)
desc 1.4.1 2022-03-06 [1] CRAN (R 4.0.5)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.0.5)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.0.5)
Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.0.5)
RcppParallel 5.1.5 2022-01-05 [1] CRAN (R 4.0.5)
rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.5)
rstantools 2.2.0.9000 2022-05-18 [1] Github (stan-dev/rstantools@fd53b22)
[1] C:/Users/pb1015/Documents/R/win-library
[2] C:/R/R-4.0.5/library
-----------------------------------------------------------------------------------------------------------------------------------------------------------