Help making a short overview of the Stan ecosystem in R?

There are lots of R packages available for Stan users but I think we’re lacking a good overview of what’s available. Would anyone be interested in helping to write a short overview of the Stan ecosystem in R for new users?

The focus (at least initially) would be on the packages that the Stan team has control over, but it would also be good to include other great packages being developed by Stan users (e.g., tidybayes, rethinking, etc.).

I could do this myself, but I think it would be ideal to have someone else either help me with this or take the lead. One reason is that this would benefit from the perspective of a user who has had the experience of navigating the Stan ecosystem in R. Another reason is that this could be an opportunity for someone to get involved and contribute to Stan.

8 Likes

This seems like a good idea. Incidentally you would be unlikely to discover STAN from the Bayesian task view on CRAN unless you start from the bottom! https://cran.r-project.org/web/views/Bayesian.html

1 Like

Yeah you’re right. And some of our R packages aren’t even listed in the task views at all. I’ll see if I can get the rest of them listed.

2 Likes

Inspired by this, I put together a little script calculating total correlation (multivariate PMI) for CRAN’s dependency graph. I looked at “from”, “to”, type of relation, and reverse or not. Originating from rstan the highest pmi/total correlations are:

    from           to       type reverse       pmi
1: rstan       inline    imports   FALSE  5.270930
2: rstan    shinystan   suggests   FALSE  5.521288
3: rstan  StanHeaders    depends   FALSE  6.165161
4: rstan  StanHeaders linking to   FALSE  7.232202
5: rstan   pollimetry   suggests    TRUE  7.852269
6: rstan       BANOVA    imports    TRUE  6.765664
7: rstan GPRMortality    depends    TRUE 10.218909
8: rstan          cbq linking to    TRUE  9.206508

Code is here: on Github if you’re curious.

2 Likes

One additional update. I used the varimax PCA proposed by @alexpghayes and co-authors on a (positive) PMI matrix of CRAN packages (here, only looking at “from” and “to”), and calculated cosine similarities for a few common packages from the resulting embeddings. On each line, left-most name is the query package, numeric values are the cosine similarities. Note, I used a penalty for the PMI value, but you can adjust that in the code.

rstan  rstantools StanHeaders         loo   bayesplot          PK   RcppEigen       dfcrm          BH   llogistic 
1.0000000   0.9822182   0.9769508   0.9468584   0.8566054   0.8395670   0.8196215   0.8027290   0.7685029   0.7003608 

brms    rstanarm  modelbased     sjstats  bayestestR  effectsize    projpred      sjPlot         see performance 
1.0000000   0.9249122   0.8816298   0.8725162   0.8650269   0.8572184   0.8498781   0.8116252   0.7960650   0.7944999

rstanarm        brms    projpred  bayestestR     sjstats  effectsize  modelbased performance  pollimetry         see 
1.0000000   0.9249122   0.8770820   0.8560088   0.8403272   0.8367295   0.8141717   0.8026197   0.7794552   0.7792965 

RcppEigen          BH StanHeaders          PK  rstantools       rstan       dfcrm         loo   RViennaCL   bayesplot 
1.0000000   0.9275742   0.8748471   0.8409405   0.8342655   0.8196215   0.7971657   0.6894355   0.6781214   0.5798502 

tidybayes bridgesampling      missingHE         BANOVA            mcp     dotwhisker           bcrm           BEST         bamdit         jarbes 
1.0000000      0.7625153      0.7307010      0.6469568      0.6376476      0.6305086      0.6214216      0.6153992      0.6126700      0.6102035 

 tidyverse    forcats      haven   labelled     pillar        hms  anomalize tidyselect    tsibble       fpp3 
 1.0000000  0.6063366  0.5700154  0.5699100  0.5321274  0.5254550  0.5219375  0.5166636  0.4811196  0.4768220 

Code is here if you want to explore.

1 Like

I wonder if it’s worth messaging the owner of the Bayesian task view (Jong Hee Park) to add a “Packages that connect to Stan” section. It feels like there is a lot and could probably be worth it

Was curious how many packages depend on rstan, there’s 117?

avail_pks <- available.packages()

blah = vector(mode = "list", length = length(avail_pks[, "Package"]))
deps = tools::package_dependencies(packages = avail_pks[, "Package"], recursive = TRUE)
beeoboo = c()
for (j in 1:length(deps)) {
  if ("rstan" %in% deps[[j]]) {
    beeoboo = c(beeoboo, names(deps[j]))
  }
}
print(beeboo)

beeoboo
  [1] "adnuts"             "autoTS"             "baggr"              "BANOVA"            
  [5] "bayes4psy"          "bayesbr"            "bayesdfa"           "bayesGAM"          
  [9] "BayesianFROC"       "BayesSenMC"         "bayesvl"            "beanz"             
 [13] "bellreg"            "bkmrhat"            "blavaan"            "bmgarch"           
 [17] "bmlm"               "bmscstan"           "bpcs"               "breathteststan"    
 [21] "brms"               "brxx"               "bsem"               "CausalQueries"     
 [25] "cbq"                "clinDR"             "CNVRG"              "conStruct"         
 [29] "CopulaDTA"          "ctsem"              "ctsemOMX"           "DAMisc"            
 [33] "dclone"             "dcmle"              "DCPO"               "DeLorean"          
 [37] "densEstBayes"       "dfped"              "dfpk"               "DrBats"            
 [41] "edstan"             "eefAnalytics"       "eggCounts"          "embed"             
 [45] "EpiNow2"            "escalation"         "ESTER"              "evidence"          
 [49] "fable.prophet"      "fergm"              "fishflux"           "FlexReg"           
 [53] "gastempt"           "ggfan"              "ggstatsplot"        "glmmfields"        
 [57] "GPP"                "gppm"               "GPRMortality"       "hBayesDM"          
 [61] "HCT"                "hsstan"             "iCARH"              "JMbayes"           
 [65] "llbayesireg"        "MADPop"             "MCMCvis"            "metaBMA"           
 [69] "MetaStan"           "MIXFIM"             "modeltime"          "modeltime.ensemble"
 [73] "modeltime.gluonts"  "modeltime.resample" "mrbayes"            "multinma"          
 [77] "OncoBayes2"         "PandemicLP"         "pcFactorStan"       "pivmet"            
 [81] "pollimetry"         "PosteriorBootstrap" "precautionary"      "promotionImpact"   
 [85] "prophet"            "psrwe"              "publipha"           "PVAClone"          
 [89] "qmix"               "rater"              "RBesT"              "Replication"       
 [93] "Rlgt"               "rmdcev"             "rmsb"               "rstanarm"          
 [97] "rstanemax"          "rstap"              "sharx"              "shinybrms"         
[101] "shinystan"          "spatialfusion"      "spsurv"             "ssMousetrack"      
[105] "statsExpressions"   "survHE"             "themetagenomics"    "thurstonianIRT"    
[109] "tidyBF"             "tidyposterior"      "tmbstan"            "trialr"            
[113] "varian"             "visit"              "walker"             "YPBP"              
[117] "YPPE"     
1 Like

Also add rstantools to that I think

(Edit: like can a package depend on rstantools and not rstan?)

You might find some additional interesting related packages if you look at the Y embeddings in addition to the Z embeddings. Note though that l2 distances are rotation invariant so the varimax isn’t doing anything in your example beyond a standard SVD.

1 Like